Thinking the Table Storage of Windows Azure
Disclaimer: I am not exactly a Table Storage expert. In this post, I am just trying to sort out my own thoughts about this service offered with Windows Azure. Check my follow-up post.
Soon after the release announcement of the release of our new O/C mapper Lokad.Cloud (object to cloud) named Lokad.Cloud, folks on the Azure Forums raised the question of the Table Storage.
Although it might be surprising, Lokad.Cloud does not provide - yet - any support for Table Storage.
At this point, I feel very uncertain about Table Storage, not in the sense that I do not trust Microsoft to end-up with finely tuned product, but rather at the patterns and practices level.
Basically, the Table Storage is an entity storage that features three special system properties:
- PartitionKey: a grouping criterion - data having the same PartitionKey being kept close.
- RowKey: the unique identifier for the entity.
- Timestamp: the equivalent of Blob Storage ETag.
So far, I got the feeling that many developers feel attracted toward the Table Storage for the wrong reasons. In particular, Table Storage is not a substitute of your old plain SQL tables:
- No support for transactions.
- No support for keys (let alone foreign keys).
- No possible refactoring (properties are frozen at setup).
If you are looking for those features, you’re most likely betting on the wrong horse. You should be considering SQL Azure instead.
Then, some might argue that SQL Azure won’t scale above 10GB (at least considering the current pricing plans offered by Microsoft). Well, the trick is Table Storage won’t scale either, at least not unless you’re not very cautious with your queries.
AFAIK, the only indexed column of the Table Storage is the RowKey. Thus, any filtering criterion based on custom entity properties is likely to get abyssal performance as soon your Table Storage get large.
Well, sort of, the most probable scenario is like to to be worse as your queries are just going to timeout after exceeding 60s.
Again, my goal here is not to bash the Table Storage, but it must be understood that the Table Storage is clearly not a magically scalable equivalent of the plain old SQL tables.
Back to Lokad.Cloud, we did not consider adding Table Storage because we did not feel the need either although our forecasting back-end is probably very high in the currently complexity spectrum of the cloud apps.
Indeed, the Blob Storage is surprisingly powerful with very predicable performance too:
- Storing complex objects is a non-issue with a serializer at hand.
- A blob name prefix is a very efficient substitute to the PartitionKey.
Basically, it seems to me that any Table Storage operation can be executed with the same performance with the Blob Storage for now. Later on, when the Table Storage will start supporting secondary indexes, this situation is likely to evolve, but meantime I still cannot think a single situation that would definitively support Table Storage over Blob Storage.
Reader Comments (2)
There are a number of points in here that are quite inaccurate (and make me wonder how much time you’ve actually spent with Table Storage):
- The PartitionKey and RowKey both make up the primary key
- Properties are not “frozen at setup” - in fact, you can have an arbitrary set of properties on an entity and change that schema at any time by updating the entity
- Partitions represent a physical boundary - so queries within a partition are quite fast
- Table storage does support transactions (within a table partition)
- The scalability by partition happens transparently - so you don’t need to come up with your own scheme, or work out how many partitions to have, or be responsible for moving a partition to a new physical location when it gets to a certain size.
September 15, 2009 | Michael Hart
Thanks Michael. I had not figured out that Table Storage had progressed so much in the last CTP (as limited transactions are now supported).
September 15, 2009 | joannes