Table Storage gotcha in Azure
Table Storage is a powerful component of the Windows Azure Storage. Yet, I feel that there is quite a significant friction working directly against the Table Storage, and it really calls for more high level patterns.
Recently, I have been toying more with the v1.0 of the Azure tools released in November’09, and I would like to share a couple of gotchas with the community hoping it will save you a couple of hours.
Gotcha 1: no REST level .NET library is provided
Contrary to other storage services, there is no .NET library provided as a wrapper around the raw REST specification of the Table Storage. Hence, you have no choice but to go for ADO.NET.
This situation rather frustrating because ADO.NET does not really reflect the real power of the Table Storage. Intrinsically, there nothing fundamentaly wrong with ADO.NET, it just suffers the law of leaky abstractions, and yes, the table client is leaking.
Gotcha 2: constraints on Table names are specific
I would have expected all the storage units (that is to say queues, containers and tables) in Windows Azure to come with similar naming constraints. Unfortunately it’s not the case, as table names do not support hyphens for example.
Gotcha 3: table client performs no client-side validation
If your entity has properties that do not match the set of supported property types then properties get silently ignored. I got burned through a int property that I was naively expecting to be supported. Note that I am perfectly fine with the limitations of the Table Storage, yet, I would have expected the table client to throw an exception instead of silently ignoring the problem.
Similarly, since the table client performs no validation, DataServiceContext.SaveChangesWithRetries behaves very poorly with the default retry policy as a failing call due to, say, and entity that already exists in the storage, is going to attempted again and again, as if it was a network failure. In this sort of situation, you really want to fail fast, not to spend 180s re-attempting the operation.
Gotcha 4: no batching by default
By default DataServiceContext.SaveChanges does not save entities in batch, but performs 1 storage call for each entity. Obviously, this is a very inefficient approach if you have many entities. Hence, you should really make sure that SaveChanges is called with the option SaveChangeOptions.Batch.
Gotcha 4: paging takes a lot of plumbing
Contrary to Blob Storage library that abstracts away most nitty-gritty details such as the management of continuation tokens, the table client does not. You are forced into a lot of plumbing to perform something as simple as paging through entities.
Then, back to the method SaveChanges, if you need to save more than 100 entities at once, you will have to deal yourself with the capacity limitations of the Table Storage. Simply put, you will have to split your calls into smaller ones: the table client doesn’t do that for you.
Gotcha 5: random access to many entities are once takes even more plumbing
As outlined before, the primary benefit of the Table Storage is to provide a cloud storage much more suited than the Blob Storage for fine-grained data access (up to 100x cheaper actually). Hence, you really want to grab entities by batch of 100 whenever possible.
Turns out that retrieving 100 entities following a random access pattern (within the same partition obviously) is really far from being straightforward. You can check my solution posted on the Azure forums.
Gotcha 6: table client support limited tweaks through events
It took me a while to realize that such customization was possible in the first place as those events feel like outliers in the whole StorageClient design. It’s about the only part where events are used, and leveraging side-effects on events is usually considered as really brittle .NET design.
Stay tuned for an O/C mapper to be included in Lokad.Cloud for Table Storage. I am still figuring out how to deal with overflowing entities.