Serialization in the cloud: SharedContract vs. SharedType
Every time developers decide not to go for relational databases in cloud apps, they end-up with custom storage formats. In my (limited) experience, that one of the inescapable law of cloud computing.
Hence, serialization plays a very important role in cloud apps either for persistence or for transient computations where input data need to be distributed among several computing nodes.
In the case of Lokad.Cloud, our O/C mapper (object to cloud), our blob storage abstraction relies on seamless serialization. Looking for a serialization solution, we did initially go the quick & dirty way through the BinaryFormatter
that has been available since .NET 1.1, that is to say forever in the .NET world.
Binary formatter is easy to setup, but pain lies ahead:
1 .No support for versioning, i.e. what will happen to your data if your code happen to change? 2. Since it embeds all .NET type info, it’s not really compact, even for small datastructure (if you just want to serialize a 1M double array, it’s OK though, but that’s not the typical situation). 3. It offers little hope for interoperability of any kind. Even interactions with other distinct .NET Framework versions can be subject to problems.
Robust serialization approach is needed
With the advent of WCF (Windows Communication Foundation), Microsoft teams came up with a much improved vision for serialization. In particular, they introduced two distinct serialization behaviors:
- SharedContract through the
DataContractSerializer
. - SharedType through the
NetDataContractSerializer
.
Both serializers produce XML streams but there is a major design gap between the two.
Shared contract assumes that the contract (the schema in the XML terminology) will be available at deserialization time. In essence, it’s a static spec while implementation is subject to evolution. Benefits are that versioning, and even performance to some extend, can be expected to be great as the schema is both static and closed.
Shared type, in the other hand, assumes that the concrete .NET implementation will be available at deserialization time. The main benefit of the shared type approach is its expressivity, as basically any .NET object graph can be serialized (object just need to be marked as [Serializable]). Yet, as price to pay for this expressiveness, versioning does suffer.
Serialization and O/C mapper
Our O/C mapper is designed not only to enable persistence (and performance), but also to ease the setup of transient computations to be run over the cloud.
A****s far persistence is concerned, you really want to go for a SharedContract approach, otherwise data migration from old .NET types to new .NET types is going to heavily mess-up your design through the massive violation of the DRY principle (Don’t Repeat Yourself, you would typically need to have old and new types side by side).
Then, for transient computations, SharedType is a much friendlier approach. Indeed, why should you care about data schema and versioning, if you can just discard old data, and re-generate them as part of your migration? That’s going to be a lot easier, but outdated data are considered as expendable here.
As a final concern for O/C mapper, it should be noted that CPU is really cheap compared to storage. Hence, you don’t want to store raw XML in the cloud, but rather GZipped XML (which comes as a tradeoff CPU vs Storage in the cloud pricing).
The case of Lokad.Cloud
For Lokad.Cloud, we will provide a GZipped XML serializer based on a combination of both the DataContractSerializer and the NetDataContractSerializer to get the best of both worlds. DataContractSerializer will be used by default, but it will be possible switch to NetDataContractSerializer through a simple attribute (idea has been borrowed to Aaron Skonnard).