A few design tips for your NoSQL app
Since the migration of Lokad toward Windows Azure about 18 months ago, we have been near exclusively relying on NoSQL - namely Blob Storage, Table Storage and Queue Storage. Similar cloud storage abstractions exist for all major cloud providers, you can think of them as NoSQL as a Service.
It took us a significant effort to redesign our apps around NoSQL. Indeed, cloud storage isn’t a new flavor of SQL, it’s a radically different paradigm and it required in-depth adjustment of the core architecture of our apps.
In this post, I will try to summarize gotchas we grabbed while (re)designing Lokad.com.
You need an O/C (object to cloud) mapper
NoSQL services are orders of magnitude simpler than your typical SQL service. As a consequence, the impedance mismatch between your object oriented code and the storage dialect is also much lower compared to SQL; this is a direct consequence of the relative lack of expressiveness of NoSQL.
Nevertheless, introducing an O/C mapper was a major complexity buster. At present time, we no more access cloud storage directly, and the O/C mapper layer is a major bonus to abstract away may subtleties such as retry policies, MD5, queue message overflow, …
Performance is obtained mostly by design
NoSQL is not only simpler but more predictable as well when it comes to performance. However, it does not mean that a solution build on top of NoSQL automatically benefit from scalability and performance - quite the opposite actually. NoSQL comes with strict built-in limitations. For example, you can’t expect more than 20 updates / second on a single blob, which is near ridiculously low compared to its SQL counterpart.
Your design needs to embrace the strengths of NoSQL and be really cautious about not hitting bottlenecks. Good news, those are much easier to spot. Indeed, no later optimization will save your app from abysmal performance if the storage architecture doesn’t match dominant I/O patterns of your app (see the Table Storage or the 100x cost factor).
Go for contract-based serializer
A serializer, aka a component that let you turn an arbitrary object graph into a serialized byte stream, is extremely convenient for NoSQL. In particular, it provides a near-seamless way to let your object-oriented code interact with the storage. In many ways, the impedance mismatch objects vs NoSQL is much lower than it was for objects vs SQL.
Although, sometimes, serializers are nearly too powerful. In particular, it’s easy to serialize objects part of the runtime which can prove brittle over time. Indeed, upgrading the runtime might end-up breaking your serialization patterns. That’s why I advise to go for simple yet explicit contract-based serialization schemes.
Although we did use a lot of XML in our early days on the cloud, we are now migrating away from XML in favor of JSON, Protocol Buffers or adhoc high-density binary encoding that provides better readability vs flexibility vs performance tradeoff in our experience.
Entity isolation is easiest path to versioning
One early mistake of Lokad in our early NoSQL day was apply too much of DRY principle (Don’t Repeat Yourself). Indeed, sharing the same class between entities is a sure way to end-up with painful versioning issues later on. Indeed, touching entities once data has been serialized with them is always somewhat risky, because you can end-up with data that you can’t deserialize any more.
Since the schema evolution required for one entity doesn’t necessarily match the evolution of the other entities, you ought to keep them apart upfront. Hence, I suggest to give up on DRY early - when it comes to entities - to ease later evolutions.
With proper design, aka CQRS, needs for SQL drop to near-zero
Over the last two decades, SQL has been king. As a consequence, nearly all apps embed SQL structural assumptions very deep into their architecture, making a relational database an irreplaceable component - by design.
This aspect was a surprise to us, as we initiated our cloud migration extensively leveraging SQL databases. Now, as we are gaining maturity at developing cloudy apps, we are gradually phasing those databases out: not because of performance or capabilities, simply because they aren’t needed anymore.
Reader Comments (2)
Good article. I’ve read it while searching for patterns over nosql entity versioning. Still, i can’t get used to not sharing part of the entity and only version changing attributes (=generic entity leafs) . Still, i’m not sure whether this is the best pattern to use. Can i ask about the magnitude of versioning that you have been using: no of entities/no of related entities/versions?.
June 3, 2012 | Gabi
Hi Gari, Lokad doesn’t have more than a few hundreds entities, even considering all our sub-systems (as of June 2012). Then, we don’t have that many changes either. I suspect that less than 1⁄4 of the entities are ever changed. However, this still amounts for dozens of incremental subtle changes. Overall, we are roughly pushing one persistence change per week or so. Hope it helps.
June 4, 2012 | Joannes Vermorel