Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
« Google App Engine becoming much more like Windows Azure? | Main | Telling the difference between cloud and smoke »
Tuesday
Apr052011

A few design tips for your NoSQL app

Since the migration of Lokad toward Windows Azure about 18 months ago, we have been near exclusively relying on NoSQL - namely Blob Storage, Table Storage and Queue Storage. Similar cloud storage abstractions exist for all major cloud providers, you can think of them as NoSQL as a Service.

It took us a significant effort to redesign our apps around NoSQL. Indeed, cloud storage isn't a new flavor of SQL, it's a radically different paradigm and it required in-depth adjustment of the core architecture of our apps.

In this post, I will try to summarize  gotchas we grabbed while (re)designing Lokad.com.

You need an O/C (object to cloud) mapper

NoSQL services are orders of magnitude simpler than your typical SQL service. As a consequence, the impedance mismatch between your object oriented code and the storage dialect is also much lower compared to SQL; this is a direct consequence of the relative lack of expressiveness of NoSQL.

Nevertheless, introducing an O/C mapper was a major complexity buster. At present time, we no more access cloud storage directly, and the O/C mapper layer is a major bonus to abstract away may subtleties such as retry policies, MD5, queue message overflow, ...

Performance is obtained mostly by design

NoSQL is not only simpler but more predictable as well when it comes to performance. However, it does not mean that a solution build on top of NoSQL automatically benefit from scalability and performance - quite the opposite actually.  NoSQL comes with strict built-in limitations. For example, you can't expect more than 20 updates / second on a single blob, which is near ridiculously low compared to its SQL counterpart.

Your design needs to embrace the strengths of NoSQL and be really cautious about not hitting bottlenecks. Good news, those are much easier to spot. Indeed, no later optimization will save your app from abysmal performance if the storage architecture doesn't match dominant I/O patterns of your app (see the Table Storage or the 100x cost factor).

Go for contract-based serializer

A serializer, aka a component that let you turn an arbitrary object graph into a serialized byte stream, is extremely convenient for NoSQL. In particular, it provides a near-seamless way to let your object-oriented code interact with the storage. In many ways, the impedance mismatch objects vs NoSQL is much lower than it was for objects vs SQL.

Although, sometimes, serializers are nearly too powerful. In particular, it's easy to serialize objects part of the runtime which can prove brittle over time. Indeed, upgrading the runtime might end-up breaking your serialization patterns. That's why I advise to go for simple yet explicit contract-based serialization schemes.

Although we did use a lot of XML in our early days on the cloud, we are now migrating away from XML in favor of JSON, Protocol Buffers or adhoc high-density binary encoding that provides better readability vs flexibility vs performance tradeoff in our experience.

Entity isolation is easiest path to versioning

One early mistake of Lokad in our early NoSQL day was apply too much of DRY principle (Don't Repeat Yourself).  Indeed, sharing the same class between entities is a sure way to end-up with painful versioning issues later on.  Indeed, touching entities once data has been serialized with them is always somewhat risky, because you can end-up with data that you can't deserialize any more.

Since the schema evolution required for one entity doesn't necessarily match the evolution of the other entities, you ought to keep them apart upfront. Hence, I suggest to give up on DRY early - when it comes to entities - to ease later evolutions.

With proper design, aka CQRS, needs for SQL drop to near-zero

Over the last two decades, SQL has been king. As a consequence, nearly all apps embed SQL structural assumptions very deep into their architecture, making a relational database an irreplaceable component - by design.

Yet,  we have find out that when the app deeply embraces concepts such as CQRS, event sourcing, domain driven design and task-based UI, then there is no more need for SQL databases.

This aspect was a surprise to us, as we initiated our cloud migration extensively leveraging SQL databases. Now, as we are gaining maturity at developing cloudy apps, we are gradually phasing those databases out: not because of performance or capabilities, simply because they aren't needed anymore.

Reader Comments (2)

Good article. I've read it while searching for patterns over nosql entity versioning.
Still, i can't get used to not sharing part of the entity and only version changing attributes (=generic entity leafs) .
Still, i'm not sure whether this is the best pattern to use.
Can i ask about the magnitude of versioning that you have been using: no of entities/no of related entities/versions?

June 3, 2012 | Unregistered CommenterGabi

Hi Gari, Lokad doesn't have more than a few hundreds entities, even considering all our sub-systems (as of June 2012). Then, we don't have that many changes either. I suspect that less than 1/4 of the entities are ever changed. However, this still amounts for dozens of incremental subtle changes. Overall, we are roughly pushing one persistence change per week or so. Hope it helps.

June 4, 2012 | Registered CommenterJoannes Vermorel
Comments for this entry have been disabled. Additional comments may not be added to this entry at this time.