Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
« Thinking an academic package for Azure | Main | MapReduce as burstable low-cost CPU »
Wednesday
Mar102010

You don't know how much you'd miss an O/C mapper till you get one

When we started moving our enterprise app toward Windows Azure, we quickly realized that scalable enterprise cloud apps were tough to develop, real tough.

Windows Azure wasn't at fault here, quite the opposite actually, but the cloud computing paradigm itself is tough to develop enterprise apps. Indeed, scalability in enterprise apps can't be solved by just pilling up tons of memcached servers.

Enterprise apps aren't about scaling out some super-simplistic webapp to a billion users who will be performing reads 99.9% of the time, but rather scaling out complex business logic and accordingly complex business data along.

This lead us to implement Lokad.Cloud, an open source .NET O/C mapper (object-to-cloud) much similar in the spirit to O/R mapper such as NHibernate but tailored for NoSQL storage.

I am proud to announce that Lokad.Cloud has reached its v1.0 milestone.

As a matter of fact, you've probably never heard of O/C mappers, so I will explain why relying a decent O/C mapper should be a primary concern for any ambitious cloud app developer.

To illustrate the point, I am going to list a few subtleties that arise as soon you start using the Queue Storage. As far cloud apps are concerned, Queue Storage is one of the most powerful and most handy abstraction to achieve true scale out behaviors.

Microsoft provides the StorageClient which is basically a .NET wrapper around the REST API offered by the Queue Storage. Let see how an O/C mapper implemented on top of the StorageClient can make queues even better:

  • Strong typed messages: Queue Storage deals with binary messages, not with objects. Obviously, you don't want to entangle your business logic with serialization/deserialization logic. Business logic only cares about the semantic of the processing, not about the underlying data format used for persistence while transiting data over the cloud. The O/C mapper is here to provide a strong typed view of the Queue Storage.
  • Overflowing messages: Queue Storage upper bounds messages to 8kB. This limitation is fine as the Blob Storage is available to deal with large (even gigantic) blobs. Yet again, you don't want to mix storage contingencies (8kB message limit) with your business logic. The O/C mapper lets large message overflow into the Blob Storage.
  • Garbage collection: you might think that manually handling overflowing messages is just fine. Not quite so. What will happen to your overflowing messages, conveniently stored in the Blob Storage, if the queue (for good or ill reasons) happens to be cleared? Simple, you end up with a cloud storage leak: dead piece of data start to pill-up into your storage, and you get charge for it. In order to avoid such situation, you need a cloud garbage collector that makes sure that expired data are automatically collected. The O/C mapper embeds a storage garbage collector.
  • Auto-deletion of messages: Messages should not only be retrieved from the Queue, but also deleted once processed. Following the GC idea, developers should not be expected to delete queue messages when the message processing goes OK, much like you don't have to care about destroying objects getting out of reach. The O/C mapper auto-deletes queue messages upon process completion.
  • Delayed messages: Queue Storage does not offer any simple way to schedule a message to reappear in the queue at a specified time. You can come up with your own custom logic, but again, why should the business logic even bother about such details. The O/C mapper supports delayed messages so that you don't have to think about it.
  • Poisoned queues: that one is deadly subtle one. A poisoned queue message refers to a message that leads to a faulty processing, typically an uncaught exception being thrown by the business logic while trying to process the message. The problem is intricately coupled to the good behavior of the Queue, indeed, if a retrieved message fails to be deleted within a certain amount of time, the message will reappear in the Queue. This behavior is excellent for building robust cloud apps. but deadly if not properly handled. Indeed, faulty messages are going to fail and to reappear over and over, consuming ever increasing cloud resources for no good reason. In a way, poisoned messages represents processing leaks. The O/C mapper detects poisoned messages and isolate them for further investigation and eventual reprocessing once the code is fixed.
  • Abandoning messages: In the clouds, you should not expect VM instances to stay up forever.  In addition to hardware faults, the fabric might decide anytime to shutdown one of your instance.  If a worker instance gets shut down while processing a message, then the processing will be lost until the message reappears in the Queue. Nevertheless, such extra delay might negatively impact your business service level, as an operation that was supposed to take only half a minute might suddenly take 1h (the expiration delay of your message). If the VM gets the chance to be notified of the upcoming shutdown, the O/C mapper abandons in-process messages, making them available for processing again without waiting for expiration.

I have only illustrated here a few point about Queue Storage, but Blob Storage, Table Storage, Management API, Performance Monitoring, ...  also need to rely on higher level abstractions as offered by an O/C mapper such as Lokad.Cloud to become fluently usable.

Don't waste any more time crippling your business logic with cloud contingencies, and start using some O/C mapper. I suggest Lokad.Cloud, but I admit this is biased viewpoint.

Reader Comments (1)

I might have a biased viewpoint too (:D), but I have to say that Lokad.Cloud really brings Azure Storage to an excellent level. Thanks for making it free and open-source!

March 10, 2010 | Unregistered CommenterDario Solera
Comments for this entry have been disabled. Additional comments may not be added to this entry at this time.