I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)


Entries in cloudcomputing (30)


Cloud 2.0, what future for cloud computing?

Almost one year ago, I posted a a personal review about Azure, Amazon, Google Engine, VMWare and the others. One year later, the cloud computing market is definitively taking shape. Patterns are emerging along with early standardization attempts.

My own personal guess is that the cloud computing market (not the technology) will somehow be reaching a v1.0 status at the very end of 2009, when the latest big player - that is to say Microsoft - will have finally launched it's own cloud.

My personal definition for cloud computing v1.0 is a complex technology mash-up that involves a series of computing resource abstractions:

  • Scalable key-value storage (1)

  • Scalable queues

  • Computing nodes on demand (1)

  • Scalable functional CPU (ala MapReduce)

  • Scalable cache (2)

  • Sharded relational DB (3)

(1) Both storage and computing nodes come in two flavors depending if the cloud supports geo-localization of its resources. In particular, read-only geo-localized scalable storage also known as content delivery networks provide advanced automated geo-localization; while computing nodes are still manually geo-localized.

(2) At present time, virtually no major cloud provider support distributed cache - but considering the success and community interest in Memcached, I am guessing that all major cloud providers will be supporting this service by the end of 2010.

(3) Again, virtually no major cloud provider support sharded relational DB at the moment, but considering the importance of relational data in virtually every single enterprise app, I am also guessing that most major cloud providers will offer that by the end of 2010.

With those services in place, I will consider that the cloud v1.0 milestone will have been reached.

Guessing what lies further ahead, beyond 2010, is a difficult game as the cloud computing technology is still under a very fast paced evolution.

Yet, I think (or rather I guess) that there will be two major forces for cloud computing 2.0:

  • Drastic productivity improvements though mature environments.

  • Fine grained geo-localization for near real-time latencies (say 10ms).

Indeed, at present time, cloud computing is mostly an option available for projects carrying little or no legacy, as the migration toward the cloud represents a complete redesign of most apps.

Furthermore, cloud computing v1.0 involves loads of hard-core development skills and a significant amount of knowledge about distributed computing. This is a vast barrier that will slow down the adoption rate of the cloud.

Thus, a key aspect of cloud computing 2.0 will be to obtain drastic productivity improvement through mature programming environments that will significantly facilitate design and testing of cloud apps. Considering the breath of issues to migrate existing apps toward the cloud, I believe that this task will require no less investments than the actual design of the cloud v1.0.

Then, if cloud v1.0 is vastly scalable, it's also still far from real-time interactions (*) as latency is, at best, only marginally better than what is obtained with classical server setups. Indeed, geo-localization is made available, but at a very coarse grained level (typically continents) and rather with a spirit of compliance with local regulations, as opposed to latency fine-tuning.

(*) Check OnLive for an early attempt at low-latency cloud infrastructure.

I feel that the potential for on-demand computing resources made available in nearly locally allowing nearly real-time interactions - from mobile apps to urban commodities - is huge. UI responsiveness is addictive, and the competition between cloud providers will reflect that.

Yet, lowering the latency will probably mean multiplying cloud data centers around the world so that most people (who will remain as blissfully ignorant about cloud computing, as they are about water supply) can enjoy loads of services with improved user experience.

To achieve that, I suspect that major cloud providers will end-ups with dozens (and ultimately hundreds) datacenters starting with the largest/wealthiest cities.

Considering that data centers typically costs hundreds of millions of dollars. Cloud 2.0 will represent investments no less important than what has been made historically to setup the power grid.


Azure Management API concerns

Disclaimer: this post is based on my (limited) understanding of the Azure Management API, I did start reading the docs only a few hours ago.

Microsoft has just released the first preview of their Management API for Windows Azure.

As far I understand the content of the newly released API (check the MSDN reference), this API just let you automates what was done manually through the Windows Azure Console so far.

At this point, I have two concerns:

  1. No way to adjust your instance count for a given role.

  2. Auto-management (*) involves loads of quirks.

(*) Auto-Management: the ability for a cloud app to scale itself up and down depending on the workload.

I am not really satisfied by this Management API as it does not seem to address basic requirements to easily scale up or down my (future) cloud app.

Being able to deploy a new azure package programmatically is nice, but we were already doing that in Lokad.Cloud. Thanks to the AppDomain restart trick, I suspect we will keep deploying that way, as the deployment through Lokad.Cloud is likely to be still 100x faster.

That being said, the Management API is powerful, but it does not seem to address auto-management, at least not in a simple fashion.

The single feature I was looking forward was being able to adjust the number of instances on-demand through a very very simple API that would have let me do three things:

  1. Create new instance for the current role.

  2. Shut down current instance.

  3. Get the status of instances attached to the current role.

That's it!

Notice that I am not asking here to deploy a new package, or to change the production/staging status. I just need to be able tweak the instance count.

In particular, I would expect a Non-SSL REST API to do those limited operations, much like the other REST API available for the cloud storage.

Indeed, security concerns related to the instance count management are nearly identical to the ones related to the cloud storage. Well, not really, as in practice securing your storage is way much more sensitive.


Table Storage or the 100x cost factor

Until very recently, I was a bit puzzled by the Table Storage. I couldn't manage to get a clear understanding how the Table Storage could be a killer option against the Blob Storage.

I get it now: Table Storage can cut your storage costs by 100x.

At outlined by other folks already, I/O costs typically represents more than 10x the storage costs if your objects are weighting less than 6kb (the computation has been done for the Amazon S3 pricing, but the Windows Azure pricing happens to be nearly identical).

Thus, if you happen to have loads of fine grained objects to store in your cloud, say less-than-140-characters tweets for example, you're likely to end-up with an insane I/O bill if you happen to store those fine-grained items in the Blob Storage.

But don't lower your hopes, that's precisely the sort of situations the Table Storage has been designed for, as this service lets you insert/update/delete entities by batches of 100 through Entity Group Transactions.

This fine-grained item orientation is reflected in the limitations that apply to entities:

  • A single entity should not weight more than 1MB.

  • A single group transaction should not weight more than 4MB.

  • A single entity property should not weight more than 64kb.

Situations where loads of small items end-ups being processed - threshold being at 60kb - by your cloud apps are likely to be good candidate for the Table Storage.

We will definitively try to reflect this in our favorite O/C mapper.


O/C mapper - object to cloud

When we started to port our forecasting technology toward the cloud, we decided to create a new open source project called Lokad.Cloud that would isolate all the pieces of our cloud infrastructure that weren't specific of Lokad.

The project has been initially subtitled Lokad.Cloud - .NET execution framework for Windows Azure, as the primary goal of this project was to provide some cloud equivalent of the plain old Windows Services. We did quickly end-up with QueueServices which happens to be quite handy to design horizontally scalable apps.

But more recently, the project has taken a new orientation, becoming more and more an O/C mapper (object to cloud) inspired by the terminology used by O/R mappers. When it comes to horizontal scaling, a key idea is that data and data processing cannot be considered in isolation anymore.

With classic client-server apps, persistence logic is not supposed to invade your core business logic. Yet, when your business logic happens to become so intensive that it must be distributed, you end-up in a very cloudy situation where data and data processing becomes closely coupled in order to achieve horizontal scalability.

That, being said, close coupling between data and data processing isn't doomed to be an ugly mess. We have found that obsessively object-oriented patterns applied to Blob Storage can made the code both elegant and readable.

Lokad.Cloud is entering its beta stage with the release of the 0.2.x series, check it out.


Lokad.Cloud - alpha version released

One of the major little-known weakness of cloud computing is development productivity. Indeed, developing over the cloud ain't easy, and as complexity goes, the management of a complex, fully-distributed app may become a nightmare. At Lokad, as we started migrating a fairly complex technology, we did get the feeling that we were needing strong patterns and practices - tailored for the cloud - so that we don't get lost half-way in the migration process.

That's how Lokad.Cloud was born.

In short, Lokad.Cloud is a framework that can be used to rationalize and speed-up development of back-end apps over Windows Azure. Read more on the announcement made directly on the Windows Azure Forums.