MapReduce as burstable low-cost CPU

About two months ago, when Mike Wickstrand setup a UserVoice instance for Windows Azure, I immediately posted my own suggestion concerning MapReduce. MapReduce is a distributed computing concept initially published by Google late 2004.

Against all odds, my suggestion, driven by the needs of Lokad, made it into the Top 10 most requested features for Windows Azure (well, 9th rank and about 20x times less voted than the No1 request for scaled down hosting).

Lately, I had the opportunity to discuss more with folks at Microsoft gathering market feedback on this item. In software business, there is frequent tendency for users to ask for features they don’t want in the end. The difficulty being that proposed features may or may not correctly address initial problems.

Preparing the interview, I realized that, to some extend, I had fallen for the same trap when asking for MapReduce. Actually, we have already reimplemented our own MapReduce equivalent, which is not that hard thanks to the Queue Storage.

I care very little about framework specifics, may it be MapReduce, Hadoop, DryadLinq or something not-invented-yet. Lokad has no cloud legacy calling for a specific implementation.

What I do care about is much simpler. In order to deliver truckloads of forecasts, Lokad needs :

  1. large scale CPU
  2. burstable CPU
  3. low cost CPU

Windows Azure is already doing a great job addressing Point 1. Thanks to the massive Microsoft investments on Azure datacenters, thousands of VMs can already be instantiated if needed.

When asking for MapReduce, I was instead expressing my concern for Point 2 and Point 3. Indeed,

Then, low-cost CPU is somehow conflicting with burstable CPU, as illustrated by the Reserved Instances pricing of Amazon.

As far low-level cloud computing components are concerned, lowering costs usually mean giving up on expressiveness as a resulting trade-off:

Seeking large scale burstable CPU, here are the list of items that we would be very willing to surrender in order to lower the CPU pricing:

Obviously, options are plenty to drag the price down in exchange of a more constrained framework. Since Azure has the unique opportunity to deliver some very .NET oriented features, I am especially interested by approaches that would leverage sandboxed code executions - giving up entirely on the OS itself to purely focus on the .NET Runtime.

I am very eager to see how Microsoft will be moving forward on this request. Stay tuned.

Reader Comments (2)

I was wondering if you got a chance to check the Cloud MapReduce implementation before reimplementing your own MapReduce. March 2, 2010 | Alex Popescu

Hi Alex, yes I had a look at a couple of papers before reimplementing own version. Yet, the trick is that once you have the Queue Storage and the Blob Storage in your hand, MapReduce has suddenly become a lot simpler to implement. In fact most of the actual MapReduce complexity just get abstracted away by the Azure Storage itself. Code length is somewhat equivalent to the one outlined in the Cloud MapReduce paper. March 3, 2010 | Joannes Vermorel