I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)


Entries in amazon (3)


Thoughts about the Windows Azure pricing

Microsoft has recently unveiled its pricing for Windows Azure. In short, Microsoft did exactly align with the pricing offered by Amazon. CPU costs $0.12 / h, meaning that a single instance running 24/24 for a month costs $86.4 which is fairly expensive compared to classical hosting provider where you can get more for basically half the price.

But well, this situation was expected as Microsoft probably does not want to start a price war with his business partners still selling dedicated Windows Server hosting. Current Azure pricing is sufficiently high to deter most companies except the ones who happen to have peaky needs.

To me, the Azure pricing is fine except in 3 areas:

  • Each Azure WebRole costs at least $86.4 / month no matter how few web traffic you have (reminder: with Azure you need a distinct webrole for every distinct webapp). This situation is caused by the architecture of Windows Azure where a VM gets dedicated for every WebRole. If we compare to Google App Engine (GAE), the situation does not looks to good for Azure, indeed, with GAE, hosting a low traffic webapp is virtually free. Free vs. $1000 / year is likely to make a difference for most small / medium businesses, especially if you end-up with a dozen of webapps to cover all your needs.

  • Cloud Storage operations are expensive: the storage itself is rather cheap $0.15 / GB / month, but the cost of $0.01 per 10K operations might be a killer for cloud apps intensively relying on small storage operations. Yes, one can argue that this price ain't cheaper with AWS, but this is not entirely true as AWS provides other services such as the block storage that comes with 10x lower price per operation (EBS could be used to lower the pressure on blob storage whenever possible).

  • Raw CPU at $0.12 / h is expensive and Azure offers no solution to lower this price whereas AWS offers CPU at $0.015 / h through their MapReduce service.

Obviously, those pricing weaknesses closely reflect missing cloud technologies for Azure (at the moment). The MapReduce issue will be fixed when Microsoft ports DryadLinq to Azure. Block storage and shared low cost web hosting might be also on their way too (although I have little info on that matter). As a side note, the Azure Cache Provider might be a killing tool to reduce the pressure on the cloud storage (but pricing is unknown yet).

As a final note, it's interesting to see that the cloud computing pricing is really dependent on the quality of the software used to run the cloud. Better software typically leads to computing hardware being delivered at much lower costs, almost 10x lower costs in many situations.


Cloud Computing vs. Hardware as a Service

In a previous post, I have discussed why I believed that cloud computing was going to be a big player arena, and not a friendly place for the little guys.

Recently, many people told about such and such small company that was supposed to deliver cloud computing too, and that their service would match the ones offered by big players.

Basically, the discussion goes like this:

Hey, we too are able to instantiate virtual machines on-demand. We have some nice virtual machine deployment scripts, a nice WebUI to administrate all the nodes, we are now matching the Amazon offer.

Nope, you’re not.

Basically, what those little players are doing is simply Hardware as a Service. For years now, computing hardware has been more or less an on-demand commodity. My favorite host can typically set-up a new server in 48h, and I can cancel my subscription anytime (although I will have to pay for the entire month). Some more aggressive host providers are providing fully automated server setup, and your new server is usually available in less than 1h.

Now, what those small companies calling themselves cloud providers are able setup new servers in seconds instead of minutes; and the trick to do that is simple: they use virtualization and deployment scripts.

But, in my opinion, this isn’t cloud computing, this is just hardware as a service with a lower overhead both at infrastructure level, but also for the system administrators themselves.

So, what is so radically different with cloud computing?

In my opinion, the radical novelty of cloud computing is the promise that you won’t have to worry about resource allocation anymore.

In particular, I don’t want to figure out if I need 1, 2, 3 or 42 computing nodes to handle a massive web traffic surge from Slashdot, I just want to tell my cloud provider:

Here is the script for my web page, do whatever is needed to ensure good performance, and send me the bill at the end of the month.

Note that this is exactly what Google App Engine is doing. Google App Engine is relieving web developers from the burden of having to figure out how they are going to scale their web apps. Google is doing the magic for them so that web developers can focus on the specific value of their web apps instead of focusing on the complex infrastructure actually needed to achieve scalability.

Quoting Thomas Serval from Round Table about Azure a few months ago:

In the past, each time we have multiplied the traffic by 10 on our applications, we have been forced more or less to rewrite the application from scratch. The promise of cloud computing is to let you achieve unlimited scalability from day one.

Obviously, cloud computing isn’t magic, thus applications will need to be carefully designed to achieve unlimited scalability, yet I believe that thanks to the cloud computing frameworks currently being published, it won’t be that hard in the future.

Thus cloud computing is not Hardware as a Service simply because Hardware as a Service does not do anything about scalability in itself.

The true benefits of cloud computing is to provide what I would call scalable computing abstractions. Those abstractions represent physical resources such as CPU, memory or bandwidth, but with additional constraints (usually structural constraints) so that it becomes actually possible to provide an infinitely scalable instance of the desired resource.

For example, nowadays more or less all cloud providers are including in their offer a distributed and reliable hashtable implementation: S3 for Amazon, Blob Storage for Windows Azure, ... FIPFO is another popular scalable storage abstraction: First-In Probably First Out, i.e. queues but without deterministic behavior. As long as you rely only on those scalable storage abstractions, you should not care about scalability of your storage.

So far, scalable storage abstractions have been the primary focus of most cloud providers. Yet, I suspect that the next battle will be scalable CPU abstractions.

Indeed, Amazon has recently unveiled their now Amazon Elastic MapReduce, and as other people believe, I too believe that MapReduce will be a game changer. First, Amazon is delivering CPU at 0.015 USD/h while its competitors are still above 0.10 USD/h at the time. Then, if we consider that the Amazon native MapReduce implementation is going to be way more efficient than custom in-house implementations - simply because Amazon folks have the time and the experience needed to get the load balancing settings rights - then Amazon has just divided the CPU price by 10.

Then, what I see as the killing benefit of MapReduce is that I don’t have to care anymore about how many nodes I need.

For example at Lokad, we have tons of time-series to process. Let say that we want to extract seasonality patterns out of 100 millions time-series (each time time-series ranging from a few hundreds to a few thousands points). With MapReduce, I just have to specify the algorithm to process a single time-series and pass the huge time-series collection as argument. The cloud infrastructure will be handling all the magic for me. In particular, I don’t have to care anymore about node crashing along the way, or about dynamically expanding / shrinking the number of computing nodes.

MapReduce is a very constrained framework that forces you to apply the very same function everywhere, but the input collection can be arbitrarily large. In my experience, if you’re not able to scale a data-mining problem through MapReduce, then nothing will – or, more precisely, the design complexity will be so great that you are most likely to give up anyway.

Those scalable resource abstractions represent the core value offered by cloud providers. Yet, those scalable resource abstractions are truly hard to design and even harder to optimize. Yes, you might know a small company that auto-deploys virtual machines, but, in my opinion, this does not reach even 10% of the potential benefits brought by the cloud.

Those benefits will be achieved though scalable resource abstractions; and each one of those abstractions is going to cost a massive amount of brain power to get done right.


Cloud computing: a personal review about Azure, Amazon, Google Engine, VMWare and the others

My own personal definition of cloud computing is a hosting provider that delivers automated and near real time arbitrary large allocation of computing resources such as CPU, memory, storage and bandwidth.

For companies such as Lokad, I believe that cloud computing will shape many aspects of the software business in the next decade.

Obviously, all cloud computing providers have limits on the amount of resources that one can get allocated, but I want to emphasize that, for the end-user, the cloud is expected to be so large that the limitation is rather the cost of resource allocation, as opposed to hitting technical obstacles such as the need to perform a two-weeks upgrade from one hosting solution to another.

Big players arena

Considering that the ticket for state-of-the art data centers is now reaching $500M, cloud computing is an arena for big players. I don't expect small players to stay competitive for long in this game.

The current players are

  • Amazon Web Services, probably the first production-ready cloud offer on the market.

  • Google App Engine, a Python cloudy framework by Google.

  • Windows Azure just unveiled by Microsoft a few weeks ago.

  • VMWare specialist of virtualization who unveiled their Cloud vService last September.

  • Salesforce and their Platform as a Service offering. Definitively cloud computing, but mostly restricted to B2B apps oriented toward CRM.

Then, I expect a couple of companies to enter the cloud computing market within the next three years (just wild guesses, I have no insider's info on those companies).

  • Sun might go for a Java-oriented cloud computing framework, much like Windows Azure, leveraging their VirtualBox product.

  • Yahoo will probably release something based on Hadoop because they have publicly expressed a lot of interest in this area.

There will most probably be a myriad of small players providing tools and utilities built on top of those clouds, but I rather not expect small or medium companies to succeed at gaining momentum with their own grid.

In particular, it's unclear for me if open-source is going to play any significant role - at the infrastructure level - in the future of cloud computing. Although open-source will present at the application level.

Indeed, open-source is virtually nonexistent in areas such as web search engines (yes, I am aware of Lucene, but it's very far from being significant on this market). I am expecting a similar situation for the cloud market.


Some people are about privacy, security and reliability issues when opting for a cloud provider. My personal opinion on that is that those points are probably among strongest benefits of the cloud.

Indeed, only those who have never managed loads of applications may believe that homemade IT infrastructure management efficiently address privacy, security and reliability concerns. In my experience, achieving a good level of security and reliability is hard for IT-oriented medium-sized companies and much harder for large non-IT-oriented companies.

Also, I am pretty sure that those concerns are among top priorities for big cloud players. A no-name small cloud hosting company can afford a data leak, but for a Google-sized company, the damage caused by such an accident is immense. As a result, the most rational option consists in investing massive amount of efforts to prevent those accidents.

Basically, I think that clouds can significantly reduce the need for system administrators and infrastructure managers by providing a secure and reliable environment where getting security patches and fighting botnets is part of the service.

Drawback: re-design for the cloud

The largest drawback that I can see is the amount of work needed to migrate applications toward clouds. Indeed, cloud hosting is a very different beast compared to regular hosting.

  • Scalability only applies with proper application design - which varies from one cloud to another.

  • Data access latency is large: you need data caching everywhere.

  • ACID properties of your storage are loose at best.

Thus, I expect that the strongest hindering factor for cloud adoption will be the technical challenges caused by the cloud itself.

If you don't need scalability, hosting on expensive-but-reliable dedicated servers is still the fastest way to bring a software product to the market. Then, if you have happen to have massive computing needs, then you probably have massive sales as well, and well, sales fixes everything.

Computing resources being commoditized? Not so sure.

With all those emerging clouds, will we see a commoditization of the computing resources? I don't expect it.

Actually, cloud frameworks are very diverse, and switching from one cloud to another is going to involve massive changes at best and complete rewrite at worst. Let's see

  • Amazon provides on-demand instantiation of near physical servers running either Linux or Windows. The code can be natively executed on top of custom OS. Scalability is achieved through programmatic computing node instantiation.

  • Google App Engine provides a Python-only (*) web app framework. Each web request gets treated independently, and scalability is a given. The code is executed in a sandboxed virtual environment. The OS is mostly irrelevant.

  • Windows Azure offers a .NET execution environment associated with IIS. The code is executed in a sandboxed virtual environment on top of a virtualized OS. Scalability is achieved by having working instances "sleeping" and waiting for the surge of incoming work.

  • VMWare takes any OS image and bring it to the cloud. Scalability is limited but other benefits apply.

  • SalesForce provides a specific framework oriented toward enterprise applications.

(*) I guess that Google will probably release a reduced Java framework at some point, much like Android.

Thus, for the next couple of years, choosing a cloud hosting provide would most probably mean a significant vendor lock-in. One more reason not to go for small players.

Since cloud computing will be an emerging market for at least 5 years. YAWG - Yet Another Wild Guess: 18 months to get the cloud offers out of their beta statuses, 18 months to train hordes of developers against those new frameworks, 18 months to write or migrate apps. During this time, I expect aggressive pricing from all actors, and little or no abuse of the "lock-in" power.

Then, when the market matures, I guess that 3rd party providers will provide tools to ease, if not to automate, the migration from one cloud to another much like the Java-.NET conversion tools.