Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Tuesday
Jul282009

Thoughts about the Windows Azure pricing

Microsoft has recently unveiled its pricing for Windows Azure. In short, Microsoft did exactly align with the pricing offered by Amazon. CPU costs $0.12 / h, meaning that a single instance running 24/24 for a month costs$86.4 which is fairly expensive compared to classical hosting provider where you can get more for basically half the price.

But well, this situation was expected as Microsoft probably does not want to start a price war with his business partners still selling dedicated Windows Server hosting. Current Azure pricing is sufficiently high to deter most companies except the ones who happen to have peaky needs.

To me, the Azure pricing is fine except in 3 areas:

• Each Azure WebRole costs at least $86.4 / month no matter how few web traffic you have (reminder: with Azure you need a distinct webrole for every distinct webapp). This situation is caused by the architecture of Windows Azure where a VM gets dedicated for every WebRole. If we compare to Google App Engine (GAE), the situation does not looks to good for Azure, indeed, with GAE, hosting a low traffic webapp is virtually free. Free vs.$1000 / year is likely to make a difference for most small / medium businesses, especially if you end-up with a dozen of webapps to cover all your needs.

• Cloud Storage operations are expensive: the storage itself is rather cheap $0.15 / GB / month, but the cost of$0.01 per 10K operations might be a killer for cloud apps intensively relying on small storage operations. Yes, one can argue that this price ain't cheaper with AWS, but this is not entirely true as AWS provides other services such as the block storage that comes with 10x lower price per operation (EBS could be used to lower the pressure on blob storage whenever possible).

• Raw CPU at $0.12 / h is expensive and Azure offers no solution to lower this price whereas AWS offers CPU at$0.015 / h through their MapReduce service.

Obviously, those pricing weaknesses closely reflect missing cloud technologies for Azure (at the moment). The MapReduce issue will be fixed when Microsoft ports DryadLinq to Azure. Block storage and shared low cost web hosting might be also on their way too (although I have little info on that matter). As a side note, the Azure Cache Provider might be a killing tool to reduce the pressure on the cloud storage (but pricing is unknown yet).

As a final note, it's interesting to see that the cloud computing pricing is really dependent on the quality of the software used to run the cloud. Better software typically leads to computing hardware being delivered at much lower costs, almost 10x lower costs in many situations.

Tuesday
May192009

FIPFO - First In Probably First Out

The FIFO (First In First Out) is a very well known concept in computer science. In one of my previous post, I used the word FIPFO to refer to First In Probably First Out to refer to the cloud equivalent of the FIFO.

Indeed, the basic idea behind that term is that you can't scale much pure FIFOs due to synchronization constraints. Yet, if you just loosen a little bit the semantic, that is to say, FIPFO, then you have an infinitely scalable data structure.

Considering its simplicity and usefulness, I believe that FIFPO will be ubiquitous in future cloud computing applications.

Matthieu, a colleague at Lokad, was asking me if FIPFO was a well-known concept or yet another wacky term that I had just made-up on my blog. Well, sorry, I can't really remember. FIPFO seems just to be a very appropriate way to describe data structure such as the Windows Azure Queue, but I am not sure if I read about it elsewhere now.

According to Google, there is just a single other result at the time for FIPFO by another person who came up with this term a few days before my initial post while describing the Drupal API. Yet, I had never read the Drupal forums before, so I guess we did separately end up with the same idea and terminology.

Monday
Apr062009

Cloud Computing vs. Hardware as a Service

In a previous post, I have discussed why I believed that cloud computing was going to be a big player arena, and not a friendly place for the little guys.

Recently, many people told about such and such small company that was supposed to deliver cloud computing too, and that their service would match the ones offered by big players.

Basically, the discussion goes like this:

Hey, we too are able to instantiate virtual machines on-demand. We have some nice virtual machine deployment scripts, a nice WebUI to administrate all the nodes, we are now matching the Amazon offer.

Nope, you’re not.

Basically, what those little players are doing is simply Hardware as a Service. For years now, computing hardware has been more or less an on-demand commodity. My favorite host can typically set-up a new server in 48h, and I can cancel my subscription anytime (although I will have to pay for the entire month). Some more aggressive host providers are providing fully automated server setup, and your new server is usually available in less than 1h.

Now, what those small companies calling themselves cloud providers are able setup new servers in seconds instead of minutes; and the trick to do that is simple: they use virtualization and deployment scripts.

But, in my opinion, this isn’t cloud computing, this is just hardware as a service with a lower overhead both at infrastructure level, but also for the system administrators themselves.

So, what is so radically different with cloud computing?

In my opinion, the radical novelty of cloud computing is the promise that you won’t have to worry about resource allocation anymore.

In particular, I don’t want to figure out if I need 1, 2, 3 or 42 computing nodes to handle a massive web traffic surge from Slashdot, I just want to tell my cloud provider:

Here is the script for my web page, do whatever is needed to ensure good performance, and send me the bill at the end of the month.

Note that this is exactly what Google App Engine is doing. Google App Engine is relieving web developers from the burden of having to figure out how they are going to scale their web apps. Google is doing the magic for them so that web developers can focus on the specific value of their web apps instead of focusing on the complex infrastructure actually needed to achieve scalability.

Quoting Thomas Serval from Round Table about Azure a few months ago:

In the past, each time we have multiplied the traffic by 10 on our applications, we have been forced more or less to rewrite the application from scratch. The promise of cloud computing is to let you achieve unlimited scalability from day one.

Obviously, cloud computing isn’t magic, thus applications will need to be carefully designed to achieve unlimited scalability, yet I believe that thanks to the cloud computing frameworks currently being published, it won’t be that hard in the future.

Thus cloud computing is not Hardware as a Service simply because Hardware as a Service does not do anything about scalability in itself.

The true benefits of cloud computing is to provide what I would call scalable computing abstractions. Those abstractions represent physical resources such as CPU, memory or bandwidth, but with additional constraints (usually structural constraints) so that it becomes actually possible to provide an infinitely scalable instance of the desired resource.

For example, nowadays more or less all cloud providers are including in their offer a distributed and reliable hashtable implementation: S3 for Amazon, Blob Storage for Windows Azure, ... FIPFO is another popular scalable storage abstraction: First-In Probably First Out, i.e. queues but without deterministic behavior. As long as you rely only on those scalable storage abstractions, you should not care about scalability of your storage.

So far, scalable storage abstractions have been the primary focus of most cloud providers. Yet, I suspect that the next battle will be scalable CPU abstractions.

Indeed, Amazon has recently unveiled their now Amazon Elastic MapReduce, and as other people believe, I too believe that MapReduce will be a game changer. First, Amazon is delivering CPU at 0.015 USD/h while its competitors are still above 0.10 USD/h at the time. Then, if we consider that the Amazon native MapReduce implementation is going to be way more efficient than custom in-house implementations - simply because Amazon folks have the time and the experience needed to get the load balancing settings rights - then Amazon has just divided the CPU price by 10.

Then, what I see as the killing benefit of MapReduce is that I don’t have to care anymore about how many nodes I need.

For example at Lokad, we have tons of time-series to process. Let say that we want to extract seasonality patterns out of 100 millions time-series (each time time-series ranging from a few hundreds to a few thousands points). With MapReduce, I just have to specify the algorithm to process a single time-series and pass the huge time-series collection as argument. The cloud infrastructure will be handling all the magic for me. In particular, I don’t have to care anymore about node crashing along the way, or about dynamically expanding / shrinking the number of computing nodes.

MapReduce is a very constrained framework that forces you to apply the very same function everywhere, but the input collection can be arbitrarily large. In my experience, if you’re not able to scale a data-mining problem through MapReduce, then nothing will – or, more precisely, the design complexity will be so great that you are most likely to give up anyway.

Those scalable resource abstractions represent the core value offered by cloud providers. Yet, those scalable resource abstractions are truly hard to design and even harder to optimize. Yes, you might know a small company that auto-deploys virtual machines, but, in my opinion, this does not reach even 10% of the potential benefits brought by the cloud.

Those benefits will be achieved though scalable resource abstractions; and each one of those abstractions is going to cost a massive amount of brain power to get done right.

Tuesday
Nov112008

Cloud computing: a personal review about Azure, Amazon, Google Engine, VMWare and the others

My own personal definition of cloud computing is a hosting provider that delivers automated and near real time arbitrary large allocation of computing resources such as CPU, memory, storage and bandwidth.

For companies such as Lokad, I believe that cloud computing will shape many aspects of the software business in the next decade.

Obviously, all cloud computing providers have limits on the amount of resources that one can get allocated, but I want to emphasize that, for the end-user, the cloud is expected to be so large that the limitation is rather the cost of resource allocation, as opposed to hitting technical obstacles such as the need to perform a two-weeks upgrade from one hosting solution to another.

Big players arena

Considering that the ticket for state-of-the art data centers is now reaching \$500M, cloud computing is an arena for big players. I don't expect small players to stay competitive for long in this game.

The current players are

• Amazon Web Services, probably the first production-ready cloud offer on the market.

• Windows Azure just unveiled by Microsoft a few weeks ago.

• VMWare specialist of virtualization who unveiled their Cloud vService last September.

• Salesforce and their Platform as a Service offering. Definitively cloud computing, but mostly restricted to B2B apps oriented toward CRM.

Then, I expect a couple of companies to enter the cloud computing market within the next three years (just wild guesses, I have no insider's info on those companies).

• Sun might go for a Java-oriented cloud computing framework, much like Windows Azure, leveraging their VirtualBox product.

• Yahoo will probably release something based on Hadoop because they have publicly expressed a lot of interest in this area.

There will most probably be a myriad of small players providing tools and utilities built on top of those clouds, but I rather not expect small or medium companies to succeed at gaining momentum with their own grid.

In particular, it's unclear for me if open-source is going to play any significant role - at the infrastructure level - in the future of cloud computing. Although open-source will present at the application level.

Indeed, open-source is virtually nonexistent in areas such as web search engines (yes, I am aware of Lucene, but it's very far from being significant on this market). I am expecting a similar situation for the cloud market.

Benefits

Some people are about privacy, security and reliability issues when opting for a cloud provider. My personal opinion on that is that those points are probably among strongest benefits of the cloud.

Indeed, only those who have never managed loads of applications may believe that homemade IT infrastructure management efficiently address privacy, security and reliability concerns. In my experience, achieving a good level of security and reliability is hard for IT-oriented medium-sized companies and much harder for large non-IT-oriented companies.

Also, I am pretty sure that those concerns are among top priorities for big cloud players. A no-name small cloud hosting company can afford a data leak, but for a Google-sized company, the damage caused by such an accident is immense. As a result, the most rational option consists in investing massive amount of efforts to prevent those accidents.

Basically, I think that clouds can significantly reduce the need for system administrators and infrastructure managers by providing a secure and reliable environment where getting security patches and fighting botnets is part of the service.

Drawback: re-design for the cloud

The largest drawback that I can see is the amount of work needed to migrate applications toward clouds. Indeed, cloud hosting is a very different beast compared to regular hosting.

• Scalability only applies with proper application design - which varies from one cloud to another.

• Data access latency is large: you need data caching everywhere.

• ACID properties of your storage are loose at best.

Thus, I expect that the strongest hindering factor for cloud adoption will be the technical challenges caused by the cloud itself.

If you don't need scalability, hosting on expensive-but-reliable dedicated servers is still the fastest way to bring a software product to the market. Then, if you have happen to have massive computing needs, then you probably have massive sales as well, and well, sales fixes everything.

Computing resources being commoditized? Not so sure.

With all those emerging clouds, will we see a commoditization of the computing resources? I don't expect it.

Actually, cloud frameworks are very diverse, and switching from one cloud to another is going to involve massive changes at best and complete rewrite at worst. Let's see

• Amazon provides on-demand instantiation of near physical servers running either Linux or Windows. The code can be natively executed on top of custom OS. Scalability is achieved through programmatic computing node instantiation.

• Google App Engine provides a Python-only (*) web app framework. Each web request gets treated independently, and scalability is a given. The code is executed in a sandboxed virtual environment. The OS is mostly irrelevant.

• Windows Azure offers a .NET execution environment associated with IIS. The code is executed in a sandboxed virtual environment on top of a virtualized OS. Scalability is achieved by having working instances "sleeping" and waiting for the surge of incoming work.

• VMWare takes any OS image and bring it to the cloud. Scalability is limited but other benefits apply.

• SalesForce provides a specific framework oriented toward enterprise applications.

(*) I guess that Google will probably release a reduced Java framework at some point, much like Android.

Thus, for the next couple of years, choosing a cloud hosting provide would most probably mean a significant vendor lock-in. One more reason not to go for small players.

Since cloud computing will be an emerging market for at least 5 years. YAWG - Yet Another Wild Guess: 18 months to get the cloud offers out of their beta statuses, 18 months to train hordes of developers against those new frameworks, 18 months to write or migrate apps. During this time, I expect aggressive pricing from all actors, and little or no abuse of the "lock-in" power.

Then, when the market matures, I guess that 3rd party providers will provide tools to ease, if not to automate, the migration from one cloud to another much like the Java-.NET conversion tools.

Sunday
Nov022008

Installing VMWare Server 2.0 on a OVH RPS

A French hosting company called OVH provides an interesting offer named Real Private Server (RPS) as a intermediate solution between cloud computing and classical dedicated hosting. In a nutshell, RPS is a true dedicated server that comes with a virtual storage starting at 10 GB and going up to 1 TB.

OVH is pricing RPS very aggressively - 20EUR/month for a dual-core AMD64 and 2EUR /month/10GB - makes the RPS a very interesting offer for backup servers that essentially need a lot of reliable storage. It's roughly at the level of the Amazon S3 storage pricing (but you get a regular drive for that price).

Disclaimer: OVH offers rock-bottom hosting prices, but do not expect any support from the OVH staff. This approach might not suit your needs, make sure that you can live without contacts with the staff of your hosting provider before migrating anything to OVH. Then, don't expect much less than 48h of delay for almost any operation that need to be performed on the OVH side. Thus, if you plan to migrate to OVH, consider at least two weeks of delay to get things started smoothly (I mean delay and not actual work).

In this post, I will explain how to install VMWare Server 2.0 on a RPS box of OVH. Having little experience with Linux, I spend two full days on the case, and I got the feeling that it was quite a complicated and painful process. May this guide helps those who might follow the same path.

The process goes with

• Preparing your partitions and OS install.

• Getting the kernel source.

• Adding modules and recompiling the kernel

• Rebooting in HD mode.

• Some preliminary tweaks for modules

• Grab VMWare through Lynx

• Install VMWare

• Tweak your IPTables to get a remote access

First, go to the OVH website, select your RSP and pay. I suggest to go for RSP 3 (because of the 2 GB of RAM) and to directly opt for at least 10 GB of extra storage - otherwise you're likely to be short of storage when running your virtual machines. At this point, you're good for 48h of delay to get your RPS ready. Once your RPS is ready the first step consists of reinstalling the OS because the default one is not very suited for the VMWare install.

Go to the Manager webapp provided by OVH, select your RPS and choose Reinstall OS. I have been using the following parameters

• Fedora Core 8 - English

• iSCSI with 5 GB for / and the rest for /home.

Fedora Core 8 has been chosen for its support of the RPM packages. Indeed, in my experience the RPM install VMWare Server 2.0 was the least dreadful option.

Caution: do not select NFS, I learned the hard way that RSP is not able to boot on a NFS disk which will be necessary for VMWare. Then, the default partition settings is granting only 3 GB for the root partition with is too short, I suggest to go for 5 GB which was sufficient in my situation.

Launch the reinstall. The process is likely to take 1h - upon termination you get a notification email with the new access codes of your server.

At that point comes the difficult part: VMWare Server needs kernel modules that aren't included in any Linux distributions offered by OVH. Thus, you need to grad the kernel sources, recompile them with modules and finally deploy your fresh boot image.

You can download the kernel sources provided by OVH and compile them with

cd /usr/srcwget ftp://ftp.ovh.net/made-in-ovh/bzImage/linux-2.6.24.5-ovh.tar.bz2tar xf linux-2.6.24.5-ovh.tar.bz2cd linux-2.6.24.5-ovhwget ftp://ftp.ovh.net/made-in-ovh/bzImage/2.6-config-xxxx-std-ipv4-32mv 2.6-config-xxxx-std-ipv4-32 .configmake menuconfigmake

When the kernel configuration screen appears, just select Load alternative configuration, open the .config file, then selects the modules (initially unchecked) and exit and save. See also made-in-ovh for more stuff provided by OVH to tweak your Linux servers.

The kernel compilation takes about 30min.
Time to get a cup of coffee.

Now that you have a freshly compiled kernel, you need to copy the boot image into your /boot directly. First, go to OVH Manager, select your RPS, and within the Netboot options, choose HD which stands for local hard drive. For performance RPS are normally booting on shared kernel instances, thus you need to force the RPS to boot on its local drive, otherwise your boot changes will have no effect.

Then, in your RPS, you go with

yum install emacscp arch/i386/boot/bzImage /boot/2.6-config-custom-std-ipv4-32emacs /etc/lilo.conf/sbin/liloshutdown -r now

This will let you edit your LILO boot settings using emacs (you can use whatever editor you like). In the lilo.conf file, change the boot image name for the name of your newly copied image. You are now restarting with a custom kernel that supports modules.

Let's install VMWare Server. OK, the main joke about downloading VMWare is that you can't use wget because you have to go through VMWare web interface instead. Note that you can't cheat VMWare by cut-and-pasting the temporary download links, I tried, it does not work.

So I had to resort to the lynx web text browser (I had not been using it since ages) to actually grab your RPM file. Since lynx isn't exactly a super convenient way of surfing on the web, I suggest to first register on VMWare and then perform the last download step with lynx.

VMWare Server 2.0 is weighting 500 MB, that's why I told you to set 5 GB for your root partition.

Then, install VMWare Server with

rpm -i VMware-server-2.0.0-122956.i386.rpm

The first run is going to fail because VMWare complains that a module directory is missing.

Couldn't open directory /lib/modules/2.6.24.5-xxxx-std-ipv4-32

Just manually create the missing directory with

cd /lib/modulesmkdir 2.6.24.5-xxxx-std-ipv4-32

Although, it's a dirty hack, VMWare does not seem to need those modules anyway. Once the dummy module directory has been created, run the VMWare install.

Lost 4h on that one step.

Now that we have succeeded at installing VMWare Server, let's configure it with

/usr/bin/vmware-config.pl

Choose defaults. When asked for your kernel source code, enter the location where you've just compiled your kernel. It should be /usr/src/linux-2.6.24.5-ovh/include. When asked where to put your virtual machines choose /home/virtual-machines because you've got more storage on the /home partition. Enter your free VMWare license (you get it on the download page of VMWare).

At this point, VMWare Server is running. The version 2.0 comes with a very nice web administration interface (by default at https://localhost:8333/) but this interface is not remotely reachable yet because it's blocked by your default firewall settings. You need to add two lines to your iptables settings.

emacs /etc/sysconfig/iptables# insert the following lines# -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8222 -j ACCEPT# -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8333 -j ACCEPTservice iptables restart

Caution: I suggest to add those two lines just below the line that deals with the port 22 (the port used to perform SSH). Indeed, appending the lines at the end of the file is unlikely to work, because the input connections might have been already rejected by another iptables rule. After saving your changes, you need to restart the service.

Check that your VMWare web access console is remotely reachable. Good. Now, one more tweak

service vmware-autostart start

In order to let VMWare starts automatically at boot time.

This completes the setup of VMWare Server 2.0 on a RPS. If you've managed to do it in less than 4h, I am impressed.

Yet, the job isn't finished because you need now to setup your virtual machine. The hard part comes from the network settings, because OVH does not support the bridged networking setup. Stay tuned.

Page 1 ... 2 3 4 5 6