Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Entries in cloudcomputing (30)

Wednesday
Aug112010

Why perfectly reliable storage is not enough

Cloud computing now offers near perfectly reliable storage. Amazon D3 is announcing a 99.999999999% durability and the Windows Azure storage is in the same league.

Yet, perfectly reliable data storage does not prevent data loss - by a long range. It only prevents data loss caused by hardware failure, which nowadays are no more the most frequent cause for losing data.

The primary danger threatening your data is just plain accidental deletion. Yes, it's possible to setup administrative rights and so on to minimize the surface area of potential trouble. But at the end of the road, someone yields sysadmin powers over the data, and this person is just a few clicks away from causing a lot of trouble.

 A long established pattern to avoid those kind of trouble is  automated data snapshots taken on a daily or weekly basis that can be restored when something will go utterly wrong. In the SQL world, snapshots are given as any serious RDBMS do provide snapshotting as a basic feature at present day.

Yet, in the NoSQL world, things aren't that bright, and at Lokad, we realized that such obvious feature was still missing from the Windows Azure Storage.

Thus, today, we are releasing Lokad.Snapshot an open source C#/.NET app targeting the Windows Azure Storage and running on Windows Azure itself. Kudos to Christoph Rüegg, the primary architect of this app. Lokad.Snapshot offers automated snapshots for tables and blobs. In addition, Lokad.Snapshot exposes a Really Simple Monitoring endpoint to be consumed by Lokad.Monitoring.

The Lokad.Snapshot codebase should still be considered as beta, although the app is already in production for our own internal needs at Lokad. If your Azure data isn't snapshotted yet, make sure to have a look at Lokad.Snapshot, it might be a life-saver sooner than expected.

Saturday
Jul102010

Top 10 cloud computing predictions

The Microsoft World Partner Conference 2010 is due to begin next Monday, and it's clear that Windows Azure is going to be one of the product that will get the most attention this year.

Over the last 2 years, I have attended and even took part to many cloud computing talks, and I am hearing tons of very confused opinions on cloud computing, and even more concerning the future of cloud computing. Hence, here are my top 10 cloud computing predictions for the next 5 years.

1) Cloud will become mainstream in enterprise adoptions

Cloud computing is already mainstream in consumer markets. Amazon, Google, Yahoo, Microsoft, ... all of them are running on top of their own clouds. If you're using a web search engine, then you're using the cloud already. In the next 5 years, I expect the cloud to become the mainstream adoption pattern. I am NOT saying that the cloud will dominate the enterprise in just 5 years from now, I am saying that it will dominate setups and upgrades. It might take one or two decades to progressively move away from strict on-premise solutions.

2) ISVs will vastly dominate the overall cloud consumption

Yet, the migration toward the cloud will be implicit. Indeed, enterprises care little about cloud computing itself, they will buy SaaS solutions not raw processing power. The vast majority of those SaaS solutions will be powered by public clouds, but for non-IT companies this fact will be irrelevant. The economical forces will drive ISVs toward the cloud, which will vastly dominate the overall cloud consumption. Single-tenant apps have very hard time competing with the low management costs of multi-tenant apps. Nothing will actually prevent companies to buy raw cloud processing power, but I expect this behavior to be marginalized as the SaaS ecosystem grows.

3) Private clouds are nonexistent and will remain marginal

We keep hearing about private clouds, yet, if we exclude the few private clouds designed by internet consumer leaders (eBay, Yahoo, Facebook, Yandex, ...) that have not been turned into public clouds, there is NOTHING even close to a private cloud at present day on the market. The only product that would start looking like a private cloud is Eucaliptus, but it's still lightyears away from global solutions build on top of containerized data-centers that public clouds represent. The skills and the costs required to operate a cloud are steep, I can't figure out why would companies go for private clouds. Some will argue that control is of utmost importance, but shareholders might not agree when they will realize that a small cloud costs millions upfront, and millions for ongoing management. Although, companies with ad-hoc data centers will keep improving them, probably importing best practices established by major cloud hosters, but that's it. Yet, those improved data centers will still be extremely far feature-wise, reliability-wise, security-wise from public clouds.

4) Hybrid clouds are fantasy and will remain fantasy

Another myth I keep hearing about is the idea of hybrid clouds: you have your own private cloud, and when you lack capacity, you rent some extra from a public cloud. Although the idea is fascinating, IMHO, it's vastly impractical. Designing a true auto-scalable app on top of a cloud - any cloud - is already quite hard. Clouds are easing the scale-out process by offering some very normalized environments, but scaling-out remains a challenge, especially for enterprise apps. Offloading processing power into some heterogeneous computing environment is bad idea, software complexity would skyrocket, and it will fail like grid computing failed before. What was a nice idea in theory was just way to difficult to be routinely implemented. Although, please note I am not stating that hybrid clouds are impossible; I am just stating that it's very unwise, and that complexity will comes back as punishment.

5) Cloud mashups will be the dominant pattern

I expect SaaS mashups to become the dominant pattern in enterprise environments - for consumer environments, it's already the case. Companies and people alike will combine the apps they want most, irrespective of the underlying clouds. As a results, scenarios where a single company adopts Salesforce for the CRM, Microsoft BPOS for the collaborative suite and Netsuite for the ERP are likely. Obviously, those mashups will requires very capable integration tools, which will also be offered on the cloud. RunMyProcess would be a good example of such tools.

6) Self-hosted servers will be considered as liability

Some people consider self-hosted servers as more secure than remote or cloud-hosted solutions. As far I can tell, 99.99% of the time, this appears to be complete fallacy. Securing a computing environment takes skills that even my bank (a very large international bank) is obviously lacking. The situation is worse in nearly all non-IT companies I have investigated while running Lokad. Some companies happen to be very confident in their IT security, but most of time, it's just over-confidence, with no tangible processes to support this confidence. As cloud computing grows more mature, I expect the community consensus to gradually converge toward the opinion that unless proven otherwise, any self-hosted server should be considered as an IT liability.

7) No1 cloud issue will stay the lack of qualified manpower

Media, influencers, integrators, and cloud providers keep discussing the relative strengths and weaknesses of the cloud, but there is one issue that dwarf all others, and yet, this issue is barely mentioned: the extreme lack of talented workforces to develop in the cloud. Not believing me? Just try to hire any experienced cloud computing software architect. Hiring good developers is already extremely hard, hiring good developers who happen to have skills and experience in large scale distributed systems is only harder.

8) Fine-grained geolocation will be the No1 entry barrier

Two years ago I was stating that cloud computing was an arena for big players. I still believe this isn't going to change. In particular, geolocation capabilities - aka the possibility to bring the computing resources close to the end-user - are already exponentially increasing the entry costs in the cloud market. Closer data-centers will mean lower latencies, and smoother UI behaviors for cloud-hosted apps. Ultra-responsive UI are so much more enjoyable that it's little wonder than Google recently started to add website speed as an extra criterion for their website ranking. In 5 years, clouds will no more be expected to have half a dozen of worldwide locations (Windows Azure has 6 locations at the moment), but dozens, with a data-center close to every major megalopolis. Considering that each data center costs more than a few hundred millions USD, entering the cloud market will be just impossible for anyone but the largest IT companies of the planet.

9) Cloud computing is not going to kill desktop apps

Some believe that the cloud is going to kill desktop apps. I don't. I believe that all software areas will be growing (cloud / desktop / embedded / games, ...). There will be more desktop apps in 5 years from now, and WAY much more cloud apps. Although cloud computing will shift the purpose and the value of desktop apps. The AppStore is a good example of the strong interactions that are likely to exist between non-web apps and the cloud: apps are available on the cloud at any time, typically interact with the cloud, and bring a top user experience that would be very hard to deliver otherwise. And no I don't think that World of Warcraft is going to run on HTML 5 any time soon.

10) Dev stacks are going to develop their cloud affinity

The software world is basically divided between a hand-few development stacks: Microsoft/.NET, Linux/LAMP, Oracle/Java, ... I expect each stack to develop some growing affinity with one public cloud in particular. The .NET world as already a very natural orientation toward Windows Azure. Linux-based solutions will keep moving forward with Amazon, eventually Rackspace. As Google is expending the coverage of its App Engine, I expect more development Java/Python tools to be released - basically the ones internally developed and used at Google. Some are dreaming about cloud interoperability, but considering the pace of change in the cloud computing world, I don't see that happening in the next 5 years.

What are your predictions for the cloud in the next 5 years?

Saturday
May152010

Really Simple Monitoring

Moving toward cloud computing relieves from (most) hardware downtime worries, yet, cloud computing is no magic pill that garanties that every single of our apps is ready to serve users as expected.

You need a monitoring system to achieve this. In particular, OS uptime and simple HTTP responsiveness is only scratching the surface as far monitoring is concerned.

In order to go beyond plain uptime monitoring, Lokad has started a new Windows Azure open source project named Lokad.Monitoring. The project comes with several tenants:

  • A monitoring philosophy,
  • A XML format, the Really Simply Monitoring (shamelessly inspired by RSS),
  • A web client for Windows Azure

Beta is version is already in production. Check project introduction page.

Wednesday
Apr282010

Sqwarea, open source game on Windows Azure

Beyond running a small software company, I am also responsible for the Sofware Engineering and Distributed Computing course at the ENS Paris. For the fourth year in a row, Microsoft offered gracious support for this course (include some Windows Azure resources).

Every year, a small dozen of 1st year Computer Science students take over a sofware project. Last year, my students produced Clouster, a scalable clustering algorithm on top of Windows Azure. It was already significant achievement considering the beta status of Windows Azure at the time (student upgraded twice from a SKD version to another during the time of the course).

This year, my students went (*) for an online massively multiplayer strategy game named Sqwarea (heavy contraction of square+war+area).

You are a King battling over a gigantic map to conquer the world. Train soldiers, conquer new territories, and resist the assault of other kingdoms. The world is flat, see for yourself.

Despite my teaching methods, students managed to do really great (especially considering that we are only at 2/3 of the project at this point of time), so let's review a few salient facts about this project:

  • Open source, see sqwarea.codeplex.com
  • ASP.NET MVC, C#, jQuery, OpenId for the front-end.
  • Lokad.Cloud for the persistence, and back-end execution framework.
  • Windows Azure used as the hoster.
  • Table Storage for the persistence (1 entity per map square).
  • Queue Storage to spread the workload among VMs.

 Then, in order to make sure, project wasn't going to be easy, I included a game rule real hard to implement:

People and soldiers have to be constantly reminded who is the King; otherwise, they just do it their own way. If, after a conquest, a part of your kingdom is no more connected to your King through a path of controlled land squares, then the disconnected area is reverted as neutral.

Apparently, students managed to implement a good (and expectedly complicated) scheme to get it this connectivity rule working in a very scalable way.

(*) Actually, every year, I choose the project to be carried on by my students. Hence, if you think the project idea is lame, blame me.

Friday
Apr092010

.NET profiler for Windows Azure

Under modern managed runtimes, performance profiling comes in two flavors:

  • CPU profiling
  • memory profiling

In last decade, the No1 breakthrough in the profiling arena was the introduction of sampling. Instead of intercepting every single method call, every single object allocation - introducing a 10x slowdown in the process - the profiler takes only sample at regular intervals.

Sampling decreases the accuracy in favor of gain in performance. In practice, sampling is not just a tradeoff, it's a game changer.

Indeed, even a modest sampling rate - say 2 or 3% of your processing capacity - give you already incredibly precise execution profile. Hint: with a 2Ghz CPU, 1% already accounts for 10M cycles per second.

With sampling, it becomes possible to aggregate fine grained execution statistics in production conditions, or even in actual production leaving the profiler ON all the time.

In the .NET ecosystem, Microsoft has been offering for years a free (yet rudimentary) memory profiler - while 3rd party vendors were providing more advance tools, such as the excellent dotTrace by JetBrains.

Lately, I discovered that Microsoft had released a new free CPU Profiler for .NET along with Visual Studio 2008. Caution: while running this tool for the 1st time, I did get a BSOD caused by an unsupported proc, problem was fixed through this hotfix .

The MS profiler is rather crude, especially on UI part. Yet, its strong orientation toward command-line and CSV/XML exports makes it rather handy for continuous integration scenarios where the profiler is run behind unit tests (or batch execs) putting the system under performance stress.

Back to the cloudy part announced in the post title, I believe that profilers will soon be considered a must-have components for cloud computing. Indeed, with the cloud you end-up precisely charged for the resources you consume.  Thus, the performance gains obtained with a profiler have a very real and very measurable ROI.

Cloud computing is not cheap per se: if you really want cheap stuff, you can roll your own hardware and get a 90% discount. Cloud computing is low cost only if performance is kept under control: no need to be a performance hero, but poor performance - that could be tolerated in good ol'days where the customer was paying for the hardware too - now impacts the SaaS vendor instead.

Forecasters expect the cloud computing market to top over dozens of billions: cost-killer technologies are bound to emerge in such a large market, and I expect profilers to be one of them.

Comparing the very marginal overhead of a sampling profiler to the significant savings that could be obtained by fine-tuning the precise hotspots of cloud apps, I expect cloud profilers to be used in the background for all apps in both testing AND production environments alike.

The strong orientation of Windows Azure toward .NET makes it one of the best cloud to introduce early on such a profiling layer on top of cloud apps.

I am actually toying with the idea of trying to run the MS profiler on Azure directly (you can run arbitrary exec files), however it may prove a bit difficult for the time being.