Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Monday
Feb082010

## Big Wish List for Windows Azure

At Lokad, we have been working with Windows Azure for more than 1 year now. Although Microsoft is a late entrant in the cloud computing arena, So far, I am extremely satisfied with this choice as Microsoft is definitively moving forward in the right direction.

Here is my Big Wish List for Windows Azure. It's the features that would turn Azure into a killer product, deserving a lion-sized market share in the cloud computing marketplace.

My wishes are ordered by component:

• Windows Azure
• SQL Azure
• Table Storage
• Queue Storage
• Blob Storage
• Windows Azure Console
• New services

### Windows Azure

Top priority:

• Faster CPU burst: The total time between initial VM request (through the Azure Console or the Management API), and the start of the client code execution is long, typically 20min, and in my (limited) experience above 1h for any larger number of VMs (say +10 VMs). Obviously, we are nowhere near real-time elastic scalability. In comparison, SQL Azure needs no more than a few seconds to instantiate a new DB. I would really like to see such an excellent behavior on the Windows Azure side.
• Smaller VMs: for now, the smallest VMs are 2GB large and costs $90/month, which brings the cost of a modest web app to 200 USD/month (considering a web role and a worker role). Competitors (such as Rackspace) are already offering much smaller VMs, down to 256MB per instance priced about 10x cheaper. I would really like to see that on Azure as well. Otherwise, scaled down apps are just not possible. • Per minute charge: for now Azure is charging by the hour, which means that any hour that your start consuming will be charged fully. Obviously, it would be a great incentive to improve performance to start charging by the minute, so that developers could really fine tune their cloud usage to meat the demand without wasting resources. Obviously, such a feature makes little sense if your VMs take 1h to get started. • Per-VM termination control: Currently, it is not possible to tell the Azure Fabric which VM should be terminated; which is rather annoying. For example, long running computations can be interrupted at any moment (they will have to be performed again) while idle VMs might be kept alive. • Bandwidth and storage quota: most apps are never supposed to be require truckloads of bandwidth or storage. If they do, it just means that something is going really wrong. Think of a loop endlessly pooling some data from a remote data source. With pay-as-go, a single VM can easily generates 10x its own monthly costs through a faulty behavior. To prevent such situations, it would be much nicer to assign quota for roles. Nice to have: • Instance count management through RoleEnvironment: The .NET class RoleEnvironment provides a basic access to the properties of the current Azure instance. It would be really nice to provide here a native .NET access to instance termination (as outlined here above), and instance allocation requests - considering that each role should be handling its own scalability. • Geo-relocation of services: Currently, the geolocation of a service is set at setup-time, and cannot be changed afterward. Yet, the default location is "Asia" (that's the first item of the list), which makes the process quite error-prone (any manual process should be considered as error-prone anyway). It would nicer if it was possible to relocate a service - eventually with a limited downtime, as it's only a corrective measure, not a production imperative. ### SQL Azure Top priority: • DB snapshot & restore toward the Blob Storage: even if the cloud is perfectly reliable, cloud app developers are not. The data of a cloud app (like any other app btw) can be corrupted by a faulty app behavior. Hence, frequent snapshots should be taken to make sure that data could be restored after a critical failure. The ideal solution for SQL Azure would be to dump DB instances directly into the Blob Storage. Since DB instances are kept small (10GB max), SQL Azure would be really nicely suited for this sort of behavior. • Smaller VM (starting at 100MB for$1 / month): 100MB is already a lot of data. SQL Azure is a very powerful tool to support scaled-down approaches, eventually isolating the data of every single customer (in case of a multi-tenant app) into an isolated DB. At $10/month, the overhead is typically too large to go for a such a strong isolation; but at$1/month, it would become the de-facto pattern; leading to smaller and more maintainable DB instances (as opposed to desperately trying to scale up monolithic SQL instances).
• Size auto-migration: Currently, a 1GB DB cannot be upgraded as 10GB instances. The data has to be manually copied first, and the original DB deleted later on (and vice-versa, the other way around). It would be much nicer if SQL Azure was taking care of auto-scaling up or down the size of the DB instances (within the 10GB limit obviously).

Nice to have:

• Geo-relocation of service: Same like above. Downtime is OK too, just a corrective measure.

### Table Storage

Top priority:

• REST level .NET client library: at present time, Table Storage can only be accessed though an ADO.NET implementation that proves to be rather troublesome. ADO.NET feels in the way if you really want to get the most of Table Storage. Instead, it would be much nicer if a .NET wrapper around the REST API was provided as low-level access.

Nice to have:

• Secondary indexes: this one has already been announced; but I am re-posting it here as it would be a really nice feature nonetheless. In particular, this would be very handy to reduce the number of I/O operations in many situations.

### Queue Storage

Nice to have:

• Push multiple messages at once: the Queue offers the possibility to dequeue multiple messages at once; but messages can only be queued one by one. Symmetrizing the queue behavior by offering batch writes too would be really nice.

### Blob Storage

Nice to have:

• Reverse Blob enumeration: prefixed Blob enumeration is probably one of the most powerful of the Blob Storage. Yet, items can only be enumerated in increasing order against their respective blob names. Yet, in many situation the "canonical" order is exactly the opposite of what you want (ex: retrieve blob names prefixed by dates, starting by the most recent ones). It would be really nice if it was to possible to enumerate the other way around too.

### Windows Azure Console

The Windows Azure Console is probably the weakest component of Windows Azure. In many ways, it's a real shame to see such a good piece of technology so much dragged down by the abysmal usability of its administrative web client.

Top priority:

• 100x speed-up: when I say 100x, I really mean it; and even with 100x factor, it will still be rather slow by most web standards, as refresh latency of 20min is not uncommon after updating the configuration of a role.
• Basic multi-user admin features: for now, the console is a mono-user app which is quite a pain in any enterprise environment (what happens when Joe, the sysadmin, goes in vacations?). It would much nicer if several Live ID could be granted access rights to an Azure project.
• Billing is a mess, really: beside the fact that about 10 counter-intuitive clicks are required to navigate from the console toward your consumption records; the consumption reporting is still of substandard quality. Billing cries for massive look & feel improvements.

Nice to have:

• Project rename: once named, projects cannot be renamed. This situation is rather annoying as there are many situations that would call for a naming correction. At present time, if you are not satisfied with your project name, you've got no choice but to reopen an Azure account; and starts all over again.
• Better handling of large projects: the design of the console is OK if you happen to have a few services to manage, but beyond 10 services, the design starts being messy. Clearly, the console has not been designed to handle dozens of services. It would be way nicer to have a compact tabular display to deal with the service list.
• Aggregated dashboard: Azure services are spread among many panels. With the introduction of new services (Dallas, ...), getting a big picture of your cloud resources is getting more and more complex. Hence, it would be really nice to have a dashboard aggregating all resources being used by your services.
• OpenID access: Live ID is nice, but OpenID is nice too. OpenID is taking momentum, I would be really nice to see Microsoft supporting OpenID here. Note that there is no issue to support LiveID and OpenID side by side.

### New services

Finally, there are a couple of new services that I would be thrilled to see featured by Windows Azure:

• .NET Role Profiler: in a cloud environment, optimizing has a very tangible ROI, as each performance gain will be reflected through a lower consumption bill. Hence, a .NET profiler would be a killing service for cloud apps based on .NET. Even better, low overhead sampling profilers could be used to collect data even for systems in production.
• Map Reduce: already featured by Amazon WS, it's such a massively useful for the rest of us (like Lokad) who perform intensive computations on the cloud. Microsoft has already been moving forward with DryadLinq in this direction, but I am eager to see how Azure will be impacted.

This is a rather long list already. Did I forget anything? Just let me know.

Friday
Jan152010

## Fat entities for Table Storage in Lokad.Cloud

After realizing the value of the Table Storage, giving a lot of thoughts about higher level abstractions, and stumbling upon a lot of gotcha, I have finally end-up with what I believe to be a decent abstraction for the Table Storage.

The purpose of this post is to outline the strategy adopted for this abstraction which is now part of Lokad.Cloud.

Table Storage (TS) comes with an ADO.NET provider part of the StorageClient library. Although I think that TS itself is a great addition to Windows Azure, frankly, I am disappointed by the quality of the table client library. It looks like an half backed prototype, far from what I typically expect from a v1.0 library produced by Microsoft.

In many ways, the TS provider now featured by Lokad.Cloud is a pile of workarounds for glitches that are found in the underlying ADO.NET implementation; but it's also much more than that.

The primary benefit brought by TS is a much cheaper way of accessing fine grained data on the cloud, thanks to the Entity Group Transactions.

Although, secondary indexes may bring extra bonus points in the future, cheap access to fine grained data is basically the only advantage of Table Storage compared to the Blob Storage at present day.

I believe there are couple of frequent misunderstandings about Table Storage. In particular, TS is nowhere an alternative to SQL Azure. TS features nothing you would typically expect from a relational database.

TS does feature a query language (pseudo equivalent of SQL), that supposedly support querying entities against any properties. Unfortunately, for scalability purposes, TS should never be queried without specifying row keys and/or partition keys. Indeed, specifying arbitrary properties may give a false impression that it just works ; yet to perform such queries, TS has no alternative but to scan the entire storage, which means that your queries will become intractable as soon your storage grows.

Note: If your storage is not expect to grow, then don't even bother about Table Storage, and go for SQL Azure instead. There is no point in dealing with the quirks of a NoSQL store, if you don't need to scale in the first place.

Back to original point, TS features cheaper data access costs, and obviously this aspect had to be central in Lokad.Cloud - otherwise, it would not been worthwhile to even bother with TS in the first place.

### Fat entities

To some extend, Lokad.Cloud puts aside most of the property-oriented features of TS. Indeed, query aspects of properties don't scale anyway (except for the system ones).

Thus, the first idea of was to go for fat entities. Here is the entity class shipped in Lokad.Cloud:

public class CloudEntity < T >
{
public string RowRey { get; set; }
public string PartitionKey { get; set; }
public DateTime Timestamp { get; set; }
public T Value { get; set; }
}

Lokad.Cloud exposes the 3 system properties of TS entities. Then, the CloudEntity class is generic and exposes a single custom property of type T.

When an entity is pushed toward TS, the entity is serialized using the usual serialization pattern applied within Lokad.Cloud.

This entity is said to be fat because the maximal size for CloudEntity in 1MB (actually it's 960KB) which corresponds to the maximal size for an entity in TS in the first place.

Instead of going for 64KB limitation per property, Lokad.Cloud offers an implementation that come with a single 1MB limitation for the whole entity.

Note: Lokad.Cloud relies, under the hood, on a hacky implementation which involves spreading the serialized representation of the CloudEntity over 15 binary properties.

At first glance, this design appears questionable as it introduces some serialization overhead instead of relying on the native TS property mechanism. Well, a raw naked entity costs about 1KB due to its Atom representation. In fact, the serialization overhead is negligible, even for small entities; and for complex entities, our serialized representation is usually more compact anyway due to GZIP compression.

The whole point of fat entities is to remove as much friction as possible from the end-developer. Instead of worrying about tight 64KB limits for each entity, the developer has only to worry about a single and much higher limitation.

Furthermore, instead of trying to cram your logic into a dozen of supported property types, Lokad.Cloud offers full strong-typing support through serialization.

### Batching everywhere

Lokad.Cloud features a table provider that abstracts the Table Storage. A couple of key methods are illustrated below.

public interface ITableStorageProvider
{
void Insert(string tableName, IEnumerable> entities);
void Delete(string tableName, string partitionKeys, IEnumerable rowKeys);
IEnumerable> Get(string tn, string pk, IEnumerable rowKeys);
}

Those methods have no limitations concerning the number of entities. Lokad.Cloud takes care of building batches of 100 entities - or less, since the group transaction should also ensures that the total request weight less than 4MB.

Note that the 4MB restriction of the TS for transactions is a very reasonable limitation (I am not criticizing this aspect) , but the client code of the cloud app is really not the right place to enforce this constraint as it significantly complicates the logic.

Then, the table provider also abstracts away all subtle retry policies that are needed while interacting with TS. For example, when posting a 4MB transaction request, there is a non-zero probability of hitting a OperationTimedOut error. In such a situation, you don't want to just retry your transaction, because its very likely to fail again. Indeed, time-out happens when your upload speed does not match the 30s time-out of the TS. Hence, the transaction needs to be split into small batches, instead of being retried as such.

Lokad.Cloud goes through those details so that you don't have to.

Monday
Jan112010

## Scaling-down for Tactical Apps with Azure

Cloud computing is frequently quoted for unleashing the scalability potential of your apps, and the press goes wild quoting the story of XYZ a random Web 2.0 company that as gone from a few web servers to 1 zillion web servers in 3 days due to massive traffic surge.

Yet, the horrid truth is: most web apps won’t ever need to scale, and an even smaller fraction will even need to scale out (as opposed of scaling up).

A pair of servers already scales up to millions of monthly visitors for an app as complex as StackOverflow. Granted, people behind SO did a stunning job at improving the performance of their app, but still, it illustrates that moderatly scaling up already brings you very far.

At Lokad, although we direly need our core forecasting technology to be scalable, nearly all  other parts do not. I wish we had so many invoices to proceed that we would need to scale out our accounting database, but I don’t see that happen any time soon.

Actually, over the last few months, we have discovered that cloud computing have the potential to unleash yet another aspect in the software industry: the power of scaled-down apps.

There is an old rule of thumb in software development that says that increasing the complexity of a project by 25% increases the development efforts by 200%. Obviously, it does not look too good for the software industry, but the bright side is: if you cut the complexity by 20% then you halve the development effort as well.

Based on this insight, I have refined the strategy of Lokad with tactical apps. Basically, a tactical app is a stripped-down web app:

• not core business, if the app crashes, it’s not a showstopper.
• features are fanatically stripped down.
• from idea to live app in less than 2 weeks, single developer on command.
• no need to scale, or rather very unlikely.

Over the last couple of weeks, I have released 3 tactical apps based on Windows Azure:

Basically, each app took me less than 10 full working days to develop, and each app is addressing some long standing issues in its own narrow yet useful way:

• Website localization had been a pain for us from the very beginning. Formalized process where tedious, and by the time the results were obtained, translations were already outdated. Lokad.Translate automates most of the mundane aspect of website localization.
• Helping partners figuring out their own implementation bugs while they were developing against our Forecasting API was a slow painful process. We had to spend hours guessing what could be the problem in partner's code (as we typically don’t have access to the code).
• Helping prospects to figure out how to integrate Lokad in their IT, we end-up facing about 20 new environments (ERP/CRM/MRP/eCommerce/…) every week, which is a daunting task for a small company such as Lokad. Hence, we really need to plug partners in, and Lokad.Leads is helping us to that in a more straightforward manner.

Obviously, if we were reach 10k visits per day for any one of those apps that would be already a LOT of traffic.

#### The power of Windows Azure for tactical apps

Tactical apps are not so much a type of apps but rather a fast-paced process to deliver short-term results. The key ingredient is simplicity. In this respect, I have found that the combination of Windows Azure + ASP.NET MVC + SQL Azure + NHibernate + OpenID is a terrific combo for tactical apps.

Basically, ASP.NET MVC offers an app template that is ready to go (following Ruby on Rails motto of convention over contention). Actually, for Lokad.Translate and Lokad.Debug, I did not even bother in re-skinning the app.

Then, Windows Azure + SQL Azure offer an ultra-standardized environment. No need to think about setting up the environment, environment is already setup, and it leaves you very little freedom to change anything which is GREAT as far productivity is concerned.

Also, ultra-rapid development is obviously error-prone (which is OK because tactical apps are really simple). Nevertheless, Azure provides a very strong isolation from one app to the next (VM level isolation). It does not matter much if one app fails and dies suffering some terminal design error, damage will be limited to app itself anyway. Obviously, it would not have been the case in a shared environment.

Finally, through OpenID (and its nice .NET implementation), you can externalize the bulk of your user management (think of registration, confirmation emails, and so on).

At this point, the only major limitation for tactical apps is the Windows Azure pricing which is rather unfriendly to this sort of apps, but I expect the situation to improve over 2010.

·         not part of your core business, if the app crashes, it’s annoying, but not a showstopper.

·         features are fanatically stripped down.

·         from idea to live app in less than 2 weeks, single developer on command.

·         no need to scale, or rather, the probability of needing scalability is really low.

·         addresses an immediate need, ROI is expected in less than a few months.

Thursday
Dec032009

## O/C mapper for TableStorage

The Table Service API is the most subtle service provided among the cloud storage services offered by Windows Azure (with also include Blob and Queue Series for now). I did struggle a while to eventually figure out what was the unique specificity of Table Storage from a scalability perspective or rather from a cost-to-scale perspective as the cloud charges you according to your consumption.

Since the scope of the Table Storage remained a fuzzy element for me for a long time, the beta version of Lokad.Cloud does not include (yet) support for Table Storage although. Rest assured that this is definitively part of our roadmap.

### TableStorage vs. others

Let's start by identifying the specifics of TableStorage compared to other storage options:

• Compared to Blob Storage,
• Table Storage provides a much cheaper fine-grained access to individual bits of information. In terms of I/O costs, Table Storage is up to 100x cheaper than Blob Storage through Entity Group Transaction.
• Table Storage will (in a near feature) provides secondary indexes while the Blob Storage only provide 1 single hierarchical access to blobs.
• Compared to SQL Azure,
• Table Storage lacks about everything you would expect from a relational database. You cannot perform any Join operation or establish a Foreign key relationship and this is very unlikely to be ever available.
• yet, while SQL Azure is limited to 10GB (this value might increase in the future, this is really not the way to go), Table Storage is expected to be nearly infinitely scalable for its own limited set of operations.

The StorageClient library shipped with Azure SDK is nice as it provides a first layer of abstraction against the raw REST API.  Nevertheless, coding your app directly against the ADO.NET client library seems painful due to the many implementation contraints that comes with the REST API. Further separation of concerns is needed here.

### The Fluent NHibernate inspiration

TableStorage has way much less expressivity than relational databases, nonetheless, classical O/R mappers are great source of inspiration, especially nicely designed ones such as NHibernate and its must-have addon Fluent NHibernate.

Although, the mapping entity-to-object isn't that complex in the case of TableStorage, I firmly believe that a proper mapping abstraction ala Fluent NH could considerably ease the implementation of cloud apps.

Among key scenarios that I would like to see addressed by Lokad.Cloud:

• A seamless management of large entity batches when no atomicity is involved: let's say you want to update 1M entities in your Table Storage. Entity Group can actually reduce I/O costs by 100x. Yet, Entity Group comes with various constraints such as no more than 100 entities per batch, no more than 4MB by operation, ... Fine-tuning I/O from the client app would have to be replicated for every table, it really makes sense to abstract that away.
• A seamless overflowing management toward the Blob Storage. Indeed, Lokad.Cloud already natively push overflowing queued items toward the Blob Storage. In particular, Table Storage assume than no properties should weight more than 64kb, but manually handling the overflow from the client app seems very tedious (actually a similar feature is already considered for blogs in SQL Azure).
• A more customizable mapping from .NET type to native property types. The amount of property types supported by the Table Storage is very limited. Although a few more types might be added in the future, Table Storage won't (ever?) be handling native .NET type. Yet, if you have a serializer at hand, problem is no more.
• A better versioning management as .NET properties may or may not match the entity properties. Fluent NH has an exemplary approach here: by default, match with default rule, otherwise override matching. In particular, I do not want the .NET client code to be carved in stone because of some legacy entity that lies in my Table Storage.
• Entity access has to be made through indexed properties (ok, for now, there isn't many). With the native ADO.NET, it's easy to write Linq queries that give a false sense of efficiency as if entities can be accessed and filtered against any property. Yet, as data grow, performance is expected to be abysmal (beware of timeouts) unless entities are accessed through their indexes. If data is not expected to grow, then you go for SQL Azure instead, as it's way more convenient anyway.

Any further aspects that should be managed by the O/C mapper? Any suggestion? I will be coming back soon with some more implementation details.

Saturday
Oct102009

## Windows Azure deserves a public roadmap

Last week, I had the chance to meet in person with Steve Marx and Doug Hauger, two key people part of the Windows Azure team at Microsoft.

First of all, I have been really pleased, those folks are brilliant. My own little company is betting a lot on Windows Azure. When I tell people (partners, investors, customers) about the amount of work involved to migrate Lokad toward the cloud, the most frequent feedback is that I am expecting way too much from Microsoft, that Lokad is taking way too much risk too rely on unproved Microsoft products, that Microsoft failed many times before, ...

My own belief in that matter is that Microsoft is a large company, with loads of talented people and loads of not so talented people too. Yet it seems clear to me now that Microsoft has gathered a top notch team on Windows Azure, and this alone is a very healthy sign concerning the future of Windows Azure.

In particular, Doug Hauger spend a lot time to explain to me his vision about the future of Windows Azure. Again, it was brilliant. Unfortunately, due to NDA, I won't be able to discuss here the most salient aspects of this roadmap. It's a bit sad because I am pretty sure that most of the Azure community would be thrilled - like I am - if this vision was openly shared.

Among all projects going on at Microsoft, on team that I like a lot is the C# team. In my humble opinion, C# is about one of the finest product ever released by Microsoft; and one thing that I appreciate a lot about the C# team is that they openly discuss their roadmap. C# 4.0 is not even released, and that have already started to discuss features that lies further ahead. If C# is such a good product, I believe it's precisely because every feature get openly discussed so much.

Back to Windows Azure, I think everybody would agree that cloud computing is, as a technology, about several order of magnitude more complex than any programming language (even C#). My own experience - reading questions asked on the Windows Azure Forums - is that many developers still fails to understand the cloud, and keep asking for the wrong features (ex: Remote Desktop). A roadmap would help people to avoid such pitfall, as it would make it much more obvious to see where Azure is heading.

Then, when we started migrating Lokad toward Azure about 6 months ago, we build our architecture upon a lot of guesses about the features that were most likely to be shipped with Windows Azure. So far, we have been really lucky, and Doug Hauger just confirmed me last week loads of things that we were only guesstimating so far. Yet, I would have been 10x more confident in the roadmap had been available from the start. You can't expect people to be that lucky at doing forecasts as a line of business.

The world is vast, and no matter how dedicated is the Azure team, it does not seems reasonable to expect they will be able to spend hours with every partner to enlight them with their secret roadmap. Private roadmaps just don't scale. Considering that Microsoft is a late entrant in the cloud computing market (Amazon EC2 has been in production for more than 2 years), a public disclosure of their roadmap seems unlikely to profit to any competitor (or rather the profit will be very marginal).

In the other hand, an Azure roadmap would heavily profit in very certain ways to all the partners already investing on Windows Azure; plus it would also help convincing other partners that Azure is here to stay, not just cover fire.

Page 1 ... 2 3 4 5 6