Lokad mentioned on Microsoft Senior VP blog

June 20th, 2009

My small company is getting visibility momentum. After managing to get copied by the Chinese Government itself, Lokad is now listed on the blog of S. Somagar, senior vice president of the Developer Division at Microsoft.

I am not exactly sure how S. Somagar ended-up on Lokad, but I don’t think that he personally spend time to carefully review each one of the 15.000 bizspark companies. Thus, I guess I have to thank Julien Codorniou for that :-) .

FIPFO - First In Probably First Out

May 19th, 2009

The FIFO (First In First Out) is a very well known concept in computer science. In one of my previous post, I used the word FIPFO to refer to First In Probably First Out to refer to the cloud equivalent of the FIFO.

Indeed, the basic idea behind that term is that you can’t scale much pure FIFOs due to synchronization constraints. Yet, if you just loosen a little bit the semantic, that is to say, FIPFO, then you have an infinitely scalable data structure.

Considering its simplicity and usefulness, I believe that FIFPO will be ubiquitous in future cloud computing applications.

Matthieu, a colleague at Lokad, was asking me if FIPFO was a well-known concept or yet another wacky term that I had just made-up on my blog. Well, sorry, I can’t really remember. FIPFO seems just to be a very appropriate way to describe data structure such as the Windows Azure Queue, but I am not sure if I read about it elsewhere now.

According to Google, there is just a single other result at the time for FIPFO by another person who came up with this term a few days before my initial post while describing the Drupal API. Yet, I had never read the Drupal forums before, so I guess we did separately end up with the same idea and terminology.

Machine learning company, what’s so special?

May 15th, 2009

Developing a machine learning startup is a very particular venture. Check out: Machine learning company, what’s so special? (based on my experience at Lokad)

Copied by the Chinese government

May 4th, 2009

Apparently, my company website has been copied by an official branch of the Chinese government. Although, Ghandi has said that Imitation was the sincerest form of flattery, I am not sure how I should handle such a blatant ripoff of Lokad’s copyrights.

Key interesting facts:

  • plenty of “left-over” on the Chinese website from the original one.
  • imaginative ways of recycling irrelevant illustrations.
  • it’s a .gov.cn website, that is to say an official Department of the Government of China.

The Business of Software folks have already quite few ideas on the subject. I will probably ponder the case a few days to decided what to do next.

Co-worker suggestion:
Rinat is suggesting me to recontact them saying that since they appear to like our website that much, they might want to try our forecasting technology too.

A few screenshots in case the Chinese website gets updated: 1, 2 and 3.

Startup Class ‘07 and ‘08 at Telecom ParisTech

April 21st, 2009

In my previous post if been detailing 9 steps to make sure your startup exists. Inspired by an initial idea of Chris Exline, I decided to make a small survey of the startups admitted at the Incubator of Telecom ParisTech in 2007 and 2008 (startups are hosted 18 months by the incubator, and then kicked-out, that’s the rule).

To figure out how well the startups of the incubator were doing, I came up with a simple score the startup websites.

Survival test for startup websites:

  • +2 if look & fell is GOOD.
  • +1 if look & fell is just OK (zero if horrid).
  • +2 if benefits of product or service is clear.
  • +1 if I must struggle to figure out the benefits.
    (zero if I am still clueless about benefits after struggling)
  • +1 if there is no happy talk
  • +1 if PageRank is greater than 3 for a B2B company.
  • +1 if PageRank is greater than 5 for a B2C company.
  • +1 if there is an English version.
  • +1 if people can buy or consume right away.
  • +1 if there are news.
  • +1 if there are forums.

The maximal score for this test is 10. One can argue that this test is very subjective. Frankly, after reviewing 50 companies, I rather think otherwise.

Any website with a decent, professional looking is ranked as GOOD with 2 points - no need for flashy graphics, decent is enough. In the other hand, if the website feels amateurish (colors messed-up, random layout) but still functional, then it’s OK, you get 1 point. If the website is utterly broken in design or in navigation, then it’s zero point.

Same for the benefits. If I can get a rough idea, in less than a minute, of the added-value of your company, then you get 2 points. I mean no need for detailed ideas, big picture is enough. If I have to struggle for 5 mins to finally guess what could be your added value, then it’s 1 point. If after 5 mins, I am still utterly clueless, then it’s zero point.

Concerning the PageRank, I am putting a much lower threshold for B2B website, because those folks typically need 100x times less customers than B2C companies to be profitable.

Not having a English version is like shooting yourself a bullet in your feet. The French market is small, so small, compared to USA+UK+Canada+India+Australia. To get 1 point here, you don’t need to have translated everything in English, any portion that makes sense is enough.

In my opinion:

  • any 6 months old startup should get at least 6/10.
  • any 18 months old startup should get 9/10.

I have collected raw data for 52 startups within a Google Spreadsheet, and here are the results at present date 2009-04-21.

Disclaimer: I have a strong bias toward Lokad since it’s my own company. Thus, its score is probably the least reliable score of the whole study(*). Concerning the other startups, I am not involved, thus, I feel more objective.

‘07 class

9 DisMoiOù
9 Lingueo
8 Connecthings
8 Helia
8 La Cartoonerie
8 LivePepper
8 Netineo
8 PREXENS
8 Teacheo
7 FrenchSet
6 Adminext
6 EtherTrust
6 InovaCours
6 Tellus
5 FamilyBy
5 Adipsys
5 Lixys
4 Needer
3 Connect and Go
3 Patent Organizer Software
3 Nexess
2 MobiNear
1 Alphacode
1 Système Polaire
0 Takys

Average score: 5.5

‘08 Class

9 OOdesk
9 Lokad
8 Accessif
8 Haploid
8 Hellocoton
8 PlayAdz
7 CapAngel
7 Jaxio
7 OhMyMode
6 Ecce Vino
6 Quelle Energie
6 Actimos
6 Kwaga
5 Ineovation
4 Eyes Triple Shut
4 Hedera Technology
4 Plugnsurf
3 Absysseo
3 Aquilant Technologies
3 Faveod
3 The Metrics Factory
2 FI Technologies
2 Media Mobility
2 nYouLinK
2 SeQureNet

Average score: 5.3

To be honest, those results look rather poor to me.

  • Two thirds of those startups don’t offer any chance to their customer to buy or consume the product or service online.
  • Roughly one third of those startups are not able to express the benefits they could bring to their customers.
  • More than half of the startups can’t get even a limited English version of their website.

Moreover, startups do not improve much over time. Considering 2007 vs 2008, if feel like if there were two categories of startups:

  • the ones that got a good website right from the start.
  • the ones that will never get a good one.

Yet, my own experience told me it’s so obviously not true. Just have a look at the first version of the Lokad website (score: 4) and compare with the current one. Granted, I am still far from what Branding Geniuses could produce, but still.

I would be interested to see how other incubators are doing on their own.

(*) Feel free to give us a score of zero if you think the Lokad website is damn broken - but please leave a comment so that we can improve :-) .

9 steps to make sure your startup exists

April 18th, 2009

My uISV isn’t even remotely an audience based business - we are on a narrow B2B segment - but since the very beginning, I have invested a lot of efforts to get a decent online presence. So far, every effort that I have pushed to strengthen the online presence was very significantly rewarded. Every week or so, excellent news just pop out of nowhere:

  • A consulting group wants to add the product to its portfolio.
  • A customer sends you a detailed spec of what you should be doing instead, and it happens to be really smart suggestions.
  • A large company wants to know if your product scales up to 1 zillion users, because they are considering buying a zillion licenses.

Nearly one year ago, I had the chance to get my own uISV admitted at the Startup Incubator of Telecom ParisTech. An incubator is a nice place to meet other people that are facing roughly the same sort of problems that you have. To my great surprise, most startups have poor online presence, and even more surprising, most investors seem to have no clue about online presence either.

It’s not clear how much it hurts the business; but in my opinion your online presence is the only tangible proof of your company existence for all people who do not happens to be within a 20km radius of your office.

Thus, here are my 9 steps to make sure the company has an online presence:

  1. No stealth-mode crap, get online, no excuse.
  2. Look & feel should be decent.
  3. Customer benefits come first.
  4. Happy talk has no place on your site.
  5. Decent Google PageRank is required.
  6. English is required.
  7. Public pricing is required.
  8. Blog is required.
  9. Community feedback should be possible.

1. No stealth-mode crap, get online, no excuse
People tend to think too much good of their own ideas. Ideas matters little while execution is everything. Remember that Google was half-a-decade late in the search engine race; idem with Facebook for social networking websites. Stealth development is a game for big players who can sustain years of R&D expenses with no visible returns and then inject millions in marketing once the technology is ready.

2. Look & feel should be decent
Unless you happen to be a graphic designer, don’t even try to skin your website yourself: it will look awfully amateurish and turn your customers away. For $100 or less you can get a nice website template. It might not be unique, but it does not matter. There are so many templates available anyway, that 99.99% of your visitors won’t even notice that aspect. In 2009, there is no more excuse to have a half-backed website skin.

3. Customer benefits come first
If your visitors can’t figure out the benefits of your technology / product / service, why should they actually care about the way it’s designed? Many startup fails at actually explain the value of what they are offering, and strongly focus on random technical aspects that happened to be a challenge for the development team.

4. Happy talk has no place on your site
Happy talk is an easy way to fill your website. Ever considered putting a Welcome on our website sentence in your front page? Well, don’t. Also, for B2B company, happy talk usually happens with (slightly) more subtle verbiage such as mindless mission statements: our mission is to serve our customer’s interests. Make sure that every single word that you put on your website carry a valuable message. If it doesn’t, delete the word.

5. Decent Google PageRank is required
Ever googled a company name to end up on the Facebook page of an employee? Well, that sort of things happens when your Google PageRank is just too low. More generally, a decent PageRank ensures that if somebody does a deep market research, your company will appears. I am not even talking about grabbing thousands of visitors through top SERP on strategic keywords; I am just considering the journalist / student / consultant / … who is trying to figure out all the players of your business niche. If this person can’t find you, then you don’t exist.

6. English is required
If you happen to be a native English speaker, that one isn’t going to be too hard for you. For the rest of us, well, we have to make the effort to get it done nonetheless. The harsh reality is that through English, you can reach roughly 10x more people than what you can through any other languages. It’s doesn’t mean that you can’t do other languages, but English should be a primary focus.

7. Public pricing is required
It’s always a bit puzzling to me to notice how people are usually reluctant to display any pricing on their website - especially on B2B websites. Yet, pricing is a vital information for your customers. Software or services can be priced from $1 / month to $10 million / month. Where do you stand? This concern stays valid even for beta products. Displaying a price is a very good signal for your customers: it tells them that you are a real company with a real product under way. Without pricing, you’re simply not part of the economic circuit.

8. Blog is required
A company can be long dead while the website is still up and running. Providing some news - any news, anywhere on the website - as long the dates are visible, is the most simple way to prove to your visitors that the company is still up and running. Having a blog, and posting at least once a month is probably the easiest to complete this step. Blogs are dirty cheap and dead simple, no excuse will be considered for not having a blog.

9. Community feedback should be possible
I found that it’s always very frustrating not being able to provide feedback about a product, a website, a service whatever. Granted, most web visitors are never giving any feedback, but some are doing it all the time. The feedback provided by those users is gold. Don’t neglect your community when setting-up web forums is just a matter of hours. Your forums are likely to have a slow profile, but in my experience, the few early feedbacks that you get can actually make a difference in your business. You should not miss that sort of opportunity.

As a final word, I have already started to collect some data about the ‘07 and ‘08 classes of the incubator of Telecom ParisTech. Stay tuned.

Cloud Computing vs. Hardware as a Service

April 6th, 2009

In a previous post, I have discussed why I believed that cloud computing was going to be a big player arena, and not a friendly place for the little guys.

Recently, many people told about such and such small company that was supposed to deliver cloud computing too, and that their service would match the ones offered by big players.

Basically, the discussion goes like this:

Hey, we too are able to instantiate virtual machines on-demand. We have some nice virtual machine deployment scripts, a nice WebUI to administrate all the nodes, we are now matching the Amazon offer.

Nope, you’re not.

Basically, what those little players are doing is simply Hardware as a Service. For years now, computing hardware has been more or less an on-demand commodity. My favorite host can typically set-up a new server in 48h, and I can cancel my subscription anytime (although I will have to pay for the entire month). Some more aggressive host providers are providing fully automated server setup, and your new server is usually available in less than 1h.

Now, what those small companies calling themselves cloud providers are able setup new servers in seconds instead of minutes; and the trick to do that is simple: they use virtualization and deployment scripts.

But, in my opinion, this isn’t cloud computing, this is just hardware as a service with a lower overhead both at infrastructure level, but also for the system administrators themselves.

So, what is so radically different with cloud computing?

In my opinion, the radical novelty of cloud computing is the promise that you won’t have to worry about resource allocation anymore.

In particular, I don’t want to figure out if I need 1, 2, 3 or 42 computing nodes to handle a massive web traffic surge from Slashdot, I just want to tell my cloud provider:

Here is the script for my web page, do whatever is needed to ensure good performance, and send me the bill at the end of the month.

Note that this is exactly what Google App Engine is doing. Google App Engine is relieving web developers from the burden of having to figure out how they are going to scale their web apps. Google is doing the magic for them so that web developers can focus on the specific value of their web apps instead of focusing on the complex infrastructure actually needed to achieve scalability.

Quoting Thomas Serval from Round Table about Azure a few months ago:

In the past, each time we have multiplied the traffic by 10 on our applications, we have been forced more or less to rewrite the application from scratch. The promise of cloud computing is to let you achieve unlimited scalability from day one.

Obviously, cloud computing isn’t magic, thus applications will need to be carefully designed to achieve unlimited scalability, yet I believe that thanks to the cloud computing frameworks currently being published, it won’t be that hard in the future.

Thus cloud computing is not Hardware as a Service simply because Hardware as a Service does not do anything about scalability in itself.

The true benefits of cloud computing is to provide what I would call scalable computing abstractions. Those abstractions represent physical resources such as CPU, memory or bandwidth, but with additional constraints (usually structural constraints) so that it becomes actually possible to provide an infinitely scalable instance of the desired resource.

For example, nowadays more or less all cloud providers are including in their offer a distributed and reliable hashtable implementation: S3 for Amazon, Blob Storage for Windows Azure, … FIPFO is another popular scalable storage abstraction: First-In Probably First Out, i.e. queues but without deterministic behavior. As long as you rely only on those scalable storage abstractions, you should not care about scalability of your storage.

So far, scalable storage abstractions have been the primary focus of most cloud providers. Yet, I suspect that the next battle will be scalable CPU abstractions.

Indeed, Amazon has recently unveiled their now Amazon Elastic MapReduce, and as other people believe, I too believe that MapReduce will be a game changer. First, Amazon is delivering CPU at $0.015/h while its competitors are still above $0.10/h at the time. Then, if we consider that the Amazon native MapReduce implementation is going to be way more efficient than custom in-house implementations - simply because Amazon folks have the time and the experience needed to get the load balancing settings rights - then Amazon has just divided the CPU price by 10.

Then, what I see as the killing benefit of MapReduce is that I don’t have to care anymore about how many nodes I need.

For example at Lokad, we have tons of time-series to process. Let say that we want to extract seasonality patterns out of 100 millions time-series (each time time-series ranging from a few hundreds to a few thousands points). With MapReduce, I just have to specify the algorithm to process a single time-series and pass the huge time-series collection as argument. The cloud infrastructure will be handling all the magic for me. In particular, I don’t have to care anymore about node crashing along the way, or about dynamically expanding / shrinking the number of computing nodes.

MapReduce is a very constrained framework that forces you to apply the very same function everywhere, but the input collection can be arbitrarily large. In my experience, if you’re not able to scale a data-mining problem through MapReduce, then nothing will – or, more precisely, the design complexity will be so great that you are most likely to give up anyway.

Those scalable resource abstractions represent the core value offered by cloud providers. Yet, those scalable resource abstractions are truly hard to design and even harder to optimize. Yes, you might know a small company that auto-deploys virtual machines, but, in my opinion, this does not reach even 10% of the potential benefits brought by the cloud.

Those benefits will be achieved though scalable resource abstractions; and each one of those abstractions is going to cost a massive amount of brain power to get done right.

In praise of Voices.com

April 2nd, 2009

I have been a long time consumer of freelance marketplaces. Yet, all the freelance websites that I have experienced so far left me a feeling of half-backed design. Guru, oDesk, eLance, rentacoder, just to name a few of them.

The heart of the problem lies in the doomed attempts at supporting any type of freelance jobs with a unique web application.

In contrast, voices.com has a unique focus on voice talents. You won’t find database administrators or supply chain consultants on voices.com; but when it comes to voice-over jobs, the application is just plain great.

Basically, like any other freelance website, you post your job - including your scripts since it’s a voice over job - and within hours you get dozen of freelance offers. So far, so good, all other freelance websites are doing that.

Yet, the killing feature of voices.com is that each freelancer gives you a 30s record of their own voice over your scripts.

And this feature is plain amazing. Instead of wasting hours making desperate attempts at sorting out true talent out of the massive amount of junk proposals, you just listen to your 30s samples, which precisely happens to be the rational way to take your decision.

And the best thing is that since voice.com is putting a strong emphasis on talent through this very feature, you’re getting virtually no junk proposal at all. Among the 30 proposals that I have been getting yesterday in less than 6h, most of them were very good, and a few of them, plain excellent.

Not believing me? Just check the very nice job that Ray Grover did for us within a 6h timeframe from job posting to job termination.

High-perf SelectInParallel in 120 lines of C#

March 23rd, 2009

A few months ago at Lokad, we started working on 8-core machines. Multi-core machines need adequate algorithmic design to leverage their processing power; and such a design can be more or less complicated depending of the algorithm that you are trying to parallelize.

In our case, there were many situations where the parallelization was quite straightforward: large loops, all iterations being independents. At that time, PLinq, the parallelization library from Microsoft wasn’t still available as a final product (it will be shipped with Visual Studio 2010). Thus, since we were quite in a hurry, we decided to code our own SelectInParallel method (code being provided below). Basically, it’s just Select but with a parallel execution for each item being selected.

Although, being surprisingly simple, we found out that, at least for Lokad, SelectInParallel alone was fitting virtually 99% of our multi-core parallelization needs.

Yet, when we did start to try to speed-up algorithms with our first SelectInParallel implementation, we did end-up stuck with poor speed-up ratio at 3x or even 2x where I was expecting near 8x speed-up.

At first, I thought it was an illustration of the Amdahl’s law. But a more detailed performance investigation did show I was just plain wrong. The harsh reality was: threads, when not (very) carefully managed, involve a (very) significant overhead.

Our last SelectInParallel implementation is now 120 lines long with a quasi-negligible overhead, i.e. bringing a near linear speed-up with the number of CPU cores on your machine. Yet, this performance wasn’t easy to achieve. Let’s review two key aspects of the implementation.

Keep your main thread working: In the first implementation, we did follow the naive pattern: start N-threads (N being the number of CPUs), wait for them to finish, collect the results and proceed. Bad idea, if the amount of work happens to be small, then, simply waiting for your threads to start is going to be a performance killer. Instead, you should start N-1 threads, and get your calling thread working right away.

Avoid synchronization altogether: At first, we were using a Producer - Consumer threading pattern. Bad idea again: it produces a lot of locking contention, the work queue becoming the main bottleneck of the process. Instead, an arithmetic trick can be used to let the workers tackle disjoint workset right from the beginning and without any synchronization.

So far, we have been quite satisfied by our 120-lines ersatz to PLinq. Hope this piece of code can help a few other people to get the most of their many-core machines. If you have ideas to improve further the performance of this SelectInParallel implementation, just let me know.

using System;
using System.Threading;

namespace Lokad.Threading
{
    ///<summary>
    /// Quick alternative to PLinq.
    ///</summary>
    public static class ParallelExtensions
    {
        static int _threadCount = Environment.ProcessorCount;

        /// <summary>Get or sets the number of threads to be used in
        /// the parallel extensions. </summary>
        public static int ThreadCount
        {
            get { return _threadCount; }
            set
            {
                _threadCount = value;
            }
        }

        /// <summary>Fast parallelization of a function over an array.</summary>
        /// <param name=”input”>Input array to processed in parallel.</param>
        /// <param name=”func”>The action to perform (parameters and all the members should be immutable!!!).</param>
        /// <remarks>Threads are recycled. Synchronization overhead is minimal.</remarks>
        public static TResult[] SelectInParallel<TItem, TResult>(this TItem[] input, Func<TItem,TResult> func)
        {
            var results = new TResult[input.Length];

            if (_threadCount == 1 || input.Length == 1)
            {
                for(int i = 0; i < input.Length; i++)
                {
                    results[i] = func(input[i]);
                }

                return results;
            }

            // perf: no more thread than items in collection
            int threadCount = Math.Min(_threadCount, input.Length);

            // perf: start by syncless process, then finish with light index-based sync
            // to adjust varying execution time of the various threads.
            int threshold = Math.Max(0, input.Length - (int) Math.Sqrt(input.Length) - 2*threadCount);
            int workingIndex = threshold - 1;

            var sync = new object();

            Exception exception = null;

            int completedCount = 0;
            WaitCallback worker = index =>
            {
                try
                {
                    // no need for lock - disjoint processing
                    for(var i = (int) index; i < threshold; i += threadCount)
                    {
                        results[i] = func(input[i]);
                    }

                    // joint processing
                    int j;
                    while((j = Interlocked.Increment(ref workingIndex)) < input.Length)
                    {
                        results[j] = func(input[j]);
                    }

                    var r = Interlocked.Increment(ref completedCount);

                    // perf: only the terminating thread actually acquires a lock.
                    if (r == threadCount && (int)index != 0)
                    {
                        lock (sync) Monitor.Pulse(sync);
                    }
                }
                catch (Exception ex)
                {
                    exception = ex;
                    lock (sync) Monitor.Pulse(sync);
                }
            };

            for (int i = 1; i < threadCount; i++)
            {
                ThreadPool.QueueUserWorkItem(worker, i);
            }
            worker((object) 0); // perf: recycle current thread

            // waiting until completion or failure
            while(completedCount < threadCount && exception == null)
            {
                // CAUTION: limit on wait time is needed because if threads
                // have terminated 
                // - AFTER the test of the ‘while’ loop, and
                // - BEFORE the inner ‘lock’ 
                // then, there is no one left to call for ‘Pulse’.
                lock (sync) Monitor.Wait(sync, 10.Milliseconds());
            }

            if(exception != null)
            {
                throw exception;
            }

            return results;
        }
    }
}

Round Table about Windows Azure

January 7th, 2009

Microsoft just published a 3-min video extract from the round table about their “Software+Services” strategy. My own interventions were mostly centered on Windows Azure. Check the original page on the MSDN.