Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags
Wednesday
Nov232011

Among the (small) community interested by the software practices of Lokad to develop entreprise software over Windows Azure, Lokad.Cloud vs Lokad.CQRS comes as a recurring question.

It's a good question, and to be entirely honest, the case is not 100% solved even at Lokad

• from different backgrounds:
• Lokad.Cloud orginates from the hard-core data analytics back-end.
• Lokad.CQRS originates from our behavioral apps.
• with different intents:
• Lokad.Cloud wants to simplify hard-core distributed algorithmics.
• Lokad.CQRS wants to provide flexibililty, auditability, extensibility (*).
• and different philosophies:
• Lokad.Cloud is a sticky framework, it defines pretty much how your app is architected.
• Lokad.CQRS is more a NoFramework, precisely designed to minimally impact the app.

(*) without compromising scalability, however scalability is not the primary purpose.

Then, historically, Lokad.Cloud has been developed first (which is a mixed blessing), and, as we have been moving forward, we have started to partition into standalone sub-projects:

• Lokad.Cloud.Storage, the O/C mapper (object to cloud), dedicated to the interactions with the Azure Storage.
• Lokad.Cloud.AppHost, an AppDomain isolation layer to enable dynamic assembly loading within Azure Worker roles (aka reboot a VM with new assemblies in 5s instead of 5min). (**)
• Lokad.Cloud.Provisioning, a toolkit for the Windows Azure Management API.

(**) Lokad.Cloud does not leverage Lokad.Cloud.AppHost yet, it still relyies on a very similar component (which was developed first, and, as such, is not as properly decoupled than AppHost)

The case of Lokad.Cloud.Storage is a bit more complicated because Lokad.CQRS because Lokad.CQRS already has its own Azure Storage layer which focuses on CQRS-style storage abstractions. In particular, Lokad.CQRS emphasizes interoperable storage abstractions where the local file storage can be used in place of the cloud storage.

### The Future

As far I can speak for Lokad.CQRS (see the projet boss), the project will keep evolving focusing on enterprise software practices, aka not so much what the framework delivers, but rather how it's intended to structure the app. Then, Lokad.CQRS might be completed by:

• tools at some point such as a maintenance console.
• refined storage abstractions (probably event-centric ones).

In constrast, Lokad.Cloud will continue its partitioning process to become decoupled and more flexible. In particular,

• the cloud runtime
• the service execution strategy

are still very heavily coupled to other concepts within the execution framework, and likely candidates for sub-projects of their own.

I would not advise to combine Lokad.Cloud (execution framework) with Lokad.CQRS within the same app. At Lokad, we don't have any project that adopts this pattern, and the resulting architcture seems fuzzy.

Then, it's possible to adopt a SOA architecture where some heavy-duty functional logic gets isolated, behind an API, into the Lokad.Cloud execution framework, while the bulk of the app adopt CQRS patterns through Lokad.CQRS. This pattern has been adopted to some extent at Lokad.

Friday
Oct142011

## Oddities of machine learning software code

Developping machine learning software is special. I did already describe a bit how it feels to be in a machine learning company, but let's be a bit more specific concerning the code itself.

One of most shocking aspect of machine learning code is that it tends to be full of super-short cryptic 1-letter or 2-letter variable names. This goes completely against the general naming conventions which emphasis readability over brievity. Yet, over the years, I have found that those compact names where best for mathematical / statistical / numerical algorithms.

Indeed,

• Logic is typically overly intricate, with tons of nested loops and seemingly random stopping conditions. Hence, even if the variables were perfectly readable, the logic would remain show-stopper for any fast-reading attempt.
• Variables typically hold intermediate computational results, which cannot be associated with 2 or 3 English words without being extremely ambiguous at best. It's not a = OnButtonClick but rather a = InterpolatedDetrendedDeseasonalizedQuantile90PercentsOfPromotionEffects.

As a result, extreme variable name brievity makes the code much more compact which in turns makes it easier to understand the logic. It forces the coder digging into the code to learn by heart the semantic of the variables (because names are cryptic), but this effort is only marginal compared to the amount of effort to grasp the logic itself anyway.

Then, magic numbers are all over the place, frequently inlined with the rest of the code. Again, for non-machine-learning, magic numbers are a big NO-NO, and a cardinal rule of sane software design consist of clear speration between data and logic. Yet, in statistical algorithms, those seemingly random numerical values are the result of the incremental tuning that is necessary to obtain the desired performance and accuracy.

• There is no benefit in isolating the magic number, because it is used only once.
• The actual numerical value is typically more insightful than the variable name. It helps the developer to get a sense of the behavior of the algorithm.

Then, it remains a good practice to add a lot of inline comments to justify the purpose of the magic numbers, and how they have been optimized.

If your code is super fast, you're probably getting it wrong. For most machine learning problems, it's better to try to take advantage of the outrageously large amount of processing power available nowadays to improve results. I am not saying that super fast code is bad in itself, but if your code is super fast, then it means that you've got room to go for more complex methods that would consume more resources in exchange for better, more accurate, results.

Unit tests are both very handy to validate small block pure-mathematical operations, and yet, quasi-useless for the bulk of the machine learning logic. Indeed, when it comes to statistical accuracy, there is no black & white, but only in shades of gray. As long performance is acceptable, the overall accuracy is the only metric that matter. In particular, it happens, from time to time, that a bug - aka a piece of code that does not reflect the original intent of the developer - turns out to behave well over the data. On average, bugs tend to degrade accuracy, but sometime, it just stumbles upon an interesting (and counter-intuitive) behavior.

Finally, Object-Oriented Programming is still around, but seldom usedFunctional Programming is king. This pattern reflects the fact that the machine learning problem itself, either classification or regression is nothing but trying to build a big complex function to tackle real-world data.

Wednesday
Aug032011

## Bitcoin, thoughts on a nascent currency system

Bitcoin is a fascinating concept, in short, it's a crypto-currency backed by nothing other than raw processing power and geeky enthusiasm. For those who've never heard of it, you can have a look at the introduction provided by the Bitcoin community itself or by The Economist.

This currency seems to trigger a much positive reactions than skeptical ones. My personal stance is very inclined in favor of Bitcoin, and I have invested a conservative amount of Euros in exchange of Bitcoins. Granted, nothing that would too troublesome even considering a 100% loss of value for those Bitcoins.

A lot have been said already about Bitcoin, so I will not go through the routine discussion of pros and cons, but merely make some observations.

### Bitcoin vs Credit Cards and Classical Banking, the long term value

A good deal of interest in Bitcoin is strictly speculative: people go for Bitcoin thinking they have a good chance of cashing out. Yet, when it comes to evaluate the value of venture of any kind, I am a strong believer of the Guy Kawasaki credo: does it make sense? Is the world a better place with Bitcoin than without? Indeed, making a speculative profit is not enough, Bitcoin has to improve the world in some tangible ways.

Here I believe that Bitcoin addresses a very deep problem: how to pay or receive money without involving either an expensive physical process (meeting and exchanging gold, goods, ...) or an expensive middleman (your bank, your credit card operator, PayPal, ...).

To a web entrepreneur, the current banking system looks like a 19th century legacy setup:

• About 4% (1) of my money gets consumed through system friction.
• It takes days (2) to complete anything that does not go through credit cards.

(1) Indeed, there are many costs that pills up (rough estimates):

• 0.5%, fees of the consumer bank account (explicit or not),
• 1%, fees of the credit card owned by the consumer,
• 2%, merchant fees for any online payment,
• 0.5%, fees of the merchant for its own bank account.

(2) International wire transfers with a bank routing in the middle where a 7 to 10 days delay is pretty much the standard.

And yet, in my experience there is not so much to be done about this friction, at least not if you're just Joe the Plumber or a small business. Marginally lowering those costs through negotiations with the bank is possible if you have leverage (that is to say money) and a lot of patience; but there is so much one side can do because both sides (payer and receiver) are paying fees anyway.

The long-term promise of Bitcoin is to bring down this 4% friction to 0.1% or less, and to reduce payment latency from days to minutes, possibly seconds with a healthy competitive ecosystem of trusted 3rd parties. Indeed, Bitcoin is not natively designed for low latency transactions, but Bitcoin can be complemented by low latency services (backed by Bitcoin) if the need arises.

Anecdotal evidence: When I purchased Bitcoins on MtGox a few days ago, the sole wire transfer from France to UK cost me about 4% (EUR to GPB conversion included), plus the transfer took 8 days, because the receiving bank in the UK had a multi-day downtime of one of their system.

### Weaknesses of Bitcoin

When it comes to assessing the weaknesses of Bitcoin, most people discuss the possibility of breaking the underlying cryptography, or swarming the network with some overwhelming computing power. Yet, Bitcoin has been designed to be natively resilient against this sort of attacks, and very capable people are working hard to make Bitcoin even more resilient. Hence, I am not too worried here: the Bitcoin community is now big enough to make those sort of attacks really complicated.

Anecdotal evidence: I have tried to mine about 0.01 BTC through Deepbit.net and on my GPU enabled laptop it was taking about 30h. Naturally, I gave up before the end of the experiment, as it was pointless to waste further electricity. Bitcoin mining has reached the state of being vastly unprofitable for everyone but the experts, which is good. It means Bitcoin had reached the point of diminishing returns where printing money (aka mining) is only very marginally profitable.

The most critical threat for Bitcoin is something simpler and stronger: a potential fade of interest, which may vastly hinder the tooling ecosystem to mature. Fade of interest would not annihilate Bitcoin, but it would make it stagnant. Then, in the innovation trade, being stagnant is the closest thing to being dead.

For the short term (next few months), my No1 concern is that a tiny few individuals such as the enigmatic Satoshi Nakamoto may possess +100k BTC (or this guy with 370k BTC). And no, the problem is not that the system is unfair - being unfair does not hinder economical success, quite the opposite actually. The problem is that each one of those individuals has the power to disrupt the emerging usage of Bitcoin. As a matter of fact, the first Bitcoin market crash was not the result of a weakness within the protocol, but the result of a not-fully-secured wallet within a trading system. A lot of early adopters are moving around with thousands of BTC, and each one of those, willingly or not, may disrupt the Bitcoin trading by simply getting their wallet stolen. A similar analysis goes for all the emerging companies supporting the Bitcoin economy that are really lacking the expertise needed to operate properly (ex: the now infamous MyBitcoin.com downtime fiasco). Those bumps are not for the faint hearted, and are likely to slow down the Bitcoin adoption. As time goes, this sort of problem will fade through survival of the fittest, but a couple of Bitcoin crashes should be expected.

For the mid-term (6 months ahead to 2 years), the most difficult operation will be to transition the Bitcoin community from mining stage to trading stage, then repeat the process again from trading stage to end-user stage (see below, for the detail of the phases) - and do those transitions without loosing commitment and enthusiasm of the people who contribute the most to the Bitcoin community. Basically, as long there are smart people enthusiastic about Bitcoin, Bitcoin will keep growing; but the attention sharing economy is a harsh mistress, and the community interest might jump to the next revolutionary idea just as well. See the law of conservation of hype as a practical illustration. Bitcoin has successfully attracted a horde of miners. Now this horde needs to involve into the next stage, as mining earnings are marginalized.

For the long term (2 years), assuming Bitcoin interest has not faded already, direct Government interventions - for whatever reasons (*) - may kill the community. Outlawing Bitcoin would be hard to enforce to its fullest extent, at least if Internet still exists, but flagship companies supporting Bitcoin are easy targets. It would also be easy to spot any company publicly accepting Bitcoin as payment method. Again, the problem is not Bitcoin annihilation - which seems a remote possibility - but rather Bitcoin undergoing a fade of interest if its community has to go underground.

(*) Until 1996, all encryption methods were banned in France, classified a warfare materials. As a result, encryption usage was close to inexistent despite obvious benefits.

### Assessing a global value for Bitcoin

Many people looking at Bitcoin make the naïve assumption that BTC mined X USD per BTC gives any reasonable assumption of the overall market value of Bitcoin. This approach is misleading. First, we don't know for sure how many BTC have been lost already. Super early users were not really treating BTC as a real currency, and it took more than 2 years for Bitcoin to take off. I suspect that many early casual miners have not properly preserved their wallet. This could account for 1M or 2M BTC being lost already (warning: this number is vastly unverifiable).

Second, those who've read Making Money  - which I strongly recommend - know that the real long-term backing of any currency is the people behind it, possibly as unwilling taxpayers (but I am digressing). Granted, Bitcoin has no magical Golems backing the protocol, but they have about the next best thing: a enthusiastic, dispersed and growing community of geeks working hard to make of Bitcoin a success.

If Bitcoin gets adopted by a sufficiently large amount of people, then it will start getting the interest of retail folks. There are already a few eCommerce out there supporting Bitcoin, but it's still very niche. The design of Bitcoin offers unprecedented opportunities to support micropayments that were simply not tractable with classical systems. Indeed, anything below $20 is considered as a micropayment by Visa, and there is no widespread electronic solution out there for payments below$1. In comparison, Bitcoin would easily scale down to $0.01 payments (or rather the equivalent amount in BTC) with only a marginal friction. Yet, in order to grab those opportunities, it will take some serious Bitcoin-powered merchant systems, as complete automation is required. Offering to any (non-geek) merchant all the tools he/she needs to receive and process Bitcoin payments is the v3.0 milestone. #### Bitcoin v4.0 - Enterprise tools No matter the success of Bitcoin, large companies will probably be among the latest entrants in the Bitcoin economy. In order to make Bitcoin useable in corporate environments, it will require a lot of support from the software industry. For example, there is nothing yet in the Bitcoin software ecosystem that would enable an enterprise to grant rights to people to operate within spending quota, possibly requesting multiple approvals if a spending goes over a certain threshold. Naturally, the same Bitcoin system would also need to be seamlessly integrated into the primary accounting system in order not to drive nut both accountants and auditors. Getting Bitcoin corporate-proof is the v4.0 milestone. ### So what next? Bitcoin is still in the middle of trading stage but, for those who are inclined in giving Bitcoin a chance to establish a very low-friction currency system, the most simple contribution is not to purchase Bitcoins, but simply to start accepting Bitcoin, which is exactly what my company, Lokad.com, started doing. Monday Jul042011 ## Why your company should have a single email address (guest post) My second (ever) guest post has been published today by Jason Cohen, founder at WP EngineWhy your company should have a single email address. This discussion is mostly based on our experience at Lokad, I will address of concerns expressed in both the comments on the original post and on the Hacker News discussion. This is not an email problem, but a CRM problem. Very true. The secret ingredient to make single email work is, I believe, a CRM such as Relenta (or their next best alternative). Yet, most CRMs completely miss the point and ignore that email plays a central part in B2B nowadays. If sales people are expected to manually feed the CRM, then as I far I have been able to observe, the amount of data actually entered into the CRM is a small fraction at best of all the information that travels through emails. Non-issue if sales properly update support and vice-versa. As I was pointing out in the original post already, the world is full of greyish situations. Boundaries within sales / support / billing ... are far from being airtight. The problem with early partitioning is that it vastly hinders your company to even realize how much overlapping there is between those subjects. Don't under estimate the pain you're inflicting your prospects and clients by letting them to decipher which is the right address for their question. Triage becomes the bottleneck, it won't scale. If the setup is properly done, then everybody is responsible for the triage whenever there is nothing more urgent to do. Hence, you don't end-up with iddle folks just waiting for the triage team to do its job; if they are iddle, they give a hand to triage. One of the most direct consequence of triage is that precisely it reduces email processing bottlenecks, and let you scale efficiently with a growing staff. We are not comfortable passing sensitive information that way. Email is - by design - an extremely insecure medium. Not because of the technology, but because of the social practices that surround it. Your company can either ignore or embrace this fact. Then again, they are exceptions. As I was also pointing out, at Lokad, we kept our personal mailboxes. If a discussion with a competitor has to take place about a potential acquisition of the company, then yes, it will not go through the shared setup. But how many of such emails do you get? The fraction is simply negligible. Wednesday Jun292011 ## Squarespace and blog spam filtering: epic fail Yesterday for the 10th time or so, I have been sending a ticket to Squarespace - the company hosting this very blog - support to improve their abysmal spam filter (inexistent actually) for blog comments. This is rather frustrating esperience to delete about 10 spam comments on a daily basis just because Squarespace can't manage to do things right in this area. Worse, people have been quitting Squarespace for years for this very reason - spam comment being the No1 reason quoted for the change. The issue is even more infuriating when you consider that: • It is common knowledge that, when designing software for the web you have to design for evil. Even if 99.9% of the worldwide population is perfectly harmless, the remaining 0.1% can be an extreme painful, and serious measures should taken in this area. Squarespace despite all the good stuff they keep delivering (such as their dedicated iPad app) seems to be simply blind to this issue. • Squarespace raised$38.5M from Accel, Index Ventures. How is it possible that the VC company that has also funded Facebook is not able to provide a hint of feedback to the management of Squarespace concerning a burning issue that is likely to endanger their own investment.

The feedback from the Squarespace support has always two properties:

• Extremely fast, my tickets are addressed within minutes.
• Extremely useless, canned answers constantly suggest trivial but vastly unsatisfying solutions.

In a way, this is not very different from the blog spam content I am trying to get rid of. Hence, I am wondering support replies would actually be reported as spam by a decent spam filter; but I digress.

When it comes to customer support KPI, speed of answer isn't everything. What really matter is to make sure that every problem gets addressed at multiple levels. Solving the immediate problem is only the tip of the iceberg, you have to go for the root cause. In the present case, suggesting to disable comments is not an acceptable solution.

Also, the support staff has been claiming for several years that Squarespace is investing a lot of efforts in fixing the spam problem. The worst part is that it might actually be true.

Indeed, spam filtering is a machine learning problem. The fundamental issue with machine learning problems is that unless your company is 100% dedicated to the problem, it can't be solved. Period. (*)

As far spam filtering Aksimet has been around for years. Last time I checked their technology, it was downright excellent; and their pricing is so agressive it's a non issue (about \$0.001 per comment for the enterprise package). Squarespace does not even have the excuse that no good dedicated tech is readily available

At this point, the only reasonable explanations for this situation is either carelessness or ego, the later being more likely. Since dealing with support is useless, let's see if I get some non-zombie feedback from Squarespace here.

(*) For large companies, very compartimented branches work too, a good example being the Kinect software by Microsoft.

Page 1 ... 3 4 5 6 7 ... 31