Cloud-first programming languages

Mar 8, 2016
lokad software engineering

The art of crafting of programming languages is probably one of the most mature fields of software, and yet it’s surprising to realize how much potential there is in rethinking programming from a cloud-first [0] perspective. At my company Lokad, we ended-up writing our own programming language - a narrow domain specific language geared toward commerce analytics – and, we keep stumbling on elements that would have been hard to achieve from a more traditional perspective.

Our language – Envision – lives within the walled garden of its parent company: Lokad provides the tools to author the code as well as the platform to execute the scripts. While this approach has limitations of its own; it offers some rather unique upsides as well.

1. Automated language upgrade

Designing a programming language is like any other design challenge: even the most brilliant designer makes mistakes. Then, assuming that the language gains some traction, a myriad of programs get written leveraging what has now become an unintended feature. At this point, rolling back any bad design decision takes a monumental effort, because every single piece of code ever written needs to be upgraded separately. All major programming languages (C++, JavaScript, Python, C#) are struggling with this problem. Overall, change is very slow, measured in decades [1].

However, if the parent company happens to be in control of all the code in existence, then it becomes possible to refactor automatically, through static code analysis, all code ever written, and through refactoring to undo the original design mistake. This does not mean that making mistakes becomes cheap but only that it becomes possible to fix those mistakes within days [2], while regular programming languages mostly have to carry on forever with their past mistakes.

From a cloud-first perspective, it’s OK to take some degree of risk with language features as long as the features being introduced are simple enough to be refactored away later on. The language evolution speed-up is massive.

2. Identifying and fixing programming antipatterns

Programming languages are for humans and humans make mistakes. Some mistakes can be identified automatically through static code analysis; and then, many more can be identified through dynamic code analysis. Within its walled garden, the company has direct access not only to all the source code, but all past executions as well, plus all the input data as well. It this context, it becomes considerably easier to identify programming antipatterns.

Once an antipattern is identified, it becomes possible to selectively warn impacted programmers with a high degree of accuracy. However, it also becomes possible to think of the deep-fix: the programming alternative that should resolve the antipattern.

For example, at Lokad, we realized a few months ago that lines of code dealing with minimal ordering quantities were frequently buggy. The deep fix was to get rid of this logic entirely through a dedicated numerical solver. The challenge was not so much of implementing the solver – although it happened to be a non-trivial algorithm – but to realize that such a solver was needed in the first place.

3. Out-of-band calculations

As soon as your logic needs to process a lot of data, computation delays creep in. Calculation delays are typically not an issue in production: results should to be served fast, but refreshing the results [3] can typically take minutes without any impact. As long as nobody is waiting for the newer results, latency matters little.

However, there is one point of time when calculation latency is critical: design time, when the programmer is slowly iterating over hundreds of versions of the same code to incrementally craft the intended calculation. At design time, calculation delays are a real hindrance. Data scientists know the pattern too well: add 2 lines to your code, execute, and go grab a coffee while the calculation completes.

But what if the platform was compiling and running your code in the background? What if the platform was even planning things ahead of you, and pre-computing many elements before you actually need them? It turns out that if the language has been designed upfront with this sort of perspectives, it’s very feasible; not all the time, just frequently enough. Through Envision, we are already doing those, and it’s not even that hard [4].

A careful cloud-first design of the programming language can be used to intensify the amount of calculations that can be performed out-of-band. Those calculations could be performed on local machines, but in practice, a relying on a cloud makes everything easier.

4. Data-rich environment

From a classic programming perspective, the programming language – or the framework – is supposed to be decoupled from data. Indeed, why would anyone ship a compiler with datasets in the first place? Except for edge cases, e.g. Unicode ranges or timezones, it’s not clear that it would even make sense to bundle any data with the programming language or the development environment.

Yet, from a cloud-first perspective, it does make a sense. For example, in Envision, we provide a native access to currency rates, both present and historical. Then, even within the narrow focus of Lokad, there are many more potential worthy additions: national tax rates, ZIP code geolocation, manufacturer identification through UPC… Other fields would probably have their own domain-specific datasets ranging from the properties of chemical compounds to trademark registrations.

Embedding terabytes of external data along with the programming environment is a non-issue from a cloud-first perspective; and it offers the possibility to make vast datasets readily available with zero hassle for the programmer.

In conclusion, the transition toward a cloud-first programming language represents an evolution similar to the one that happens when transitioning from desktop software to SaaS. From afar, both options look similar, but the closer you get, the more differences you notice.

[0] I am not entirely satisfied with this terminology; it could have been LaaS for “Language as a Service”, or maybe IDEE for “Integrated Development and Execution Environment”.

[1] The upgrade from Python 2 from Python 3 will have roughly cost about a decade to this community. Improving the way null values are handled in C# is also a process that will most likely to span over a decade; the end-game being to make those null values unnecessary in C#.

[2] In the initial version of Envision, we decided that the operator == when applied to strings would perform a case-insensitive equality test. In hindsight, this was a plain bad idea. The operator == should perform a case-sensitive equality test. Recently, we rolled a major upgrade where all Envision scripts got upgraded toward the new case-insensitive operators, effectively freeing the operator == for the revised intended semantic.

[3] Most people would favor a spam filter introducing 10 seconds of processing delay per message if the filtering accuracy is at 99.99% versus a spam filter needing 0.1 seconds but offering only a 99% accuracy. Similarly, when Lokad computes demand forecasts to optimize containers shipped from China to the USA, speeding up the calculation of a few minutes is irrelevant compared to any extra forecasting accuracy to be gained through a better forecasting model.

[4] If somebody uploads a flat file – say a CSV file – to your data processing platform, what comes next? You can safely assume that loading and parsing the file will come next; and Lokad does just that. Envision has more fancy tricks under the hood than flat file pre-parsing, but it’s same sort of ideas.

Joannes Vermorel's blog

Cloud-first programming languages

1. Automated language upgrade

2. Identifying and fixing programming antipatterns

3. Out-of-band calculations

4. Data-rich environment