Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Entries in Lokad (18)

Sunday
Mar182007

Format-Graph CmdLet; Drawing graphics with PowerShell

PowerShell, through its object-oriented design, provides a flexible and powerful framework to build interactive shell commands. Lately, in order to produce quick&dirty graphs while working on Lokad, I came up with Format-Graph, a CmdLet that outputs a text-based graphics of curves extracted directly from the PowerShell pipeline.

Format-Graph screenshot

For those who might be interested, the source-code of the Format-Graph CmdLet is freely available as a part of the Lokad OpenShell project. The class is stand-alone, you not need the rest of the Lokad OpenShell project to get it working.

The key idea behind the Format-Graph is the way to actually retrieve, from the pipeline, the values to be plotted. Indeed, if the Format-Graph was relying on some arbitrary strong-typed inputs (think of a class named Point); then this CmdLet would be pretty much useless. Indeed, using the Format-Graph would require some heavy input formatting to actually produce anything. Instead, Format-Graph leverages the .Net reflexion to extract the values from the pipeline.

Basically, Format-Graph takes a ValuePropertyName argument that is used to extract the actual double value from the specified property. For example, in the screenshot here above, I have used -ValuePropertyName:Value to extract the values from TimeValue objects.

Future development notes: Format-Graph does not provides any axis description for now. I have already included a LabelPropertyName argument as a placeholder (labels are just ignored for now); but I am unsure about the way to handle the X axis description at this point. Additionally, I am considering to (optionally) display some scaling information for the Y axis.

Tuesday
Mar132007

Lokad Sales Forecasting for osCommerce - v1.0 released

We have finally released Lokad Sales Forecasting for osCommerce. It's the first PHP software produced by Lokad.com. Read the announcement on blog.lokad.com.

Friday
Feb232007

Lokad Desktop Sales Forecasting v1.0 released

I have finally released the Lokad Desktop Sales Forecasting v1.0. See the original blog post on blog.lokad.com.

Tuesday
Jan302007

Two Lokad-related products shipped

The last few days have been intense, with not one be two releases.

We have first shipped Lokad Sales Forecasting v1.0 for ASP.Net, a stand-alone reporting library to be integrated in your favorite ASP.Net eCommerce application.

Then we have shipped Lokad OpenShell v1.0, a PowerShell snap-in that features CmdLets related to time-series forecasting. Lokad OpenShell aims to facilitate RAD (Rapid Application Development) approaches while integrating the Lokad technology.

Both products have been released under a BSD license.

Friday
Jan052007

Missing time-series vs. Empty time-series

Lokad is about time-series forecasting, but as simple as the time-series model may seem to be (after all a time-series is nothing more than a list of time-value pairs), there are several subtleties in the way to manage time-series. In this post, we will see how the Lokad time-series model distinguishes missing time-value pairs from empty time-value pairs. Since the topic is slightly complex, I would suggest, if you're not familiar the Lokad technology, to have a look at our User Guide (in particular, the Forecasting tasks section).

A practical situation


Let's start with a practical real-life situation; let's assume that we have a time-series that include 12 time-values, one value for each month of the year 2005 (starting January 2005, ending December 2005). We can imagine that this time-series represent the monthly sales of a web shop. At the time I am writing this post, it's the beginning of January 2007. What happen if I insert now this time-series into my Lokad account and ask for a monthly forecast? Well, there is an ambiguity in the time-series model, because there would be two possibilities:

  • Returning a forecast for January 2007 (let's call it the clock-centric approach). In this case, we would be considering the 12 values for the year 2006 are simply missing. Thus, we skip them a produce a forecast nonetheless but based on the data of the year 2005.

  • Returning a forecast for January 2006 (let's call it the data-centric approach): The forecast is based on the last time-value pair available (i.e. December 2005 in the present situation), which is equivalent to the assumption that there is no missing values. In this case, the delivered forecast might refer to a period already part of the past.

Let's make the things clear: Lokad has chosen the data-centric approach, if ask a monthly forecast for your 12 time-values ranging from January 2005 to December 2005, you will get a forecast for January 2006, no matter if you request the forecast at the beginning of 2006 or in a distant future. Lokad takes the last time-value pair of your time-series as a reference to compute the forecasts. This option has been chosen because we believe it's closer to the business requirements.

Some arguments supporting the data-centric approach

Let's review the arguments in favor of the data-centric approach:

  • The data-centric approach has a persistent semantic. If the input time-series data do not change the forecast time-range do not either (yet the actual values of the forecast may change over time ).

  • The data-centric approach offers the possibility to benchmark the Lokad forecast services. You can import your 2005 product sales data in your Lokad account, get the forecast for 2006, and see how much difference lies between our forecasts and your historical record for 2006.

  • The data-centric approach assumes that there is no missing data in your time-series data after the initial time-value pair. This assumption has the strong advantage: its simplicity. Indeed, in some data mining fields, missing data are very frequent (think medical surveys for example), but when it comes to time-series, it's quite rare.

Yet, this approach involves a minor drawback: you need to handle explicitly the lack of data. For example, in the previous web shop situation, each product of the catalog may not have be sold even once a month. In such case, you must explicitly add a zero time-value in your time-series that represent this lack of sales.