With Big Data, can we predict demand on our highways and railways?

Is there such a thing as an infallible crystal ball?

With the advance in data handling and processing (the hackneyed and recurrent concept of Big Data), new voices are emerging that claim that it is possible to predict traffic with a degree of reliability that is close to 100% as more information is available to make predictions.

But is this possible?

Although some data processing companies say yes, the answer is, as far as I am concerned, no, because the time horizon in which we engineers and companies in the sector operate is different.

What’s up, man?

Let’s not fool ourselves: nothing new has been invented, only varnished and, why not say it, improved, but the use of Big Data and data analysis has been going on for a long time, but with more mundane names.

You don’t have to go to the competition to see this. In Globalvia, the clearest example is in Metro de Sevilla, which when it comes to budgeting its demand includes the dates of Easter, the April Fair and even the days when there is football (even assuming how far Sevilla goes in international competitions, for example).

As I say, the new data analysis tools do not replace the old calculation methods, but they can improve them, in the sense that they allow us to cover more information and, therefore, are more accurate and faster in characterising our traffic or demand.

This improvement in characterisation is based on obtaining new correlations between traffic and variables that may affect it, opening up a range of possibilities that until recently were unthinkable to develop. Correlating the evolution of traffic with some justifying variable was practically limited to fairly well-known variables, perhaps out of habit, but above all because there was no other information or means with which to associate these data.

Source: Dahl Winters.

But beware: correlation does not imply causation.

For example, a correlation has been found between the increase in sales of ice axes in a shop in Chamonix and the increase in the number of deaths climbing Mont Blanc. This might suggest that anyone who buys an ice axe from such a shop has his or her days numbered. However, although correlated, one does not (a priori) cause the other, and it is more likely that the good weather has attracted more mountaineers to the area, thus increasing both the shop’s sales and the chances of an accident.

The job of the data analyst (in this case the Traffic Engineer) is to detect inconsistencies in each correlation to avoid misuse of the information, but also to validate the variables that do imply causation.

Want to know more about Roads of the Future?

Contact us and find out how we can help you.

Models and models

The increasingly accessible information about the user (understanding by user any person who moves for a specific objective) allows us to search in other indexes for the causes of their movement, and therefore improve what we know about that user and (supposedly) improve the prediction of the future behaviour of similar users.
And this is where the first misconception appears: something that may not actually be a predictive model is being called a predictive model.

Our use of Big Data in our internal analysis is actually generating, in the case of traffic, a descriptive model by looking for a justification of why a person decided to move in the past, classifying them in turn into a certain group. By analysing their behaviour, if in the future we find a user who fits into a category, we assume that this user will behave similarly to the previously analysed user within our scope.

Predictive models, although similar, assess the probability that a person in a different sample exhibits behaviour similar to that of the subjects analysed. That is, what changes is the domain.

To put it more clearly, descriptive modelling is knowing better the users of (for example) Parla Tram and using this model to calculate possible new users of this tram, and predictive modelling would be using the behaviour detected in Parla to calculate the demand for Barcelona Trams.

Utility, that great unknown

If someone asks me why there is no system to predict future demand, my answer is that there is a mismatch between utility and the time horizon to work with.

Typical descriptive models predict that they are able to determine the demand that can travel on a motorway with 100% reliability… 72 hours ahead. This degree of reliability is reduced if the scope is extended, but that does not mean that they are not useful, only that the right customer has to be found.

From an O&M point of view, calculating the demand on special days can help to improve the service provided to the user, for example to plan for an increase in the number of trains on a tramway or passengers on a motorway.

For traffic engineering this time horizon is very short. Predictions are made for a minimum of one year, but usually cover the entire life of the concession, which can be several decades.

Moreover, we run into another problem: how do you predict the variables that allow you to obtain your demand? Another model would have to be generated for those variables, which in turn will be dependent on others, in what would be an infinite loop of models that would render the methodology useless.

And is this prediction of these variables reliable? An error or deviation in the prediction of the base variables irremediably causes a collateral effect on the projection of traffic or demand, so the matter is far from simple.

Therefore, the choice of variables that correlate with traffic may be limited, and finding a variable that has good predictability and reliability is just as important as its correlation with traffic. In many cases it is necessary to discard these variables for these reasons and resort to the «classic» variables that do have more reliable future projections (population, GDP, employment, etc.), although they are not infallible.


New data analysis tools allow us to improve the way we understand our motorway, metro and tram users and to analyse more variables that justify the reasons why they use our roads, metros and trams.

With this new knowledge it is possible to confirm or expand the variables that justify mobility, and to be more precise in determining future demand, but with certain nuances: the time horizon is limited to the very short term if we want total reliability and the justifying variables may not be valid because they are not independent, predictable and also reliable.

Therefore, to claim that a traffic prediction model, a new crystal ball, can be created to replace a traffic engineer is, for the time being, risky in the short term, as long as scientific and critical methods have to be applied to validate the models that help estimate the demand for our assets.

Source: blogs.sas.com.

Carlos Rol Rúa – Globalvia Traffic Manager