Log Analysis Makes You See Transport Schedule Gaps Differently Now

The gap between planned arrivals and actual arrivals seems to be a common issue in transport schedules. Log analysis might help us see that.

The real problem with schedule gaps

Studies about the value of public transport punctuality found that passengers do their part to come to the bus stop or train station earlier than the scheduled arrival to prepare for boarding and can tolerate a certain level of delay especially if the service is more frequent. A delay from a low-frequency service is less likely to be tolerated.

What we mean by punctuality is that the actual arrival is more or less the same as the scheduled arrival.

Just stating the obvious.

If the actual arrival is the same as the scheduled arrival, then passengers referring to the schedule will have an easier time deciding when to arrive at the bus stop or train station without missing or waiting too long for their ride.

If the bus or train arrives later than scheduled, then it’s a delay, which will make such decisions hard for passengers to reach and, simultaneously, develop waiting-related anxiety.

Such delays become more problematic when they happen regularly, resulting in regular lateness, financial costs and multimodal transport issues. If the worse comes to worst, passengers will resort to avoiding public transport, which consequently reduces public transport demand (a loss for the environment) and increases e-hailing demand.

Thus, these issues prove the necessity of real-time schedule updates. Case in point: Leng and Corman (2020) evaluated The role of information availability to passengers in public transport disruptions: An agent-based simulation approach.

To evaluate the effects, the duo used an agent-based micro-simulation model (MATSim) to simulate passengers’ behaviours in a multimodal transport system in Zürich, Switzerland. Using the simulation, they found that passengers will adapt their route choices when they are informed about disruptions, thus making passengers happier with the courtesy of informing.

But how can this information be obtained and disseminated? Log analysis is one answer.

Log analysis for schedule gaps

In case you didn’t know, you need to know what a log file is to know what log analysis is.

“A log file is a computer-generated data file that contains information about usage patterns, activities, and operations within an operating system, application, server or another device.”

A definition of a log file as written on Sumo Logic.

An example of a log file is the IoT-based bus event log file containing location and temporal data mentioned in Data management and applications in a world-leading bus fleet by Hounsell, Shrestha and Wong (2011).

The log analysis methodology follows the basic data science procedure but using log files.

Collection: identification of the devices or sources (e.g., applications, system, network device, security), capturing log files that must be extracted for analysis, exporting log files, and reduction of log file size.

Preparation: data quality checks, determine the log record structure, select optimised column data types and clean up data.

Modelling: analyse the info in the log files using the most appropriate model based on the type of events being investigated. To determine the event of investigation, we need to identify the causes of delays, so that public transport operators can analyse the log files of delays and even prepare for the disruptions.

Report/Presentation/Visualisation: Showing the results in a comprehensive way using the most suitable chart type.

To find out more about using log analysis to calculate/estimate transport schedule gaps, I did some asking around. In response, Mark Varnas of Red9 and Ella Hao of WellPCB shared their views.

Responses have been edited for clarity.

Pros and cons of log analysis for public transport schedule gaps

“Log analysis is a rather new approach to calculate public transport discrepancies.

Log analysis has some advantages over other approaches to calculate public transport discrepancy. Such as being able to measure the discrepancy in more detail, being able to track the changes in the discrepancy over time and being able to compare different ways of estimating a public transport schedule.

However, it also has a few disadvantages such as a lack of transparency which can lead to bias in the measurements and difficulties with quantification that make it hard for stakeholders to use log data for the decision-making process or for identification of how best they can improve public transportation services.”

Mark of Red9

Pros are that log analysis can measure route performance accurately because it works with real-time data and logs are helpful for finding the causes of schedule gaps, e.g., if a bus is delayed due to traffic jams or bad weather.

Cons are that the quality of logs depends on the reliability and availability of sensors in vehicles that track them and that data processing takes time, which means that logs can be analyzed only with a delay.”

Ella of WellPCB

Parameters needed for transport schedule gap log analysis

“Log analysis is the process of interpreting the data in order to detect patterns and solve problems. There are three main parameters needed for log analysis in public transport discrepancies:

1. The time delay between different train stations,

2. The number of trains running per hour,

3. The actual time spent in each station”


“The following data is required to perform calculations using an example of public transport schedule gaps:

– Timestamps on all events (arrival at stops or delays) and vehicle speed
– Stop location coordinates
– Vehicle type
– Route length”


Most suitable model and chart type for log analysis and visualisation

“There are many different models and charts that can be used for log analysis and visualization of public transport discrepancies.

The best model for this task is a time series chart, which can be used for both the graph of average duration and the graph of the distribution of waiting times.

The most suitable chart type is the line graph. This type of graph can be used for modelling and visualizing data that changes over time – such as public transport discrepancies – and their changes over a specified period.”


Bar, line, and scatter charts are the most common models that can be used because they are easy to interpret.

However, they are not suitable for visualizing time delays that last more than one day because they cannot show the exact date on which a delay happened.

The most appropriate model is a line chart showing the cumulative distribution of events by the hour or minute with additional bars representing individual events in order from earliest to latest. This type of chart enables people to quickly identify the number of events that occurred at a given time.”


How log analysis changes your view of transport schedule gaps

In a strive towards solving a problem that leads to other problems, log analysis provides a means to closing the transport schedule gaps.

Akin to the basic data science procedure, log analysis collects the required data such as timestamps, route and vehicle details from log files, prepares the data, puts the data set through a model to compute and visualises the results over time using a time-series line chart.

Despite the data quality challenges, log analysis allows data scientists and operators to measure the gaps closely and trace back to the causes of the gaps that need to be addressed to prevent further gaps.

The Value Of Public Transport Punctuality For Passengers

The Complexities Of Public Transport Delays

The Truth About Public Transport Routes With Recurring Delays

The Ultimate Guide To The Demand For Public Transport Routes

Multimodal Transport Routes And Why Many Of Them Suck

Big Data is the Eureka to Public Transport Problems

What Are Public Transport’s 1st, 2nd And 3rd-Party Data?