For almost a year, we have been writing extensively about using big data to improve several aspects of public transport, proving that transport data collection has come a long way since its survey filling days.
Here are some of the ways big data can improve public transport:
- People counting to obtain occupancy data
- Analysis of smart card and vehicle location data to estimate punctuality
- Real-time vehicle schedule updates and machine learning to predict delays
- Route optimisation for seamless multimodal routes
- Demand analysis for transport route and stop demand
It’s common for the data sources required for these use cases to be either first-party, second-party or third-party data, which depends on who uses and owns the data. This is a point that has also been made by the CEO of Nomad Data, Brad Schneider. Familiar with the process of matching data buyers with the right data sellers, he imparted some insights on the topic from the public transport operator’s viewpoint.
First-party data for public transport
The operator’s first-party data is their own data, to which they have direct access. This means that chances are much of the data used are first-party data since the operator is the one collecting and storing them in their transport management platform.
The following data sets are most likely first-party data:
- People counting data: There are various people counting methods for transport operators to choose from, including turnstile/entrance counter data, surveillance camera data, sensor data, QR code, smart card data, ticket sales records/apps, surveys, Wi-Fi probes and manual tallies.
- Payment or ticket sales data: It’s pretty obvious that the responsibility of tracking the amount of fares collected falls on the operator’s shoulders.
- Smart card data: If the operator is in charge of issuing smart cards to be topped up with money and tapped at the station gates, then the smart card data is theirs. Brad pointed out that smart card data would be considered first-party data since the operator is the one generating it and usually has direct rights to it. However, “If they didn’t negotiate any rights and had to license it, it would then be considered second party.”
- Schedule and route data: The operator is in charge of coordinating the daily commute schedules and routes, so it’s obvious that the schedule and route data belong to the operator. “Schedule data tracks the historic timing of transportation arriving at stops/stations,” Brad added.
- Vehicle condition, historic servicing and location data: The operator is in charge of procuring and maintaining public service vehicles (PSVs) as well as monitoring the costs of purchasing and maintaining equipment.
“It’s common that they track high level metrics such as ridership and revenues. Many organizations do a poor job keeping track of cost level items. Cost level items would include equipment purchasing costs, wages and salaries, equipment maintenance costs, station maintenance, fuel, etc. First party data collection can have an enormous impact on helping transport operators understand where the inefficiencies are, giving them a much clearer roadmap of how to improve their operations.”Nomad Data CEO, Brad Schneider on the significance of first-party data for public transport operators.
Second-party data for public transport
Second-party data is the partner company’s first-party data. According to Brad, second-party data in the transport operator’s view would include:
- “Data from repair vendors around specifics of servicing costs.
- Demographic data from payment providers on who the riders actually are, where they live and other socioeconomic factors.
- Data from equipment vendors on servicing costs from other transport systems.”
In cases whereby the operator relies on partner companies for ticketing or smart card solutions, as mentioned earlier, Brad highlighted that the failure to negotiate the rights would result in the smart card data being considered as second-party data.
The importance of proper negotiation can be substantiated by “Partnerships between operators and public transport authorities. Working practices in relational contracting and collaborative partnerships” by Hrelja et al (2018). The study explored through interviews the role of high-quality partnerships between public transport operators and authorities in supporting public transport networks in fragmented institutional settings in England and Sweden.
Third-party data for public transport
Third-party data is data collected by a company who is neither you nor your partner. It’s usually used to provide what hasn’t been captured by first and second-party data.
Examples of third-party data for a public transport operator include:
- Geolocation or geospatial data like satellite data and GTFS data “to track exit station/stop if this isn’t captured in first party data”.
- “Mobile phone geolocation data to track the actual number of people boarding the system, how far they travel to reach a station and how many unique riders there are.”
- “Credit card data from data aggregators to track overall transport revenue compared to similar systems in other cities/countries.”
- “Web scrapes of fares from other transport systems over time.”
- Weather and traffic forecasts to back delay and cancellation predictions.
- Social media data for passenger sentiment analysis.
Despite giving a rough idea about passenger sentiment, social media data needs to be taken with a pinch of salt.
“Social media data is extremely biased in nature but it can give you a sense of directional changes in attitudes toward the transport system over time.”Brad on the reliability of social media data in analysing passenger sentiment.
My assumption is that social media data is biased because it shows only what social media users are posting and excludes the opinions of people who quit social media, people who never use social media and people who are choosy about social media platforms. Brad agreed and explained this further:
“Yes, one aspect of the bias is that people who are on social media represent a sub population, not the overall population. They would have different socioeconomic characteristics than the overall population, creating a bias.
Also there is a bias in that people don’t normally post positively about something which is a standard part of their lives, such as their commute, unless there is a problem. This often gives social media comments a bias toward more negative reviews in these cases.”Brad explaining social media’s biased nature.
How the data sources fit into public transport
Most of the data used by public transport operators is first-party data as much of the data collection work can be done in-house while some second-party data is collected from partners related to equipment, maintenance and fare systems. But third-party data is also very useful for public transport operators because it closes the gaps of the first and second-party data.
The different data sources can also be used for cross-checking as not all data sources are reliable on their own (Exhibit A: Social media data). In fact, using more than one data source type is so crucial in giving a multi-dimensional view of the public transport problems in need of solving that, when I asked Brad about which data source types are needed for a system-wide revamp, his answer was all 3 of them:
“For a system wide revamp you’ll likely use all three types of data. You’ll combine them to improve your understanding of how efficient you are compared to your peers and how your customers are using your service. It will also help you better understand who your customer is and what alternative transportation they employ in their commutes.”Brad’s response when asked about the data source types needed to make a transport system-wide revamp a success.