posted on June 19, 2018
You hear the term “big data” tossed around a lot, but what does it really mean? And what does the collection and application of data mean for the transportation world?
Odds are, you’ve benefited from data collection yourself. If you’ve ever used smartphone applications like Google Maps or WAZE, not only are you using an app built from user data, but you’re also contributing to the dataset.
With growing privacy concerns and benefits surrounding data collection, how does one navigate the world of data and transportation? What kind of information is being collected, and what should we keep in mind for the future?
I spoke with Christopher Day—an assistant professor at Iowa State University (ISU) in the Department of Civil, Construction, and Environmental Engineering, and a researcher at ISU’s Institute for Transportation (InTrans)—to get an expert opinion.
Day—an experienced transportation researcher and licensed professional engineer—is currently focusing his research on the distillation of large data sets into quantitative performance measures and visualizations that improve our understanding of how transportation systems work, and how they can be made more efficient, safe, and sustainable.
What does the term “big data” mean? How does data intersect with transportation?
Big data is this buzzword that gets thrown around quite a bit. The name implies that you have really large data sets, but I think it’s really more about the ubiquitous nature of data collection. The size of those datasets is more a consequence of the fact that more and more data is being collected from a growing number of sources. Growth of smartphone use in the last decade has really driven this trend.
The main function of transportation systems is to move people and goods around. New data sets are providing some ways to look at these systems that weren’t possible before. In the past, we would rely on building lots of models to try to estimate or predict different properties of transportation systems, such as travel time or the amount of congestion. We can now directly measure, or at least sample, a lot of this with greater efficiency than before. We can do more measuring and less modeling—or at least the models can be better tuned. That’s one advantageous use of big data in transportation systems. There’s better intelligence about what’s going on, and improving spatial and temporal coverage.
Can you talk more about how mobile devices contribute to this wealth of data? What is personal data used for when it comes to transportation systems and apps?
The key piece of information is the location of the devices as they’re moving around. That’s the main type of information being extracted from smartphones, which has enabled recent applications. For example, if you open the “Maps” app on your phone, and turn on the traffic layer, the roadways show a color coded scale, showing where congestion is in real time, and you can react to this by adjusting your route. This is probably the most fundamental application for the general public. Transportation agencies additionally need to understand how the system performs over time, so they can be better informed when making decisions about where to invest resources to make improvements. They can also examine their performance in recovering from incidents, responding to weather conditions, and traffic management for special events, among other uses. It’s also possible to examine origin and destination patterns, and develop further insights about how the systems is being used.
With growing concerns about data privacy, how does the transportation community deal with privacy concerns? Should there be any privacy concerns with giving your data to sources like Google Maps and Waze?
From the perspective of transportation agencies, we are not interested in knowing the microscopic details of anyone’s individual travel characteristics. I think it’s safe to say that transportation analysists would strongly prefer not to have stewardship over anyone’s personal data. Our focus is on the system performance at a more macroscopic level, and where we do examine things in detail, the identity of the traveler and the specific details of their trip are way beyond what we would want or need to have to understand system performance.
The kind of data state Departments of Transportation (DOTs) currently use for analytics is minute-by-minute average speeds on roadway segments, which is based on analysis of microscopic trip data by a commercial provider. This is the equivalent to having a speed sensor on the roadway, without the cost of having to install and maintain a physical sensor. This data is completely anonymous; the data consists of a timestamp and a number for the average speed. This allows us to look at conditions along the entire state highway system both historically and in real time.
In future, however, we are probably going to see increasing use of GPS data. I’ve had the opportunity to examine some sample data sets of this type. The trip information is anonymous in the sense that there is no personal information attached to it, and the individual trips are not linked to each other. That is, you can’t see different trips from the same user. Also, that data represented commercial vehicles. It didn’t seem to me that there was much danger of having severe privacy concerns with that particular dataset right now. Now with that said, if the data collection market continues to grow, some valid concerns about data privacy may arise.
There are likely some ways that data can be scrubbed—if you will—or made more anonymous, that would not reduce its value for analytics. For example, if we are interested in improving the coordination of traffic signals, and we want to know where traffic is being stopped along a street, we are primarily interested in the GPS data of trips within a range near to that street. Even if you are looking at origin-destination patterns, the scale of that analysis is on the order of a city—tens of dozens of miles. We don’t really need to pinpoint the start and end location of each trip to come up with useful analytics. We could probably truncate the first and last 1,000 feet of a trip and still have a pretty good idea of where trips are coming from and going to across the city.
How does one benefit from contributing to the data set?
The more contributors to the data, the more accurate the picture is that you get out of it. One example that people use and understand is the map application on their smartphones. If you turn on the traffic layer, you see that not every roadway has a color. When you zoom in and go to the low volume roadways in residential neighborhoods or rural areas, you don’t see any traffic information. The more users, the more coverage you have. And that’s really the benefit—by using it, you end up contributing to the data set that’s being used to give you back the information.
Would things like Google Maps and WAZE be possible without user data?
I would say no. Those services are really founded on crowdsourcing that information from the users. Even the maps themselves are drawn or revised based on the actual data being collected from the users. All those systems are totally reliant on the user data to function.
Should we be scared or supportive of these systems?
It’s a bit of a double-edged sword. There is a potential to develop much more intelligence about the way things are working in different systems—transportation is one of them—so there’s potential, with more user data, to help these things work better, and get more out of the infrastructure we already have. At the same time, this type of data may carries some potential to intrude on individual privacy.
Right now, we are seeing an increasing amount of attention being given to privacy rights. Just this month, we had a Supreme Court ruling regarding cell phone records and fourth amendment rights. The European Union implemented new privacy regulations last month that provide some directives to companies that collect personal information. The idea is that people have a right to know what data is being collected about them, who is doing the collection, and who has access to that data. I think it’s a good thing that awareness is increasing, because the only way to move forward is to actually have this discussion and develop a solution. I’m optimistic that we can find a middle ground that safeguards individual privacy, while providing a means for extracting useful information about the system.
Google Scholar page: scholar.google.com/citations?user=BJwtiVEAAAAJ&hl=en
Chris Day Iowa State Civil, Construction, and Engineering page: ccee.iastate.edu/directory/?user_page=cmday
LinkedIn page: www.linkedin.com/in/cmdayisu/By Hannah Postlethwait, Go! Staff Writer