For Self-Driving Cars, There’s Big Meaning Behind One Big Number: 4 Terabytes
If 3,000 People Talked All at Once, Could You Understand What Each One Was Saying?
By Kathy Winter
As an engineer, I love solving problems and using the “language of math” – or numbers – to understand the world we live in. With meaning beyond their stated numerical value, numbers add context to stories and challenges in a way that words alone cannot. Big numbers are interesting as their meaning is often much more complex than their sheer size might suggest. With one number in particular – 4 terabytes (TB) – this is especially true, and I’m excited about the meaning behind that number for the autonomous driving industry.
First things first: Why that number? Four terabytes is the estimated amount of data that an autonomous car will generate in about an hour and a half of driving – or the amount of time a typical person spends in their car each day. By 2020, that’s also the amount of data that 3,000 individual internet users are expected to put out each and every day. It might not sound like much until you think of it in a different way: How many of us have 3,000 friends on Facebook? Now imagine trying to follow and absorb everything they all post each and every single day.
If the interesting thing about the data created by a self-driving car was simply the amount of it, 4TB wouldn’t be very exciting. What makes “data the new oil” for autonomous driving – and what makes it a real challenge – is our need to make sense of that data, to turn it into actionable insight that lets cars think, learn and act without human intervention. Data that lets cars do the driving so that the 90 percent of the accidents caused by human error1 may one day be a thing of the past.
Press Kit: Autonomous Driving at Intel
Intel is a data company. We know how to create, move, store, process, analyze and manage data – at massive scale – and we’re applying this vast expertise to the autonomous driving industry. From experience, we also know the fastest way to solve the autonomous driving data challenge is through industry collaboration. While there’s a lot of work to do to deliver fully autonomous vehicles by 2021, I am confident that by working with the industry and our partners, together we can get it done.
Autonomous driving data comes in three basic types: technical data, crowdsourced data and personal data.
Technical data is perhaps most obvious. This data comes from a suite of sensors and is the car’s “view” of the world immediately around it. This data helps the car recognize a person or fire hydrant, “see” a new pothole, or maybe calculate how quickly a nearby car is approaching. This kind of technical data is also great for capturing new driving scenarios and pushing it to the cloud for learning and improving the software that controls driving behavior. When this kind of data goes to the cloud, it becomes incredibly valuable to other vehicles connected with that same cloud.
Crowdsourced data is something that a community of local cars takes in from their surroundings, such as traffic or changes to the road conditions. You can imagine all kinds of cool applications that could use this kind of information, such as finding a nearby parking spot or avoiding traffic jams.
Finally, there is personal data, including the radio stations you like to listen to, coffee shops you frequent, routes you prefer and so on. This type of data could be useful in creating the most amazing personalized experience in your autonomous vehicle.
As the industry moves toward fully autonomous cars, data presents a number of challenges for the entire global industry. The first challenge goes back to that original number: 4TB. The exponentially growing size of the data sets necessitates an enormous amount of compute capacity to organize, process, analyze, understand, share and store. Think data center server compute power, not PC power.
The need to train autonomous vehicles as quickly as possible presents another challenge. When new driving responses or situations are identified, machine learning, simulation and algorithm improvements must happen almost instantly – not weeks or months later – and updated driving models must be pushed to the cars immediately once available. When, where and how that happens has implications not just for today, but for the day when self-driving cars are the norm.
There’s also the matter of data protection and what that means for consumers to eventually trust the autonomous experience. How we will achieve truly secure storage and sharing of data is a question I am asked about frequently and one we take very seriously. Which data gets stored? Which gets tossed? Which data sets get shared? And how will we protect it all? These are valid questions that will require industry collaboration and our best experts to address in a meaningful way.
Finally, the data challenge grows over time as small fleets of vehicles eventually become hundreds of millions of vehicles. The ability to make this happen comes only through the ability to process increasingly larger data sets. True system scalability will be critical both inside our cars – back to that 4TB number – and outside our cars in massive data centers, as the self-driving supercomputer and the cloud that supports it continue to evolve.
No one company can tackle these data challenges on its own. At Intel, we believe the best way to solve the autonomous driving data challenge is to do it collectively, to work together across the industry to develop secure state-of-the-art platforms and to share safety-related information. As we work toward a shared vision of a world without accidents and with mobility for all, industry collaboration will accelerate our ability to deliver. I am thrilled to be working with our Intel team and key partners on the 4TB challenge, as I know that solving this problem will lead to safer roads and a better journey for all.
Kathy Winter is vice president and general manager of the Automated Driving Solutions Division at Intel Corporation. She joined Intel in 2016 from Delphi, where she engineered the first cross-country drive of a fully autonomous vehicle.
This is the third in an occasional series of Intel newsroom editorials related to autonomous driving. To comment or reach Kathy directly, email firstname.lastname@example.org.
1 National Motor Vehicle Crash Causation Survey, U.S. Department of Transportation, at 25 (2008), https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/811059; http://cyberlaw.stanford.edu/blog/2013/12/human-error-cause-vehicle-crashes