A new video data set from the Massachusetts Institute of Technology (MIT) AgeLab and the Toyota Collaborative Safety Research Center (CSRC) may help the engineering community better interpret and predict different data patterns during a continuous driving scene.
The new and open dataset, known as “DriveSeg,” is the latest innovation from CSRC and MIT Agelab. “This specific project started in 2017 and ended this year,” said Rini Sherony, Senior Principal Engineer at CSRC. “This was a close collaboration between CSRC and MIT, guided by a detailed contract with specific ongoing deliverables.”
What is DriveSeg?
Just as humans do when we drive, fully autonomous platforms perceive their environment as an ongoing and continuous flow of information. Traditionally, the engineering community has relied on an abundance of static images to identify pedestrians, traffic lights, and other road users via “bounding boxes.”
DriveSeg differs in that it’s video‐based and provides a real-time flow of real-world data. According to MIT and Toyota, with this approach, engineers and researchers are better able to explore data patterns as they happen over time, leading to a number of advancements in ADAS technology.
“The purpose of this dataset is to allow for exploration of the value of temporal dynamics information for full scene segmentation in dynamic, real-world operating environments,” Sherony said. “This will lead to the improvement of perception systems for ADAS and automated vehicles.”
“DriveSeg is a large-scale driving scene segmentation dataset captured from a moving vehicle during continuous daylight driving through a crowded city street,” explained Li Ding, MIT Researcher. “It contains both video data and pixel-level semantic annotation through manual and semi-automated annotation.”
Manual Versus Semi-Auto
DriveSeg data is classified in two ways: manual and semi-auto. DriveSeg manual is two minutes and 47 seconds of high‐resolution video from a midday trip around Cambridge, Massachusetts, home of MIT Agelab. The video’s 5,000 frames are annotated manually with per‐pixel human labels of 12 classes of road objects (listed below).
By contrast, DriveSeg semi‐auto includes 20,100 video frames (67 video clips at 10 seconds each) drawn from MIT Advanced Vehicle Technologies Consortium data. DriveSeg semi-auto employs the same pixel‐wise semantic annotation as DriveSeg manual. However, annotations for DriveSeg semi-auto were done through an MIT-specific annotation method.
“It is an open research question of how much extra information the temporal dynamics of the visual scene carries that is complementary to the information available in the video’s individual frames,” Sherony explained. “Our initial research shows that by adding temporal dynamics, the perception system is able to overcome many edge cases in real-world driving such as sudden sensor failures and motion blur, which are essential to the safety of driving but cannot be handled by static-image-based models.”
DriveSeg semi-auto leverages both manual and computational efforts for higher efficiency and lower cost. Toyota and MIT say this data set was created to assess the feasibility of annotating a wide range of real‐world driving scenarios and assess the potential of training vehicle perception systems on pixel labels created through AI‐based labeling systems.
“We hope the dataset can help the community to further investigate spatial-temporal modeling for more accurate, efficient, and robust perception systems,” Ding said. “It can be used to build more advanced algorithms as well as evaluate current methods in real-world scenarios.”
DriveSeg: 12 Classes of Road Objects
- Traffic Sign
- Traffic Light
- Terrain (horizontal vegetation)
- Vegetation (vertical vegetation)
How to Access DriveSeg
DriveSeg is available for free and can be used by engineers and the academic community for non‐commercial purposes. “We hope other safety researchers can use it and build upon it to improve safety for autonomous driving,” Sherony said.