Predicting unseen danger in road circumstance
1. Why should car predict unseen danger in road scene?
There are many unpredictable events that can happen on the road. The
ultimate goal of autonomous vehicles is to detect and predict signals
that even drivers cannot see on the road and warn about them. While
developing the longitudinal control of autonomous vehicles, particularly
the AEB system, I felt that the simplest form of danger detection
(emergency stop when an obstacle overlaps my path) is not safe enough.
Small parts that have fallen off due to vehicle accidents can cause
large secondary accidents. However, it is impossible to introduce all
possible dangerous situations that can occur in vehicles on a
case-by-case basis. This is because accidents are unpredictable.
Therefore, the vehicle should be able to predict even the unseen data as
potential risks, and I aimed to solve this problem with a deep learning
model trained on accident scenario videos.
2. CarCrash dataset and BeamNG
I needed to collect accident situation data, and after some
investigation, I found the
CarCrash Dataset,
which I decided to use to train my model initially. However, the problem
was that the dataset had only about 1500 five-second accident videos,
which was not enough data, and since it was a real accident video
dataset, there was a possibility that each video contained incorrect
information due to noise differences, and most of the accident videos
were very similar. To solve this problem, my supervisor suggested using
BeamNG for collecting accident video.
3. Fusion of ViViT and DANN
To choose a model that could efficiently learn from a small dataset, I
decided to use the
ViViT (video vision
transformer) model, which is known to perform well on video
classification tasks with regularization techniques.
Furthermore, to reduce the gap between the training and testing datasets
due to the differences in background, blurriness, and light reflection,
I decided to incorporate domain adaptation using the
DANN (Domain Adversarial
Neural Network) model proposed by Yaroslav Ganin et.al.
I devised a design to apply DANN by attaching a domain classifier and
gradient reversal layer to the ViViT model. The goal was to accurately
classify accident and non-accident situations using a small amount of
video data, with a large gap between the training and test data due to
factors such as differences in backgrounds, blur levels, and light
reflections.