Prediction of pedestrian behavior is critical for fully autonomous vehicles to drive in busy city streets
safely and efficiently. The future autonomous cars need to fit into mixed conditions with not only
technical but also social capabilities. It is important to estimate the temporal-dynamic intent changes
of the pedestrians, provide explanations of the interaction scenes, and support algorithms with social
intelligence.
The IUPUI-CSRC Pedestrian Situated Intent (PSI) benchmark dataset has two
innovative labels besides comprehensive computer vision annotations. The first novel label is the
dynamic intent changes for the pedestrians to cross in front of the ego-vehicle, achieved from 24
drivers with diverse backgrounds. The second one is the text-based explanations of the driver reasoning
process when estimating pedestrian intents and predicting their behaviors during the interaction period.
These innovative labels can enable computer vision tasks like pedestrian intent/behavior prediction,
vehicle-pedestrian interaction segmentation, and video-to-language mapping for explainable algorithms.
The dataset also contains driving dynamics and driving decision-making reasoning explanations.
There are 110 unique pedestrian encountering scenes.
Number of frames annotated with object detection and classification, tracking, posture, and semantic segmentation labels.
More than 621k estimations are made by 24 human drivers for the key pedestrians’ situated intents.
Boundaries identified by human drivers when segmenting the 110 scenes based on the key pedestrians’ situated intents and the corresponding reasoning explanations
Object Detection
Semantic Segmentation
Demo Video