Prediction of pedestrian behavior is critical for fully autonomous vehicles to drive in busy city streets safely and efficiently. The future autonomous cars need to fit into mixed conditions with not only technical but also social capabilities. It is important to estimate the temporal-dynamic intent changes of the pedestrians, provide explanations of the interaction scenes, and support algorithms with social intelligence.

The IUPUI-CSRC Pedestrian Situated Intent (PSI) benchmark dataset has two innovative labels besides comprehensive computer vision annotations. The first novel label is the dynamic intent changes for the pedestrians to cross in front of the ego-vehicle, achieved from 24 drivers with diverse backgrounds. The second one is the text-based explanations of the driver reasoning process when estimating pedestrian intents and predicting their behaviors during the interaction period. These innovative labels can enable computer vision tasks like pedestrian intent/behavior prediction, vehicle-pedestrian interaction segmentation, and video-to-language mapping for explainable algorithms. The dataset also contains driving dynamics and driving decision-making reasoning explanations.

110

Total Number of Scenes

There are 110 unique pedestrian encountering scenes.

25,881

Total Number of Annotated Frames

Number of frames annotated with object detection and classification, tracking, posture, and semantic segmentation labels.

621k

Total Number of PSI Estimation

More than 621k estimations are made by 24 human drivers for the key pedestrians’ situated intents.

5,473

Total Number of PSI Segmentation Boundaries and Driver Reasoning Explanations

Boundaries identified by human drivers when segmenting the 110 scenes based on the key pedestrians’ situated intents and the corresponding reasoning explanations