With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently. In this work, we build a Multimodal Egocentric Activity dataset which includes egocentric videos and sensor data of 20 fine-grained and diverse activity categories. We present a novel strategy to extract temporal trajectory-like features from sensor data. We propose to apply the Fisher Kernel framework to fuse video and temporal enhanced sensor features. Experiment results show that with careful design of feature extraction and fusion algorithm, sensor data can enhance information-rich video data.
Song Sibo is currently a Ph.D. student at the Singapore University of Technology and Design (SUTD) with a concentration in image and video processing and visual data processing. He is under the supervision of Prof. Ngai-Man Cheung and Prof. Selin Damla Ahipasaoglu.
He obtained Bachelor’s degree from Zhejiang University(ZJU) in Automation. He is enrolled in the Science and Engineering Honours Class, Chu Kochen Honours College from September 2009 to June 2011. He is selected for Tohoku University Science Summer Program in July 2012 and awarded a JASSO Scholarship for the performance during the program.