Our
Projects
This is our project page. We are studying video processing and analysis using deep learning and various intelligent signal processing.
Multi-modal Sensor Fusion & Future Object Localization for Autonomous Driving
Future Object Localization (FOL) helps awareness of situations around the autonomous driving agent. The task is crucial in order to enhance the stability of autonomous driving systems including ADAS, which require perception, prediction, and planning steps. Since it is necessary for a driving agent to perceive and detect surroundings as well as its own motions, it should be equipped with various types of sensors. In our research, we propose “MS-FOLe (Multi-modal Sensor-fusion FOL with Ego-motion prediction)” that processes data acquired from various types of sensors and uses Deep Learning (DL) architecture for FOL. The work includes utilizing 3D point cloud data, 2D image processing, object detection, using cross-attention to fuse rgb and lidar data, and composing the whole DL architecture for the task.
Multi 360 Image Super-Resolution
One or more 360° images in adjacent views can be utilized to significantly improve the resolution of a target 360° image. In this work, we propose an efficient reference-based 360° image super-resolution (RefSR) technique to exploit a wide field of view (FoV) among adjacent 360° cameras. Latitude-aware convolution (LatConv) is designed to generate more robust features to circumvent the distortion and keep the image quality. We also develop synthetic 360° image datasets and introduce a synthetic-to-real learning scheme that transfers knowledge learned from synthetic 360° images to a deep neural network.
Video Question and Answering using Compressed Features for Deep Learning
Video Question Answering (Video QA) aims to give an answer to the question through semantic reasoning between visual and linguistic information. Recently, handling large amounts of multi-modal video and language information of a video is considered important in the industry. In this work, we develop a novel deep neural network to provide video QA features obtained from coded video bit-stream to reduce the complexity. The proposed network includes several dedicated deep modules to both the video QA and the video compression system, which is the first attempt at the video QA task.