Learning to Reconstruct Intensıty Images From Events
Özet
The past decade has seen significant progress in computer vision, leading to diverse applications across various domains. However, today’s artificial vision systems remain in their infancy compared to their biological counterparts in robustness to challenging real-world scenarios, real-time processing capabilities, and computational efficiency. These shortcomings can be attributed to the classical frame-based acquisition and processing pipelines, which suffer from low temporal resolution, low dynamic range, motion blur, and redundant information flow.
A new class of visual sensory devices called event cameras offers promising solutions to these challenges. Instead of capturing frames collectively, the pixels of an event camera work independently and respond to local brightness variations by generating asynchronous signals called events. As a result, event cameras have many advantages over traditional frame-based sensors, such as high dynamic range, high temporal resolution, low latency, and minimal motion blur.
This thesis focuses on reconstructing intensity images from events. Reconstruction of intensity information leverages the advantages of events for high-quality imaging in challenging scenarios. This enables the application of established methods developed for frame-based images and facilitates human-centered applications involving event data. We present three main contributions on this task: a novel method surpassing existing ones in terms of image quality and efficiency, a comprehensive evaluation framework, and a large and diverse benchmark dataset.
First, we develop a novel dynamic neural network architecture based on hypernetworks, named HyperE2VID. HyperE2VID dynamically adapts to event data, unlike existing works that process events with static networks. Its context fusion module leverages complementary elements of event and frame domains, while its filter decomposition steps reduce computational cost. Thanks to this design, it surpasses existing methods in both image quality and computational efficiency.
Our second contribution is an open-source library for evaluating and analyzing event-based video reconstruction methods, called EVREAL. EVREAL allows us to evaluate different methods comprehensively, considering diverse and challenging scenarios, employing extensive real-world datasets, measuring robustness to several key variables, and assessing performance through multiple metrics and tasks. This evaluation ensures generalizability to real-world scenarios, fair comparison, and reproducibility.
Our third contribution is a new benchmark dataset, HUE. HUE has high resolution, contains numerous sequences taken in diverse scenarios, and focuses on low-light scenarios, a challenging but rewarding domain for event-based video reconstruction.
Using EVREAL, we evaluate HyperE2VID via extensive experiments on several datasets, including our proposed dataset HUE. We use various metrics to assess the image quality under different conditions. We also analyze computational complexity and present a detailed ablation study to validate the design choices of HyperE2VID. Our experimental results demonstrate the success of the proposed dynamic architecture, generating higher-quality videos than previous state-of-the-art methods across a wide range of settings, while also reducing memory consumption and inference time.
We expect the event-based vision literature to keep growing and event cameras to become more prominent in the coming years. We believe our method HyperE2VID, together with our evaluation framework EVREAL and benchmark dataset HUE, marks an important step towards enabling high-quality and robust imaging in a computationally efficient way.