Ara
Toplam kayıt 2, listelenen: 1-2
Towards Understandıng Intuıtıve Physıcs Wıth Language And Vısıon
(Fen Bilimleri Enstitüsü, 2021)
Visual question answering (VQA) is one of the difficult tasks in multimodal machine reasoning. VQA requires machines to provide correct answers to questions about an image or a video. Here, the machine should perceive the ...
Monocular Depth Estimation with Self-Supervised Representation Learning
(Fen Bilimleri Enstitüsü, 2022)
Many representation and modalities are developed for better scene understanding as images, videos, point clouds, etc. In this thesis, we intentionally characterize scene representation as depth maps in order to leverage ...