13 March 2023
ESA/ESTEC
Europe/Amsterdam timezone

Prompt–RSVQA: Prompting visual context to a language model for Remote Sensing Visual Question Answering

13 Mar 2023, 12:00
20m
Einstein (ESA/ESTEC)

Einstein

ESA/ESTEC

Keplerlaan 1 2201 AZ Noordwijk The Netherlands
Short Presentation Session 2 - Earth Observation

Speaker

Christel Chappuis (EPFL)

Description

Despite its potential, Earth observation (EO)-based information still remains difficult to access, mostly because of the technical requirements needed to convert the raw image data into actionable information (including the limited availability of vast labeled sets and the need for advanced machine learning skills). New ways to extract relevant information from images bypassing those requirements are needed to unleash the full potential of EO for the benefit of various application fields, such as environmental monitoring, agriculture, urban planning, tourism, etc. Remote sensing visual question answering (RSVQA) was proposed with the aim of interfacing natural language and vision to ease the access of information contained in Earth Observation data for a wide audience, which is granted by simple questions in natural language.

Traditionally, the vision/language interface is an embedding obtained by fusing features from two deep models, one processing the image and another the question. Despite the success of early VQA models, it remains difficult to control the adequacy of the visual information extracted by its deep model, which should act as a context regularising the work of the language model. We propose to extract this context information with a visual model, convert it to text and inject it, i.e. prompt it, into a language model. The language model is therefore responsible to process the question with the visual context, and extract features, which are useful to find the answer. We study the effect of prompting with respect to a black-box visual extractor and discuss the importance of training a visual model producing accurate context, i.e. image description.

Primary authors

Christel Chappuis (EPFL) Valérie Zermatten (EPFL) Sylvain Lobry (Université Paris Cité) Bertrand Le Saux (European Space Agency, Φ-lab) Devis Tuia (EPFL)

Presentation materials