Computational construction grammar for visual question answering

J. Nevens; P. Van Eecke; K. Beuls

doi:10.1515/lingvan-2018-0070

Computational construction grammar for visual question answering

J. Nevens, P. Van Eecke, K. Beuls

Faculte d'informatique

Résultats de recherche: Contribution à un journal/une revue › Article › Revue par des pairs

20 Téléchargements (Pure)

Résumé

In order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.

langue originale	Anglais
Numéro d'article	20180070
Pages (de - à)	1-16
Nombre de pages	16
journal	Linguistics Vanguard
Volume	5
Numéro de publication	1
Les DOIs	https://doi.org/10.1515/lingvan-2018-0070
Etat de la publication	Publié - 2019

Accès au document

10.1515/lingvan-2018-0070

nevens2019computationalVersion finale publiée, 1,23 MBLicense: CC BY

Autres fichiers et liens

Lien vers la publication sur Scopus

Contient cette citation

@article{2ddc9f7293914079b20d43c8f864d915,

title = "Computational construction grammar for visual question answering",

abstract = "In order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.",

keywords = "Computational Construction Grammar, Fluid Construction Grammar, Natural Language Understanding, Procedural Semantics, Visual Question Answering",

author = "J. Nevens and {Van Eecke}, P. and K. Beuls",

note = "Publisher Copyright: {\textcopyright} 2019 Walter de Gruyter GmbH, Berlin/Boston.",

year = "2019",

doi = "10.1515/lingvan-2018-0070",

language = "English",

volume = "5",

pages = "1--16",

journal = "Linguistics Vanguard",

publisher = "de Gruyter",

number = "1",

}

TY - JOUR

T1 - Computational construction grammar for visual question answering

AU - Nevens, J.

AU - Van Eecke, P.

AU - Beuls, K.

PY - 2019

Y1 - 2019

N2 - In order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.

AB - In order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.

KW - Computational Construction Grammar

KW - Fluid Construction Grammar

KW - Natural Language Understanding

KW - Procedural Semantics

KW - Visual Question Answering

UR - http://www.scopus.com/inward/record.url?scp=85076859115&partnerID=8YFLogxK

U2 - 10.1515/lingvan-2018-0070

DO - 10.1515/lingvan-2018-0070

M3 - Article

VL - 5

SP - 1

EP - 16

JO - Linguistics Vanguard

JF - Linguistics Vanguard

IS - 1

M1 - 20180070

ER -

Computational construction grammar for visual question answering

Résumé

Accès au document

Autres fichiers et liens

Empreinte digitale

Contient cette citation