TY - JOUR
T1 - Computational construction grammar for visual question answering
AU - Nevens, J.
AU - Van Eecke, P.
AU - Beuls, K.
N1 - Publisher Copyright:
© 2019 Walter de Gruyter GmbH, Berlin/Boston.
PY - 2019
Y1 - 2019
N2 - In order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.
AB - In order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.
KW - Computational Construction Grammar
KW - Fluid Construction Grammar
KW - Natural Language Understanding
KW - Procedural Semantics
KW - Visual Question Answering
UR - http://www.scopus.com/inward/record.url?scp=85076859115&partnerID=8YFLogxK
U2 - 10.1515/lingvan-2018-0070
DO - 10.1515/lingvan-2018-0070
M3 - Article
VL - 5
SP - 1
EP - 16
JO - Linguistics Vanguard
JF - Linguistics Vanguard
IS - 1
M1 - 20180070
ER -