Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation

Research output: Contribution in Book/Catalog/Report/Conference proceedingConference contribution

66 Downloads (Pure)

Abstract

More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploiting SL corpora as bilingual corpora. We present and illustrate the main improvements we foresee in developing such an approach: downstream,
for the benefit of the linguistic description and the bilingual (signed - spoken) competence of teachers, learners and the users; and upstream, in order to enable the automatisation of the annotation process of sign language data. We also describe the methodology we are using to develop a concordancer able to turn SL corpora into searchable translation corpora, and to derive from it a tool support to annotation.
Original languageEnglish
Title of host publicationProceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining
Subtitle of host publicationLREC 2016
Pages159-166
Number of pages8
Publication statusAccepted/In press - 2016
Event7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining - Grand Hotel Bernardin Conference Center , Portoroz, Slovenia
Duration: 28 May 201628 May 2016

Publication series

NameProceedings of the Workshop on the Representation and Processing of Sign Languages

Seminar

Seminar7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining
CountrySlovenia
CityPortoroz
Period28/05/1628/05/16

Fingerprint

Linguistics
Data mining
Testing

Cite this

Meurant, L., Cleve, A., & Crasborn, O. (Accepted/In press). Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. In Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016 (pp. 159-166). (Proceedings of the Workshop on the Representation and Processing of Sign Languages).
Meurant, Laurence ; Cleve, Anthony ; Crasborn, Onno. / Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016. 2016. pp. 159-166 (Proceedings of the Workshop on the Representation and Processing of Sign Languages).
@inproceedings{c46629471d3a4ab999ae45c64189ae86,
title = "Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation",
abstract = "More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploiting SL corpora as bilingual corpora. We present and illustrate the main improvements we foresee in developing such an approach: downstream,for the benefit of the linguistic description and the bilingual (signed - spoken) competence of teachers, learners and the users; and upstream, in order to enable the automatisation of the annotation process of sign language data. We also describe the methodology we are using to develop a concordancer able to turn SL corpora into searchable translation corpora, and to derive from it a tool support to annotation.",
author = "Laurence Meurant and Anthony Cleve and Onno Crasborn",
year = "2016",
language = "English",
series = "Proceedings of the Workshop on the Representation and Processing of Sign Languages",
pages = "159--166",
booktitle = "Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining",

}

Meurant, L, Cleve, A & Crasborn, O 2016, Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. in Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016. Proceedings of the Workshop on the Representation and Processing of Sign Languages, pp. 159-166, 7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining, Portoroz, Slovenia, 28/05/16.

Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. / Meurant, Laurence; Cleve, Anthony; Crasborn, Onno.

Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016. 2016. p. 159-166 (Proceedings of the Workshop on the Representation and Processing of Sign Languages).

Research output: Contribution in Book/Catalog/Report/Conference proceedingConference contribution

TY - GEN

T1 - Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation

AU - Meurant, Laurence

AU - Cleve, Anthony

AU - Crasborn, Onno

PY - 2016

Y1 - 2016

N2 - More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploiting SL corpora as bilingual corpora. We present and illustrate the main improvements we foresee in developing such an approach: downstream,for the benefit of the linguistic description and the bilingual (signed - spoken) competence of teachers, learners and the users; and upstream, in order to enable the automatisation of the annotation process of sign language data. We also describe the methodology we are using to develop a concordancer able to turn SL corpora into searchable translation corpora, and to derive from it a tool support to annotation.

AB - More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploiting SL corpora as bilingual corpora. We present and illustrate the main improvements we foresee in developing such an approach: downstream,for the benefit of the linguistic description and the bilingual (signed - spoken) competence of teachers, learners and the users; and upstream, in order to enable the automatisation of the annotation process of sign language data. We also describe the methodology we are using to develop a concordancer able to turn SL corpora into searchable translation corpora, and to derive from it a tool support to annotation.

M3 - Conference contribution

T3 - Proceedings of the Workshop on the Representation and Processing of Sign Languages

SP - 159

EP - 166

BT - Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining

ER -

Meurant L, Cleve A, Crasborn O. Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. In Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016. 2016. p. 159-166. (Proceedings of the Workshop on the Representation and Processing of Sign Languages).