Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation

Laurence Meurant; Anthony Cleve; Onno Crasborn

Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation

Laurence Meurant, Anthony Cleve, Onno Crasborn

Research output: Contribution in Book/Catalog/Report/Conference proceeding › Conference contribution

77 Downloads (Pure)

Abstract

More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploiting SL corpora as bilingual corpora. We present and illustrate the main improvements we foresee in developing such an approach: downstream,
for the benefit of the linguistic description and the bilingual (signed - spoken) competence of teachers, learners and the users; and upstream, in order to enable the automatisation of the annotation process of sign language data. We also describe the methodology we are using to develop a concordancer able to turn SL corpora into searchable translation corpora, and to derive from it a tool support to annotation.

Original language	English
Title of host publication	Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining
Subtitle of host publication	LREC 2016
Pages	159-166
Number of pages	8
Publication status	Published - 2016
Event	7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining - Grand Hotel Bernardin Conference Center , Portoroz, Slovenia Duration: 28 May 2016 → 28 May 2016

Publication series

Name	Proceedings of the Workshop on the Representation and Processing of Sign Languages

Seminar

Seminar	7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining
Country/Territory	Slovenia
City	Portoroz
Period	28/05/16 → 28/05/16

Access to Document

http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-SignLanguage_Proceedings.pdf

Multimodal and contrastive approach to sign and spoken languages
Meurant, L.
1/03/19 → 1/11/21
Project: Research Axis
SILENT: Towards a corpus-based online tool for French-LSFB bilingual needs
Meurant, L. & Cleve, A.
1/01/16 → 31/12/16
Project: Research

2 Participation in conference
1 Research/Teaching in a external institution

EFSLI 2017
Laurence Meurant (Member of Scientific Committee)
2017
Activity: Participating in or organising an event types › Participation in conference
Université de Genève
Laurence Meurant (Invited lecturer), Maxime Gobert (Invited researcher) & Anthony Cleve (Invited lecturer)
9 Sept 2016
Activity: Visiting an external institution types › Research/Teaching in a external institution
7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining
Laurence Meurant (Speaker)
28 May 2016
Activity: Participating in or organising an event types › Participation in conference

Corpus LSFB. First digital open access corpus of movies and annotations of French Belgian Sign Language (LSFB).
Meurant, L. (Creator), Université de Namur, 15 Dec 2015
http://www.corpus-lsfb.be
Dataset

Cite this

Meurant, L., Cleve, A., & Crasborn, O. (2016). Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. In Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016 (pp. 159-166). (Proceedings of the Workshop on the Representation and Processing of Sign Languages). http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-SignLanguage_Proceedings.pdf

Meurant, Laurence ; Cleve, Anthony ; Crasborn, Onno. / Using sign language corpora as bilingual corpora for data mining : Contrastive linguistics and computer-assisted annotation. Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016. 2016. pp. 159-166 (Proceedings of the Workshop on the Representation and Processing of Sign Languages).

@inproceedings{c46629471d3a4ab999ae45c64189ae86,

title = "Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation",

abstract = "More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploiting SL corpora as bilingual corpora. We present and illustrate the main improvements we foresee in developing such an approach: downstream,for the benefit of the linguistic description and the bilingual (signed - spoken) competence of teachers, learners and the users; and upstream, in order to enable the automatisation of the annotation process of sign language data. We also describe the methodology we are using to develop a concordancer able to turn SL corpora into searchable translation corpora, and to derive from it a tool support to annotation.",

author = "Laurence Meurant and Anthony Cleve and Onno Crasborn",

year = "2016",

language = "English",

series = "Proceedings of the Workshop on the Representation and Processing of Sign Languages",

pages = "159--166",

booktitle = "Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining",

note = "7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining ; Conference date: 28-05-2016 Through 28-05-2016",

}

Meurant, L , Cleve, A & Crasborn, O 2016, Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. in Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016. Proceedings of the Workshop on the Representation and Processing of Sign Languages, pp. 159-166, 7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining, Portoroz, Slovenia, 28/05/16. <http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-SignLanguage_Proceedings.pdf>

Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. / Meurant, Laurence ; Cleve, Anthony; Crasborn, Onno.
Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016. 2016. p. 159-166 (Proceedings of the Workshop on the Representation and Processing of Sign Languages).

Research output: Contribution in Book/Catalog/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Using sign language corpora as bilingual corpora for data mining

T2 - 7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining

AU - Meurant, Laurence

AU - Cleve, Anthony

AU - Crasborn, Onno

PY - 2016

Y1 - 2016

N2 - More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploiting SL corpora as bilingual corpora. We present and illustrate the main improvements we foresee in developing such an approach: downstream,for the benefit of the linguistic description and the bilingual (signed - spoken) competence of teachers, learners and the users; and upstream, in order to enable the automatisation of the annotation process of sign language data. We also describe the methodology we are using to develop a concordancer able to turn SL corpora into searchable translation corpora, and to derive from it a tool support to annotation.

AB - More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploiting SL corpora as bilingual corpora. We present and illustrate the main improvements we foresee in developing such an approach: downstream,for the benefit of the linguistic description and the bilingual (signed - spoken) competence of teachers, learners and the users; and upstream, in order to enable the automatisation of the annotation process of sign language data. We also describe the methodology we are using to develop a concordancer able to turn SL corpora into searchable translation corpora, and to derive from it a tool support to annotation.

M3 - Conference contribution

T3 - Proceedings of the Workshop on the Representation and Processing of Sign Languages

SP - 159

EP - 166

BT - Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining

Y2 - 28 May 2016 through 28 May 2016

ER -

Meurant L , Cleve A, Crasborn O. Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation. In Proceedings of the 7th workshop on the Representation and Processing of Sign Languages:Corpus Mining: LREC 2016. 2016. p. 159-166. (Proceedings of the Workshop on the Representation and Processing of Sign Languages).

Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted annotation

Abstract

Publication series

Seminar

Access to Document

Fingerprint

Projects

Multimodal and contrastive approach to sign and spoken languages

SILENT: Towards a corpus-based online tool for French-LSFB bilingual needs

Activities

EFSLI 2017

Université de Genève

7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining

Datasets

Corpus LSFB. First digital open access corpus of movies and annotations of French Belgian Sign Language (LSFB).

Cite this