Local-global Data Augmentation for Contrastive Learning in Static Sign Language Recognition

Research output: Contribution in Book/Catalog/Report/Conference proceedingConference contribution

Abstract

Sign language (SL) is a visual language used by the Deaf community. Static sign language recognition (SLR) consists of classifying static hand configurations, i.e., signs, present in isolated images. Due to the expertise required for manual annotation, SLR suffers from a data scarcity issue. Recent studies show that contrastive learning is an effective method for addressing this issue by proposing an efficient unsupervised pre-training. Contrastive learning leverages data augmentation techniques applied to entire images (global-global augmentation). However, fine-tuned, contrastive models often rely on irrelevant aspects of those images, like the background, without focusing solely on the regions of interest. Such models are prone to bias that could lead to unreliable predictions. In response, this paper proposes a new local-global data augmentation technique that helps contrastive models focus during the fine-tuning step on regions of interest, i.e., the signer’s hands. This approach (i) improves the accuracy of contrastive learning by up to 15% on some SLR datasets, and (ii) help the fine-tuned contrastive models to better focus on relevant regions of images for SLR.
Original languageEnglish
Title of host publicationIDA 2025
Subtitle of host publication Intelligent Data Analysis
Publication statusAccepted/In press - 2 Feb 2025

Fingerprint

Dive into the research topics of 'Local-global Data Augmentation for Contrastive Learning in Static Sign Language Recognition'. Together they form a unique fingerprint.

Cite this