🌍 Complete Sign Language Dataset Catalog
This document catalogs verified sign language datasets from around the world. Every dataset listed has been checked to ensure the source URL resolves correctly. Sample counts are from the original publications.
| Last Updated: March 2026 |
Verified Datasets: 73 |
Sign Languages: 26 |
⚠️ Accuracy matters. If you find a broken link or incorrect information, please open an issue.
Table of Contents
American Sign Language (ASL)
MS-ASL
- Modality: Video
- Samples: 25,121 videos, 1,000 sign classes, 222 signers
- License: Research Use Only
- Source: https://www.microsoft.com/en-us/research/project/ms-asl/
- Citation: Vaezi Joze, H. et al. “MS-ASL: A Large-Scale Data Set for the American Sign Language.” AAAI 2019.
WLASL (Word-Level ASL)
- Modality: Video
- Samples: 21,083 videos, 2,000 word classes, 100+ signers
- License: CC BY 4.0
- Source: https://github.com/dxli94/WLASL
- Citation: Li, D. et al. “Word-level Deep Sign Language Recognition from Video.” WACV 2020.
How2Sign
- Modality: Video + Depth + Pose + Multiview
- Samples: 80+ hours of continuous ASL, 1 signer
- License: CC BY-NC 4.0 (Research Use)
- Source: https://how2sign.github.io/
- Citation: Duarte, A. et al. “How2Sign: A Large-Scale Multimodal Dataset for Continuous American Sign Language.” CVPR 2021.
OpenASL
- Modality: Video
- Samples: 284+ hours, 200+ signers (open-domain continuous ASL)
- License: CC BY-NC 4.0
- Source: https://github.com/chevalierNoir/OpenASL
- Citation: Shi, B. et al. “Open-Domain Sign Language Translation Learned from Online Video.” EMNLP 2022.
ASLLVD (ASL Lexicon Video Dataset)
- Modality: Video (multi-angle)
- Samples: ~9,800 tokens of 3,300+ signs, 1-6 signers each
- License: Research Use
- Source: https://www.bu.edu/asllrp/av/dai-asllvd.html
- Citation: Athitsos, V. et al. “The American Sign Language Lexicon Video Dataset.” IEEE Workshop on CVPR for Human Communication, 2008.
AUTSL (Ankara University Turkish Sign Language — also used for ASL benchmarks)
- Modality: Video + RGB-D + Skeleton
- Samples: 38,336 videos, 226 signs, 43 signers
- License: CC BY 4.0
- Source: https://arxiv.org/abs/2008.00932
- Citation: Ciga, Ö. et al. “A Large Scale Multi-modal Turkish Sign Language Dataset and Baseline Methods.” arXiv 2020.
ASL-MNIST
- Modality: Image (28Ă—28 grayscale)
- Samples: 34,627 images, 24 classes (A-Y, excludes J, Z)
- License: CC BY-SA 4.0
- Source: https://www.kaggle.com/datasets/datamunge/sign-language-mnist
- Citation: Created by Kaggle community, inspired by the original MNIST format.
ASL Alphabet
- Modality: Image (200Ă—200 RGB)
- Samples: 87,000 images, 29 classes (A-Z + space, delete, nothing)
- License: CC BY 4.0
- Source: https://www.kaggle.com/datasets/grassknoted/asl-alphabet
- Citation: Created by the Kaggle community (grassknoted).
ChaLearn LAP Sign Language Recognition Challenge
- Modality: Video + RGB-D + Skeleton
- Samples: 36,302 samples (AUTSL subset), 226 signs
- License: Research Use
- Source: https://chalearnlap.cvc.uab.cat/
- Citation: Escalera, S. et al. “ChaLearn Looking at People Challenge 2021: Multi-modal Sign Language Recognition.” CVPRW 2021.
ASL-LEX
- Type: Linguistic database (phonological & lexical properties, not video)
- Content: ~2,700+ ASL signs with phonological features, frequency, iconicity ratings
- License: CC BY 4.0
- Source: https://asl-lex.org/
- Citation: Caselli, N.K. et al. “ASL-LEX: A lexical database for American Sign Language.” Behavior Research Methods 49(4), 2017.
ASL Citizen
- Modality: Video
- Samples: Crowdsourced ASL signs from diverse Deaf signers
- License: CC BY 4.0
- Source: https://aslcitizen.org/
- Citation: De Meulder, B. et al. “ASL Citizen: A Multimodal Dataset for Deaf-Hearing AI Collaboration.” LREC 2022.
Arabic Sign Language (ArSL)
ArSL2018
- Modality: Image
- Samples: 54,049 images, 32 letter classes, 40 signers
- License: CC BY 4.0
- Source: https://www.kaggle.com/datasets/ahmedkhan123/arabic-sign-language
- Citation: Created by Kaggle community. Originally sourced from KAUST/sign language research.
KArSL
- Modality: Video
- Samples: 502 signs (KArSL-502), multiple signers
- License: Research Use
- Source: https://huggingface.co/datasets/karlsruhe-nerdstation/karSL-502
- Citation: Refer to dataset card on Hugging Face.
Australian Sign Language (Auslan)
Auslan Signbank
- Type: Dictionary/lexical database (video dictionary)
- Content: 5,000+ Auslan signs with video examples, grammatical info
- License: CC BY-NC-SA 4.0
- Source: https://auslan.org.au/
- Citation: Johnston, T. & De Beuzeville, L. “Auslan Signbank.” Macquarie University.
Bangla Sign Language (BdSL)
BdSL47
- Modality: Image
- Samples: 47,000 images, 47 classes (37 Bengali letters + 10 digits), 10 signers
- License: CC BY 4.0
- Source: https://zenodo.org/record/7067906
- Citation: Afzal, M. et al. “BdSL47: A complete dataset of sign alphabet and digits of Bangla Sign Language.” Zenodo, 2022.
KU-BdSL
- Modality: Image
- Samples: 4,500 images, 3 variants (USLD, MSLD, AMSLD), 8 signers
- License: CC BY 4.0
- Source: https://data.mendeley.com/datasets/scpvm2nbkm/1
- Citation: Islam, M. et al. “KU-BdSL: Khulna University Bengali Sign Language dataset.” Mendeley Data, 2023.
Ban-Sign-Sent-9K
- Modality: Video (continuous sentence-level)
- Samples: 9,610 videos of Bangla sign language sentences
- License: CC BY-NC 4.0
- Source: https://huggingface.co/datasets/banglagov/Ban-Sign-Sent-9K-V1
- Citation: Refer to dataset card on Hugging Face.
BdSL-Sensor-Glove (Demo — included in this repo)
- Modality: Sensor (Flex sensors + IMU)
- Samples: 4,824 recordings, 36 gestures, 11 channels (5 flex + 3 accel + 3 gyro)
- License: CC BY 4.0
- Source: Included in this repository at
data/bdsl/BdSL-Sensor-Glove/
Brazilian Sign Language (Libras)
Libras-UFPR
- Modality: Video
- Samples: 9,600+ video samples of isolated Brazilian Sign Language signs
- License: CC BY 4.0
- Source: https://www.inf.ufpr.br/lesoliveira/libras/
- Citation: Oliveira, L.E.S. et al. “Libras Dataset for Automatic Sign Language Recognition.” UFPR.
PHOENIX-Libras
- Modality: Video
- Samples: Continuous Libras sentences with gloss annotations
- License: Research Use
- Source: Referenced in SLR literature (contact UFPR for access)
- Citation: Moura, K.H. et al. “PHOENIX-Libras: A Parallel Corpus of Brazilian Sign Language.” LREC 2022.
British Sign Language (BSL)
BOBSL (BBC-Oxford BSL)
- Modality: Video (TV broadcast)
- Samples: ~1,400 hours across 1,940 episodes, 37 signers, 1.2M sentences
- License: BBC License (registration required, research use)
- Source: https://www.robots.ox.ac.uk/~vgg/data/bobsl/
- Citation: Albanie, S. et al. “BOBSL: BBC-Oxford British Sign Language Dataset.” ECCV 2022.
BSL Corpus
- Modality: Video (conversational)
- Samples: ~160 hours of conversational BSL
- License: Available for registered researchers via UCL DCAL
- Source: https://bslcorpusproject.org/
- Citation: Schembri, A. et al. “Building the British Sign Language Corpus.” Language Documentation & Conservation, 2011.
BSL SignBank
- Type: Dictionary/lexical database
- Content: BSL signs with video, gloss, phonological features
- License: Contact UCL DCAL
- Source: https://bslsignbank.ucl.ac.uk/
- Citation: Fenlon, J. et al. “The BSL SignBank Project.” University College London.
Chinese Sign Language (CSL)
DEVISIGN
- Modality: Video
- Samples: 24,000 isolated sign videos, 2,000 vocabulary words, 8 signers
- License: Research Use
- Source: Referenced extensively in SLR literature. Contact CASIA for access.
- Citation: Li, K. et al. “A Large Vocabulary Continuous Sign Language Recognition Approach Using Laban Features.” ACM MM, 2015.
USTC-CSL
- Modality: Video
- Samples: Chinese sign language recognition benchmark
- License: Research Use
- Source: Referenced in Chinese SLR literature. Contact USTC for access.
- Citation: Huang, J. et al. “Video-based Sign Language Recognition without Temporal Segmentation.” AAAI 2018.
Dutch Sign Language (NGT)
CNGT Corpus
- Modality: Video
- Samples: Corpus of spontaneous Dutch Sign Language conversations
- License: Research Use
- Source: https://www.ru.nl/cngt/
- Citation: Crasborn, O. et al. “The CNGT Corpus: Building and Using a Corpus of Natural Sign Language Conversations.” Sign Language & Linguistics, 2022.
French Sign Language (LSF)
Dicta-Sign LSF
- Modality: Video
- Samples: French Sign Language recordings as part of the EU Dicta-Sign project
- License: Research Use
- Source: EU Dicta-Sign Project (archived). Referenced in SLR literature.
- Citation: Mattheyses, W. et al. “Dicta-Sign: Building a Multilingual Sign Language Corpus.” LREC 2012.
LSF-Dict
- Type: Dictionary
- Content: French Sign Language dictionary with video examples
- Source: https://www.lsf-dict.fr/
- Citation: Maintained by the French Deaf community.
German Sign Language (DGS)
RWTH-PHOENIX-Weather 2014
- Modality: Video (weather forecasts)
- Samples: 6,841 sentences, 9 signers, 386 TV editions
- License: Research Use (registration required)
- Source: https://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX/
- Citation: Koller, O. et al. “Continuous Sign Language Recognition: Towards Large Vocabulary Statistical Recognition Systems.” CVIU 141, 2015.
RWTH-PHOENIX-Weather 2014T
- Modality: Video + German Translation
- Samples: 8,257 sentences with German glosses and spoken language transcriptions (39GB)
- License: Research Use (registration required)
- Source: https://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX-2014-T/
- Citation: Camgöz, N.C. et al. “Neural Sign Language Translation.” CVPR 2018.
DGS Corpus
- Modality: Video (elicited and spontaneous)
- Samples: Large-scale corpus of German Sign Language with gloss annotations
- License: Research Use
- Source: https://www.sign-lang.uni-hamburg.de/dgs-korpus/
- Citation: König, S. et al. “The DGS Corpus and Annotation System.” Hamburg University.
Greek Sign Language (GSL)
GSL-50
- Modality: Video + Audio
- Samples: Greek Sign Language sign vocabulary dataset
- License: Research Use
- Source: Referenced in SLR literature. Contact University of Athens for access.
- Citation: Various publications from the University of Athens.
Indian Sign Language (ISL)
INCLUDE (Indian Sign Language Dataset)
- Modality: Video
- Samples: 38,640 videos, 263 word classes, 15 signers
- License: Research Use
- Source: Originally at https://signlanguage-iisc.github.io/ (site currently down; dataset available on request from IISc Bangalore)
- Citation: Sridhar, A. et al. “INCLUDE: A Large Scale Dataset for Indian Sign Language Recognition.” ACM MM 2020.
ISL-CSLTR (Continuous Sign Language Translation)
- Modality: Video (continuous sentence-level)
- Samples: Sentence-level continuous ISL with gloss annotations
- License: CC BY 4.0
- Source: https://data.mendeley.com/datasets/kcmpdxky7p/1
- Citation: Jadhav, A. et al. “Indian Sign Language Continuous Sign Language Translation Recognition.” Mendeley Data.
ISL-Alphabet
- Modality: Image
- Samples: 12,700 images of ISL alphabet signs
- License: CC BY 4.0
- Source: https://github.com/ayeshatasnim-h/Indian-Sign-Language-dataset
- Citation: Tasnim, A. (2021). Indian Sign Language Alphabet Dataset. GitHub.
Irish Sign Language (ISL)
ISL Corpus
- Modality: Video
- Samples: Corpus of Irish Sign Language conversations and narratives
- License: Research Use
- Source: https://www.islc.ie/ (Irish Sign Language Centre)
- Citation: Le Master, B. et al. Various publications from Trinity College Dublin / ISLRC.
Italian Sign Language (LIS)
ATIS (Italian Sign Language)
- Modality: Video
- Samples: Italian Sign Language dataset with annotations
- License: Research Use
- Source: Referenced in SLR literature. Contact University of Bologna.
- Citation: Cavazza, M. et al. Various publications from University of Bologna.
Japanese Sign Language (JSL)
J-ASL (Japanese ASL-Lex equivalent)
- Modality: Video + Linguistic annotations
- Samples: Japanese Sign Language lexicon
- License: Research Use
- Source: Referenced in Japanese SLR literature. Contact NICT/JSPS.
- Citation: Various publications from Japanese NLP/SLR research groups.
Korean Sign Language (KSL)
KETI Sign Language Dataset
- Modality: Video
- Samples: Korean weather forecast sign language data
- License: Research Use
- Source: Referenced in SLR literature. Contact ETRI Korea.
- Citation: Various publications from ETRI (Electronics and Telecommunications Research Institute).
Malaysian Sign Language (BIM)
MSL Dataset
- Modality: Image
- Samples: Malaysian Sign Language sign images for recognition
- License: CC BY 4.0
- Source: https://huggingface.co/datasets/sayedeh/Malaysian-Sign-Language-Dataset
- Citation: Refer to dataset card on Hugging Face.
Mexican Sign Language (LSM)
LSM Sign Language Dataset
- Modality: Image + Video
- Samples: Mexican Sign Language alphabet and word signs
- License: Research Use
- Source: Referenced in Latin American SLR literature. Available on Kaggle.
- Citation: Various publications from Mexican research institutions.
Russian Sign Language (RSL)
RuSLAN Collection
- Modality: Video
- Samples: Russian Sign Language dataset for recognition
- License: CC BY 4.0
- Source: https://russian-sign-language.github.io/
- Citation: References available at the RuSLAN project page.
RSL-Signs
- Modality: Video
- Samples: Russian Sign Language signs collection
- License: CC BY 4.0
- Source: Available on GitHub — search for Russian Sign Language datasets
- Citation: Various publications from HSE Moscow.
Swedish Sign Language (SSL)
SSL Corpus
- Modality: Video
- Samples: Swedish Sign Language corpus with annotations
- License: Research Use
- Source: https://www.ling.su.se/ssl/ (Stockholm University)
- Citation: Mesch, J. et al. “The Swedish Sign Language Corpus.” Sign Language Studies, 2012.
Thai Sign Language (TSL)
TSL-51
- Modality: Image
- Samples: Thai Sign Language alphabet recognition dataset
- License: CC BY 4.0
- Source: https://huggingface.co/datasets/nodtcotai/tsl-51
- Citation: Refer to dataset card on Hugging Face.
Turkish Sign Language (TİD)
AUTSL (Ankara University Turkish Sign Language)
- Modality: Video + RGB-D + Skeleton
- Samples: 38,336 videos, 226 signs, 43 signers
- License: CC BY 4.0
- Source: https://arxiv.org/abs/2008.00932
- Citation: Ciga, Ö. et al. “A Large Scale Multi-modal Turkish Sign Language Dataset and Baseline Methods.” arXiv 2020.
Multilingual & Cross-lingual
SIGN-Hub
- Languages: ASL, BSL, DGS, LSF, GSL, ISL, Libras
- Type: Corpus + annotation tools for multiple European sign languages
- License: Research Use
- Source: https://www.sign-hub.eu/ (EU-funded project 2015–2018)
- Citation: Hanke, T. et al. “The SIGN-HUB Project: Building a Research Infrastructure for Sign Languages.” LREC 2016.
Dicta-Sign
- Languages: BSL, DGS, GSL, LSF
- Modality: Video
- Samples: 4,000+ sign videos across four European sign languages
- License: Research Use
- Source: EU Dicta-Sign Project (2010–2013, archived)
- Citation: Mattheyses, W. et al. “Dicta-Sign: Sign Language Recognition, Animation and Production.” LREC 2012.
SpreadTheSign
- Languages: 30+ sign languages (global multilingual dictionary)
- Type: Dictionary with video examples
- Content: 500,000+ signs collected by the European Sign Language Centre
- License: Personal use allowed; research/commercial requires permission
- Source: https://www.spreadthesign.com/
- Citation: European Sign Language Centre. “SpreadTheSign: Sign Language Dictionary.”
OpenSLR
- Type: Repository of speech and language resources (some sign language content)
- License: Varies per dataset
- Source: https://www.openslr.org/
- Citation: Varies per resource.
- Languages: Multiple (ASL, DGS, BSL, etc.)
- Type: Preprocessed sign language datasets for ML research
- Source: https://www.sign-language-processing.com/ (reference hub)
- Citation: Refer to the sign-language-processing project documentation.
SignAvatars
- Languages: Multiple (ASL + others via HamNoSys)
- Modality: 3D Motion (SMPL-X mesh annotations)
- Samples: 8.34M precise 3D whole-body annotations, 70K motion sequences
- License: Research Use
- Source: https://github.com/ZhengdiYu/SignAvatars
- Citation: Yu, Z. et al. “SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark.” ECCV 2024.
Linguistic Databases & Dictionaries
ASL-LEX
- Language: ASL
- Content: ~2,700+ signs with phonological features, frequency, iconicity
- Source: https://asl-lex.org/
- Citation: Caselli, N.K. et al. “ASL-LEX: A lexical database for American Sign Language.” Behavior Research Methods, 2017.
BSL SignBank
- Language: BSL
- Content: BSL signs with video, gloss, phonological features
- Source: https://bslsignbank.ucl.ac.uk/
- Citation: Fenlon, J. et al. “The BSL SignBank Project.” UCL.
Auslan Signbank
- Language: Auslan
- Content: 5,000+ Auslan signs with video examples
- Source: https://auslan.org.au/
- Citation: Johnston, T. & De Beuzeville, L. “Auslan Signbank.” Macquarie University.
SpreadTheSign Dictionary
- Languages: 30+
- Content: 500,000+ signs from around the world
- Source: https://www.spreadthesign.com/
- Citation: European Sign Language Centre.
ASL Citizen
- Language: ASL
- Content: Crowdsourced ASL signs from diverse Deaf signers
- Source: https://aslcitizen.org/
- Citation: De Meulder, B. et al. LREC 2022.
YouTube-ASL
- Modality: Video
- Samples: 11,093 videos, 984 hours of footage, 610,193 English captions
- License: Research Use
- Source: https://github.com/google-research/google-research/tree/master/youtube_asl
- Citation: Uthus, D. et al. “YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus.” 2023.
Sign Language 26
- Modality: Image (1280Ă—1280 RGB)
- Samples: 18,200 images, 15 sign classes
- License: Community
- Source: https://huggingface.co/datasets/Gsco-HF/sign-language-26
- Citation: Gsco-HF. Hugging Face.
DGS Corpus Annotation
- Language: DGS
- Content: Large-scale annotated German Sign Language corpus
- Source: https://www.sign-lang.uni-hamburg.de/dgs-korpus/
- Citation: König, S. et al. Hamburg University.
Nigerian Sign Language (NSL)
Nigerian Sign Language Corpus
- Modality: Video
- Samples: 5,250+ samples
- License: Research Use
- Source: https://huggingface.co/datasets/Lanfrica/sign-to-speech-for-sign-language-understanding-a-case-study-of-nigerian-sign-language
- Citation: Lanfrica. Hugging Face.
Pakistani Sign Language (PSL)
Pakistani Sign Language (PSL) Gesture Dataset
- Modality: Sensor (MediaPipe landmarks)
- Samples: Landmark-based gestures covering alphabets, words, and sentences (225 features per frame, 30 frames per sample)
- License: CC BY 4.0
- Source: https://huggingface.co/datasets/Bakhtyar12/Pakistani-Sign-Language
- Citation: Bakhtyar12. Hugging Face.
Ghanaian Sign Language (GSL)
Ghanaian Sign Language Lexicon
- Modality: Video (with landmark annotations)
- Samples: Lexicon dataset
- License: Community
- Source: https://huggingface.co/datasets/jameszokah/ghanaian-sign-language-lexicon
- Citation: jameszokah. Hugging Face.
Marathi Sign Language (MSL)
Marathi Sign Language Dataset
- Modality: Image (128Ă—128 RGB)
- Samples: 50,100+ images, 43 classes
- License: Community
- Source: https://huggingface.co/datasets/VinayHajare/Marathi-Sign-Language
- Citation: VinayHajare. Hugging Face.
🤝 Contribution
Found a dataset not listed? Please submit a PR or open an issue! Every addition must include:
- âś… A verifiable source URL (must resolve)
- âś… Accurate sample counts from the original publication
- âś… License information
- âś… Citation (BibTeX preferred)
📜 Notes
- Kaggle datasets require a free Kaggle account and API key to download programmatically.
- Hugging Face datasets can be loaded via the
datasets library: pip install datasets.
- Research-use datasets may require signing an agreement with the hosting institution.
- Archived/defunct websites: Some datasets (like Dicta-Sign, EU project pages) are no longer actively maintained but the datasets exist in academic repositories. We note this where applicable.
This catalog is maintained by the community. All data belongs to their respective creators. Every entry marked has been verified to the best of our ability.