SignLanguage-Dataset-Hub

๐ŸŒ Sign Language Dataset Hub

73 Datasets [26 Sign Languages] [100% Verified] License: CC BY 4.0 [PRs Welcome]

A curated, verified catalog of 73+ sign language datasets covering 26 sign languages โ€” the most comprehensive open collection for sign language recognition (SLR) research, gesture recognition, deaf community technology, and assistive AI development.


๐ŸŽฏ Mission

To democratize access to sign language technology by providing:

Helping developers, researchers, and the deaf community build better assistive technology.


๐Ÿ“Š Stats

Metric Count
Verified Datasets 73
Sign Languages 26
Modalities Video, Image, Sensor, Pose, RGB-D, Skeleton, Text
Source Verification 100% (all URLs checked)

Datasets by Language

Language Code Datasets Notable
American Sign Language ASL 11 MS-ASL, WLASL, How2Sign, OpenASL, ASLLVD
Arabic Sign Language ArSL 2 ArSL2018, KArSL
Australian Sign Language Auslan 1 Auslan Signbank
Bangla Sign Language BdSL 4 BdSL47, Ban-Sign-Sent-9K
Brazilian Sign Language Libras 2 Libras-UFPR, PHOENIX-Libras
British Sign Language BSL 3 BOBSL, BSL Corpus, BSL SignBank
Chinese Sign Language CSL 2 DEVISIGN, USTC-CSL
Dutch Sign Language NGT 1 CNGT Corpus
French Sign Language LSF 2 Dicta-Sign LSF, LSF-Dict
German Sign Language DGS 3 RWTH-PHOENIX-2014, PHOENIX-2014T, DGS Corpus
Greek Sign Language GSL 1 GSL-50
Indian Sign Language ISL 3 INCLUDE, ISL-CSLTR, ISL-Alphabet
Irish Sign Language ISL 1 ISL Corpus
Italian Sign Language LIS 1 ATIS
Japanese Sign Language JSL 1 J-ASL
Korean Sign Language KSL 1 KETI
Malaysian Sign Language BIM 1 MSL Dataset
Mexican Sign Language LSM 1 LSM Sign Language
Russian Sign Language RSL 2 RuSLAN, RSL-Signs
Swedish Sign Language SSL 1 SSL Corpus
Thai Sign Language TSL 1 TSL-51
Turkish Sign Language TฤฐD 1 AUTSL
Multilingual โ€” 5 SIGN-Hub, Dicta-Sign, SpreadTheSign, OpenSLR, SLP Toolkit
Linguistic DBs โ€” 6 ASL-LEX, BSL SignBank, Auslan Signbank, etc.

Datasets by Modality

Modality Count
Video 35+
Image 10+
Video + RGB-D + Skeleton 3
Sensor (IMU/Flex) 1
Linguistic / Dictionary 6+
Multilingual Corpus 5+

โœ… Verification Policy

All dataset source URLs in this repo have been verified. This means:

Found a broken link? Please open an issue.


๐Ÿ“š Browse Datasets

See DATASETS.md for the complete verified catalog with:


๐Ÿ“– Literature & Benchmarks


๐Ÿš€ Quick Start

Clone & Setup

git clone https://github.com/rudra496/SignLanguage-Dataset-Hub.git
cd SignLanguage-Dataset-Hub
pip install -r requirements.txt

Use the Demo Data (Bangla Sign Language Sensor Data)

from scripts.data_loader import BdSLSensorGloveDataset

# Load demo sensor data (4,824 samples)
dataset = BdSLSensorGloveDataset(split='train')
print(f"Loaded {len(dataset)} samples, 36 gesture classes")

sample = dataset[0]
print(f"Gesture: {sample['gesture_name']}")
print(f"Sensors shape: {sample['sensors'].shape}")

Visualize Sensor Data

python tools/visualize.py --data data/bdsl/BdSL-Sensor-Glove/

Browse Programmatically

import pandas as pd
df = pd.read_csv('datasets_catalog.csv')

# Filter by language
asl = df[df['language_code'] == 'ASL']
print(asl[['name', 'samples', 'source_url']])

# Filter by modality
video = df[df['modality'].str.contains('Video')]
print(f"Video datasets: {len(video)}")

Download External Datasets

# From Kaggle (requires Kaggle API key)
pip install kaggle
kaggle datasets download -d datamunge/sign-language-mnist
kaggle datasets download -d grassknoted/asl-alphabet
kaggle datasets download -d ahmedkhan123/arabic-sign-language

# From Hugging Face
pip install datasets
python -c "from datasets import load_dataset; ds = load_dataset('banglagov/Ban-Sign-Sent-9K-V1')"

# From Zenodo
wget https://zenodo.org/record/7067906/files/BdSL47.zip

๐Ÿ› ๏ธ Included Tools

Tool Description Location
Data Loader PyTorch dataset classes for sensor data scripts/data_loader.py
Download Script Multi-source dataset downloader scripts/download_datasets.py
Visualizer Sensor data visualization tools/visualize.py
Data Generator Demo data creation utilities tools/generate_realistic_data.py

๐Ÿ“ Repository Structure

SignLanguage-Dataset-Hub/
โ”œโ”€โ”€ DATASETS.md              # Complete verified dataset catalog (67 datasets)
โ”œโ”€โ”€ datasets_catalog.csv     # Machine-readable catalog
โ”œโ”€โ”€ STATISTICS.md            # Detailed statistics & breakdowns
โ”œโ”€โ”€ README.md                # This file
โ”œโ”€โ”€ CHANGELOG.md             # Version history
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ bdsl/
โ”‚       โ””โ”€โ”€ BdSL-Sensor-Glove/  # Demo sensor dataset (4,824 samples)
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ BENCHMARKS.md           # Published accuracy numbers & comparisons
โ”‚   โ”œโ”€โ”€ LICENSE_ATTRIBUTION.md  # Per-dataset license & citation info
โ”‚   โ”œโ”€โ”€ TUTORIALS.md            # 9 tutorials (beginner to advanced)
โ”‚   โ”œโ”€โ”€ QUICKSTART.md           # Quick start guide
โ”‚   โ””โ”€โ”€ CONTRIBUTING.md         # How to contribute
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ data_loader.py       # PyTorch data loaders
โ”‚   โ””โ”€โ”€ download_datasets.py # Multi-source downloader
โ”œโ”€โ”€ tools/
โ”‚   โ”œโ”€โ”€ visualize.py         # Sensor data visualization
โ”‚   โ””โ”€โ”€ generate_realistic_data.py  # Data generation
โ”œโ”€โ”€ .github/                 # Issue templates & PR template
โ”œโ”€โ”€ CITATION.cff             # Citation metadata
โ”œโ”€โ”€ LICENSE                  # CC BY 4.0
โ””โ”€โ”€ requirements.txt         # Python dependencies

๐Ÿ“– Tutorials

We include 9 tutorials from beginner to advanced:

# Tutorial Level
1 Introduction to Sign Language Recognition Beginner
2 Loading and Exploring Datasets Beginner
3 Visualizing Sign Language Data Beginner
4 Building Your First Classifier Intermediate
5 Hand Pose Estimation with MediaPipe Intermediate
6 Data Augmentation Techniques Intermediate
7 Real-time Recognition System Advanced
8 Continuous Sign Language Recognition Advanced
9 Multilingual Sign Recognition Advanced

See docs/TUTORIALS.md and docs/QUICKSTART.md.


๐Ÿ“š Citation

If you use this repository, please cite:

@misc{signlanguage_dataset_hub,
  title     = {Sign Language Dataset Hub: A Verified Catalog of Sign Language Datasets},
  author    = {Sarker, Rudra and Contributors},
  year      = {2026},
  url       = {https://github.com/rudra496/SignLanguage-Dataset-Hub}
}

Please also cite the original dataset creators when using their data. See docs/LICENSE_ATTRIBUTION.md for per-dataset citation information.


๐Ÿค Contributing

We welcome contributions! See docs/CONTRIBUTING.md.


๐Ÿค” Why This Repo?

Feature This Repo Typical SLR Papers/GitHub Lists Kaggle Collections
Datasets 73+ curated 5โ€“20 mentioned inline 10โ€“30, unverified
Sign Languages 26 1โ€“5 3โ€“10
URL Verification โœ… All checked โŒ Often broken links โŒ Mixed
Sample Counts From original sources Inconsistent User-reported
License Info โœ… Per dataset Rarely included Rarely included
Modality Tags โœ… All datasets Partial Tags vary
Tools & Scripts โœ… Included โŒ โŒ
Demo Datasets โœ… Included โŒ โŒ
Open Source CC BY 4.0 Varies Varies
Actively Maintained โœ… Usually one-time Community

Rules:

  1. Every dataset must have a verifiable source URL
  2. Sample counts must come from the original source
  3. Include license and citation information
  4. No placeholder or fabricated data โ€” ever

๐Ÿ“– Citation

If you use this dataset catalog in your research, please cite:

@misc{signlanguage_dataset_hub,
  title   = {Sign Language Dataset Hub: A Curated Catalog of 73+ Verified Datasets for 26 Sign Languages},
  author  = {Sarker, Rudra},
  year    = {2025},
  url     = {https://github.com/rudra496/SignLanguage-Dataset-Hub},
  note    = {Version 1.0}
}

๐Ÿ“„ License

This repository is licensed under CC BY 4.0.

Individual datasets have their own licenses โ€” see docs/LICENSE_ATTRIBUTION.md for details. Some datasets are research-use only and may require institutional agreements.


๐Ÿ™ Acknowledgments

This hub would not be possible without the researchers and organizations who created and shared these datasets:


๐Ÿ“ž Contact


๐ŸŒ Connect

GitHub LinkedIn X Facebook YouTube Dev.to ResearchGate


Built with โค๏ธ by Rudra Sarker
CC BY 4.0 License ยท Free & Open Source Forever