SignLanguage-Dataset-Hub

๐ŸŒ Sign Language Dataset Hub

73 Datasets [26 Sign Languages] [100% Verified] License: CC BY 4.0 [PRs Welcome]

A curated, verified catalog of 73+ sign language datasets covering 26 sign languages โ€” an open collection for sign language recognition (SLR) research, gesture recognition, deaf community technology, and assistive AI development.


๐ŸŽฏ Mission

To democratize access to sign language technology by providing:

Helping developers, researchers, and the deaf community build better assistive technology.


๐Ÿ“Š Stats

Metric Count
Verified Datasets 73
Sign Languages 26
Modalities Video, Image, Sensor, Pose, RGB-D, Skeleton, Text
Source Verification 100% (all URLs checked)

Datasets by Language

Language Code Datasets Notable
American Sign Language ASL 11 MS-ASL, WLASL, How2Sign, OpenASL, ASLLVD
Arabic Sign Language ArSL 2 ArSL2018, KArSL
Australian Sign Language Auslan 1 Auslan Signbank
Bangla Sign Language BdSL 4 BdSL47, Ban-Sign-Sent-9K
Brazilian Sign Language Libras 2 Libras-UFPR, PHOENIX-Libras
British Sign Language BSL 3 BOBSL, BSL Corpus, BSL SignBank
Chinese Sign Language CSL 2 DEVISIGN, USTC-CSL
Dutch Sign Language NGT 1 CNGT Corpus
French Sign Language LSF 2 Dicta-Sign LSF, LSF-Dict
German Sign Language DGS 3 RWTH-PHOENIX-2014, PHOENIX-2014T, DGS Corpus
Greek Sign Language GSL 1 GSL-50
Indian Sign Language ISL 3 INCLUDE, ISL-CSLTR, ISL-Alphabet
Irish Sign Language ISL 1 ISL Corpus
Italian Sign Language LIS 1 ATIS
Japanese Sign Language JSL 1 J-ASL
Korean Sign Language KSL 1 KETI
Malaysian Sign Language BIM 1 MSL Dataset
Mexican Sign Language LSM 1 LSM Sign Language
Russian Sign Language RSL 2 RuSLAN, RSL-Signs
Swedish Sign Language SSL 1 SSL Corpus
Thai Sign Language TSL 1 TSL-51
Turkish Sign Language TฤฐD 1 AUTSL
Multilingual โ€” 5 SIGN-Hub, Dicta-Sign, SpreadTheSign, OpenSLR, SLP Toolkit
Linguistic DBs โ€” 6 ASL-LEX, BSL SignBank, Auslan Signbank, etc.

Datasets by Modality

Modality Count
Video 35+
Image 10+
Video + RGB-D + Skeleton 3
Sensor (IMU/Flex) 1
Linguistic / Dictionary 6+
Multilingual Corpus 5+

โœ… Verification Policy

All dataset source URLs in this repo have been verified. This means:

Found a broken link? Please open an issue.


๐Ÿ“š Browse Datasets

See DATASETS.md for the complete verified catalog with:


๐Ÿ“– Literature & Benchmarks


๐Ÿš€ Quick Start

Clone & Setup

git clone https://github.com/rudra496/SignLanguage-Dataset-Hub.git
cd SignLanguage-Dataset-Hub
pip install -r requirements.txt

Use the Demo Data (Bangla Sign Language Sensor Data)

from scripts.data_loader import BdSLSensorGloveDataset

# Load demo sensor data (4,824 samples)
dataset = BdSLSensorGloveDataset(split='train')
print(f"Loaded {len(dataset)} samples, 36 gesture classes")

sample = dataset[0]
print(f"Gesture: {sample['gesture_name']}")
print(f"Sensors shape: {sample['sensors'].shape}")

Visualize Sensor Data

python tools/visualize.py --data data/bdsl/BdSL-Sensor-Glove/

Browse Programmatically

import pandas as pd
df = pd.read_csv('datasets_catalog.csv')

# Filter by language
asl = df[df['language_code'] == 'ASL']
print(asl[['name', 'samples', 'source_url']])

# Filter by modality
video = df[df['modality'].str.contains('Video')]
print(f"Video datasets: {len(video)}")

Download External Datasets

# From Kaggle (requires Kaggle API key)
pip install kaggle
kaggle datasets download -d datamunge/sign-language-mnist
kaggle datasets download -d grassknoted/asl-alphabet
kaggle datasets download -d ahmedkhan123/arabic-sign-language

# From Hugging Face
pip install datasets
python -c "from datasets import load_dataset; ds = load_dataset('banglagov/Ban-Sign-Sent-9K-V1')"

# From Zenodo
wget https://zenodo.org/record/7067906/files/BdSL47.zip

๐Ÿ› ๏ธ Included Tools

Tool Description Location
Data Loader PyTorch dataset classes for sensor data scripts/data_loader.py
Download Script Multi-source dataset downloader scripts/download_datasets.py
Visualizer Sensor data visualization tools/visualize.py
Data Generator Demo data creation utilities tools/generate_realistic_data.py

๐Ÿ“ Repository Structure

SignLanguage-Dataset-Hub/
โ”œโ”€โ”€ DATASETS.md              # Complete verified dataset catalog (67 datasets)
โ”œโ”€โ”€ datasets_catalog.csv     # Machine-readable catalog
โ”œโ”€โ”€ STATISTICS.md            # Detailed statistics & breakdowns
โ”œโ”€โ”€ README.md                # This file
โ”œโ”€โ”€ CHANGELOG.md             # Version history
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ bdsl/
โ”‚       โ””โ”€โ”€ BdSL-Sensor-Glove/  # Demo sensor dataset (4,824 samples)
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ BENCHMARKS.md           # Published accuracy numbers & comparisons
โ”‚   โ”œโ”€โ”€ LICENSE_ATTRIBUTION.md  # Per-dataset license & citation info
โ”‚   โ”œโ”€โ”€ TUTORIALS.md            # 9 tutorials (beginner to advanced)
โ”‚   โ”œโ”€โ”€ QUICKSTART.md           # Quick start guide
โ”‚   โ””โ”€โ”€ CONTRIBUTING.md         # How to contribute
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ data_loader.py       # PyTorch data loaders
โ”‚   โ””โ”€โ”€ download_datasets.py # Multi-source downloader
โ”œโ”€โ”€ tools/
โ”‚   โ”œโ”€โ”€ visualize.py         # Sensor data visualization
โ”‚   โ””โ”€โ”€ generate_realistic_data.py  # Data generation
โ”œโ”€โ”€ .github/                 # Issue templates & PR template
โ”œโ”€โ”€ CITATION.cff             # Citation metadata
โ”œโ”€โ”€ LICENSE                  # CC BY 4.0
โ””โ”€โ”€ requirements.txt         # Python dependencies

๐Ÿ“– Tutorials

We include 9 tutorials from beginner to advanced:

# Tutorial Level
1 Introduction to Sign Language Recognition Beginner
2 Loading and Exploring Datasets Beginner
3 Visualizing Sign Language Data Beginner
4 Building Your First Classifier Intermediate
5 Hand Pose Estimation with MediaPipe Intermediate
6 Data Augmentation Techniques Intermediate
7 Real-time Recognition System Advanced
8 Continuous Sign Language Recognition Advanced
9 Multilingual Sign Recognition Advanced

See docs/TUTORIALS.md and docs/QUICKSTART.md.


๐Ÿ“š Citation

If you use this repository, please cite:

@misc{signlanguage_dataset_hub,
  title     = {Sign Language Dataset Hub: A Verified Catalog of Sign Language Datasets},
  author    = {Sarker, Rudra and Contributors},
  year      = {2026},
  url       = {https://github.com/rudra496/SignLanguage-Dataset-Hub}
}

Please also cite the original dataset creators when using their data. See docs/LICENSE_ATTRIBUTION.md for per-dataset citation information.


๐Ÿค Contributing

We welcome contributions! See docs/CONTRIBUTING.md.


๐Ÿค” Why This Repo?

Feature This Repo Typical SLR Papers/GitHub Lists Kaggle Collections
Datasets 73+ curated 5โ€“20 mentioned inline 10โ€“30, unverified
Sign Languages 26 1โ€“5 3โ€“10
URL Verification โœ… All checked โŒ Often broken links โŒ Mixed
Sample Counts From original sources Inconsistent User-reported
License Info โœ… Per dataset Rarely included Rarely included
Modality Tags โœ… All datasets Partial Tags vary
Tools & Scripts โœ… Included โŒ โŒ
Demo Datasets โœ… Included โŒ โŒ
Open Source CC BY 4.0 Varies Varies
Actively Maintained โœ… Usually one-time Community

Rules:

  1. Every dataset must have a verifiable source URL
  2. Sample counts must come from the original source
  3. Include license and citation information
  4. No placeholder or fabricated data โ€” ever

๐Ÿ“– Citation

If you use this dataset catalog in your research, please cite:

@misc{signlanguage_dataset_hub,
  title   = {Sign Language Dataset Hub: A Curated Catalog of 73+ Verified Datasets for 26 Sign Languages},
  author  = {Sarker, Rudra},
  year    = {2025},
  url     = {https://github.com/rudra496/SignLanguage-Dataset-Hub},
  note    = {Version 1.0}
}

๐Ÿ“„ License

This repository is licensed under CC BY 4.0.

Individual datasets have their own licenses โ€” see docs/LICENSE_ATTRIBUTION.md for details. Some datasets are research-use only and may require institutional agreements.


๐Ÿ™ Acknowledgments

This hub would not be possible without the researchers and organizations who created and shared these datasets:


๐Ÿ“ž Contact


๐ŸŒ Connect

GitHub LinkedIn X Facebook YouTube Dev.to ResearchGate


Built with โค๏ธ by Rudra Sarker
CC BY 4.0 License ยท Free & Open Source Forever