LJ Speech Dataset

LJ Speech Dataset | findAIList | Find AI List

Overview

LJ Speech is a foundational public domain speech dataset released by Keith Ito in 2017, which remains the 'gold standard' benchmark for evaluating single-speaker neural text-to-speech (TTS) models in 2026. The dataset consists of 13,100 short audio clips of a single female speaker reading passages from seven non-fiction books. Technically, the collection provides approximately 24 hours of audio recorded at 22,050 Hz in 16-bit mono PCM, accompanied by normalized and non-normalized transcriptions in a CSV format. Its significance in the AI market lies in its role as a control variable; because the recording environment and speaker characteristics are consistent, researchers use it to isolate the performance of new architectures like Tacotron 2, FastSpeech, and HiFi-GAN. In 2026, it serves as the primary baseline for zero-shot cross-lingual transfer learning and as a pre-training corpus for more complex multi-speaker generative models. The Public Domain (CC0) status ensures it remains the most legally frictionless dataset for commercial and academic AI development.

Common tasks

Acoustic Model Training Vocoder Benchmarking Automatic Speech Recognition (ASR) Training Prosody Transfer Research

FAQ

View all

Can I use LJSpeech for commercial products?

Yes, it is in the public domain (CC0), meaning you can use it for any commercial purpose without paying royalties or providing attribution.

What is the sampling rate of the audio?

The audio is recorded at 22,050 Hz, which is the standard for most neural TTS research.

Is the dataset balanced for all phonemes?

While it is comprehensive for standard English, it follows the distribution of the non-fiction books read, so some rare phonemes may be underrepresented.

How large is the download?

The compressed archive is approximately 2.6 GB; once extracted, it takes up roughly 3.8 GB of disk space.

FAQ+