Overview
LJ Speech is a foundational public domain speech dataset released by Keith Ito in 2017, which remains the 'gold standard' benchmark for evaluating single-speaker neural text-to-speech (TTS) models in 2026. The dataset consists of 13,100 short audio clips of a single female speaker reading passages from seven non-fiction books. Technically, the collection provides approximately 24 hours of audio recorded at 22,050 Hz in 16-bit mono PCM, accompanied by normalized and non-normalized transcriptions in a CSV format. Its significance in the AI market lies in its role as a control variable; because the recording environment and speaker characteristics are consistent, researchers use it to isolate the performance of new architectures like Tacotron 2, FastSpeech, and HiFi-GAN. In 2026, it serves as the primary baseline for zero-shot cross-lingual transfer learning and as a pre-training corpus for more complex multi-speaker generative models. The Public Domain (CC0) status ensures it remains the most legally frictionless dataset for commercial and academic AI development.
