
Lalals
AI-powered audio tools for music creation, voice manipulation, and audio enhancement.

Professional-grade polyphonic piano transcription with high-fidelity onset and velocity detection.

Onsets and Frames is a state-of-the-art automatic music transcription (AMT) model developed by the Google Magenta team. Built on a sophisticated neural network architecture, it specifically addresses the 'onset-offset' problem in polyphonic music transcription. By utilizing separate heads for detecting the beginning of notes (onsets) and the duration (frames), the system achieves significantly higher precision than traditional frame-based classifiers. In 2026, it remains the industry benchmark for piano transcription, utilizing a combination of Convolutional Neural Networks (CNNs) for feature extraction and Recurrent Neural Networks (LSTMs or Transformers in newer iterations) for temporal modeling. The model also regresses note velocity, allowing it to capture the expressive dynamics of a performance. This architecture effectively mitigates the common error where long notes are fragmented into multiple short ones. It is primarily distributed via the Magenta library and TensorFlow, making it a favorite for developers building DAW plugins, music education platforms, and digital archival tools that require high-accuracy conversion of acoustic audio into editable MIDI data.
Onsets and Frames is a state-of-the-art automatic music transcription (AMT) model developed by the Google Magenta team.
Explore all tools that specialize in polyphonic piano transcription. This domain focus ensures Onsets and Frames delivers optimized results for this specific requirement.
Explore all tools that specialize in note velocity estimation. This domain focus ensures Onsets and Frames delivers optimized results for this specific requirement.
Explore all tools that specialize in midi score generation. This domain focus ensures Onsets and Frames delivers optimized results for this specific requirement.
Open side-by-side comparison first, then move to deeper alternatives guidance.
Uses a dedicated loss term for the start of notes to prevent temporal blurring.
Predicts a MIDI velocity value (0-127) for every detected note onset.
While optimized for piano, the architecture can be re-trained for drums and other percussive instruments.
Supports GAN-based training cycles to improve realism in low-quality audio conditions.
Combines 2D Convolutions with bidirectional LSTMs for spatial-temporal accuracy.
Processes audio via Log-mel spectrograms with high frequency resolution.
Weights can be quantized and exported for mobile and browser-based real-time transcription.
Install Python 3.10+ environment.
Install TensorFlow and Magenta library via pip.
Download the pre-trained 'onsets_frames_transcription' checkpoints from Google Cloud Storage.
Prepare a high-quality 16kHz mono WAV file of a piano performance.
Configure the transcription script parameters (threshold, frame-stacking).
Run the inference command-line tool on the target audio file.
Analyze the generated .mid file in a DAW or MIDI visualizer.
Optional: Fine-tune the model on custom datasets using the provided training scripts.
Export the model to TFLite for edge-device deployment if required.
Integrate the Python wrapper into your application backend.
All Set
Ready to go
Verified feedback from other users.
“Highly praised by the research community for its breakthrough in note-on precision and expressive velocity capture.”
Post questions, share tips, and help other users.

AI-powered audio tools for music creation, voice manipulation, and audio enhancement.

The industry-standard, high-fidelity MP3 encoding engine for precision audio compression.

The industry-standard monophonic vocal transformer for pitch, formant, and saturation.

AI-powered voice isolation for crystal-clear communication in any environment.

Professional-grade AI music source separation and stem extraction for producers and DJs.

Lightweight, open-source noise gate for zero-latency audio suppression.