
LM Studio
Discover, download, and run any local LLM on your machine with total privacy and hardware acceleration.
The industry-standard open-source interface for running, training, and deploying local Large Language Models.

Oobabooga Text Generation WebUI is a highly flexible Gradio-based interface designed to serve as the definitive hub for local LLM operations. In the 2026 landscape, it remains the primary vehicle for 'Sovereign AI,' allowing users to run models ranging from small-scale Llama variants to massive trillion-parameter architectures through sophisticated quantization backends. Technically, it functions as a wrapper for multiple inference engines including Transformers, llama.cpp, ExLlamaV2, AutoGPTQ, and AutoAWQ. Its architecture is modular, supporting a robust extension ecosystem that enables multimodal capabilities, speech-to-text, and long-term memory management. By decoupling the UI from the inference engine, it provides a unified control plane for parameter tuning—controlling temperature, top-p, and repetition penalty—while facilitating the injection of custom system prompts and character profiles. For the enterprise, it serves as a critical prototyping environment for testing model performance before committing to cloud-scale deployments, ensuring zero data leakage by operating entirely within air-gapped or local environments.
Oobabooga Text Generation WebUI is a highly flexible Gradio-based interface designed to serve as the definitive hub for local LLM operations.
Explore all tools that specialize in local llm inference. This domain focus ensures Oobabooga Text Generation WebUI delivers optimized results for this specific requirement.
Explore all tools that specialize in lora/qlora fine-tuning. This domain focus ensures Oobabooga Text Generation WebUI delivers optimized results for this specific requirement.
Explore all tools that specialize in character roleplay. This domain focus ensures Oobabooga Text Generation WebUI delivers optimized results for this specific requirement.
Explore all tools that specialize in api hosting. This domain focus ensures Oobabooga Text Generation WebUI delivers optimized results for this specific requirement.
Explore all tools that specialize in quantization testing. This domain focus ensures Oobabooga Text Generation WebUI delivers optimized results for this specific requirement.
Supports Transformers, llama.cpp, ExLlamaV2, AutoGPTQ, AutoAWQ, and Hidet engines simultaneously.
Includes a built-in UI for training Low-Rank Adaptations (LoRA) using QLoRA techniques.
Uses a smaller draft model to predict tokens, which are then validated by the larger target model.
Forces the LLM to output specific formats (like valid JSON) using GBNF grammars.
A plugin system that allows the community to add features like STT/TTS and Long-term Memory.
Capability to load a primary model and a secondary encoder or vision model (e.g., CLIP).
A non-chat interface designed for creative writing and long-form content generation.
Install Python 3.11+ and Git for your operating system.
Clone the official repository from GitHub using the 'git clone' command.
Run the 'start_linux.sh', 'start_windows.bat', or 'start_macos.sh' script to initiate the automated installer.
Select your GPU manufacturer (NVIDIA, AMD, Apple Silicon, or CPU-only) when prompted.
Wait for the environment setup to complete, which installs the necessary Torch and CUDA libraries.
Access the WebUI via the provided local URL (typically http://127.0.0.1:7860).
Navigate to the 'Model' tab and paste a Hugging Face repository ID to download a model.
Select the appropriate loader (e.g., ExLlamaV2 or llama.cpp) based on the model format.
Click 'Load' to move the model into VRAM/RAM.
Navigate to the 'Chat' or 'Default' tab to begin inference.
All Set
Ready to go
Verified feedback from other users.
“Users praise its versatility and the ability to run almost any LLM locally, though some find the UI density overwhelming.”
Post questions, share tips, and help other users.