Overview
ControlNet is a neural network architecture designed to enhance diffusion models by introducing conditional control over image generation. It operates by creating two copies of the neural network blocks: a 'locked' copy that preserves the original model's weights and a 'trainable' copy that learns specific conditions. This approach allows for fine-tuning with smaller datasets without compromising the integrity of pre-trained diffusion models. The architecture employs 'zero convolutions,' initialized with zeros to prevent distortion during initial training phases. By reusing the Stable Diffusion encoder as a deep, robust backbone, ControlNet effectively learns diverse controls, making it memory-efficient and suitable for training on personal devices. It can be integrated to control Stable Diffusion models, with use cases spanning edge detection, line art generation, and pose estimation.
