Overview
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings designed for efficient nearest neighbor search. It constructs large, read-only, file-based data structures that are memory-mapped, enabling multiple processes to share the same data. Annoy supports Euclidean, Manhattan, cosine, Hamming, and Dot (Inner) Product distances. It's particularly useful when dealing with high-dimensional data (up to 1,000 dimensions), such as vector representations of users/items in recommendation systems. The library decouples index creation from loading, allowing indexes to be easily shared and distributed. It minimizes memory footprint and is suitable for applications where memory usage is a prime concern. Annoy allows building indexes on disk to handle datasets that do not fit into memory. It’s used at Spotify for music recommendations by finding similar users/items based on vector representations derived from matrix factorization algorithms.
