The previous essay in this series highlighted how vector search is essential in many applications that require precise and fast responses
Vector search methods sometimes perform poorly due to memory and computation strain caused by high vector dimensionality
Intel's LeanVec use dimensionality reduction and vector quantisation to speed up vector search on large vectors while maintaining accuracy in out-of-distribution queries
This capability allows applications to search massive vector collections for semantically meaningful results by finding the closest neighbors to a query vector
The queries are out-of-distribution (OOD) when the database and query vector statistical distributions diverge, making vector compression harder
The first is cross-modal querying, where a user queries one modality to return relevant elements from another. Word queries help text2image find thematically similar images.
For a query-agnostic method like PCA, projecting the database (𝒳) and query (Q) vectors onto the first principal axis (large green arrow) is recommended
To speed up similarity search for deep learning embedding vectors, LeanVec approximates the inner product of a database vector x and a query q
LeanVec learns DRquery and DRDB from data using novel mathematical optimization methodologies
LeanVec improves SVS performance, exceeding the top open-source version of a top-performing algorithm (HNSWlib)