Vector Databases
collection: like a table in sql, each has its own dimension, indexes,
point: row/record to store vector + payload
payload: metadata attached to a vector
filtering: sql styled conditions on payload
indexing: organizing vectors for fast search
vectors
// A sparse vector with 4 non-zero elements
{
"indexes": [1, 3, 5, 7],
"values": [0.1, 0.2, 0.3, 0.4]
}
the sparse vectors contains many zeroes, they are stored separately in qdrant special storage.
payload types
Approximate COrdinate-based Nearest Neighbor (ACORN) search algorithm: has a similar recall but has a very low memory usage and update cost as compared to HNSW
we can also group the documents by document_id via groups API
without grouping
1. chunk 5 of doc A
2. chunk 8 of doc A
3. chunk 3 of doc A
4. chunk 12 of doc A
5. chunk 1 of doc A
LLM gets nearly identical contexts
with grouping by doc_id
1. best chunk from doc A
2. best chunk from doc B
3. best chunk from doc C
4. best chunk from doc D
5. best chunk from doc E
LLM gets diverse knowledge
ingesting same document metadata for millions of chunks can waste storage, you can create duplicated metadata inside a collection and reference it in your queries to space space its called lookup in groups
dont index if your points are less than 5k
create indexes (lookup tables) for metadata to get faster results for each query type
cardinality = how many points will match my filter
if cardinality is 10 → brute force
if 1000 → brute force
if 100,000 → ANN
if 10,000,000 → ACORN + heavy indexing
Recommendations API
we can also get recommendations via multiple algorithms defined in qdrant. (average_vector, best_score, sum_scores)
postiive vectors = things the user has liked
negative vectors = things the user dislikes
Discovery API
concept of context: set of positive and negative pairs, each pair divides the space into positive and negative zones
discovery search: inputs context pairs and a target vector target vector = what you ultimately want to find like context pairs = positive negative pairs
context search: you dont provide target, just context pairs, returns points that are in best positive zones
Hybrid search: uses dense and sparse retrieval to get the most relevant documents
Maximal Marginal Relevance (MMR): without MMR top k results might be super identical to each other, if you want a little diversity, you can add MMR tweak the diversity parameter [0, 1] to tweak the tradeoff between diversity | relevancy
after we have fetched the vectors, we can boost thier scores on the basis of metadata filters by score boosting API
BM25 (Best Matching 25) is a ranking function for text search, it uses sparse vectors that represent documents, each dimension corresponds to a word, Qdrant can generate these sparse embeddings from input text directly on the server
optimizer: some data structures are hard to change individually e.g. indexes. if you change things too often, the database operations can slow down. optimizer is qdrants way of batching changes and cleaning up segments to maintain performance.
Indexing: process of building a data structure over vectors to quickly find nearest neighbors without scanning every vector
Tokenizer: a tokenizer is an algorithm that splits text into smaller pieces called tokens so Qdrant can index and search efficiently
factors affecting tokenization:
Embedding generation pipeline
"I love AI"
→ [101, 1045, 2293, 9932, 102]
b. Token Embedding Lookup Step: Each token ID is mapped to a vector from embedding table Output: Sequence of vectors
(5 tokens) → (5, D)
e.g. → (5, 384) or (5, 3072)
c. Positional Encoding Addition Step: Add position information so order matters Output: Position-aware token embeddings
(5, 384) → (5, 384) with order info injected
d. Transformer Layers (Contextualization) Step: Self-attention mixes information across tokens Output: Context-aware token embeddings
(5, 384) → (5, 384) but now each token “understands” others
e. Pooling (Sentence Compression) Step: Combine token embeddings into one vector Output: Single sentence embedding
(5, 384) → (384,)
f. (Optional) Normalization Step: Normalize vector length (L2 norm usually) Output: Final embedding used for similarity search
(384,) → normalized (384,)