Vector Databases

May 11, 2026Vector Databases7 min read

// A sparse vector with 4 non-zero elements
{
    "indexes": [1, 3, 5, 7],
    "values": [0.1, 0.2, 0.3, 0.4]
}

1. chunk 5 of doc A
2. chunk 8 of doc A
3. chunk 3 of doc A
4. chunk 12 of doc A
5. chunk 1 of doc A

with grouping by doc_id
1. best chunk from doc A
2. best chunk from doc B
3. best chunk from doc C
4. best chunk from doc D
5. best chunk from doc E

"I love AI"
→ [101, 1045, 2293, 9932, 102]

(5 tokens) → (5, D)
e.g. → (5, 384) or (5, 3072)

(5, 384) → (5, 384) with order info injected

(5, 384) → (5, 384) but now each token “understands” others

(5, 384) → (384,)

(384,) → normalized (384,)

Embedding generation pipeline