What Is a Vector Database — And Why Every AI Engineer Should Understand It
If you're building with LLMs and you don't understand vector databases yet…
You're building on magic you don't control.
Let's fix that.
If you haven't read it yet — What Is an Embedding in AI? is a good starting point before this one.
First: What Are We Even Storing?
When you build a RAG pipeline, a lot happens under the hood that most tutorials just gloss over:
- Text gets converted into embeddings
- Embeddings get stored somewhere
- Your query gets converted into an embedding too
- Similar vectors get retrieved
- That context gets sent to the LLM
But here's the question nobody stops to ask enough — where do embeddings actually live?
Not in a traditional database the way you'd think. Because embeddings aren't strings, labels, or IDs.
They're high-dimensional vectors:
[0.021, -0.88, 0.445, ..., 0.19] // 1536 dimensions
That's not something a SQL index was designed for.
Why Traditional Databases Fall Short
Relational databases are incredibly good at what they do:
- Exact matches
- Range queries
- Indexed lookups on structured data
SELECT * FROM users WHERE email = 'np@example.com'
But in AI systems, we're not asking "find an exact match." We're asking — "find semantically similar content."
That's a geometric problem. Not a relational one.
And no amount of clever indexing on a Postgres table is going to make that fast at scale.
Enter: Vector Databases
A vector database is purpose-built to:
- Store embeddings efficiently
- Index high-dimensional vectors
- Perform fast similarity search
- Retrieve the nearest neighbors to a given query
Instead of WHERE name = 'AI', it does:
Find the top-k closest vectors to this embedding.
And that "closeness" is measured using things like:
- Cosine similarity — angle between vectors
- Dot product — directional alignment
- Euclidean distance — straight-line distance in vector space
This is what powers semantic search, RAG pipelines, recommendation systems, and context-aware AI assistants. Basically, everything interesting in modern AI.
What's Actually Happening Under the Hood?
You can't brute-force compare millions of vectors on every query — that would be painfully slow.
So vector databases use techniques like:
- Approximate Nearest Neighbor (ANN) search
- HNSW graphs (Hierarchical Navigable Small World)
- Optimized indexing structures built specifically for high-dimensional space
Think of it this way: instead of scanning the entire vector space, the database quickly zooms into the most promising region. That's why retrieval feels instant even with millions of embeddings stored.
It's clever engineering under what feels like magic.
Why This Matters for AI Engineers
Here's where it gets real. If you don't understand vector search:
- You won't know why your RAG is hallucinating
- You won't know how to tune similarity thresholds
- You won't know when retrieval is weak vs. when the LLM is the problem
- You won't design proper chunking strategies for your knowledge base
Vector databases aren't just storage. They're the backbone of contextual intelligence in LLM systems. Get them wrong and your AI is just a confident guesser with no memory.
The Bottom Line
LLMs generate. Vector databases retrieve. Together, they simulate something that feels like understanding.
Separately, they're just math.
If you're building AI systems in 2026, understanding vector search isn't optional — it's foundational. The engineers who really get this layer are the ones who can debug retrieval failures, tune relevance, and build systems that don't just work but work reliably.
That's the kind of AI engineering I'm interested in.
Next up — I wrote about chunking strategies and why getting this wrong quietly kills your RAG quality. Read it here →
— Cheers, NP