I’ve been struggling for the past few weeks to understand what vector databases are, and why is it a popular buzzword in AI.
Every article, video, or course I’ve found on vector databases has left me confused after the first few sentences. There’s a lot of jargon like “vectors”, “embeddings”, “RAG”, “semantic search”, etc., and it’s hard to understand vector databases without first knowing what these are.
Let’s start at the beginning.
Typical (Non-Vector) Databases
A typical database, is either an SQL database or a NoSQL database. SQL databases store tabular data, which looks similar to an Excel Spreadsheet. It has rows and columns with data inside, like this list of contacts:
| Name | Email |Age |
|--------|------------------|----|
| Janac | janac@gmail.com | 30 |
| John | john@gmail.com | 20 |
| Juan | juan@gmail.com | 41 |
NoSQL databases store data in a format called JSON. Here’s the equivalent of the above table in JSON:
[
{
name: "janac",
email: "janac@gmail.com",
age: 30
},
{
name: "john",
email: "john@gmail.com",
age: 20
},
{
name: "juan",
email: "juan@gmail.com",
age: 41
}
]
There are dozens of other complex types of databases that are optimized for a specific purpose, like time series, graph-db, kdb, GIS, etc. but we won’t be covering those in this…