Create a Semantic Search API with FastAPI, Sentence-BERT, and PostgreSQL pgvector
In this post, I’ll walk you through how I built a semantic similarity search API using FastAPI, Sentence-BERT (SBERT), and PostgreSQL with the pgvector extension.
This project began as a proof of concept (POC) to explore how seamlessly we can integrate deep learning–based text embeddings into a traditional relational database for fast, contextual search — without introducing an external vector database or adding unnecessary complexity to the existing tech stack. Link to Repo - kiransabne04/fastapi-sbert-pgvector-similarity: A FastAPI-based text similarity and semantic search API using Sentence-BERT (all-MiniLM-L6-v2) with PostgreSQL + pgvector for vector storage and similarity matching.
Why Semantic Search?
Traditional keyword search only matches exact terms.
Semantic search, on the other hand, understands context — for example, the phrases:
“How do I reset my password?”
and
“Forgot my login credentials.”
mean the same thing, even though they use completely different words.
That’s what Sentence-BERT (SBERT) enables — it transforms sentences into numerical embeddings that capture semantic meaning.
Once we have those embeddings, we can use vector similarity (like cosine similarity) to find text with similar meaning.
High Overview
Here’s the high-level flow of the system we built:
FastAPI serves REST endpoints for inserting and searching text.
Sentence-BERT generates a 384-dimensional vector for each text.
PostgreSQL with pgvector stores these embeddings and performs fast similarity queries using vector math.
Tech Stack
| Component | Description |
| FastAPI | Web framework for the API |
| Sentence-Transformers | Generates text embeddings (SBERT) |
| PostgreSQL 15 + pgvector | Stores embeddings and runs similarity search |
| Uvicorn | ASGI server for FastAPI |
| Docker Compose | Spins up Postgres with vector support |
Model used:
all-MiniLM-L6-v2— lightweight, accurate, and great for quick experimentation.
Here’s how the key pieces fit together.
Sentence-BERT Embeddings
Using Hugging Face’s sentence-transformers library, each text is transformed into a 384-dimensional embedding vector:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
text = "A forgotten attic filled with old memories"
embedding = model.encode(text)
print(len(embedding)) # 384
PostgreSQL + pgvector
To store and search these embeddings efficiently, we enable the pgvector extension.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE items (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
description TEXT NOT NULL,
embedding VECTOR(384),
created_at TIMESTAMP DEFAULT NOW()
);
We then create a vector index to speed up similarity queries:
CREATE INDEX items_embedding_idx
ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
FastAPI Endpoints
/insert_item
Accepts multiple text items and inserts their embeddings into PostgreSQL.
@app.post('/insert_item')
def insert_items(request: ItemRequest):
embeddings = [embedder.encode(i.description) for i in request.item_requests]
# Store in DB as vector
/find_similar
Finds top-N most similar descriptions by comparing embeddings using pgvector’s cosine similarity operator <=>. There are other similarity operators as well for you to explore.
SELECT title, description, 1 - (embedding <=> %s::vector) AS similarity
FROM items
ORDER BY similarity DESC
LIMIT 5;
The result: a list of text items with semantic similarity scores.
Example in Action
Input Text
“The smell of aged paper and leather in a quiet bookstore.”
Output
{
"similar_description": [
{
"title": "A vintage bookstore",
"description_text": "The bookstore smelled of aged paper and leather...",
"similarity": 0.86,
"similarity_percent": 86.42
},
{
"title": "A forgotten attic",
"description_text": "The air in the attic hung heavy with the scent of forgotten things...",
"similarity": 0.74,
"similarity_percent": 74.10
}
]
}
That’s semantic similarity in action. You can further improve/optimize it based on requirement and other pre-steps.
Alternatives to SBERT
Few other alternatives are
intfloat/e5-large-v2
nomic-ai/nomic-embed-text-v1
OpenAI Embeddings (
text-embedding-3-large)Cohere Embeddings (
embed-multilingual-v3.0)Sentence-T5 or Universal Sentence Encoder (USE)
Thoughts:
This little POC with proper architecture & implementation has worked wonder for one of use cases. With just a few hundred lines of Python and SQL, you can build a real semantic search engine — no external AI infrastructure required.
If you’re exploring NLP, information retrieval, or vector databases, this is one of the best starting points you can build on.