Semantic Search with FastAPI & pgvector

In this post, I’ll walk you through how I built a semantic similarity search API using FastAPI, Sentence-BERT (SBERT), and PostgreSQL with the pgvector extension.
This project began as a proof of concept (POC) to explore how seamlessly we can integrate deep learning–based text embeddings into a traditional relational database for fast, contextual search — without introducing an external vector database or adding unnecessary complexity to the existing tech stack. Link to Repo - kiransabne04/fastapi-sbert-pgvector-similarity: A FastAPI-based text similarity and semantic search API using Sentence-BERT (all-MiniLM-L6-v2) with PostgreSQL + pgvector for vector storage and similarity matching.

Why Semantic Search?

Traditional keyword search only matches exact terms.
Semantic search, on the other hand, understands context — for example, the phrases:

“How do I reset my password?”
and
“Forgot my login credentials.”

mean the same thing, even though they use completely different words.

That’s what Sentence-BERT (SBERT) enables — it transforms sentences into numerical embeddings that capture semantic meaning.
Once we have those embeddings, we can use vector similarity (like cosine similarity) to find text with similar meaning.

High Overview

Here’s the high-level flow of the system we built:

FastAPI serves REST endpoints for inserting and searching text.
Sentence-BERT generates a 384-dimensional vector for each text.
PostgreSQL with pgvector stores these embeddings and performs fast similarity queries using vector math.

Tech Stack

Component	Description
FastAPI	Web framework for the API
Sentence-Transformers	Generates text embeddings (SBERT)
PostgreSQL 15 + pgvector	Stores embeddings and runs similarity search
Uvicorn	ASGI server for FastAPI
Docker Compose	Spins up Postgres with vector support

Model used:

all-MiniLM-L6-v2 — lightweight, accurate, and great for quick experimentation.

Here’s how the key pieces fit together.

Sentence-BERT Embeddings

Using Hugging Face’s sentence-transformers library, each text is transformed into a 384-dimensional embedding vector:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

text = "A forgotten attic filled with old memories"
embedding = model.encode(text)
print(len(embedding))  # 384

PostgreSQL + pgvector

To store and search these embeddings efficiently, we enable the pgvector extension.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    description TEXT NOT NULL,
    embedding VECTOR(384),
    created_at TIMESTAMP DEFAULT NOW()
);

We then create a vector index to speed up similarity queries:

CREATE INDEX items_embedding_idx
ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

FastAPI Endpoints

`/insert_item`

Accepts multiple text items and inserts their embeddings into PostgreSQL.

@app.post('/insert_item')
def insert_items(request: ItemRequest):
    embeddings = [embedder.encode(i.description) for i in request.item_requests]
    # Store in DB as vector

`/find_similar`

Finds top-N most similar descriptions by comparing embeddings using pgvector’s cosine similarity operator <=>. There are other similarity operators as well for you to explore.

SELECT title, description, 1 - (embedding <=> %s::vector) AS similarity
FROM items
ORDER BY similarity DESC
LIMIT 5;

The result: a list of text items with semantic similarity scores.

Example in Action

Input Text

“The smell of aged paper and leather in a quiet bookstore.”

Output

{
  "similar_description": [
    {
      "title": "A vintage bookstore",
      "description_text": "The bookstore smelled of aged paper and leather...",
      "similarity": 0.86,
      "similarity_percent": 86.42
    },
    {
      "title": "A forgotten attic",
      "description_text": "The air in the attic hung heavy with the scent of forgotten things...",
      "similarity": 0.74,
      "similarity_percent": 74.10
    }
  ]
}

That’s semantic similarity in action. You can further improve/optimize it based on requirement and other pre-steps.

Alternatives to SBERT

Few other alternatives are

intfloat/e5-large-v2
nomic-ai/nomic-embed-text-v1
OpenAI Embeddings (text-embedding-3-large)
Cohere Embeddings (embed-multilingual-v3.0)
Sentence-T5 or Universal Sentence Encoder (USE)

Thoughts:

This little POC with proper architecture & implementation has worked wonder for one of use cases. With just a few hundred lines of Python and SQL, you can build a real semantic search engine — no external AI infrastructure required.

If you’re exploring NLP, information retrieval, or vector databases, this is one of the best starting points you can build on.

Create a Semantic Search API with FastAPI, Sentence-BERT, and PostgreSQL pgvector

Why Semantic Search?

High Overview

Tech Stack

Sentence-BERT Embeddings

PostgreSQL + pgvector

FastAPI Endpoints

`/insert_item`

`/find_similar`

Example in Action

Input Text

Output

Alternatives to SBERT

Thoughts:

More from this blog

How do you write efficient pagination for millions of rows?

PostgreSQL Parameter Tuning for 100M+ Rows

Understanding and Solving Cache Stampede: The Invisible Threat to Databases

PostgreSQL Indexing: When BRIN Is a Better Choice Than B-Tree

Command Palette

Why Semantic Search?

High Overview

Tech Stack

Sentence-BERT Embeddings

PostgreSQL + pgvector

FastAPI Endpoints

/insert_item

/find_similar

Example in Action

Input Text

Output

Alternatives to SBERT

Thoughts:

More from this blog

`/insert_item`

`/find_similar`