PostgreSQL Parameter Tuning for 100M+ Rows: shared_buffers, work

When you’re running PostgreSQL with 100M+ rows, the difference between “it works” and “it flies” is all about tuning—but not the generic, copy-paste advice you get everywhere. Here’s the advanced, battle-tested checklist I’d give to serious DBAs and developers working at scale.

The Hidden Checklist You Actually Need

1. Understand Your Workload First

Not a setting, but non-negotiable.
Ask yourself:

OLTP or OLAP?
Read-heavy or write-heavy?
Bulk inserts? Lots of small transactions?
Real-time queries or batch processing?

Every parameter depends on this.

2. `shared_buffers` – But Don’t Max It Out

Common advice: set shared_buffers = 25% of RAM
Better:
1. If your queries are heavy on joins and large scans: go higher (30–40%)
2. For many connections and small transactions: stay conservative (15–20%)
3. If using a connection pooler: you can afford a bit more here

Rule of Thumb: benchmark at 25%, then test up/down in 5% steps.

3. `work_mem` – The Secret Weapon for Big Joins

This controls sort/hash memory per operation, not per query.
Default is tiny. Too tiny.

Tune it dynamically:

SET work_mem = '256MB';  -- per session

But for config:

work_mem = 64MB   # OLAP
work_mem = 4-16MB # OLTP

Use EXPLAIN ANALYZE → look for “disk” in sort/hash ops. If you see that? Bump it.

4. `maintenance_work_mem` – Bulk Ops Love This

Used for VACUUM, CREATE INDEX, etc.
1. Default is too low (64MB).
2. Crank it up when running vacuums manually:

maintenance_work_mem = 1GB

If you're indexing 100M rows, go 2–4GB (if RAM allows).

5. `effective_cache_size` – Inform the Planner

Doesn't use memory, just tells the planner how much OS cache is available.
Set it to 50–75% of total RAM:

effective_cache_size = 24GB  # on a 32GB machine

Helps avoid bad nested loop plans on big tables.

6. Autovacuum Tuning – The Untold Bottleneck

Massive tables require aggressive VACUUM tuning, otherwise bloat kills you.

In postgresql.conf or via ALTER TABLE:

autovacuum_vacuum_threshold = 1000
autovacuum_vacuum_scale_factor = 0.01
autovacuum_analyze_scale_factor = 0.005
autovacuum_max_workers = 5
autovacuum_naptime = 10s
autovacuum_vacuum_cost_limit = 2000

For HOT update-heavy workloads, consider lower thresholds + more workers.

7. Parallelism Parameters – Scale Joins + Aggregates

Enable Postgres to use parallel query features.

max_parallel_workers = 8
max_parallel_workers_per_gather = 4
parallel_tuple_cost = 0.1
parallel_setup_cost = 1000

Lower parallel_tuple_cost and parallel_setup_cost to encourage parallelism.

8. `random_page_cost` & `seq_page_cost` – For SSD Optimization

If on SSD (you should be), reduce these to reflect reality.

random_page_cost = 1.1
seq_page_cost = 1.0

Default assumes spinning disks. SSDs have less cost difference between random vs sequential access.

9. Connection Limits – Don’t Overload

Too many active connections will kill performance.

Use a connection pooler like PgBouncer:

max_connections = 100  # keep it low

Let PgBouncer manage thousands of app connections.

10. Logging for Insightful Tuning

Turn on query logging to find bad queries:

log_min_duration_statement = 1000  # ms
log_checkpoints = on
log_autovacuum_min_duration = 0
log_temp_files = 0

Then mine the logs and use pg_stat_statements for real performance work.

Bonus: Table/Index-Level Tricks

Use BRIN indexes for append-only, timestamped data
Use partial indexes if only part of the data is queried often
Consider UNLOGGED tables for transient data (faster inserts, no WAL)
Use pg_repack to reclaim bloat without locking tables
Implement Partitioning & Data Archival
Smart Indexing & Monitoring index bloat regularly with pgstattuple
Tablespaces, Compression & Storage
Logical/Native Replication to scale out
Aggressive Monitoring, Debugging & Benchmarking

TL;DR – The Advanced Tune Checklist

Parameter	Suggested Value	Notes
shared_buffers	25–40% RAM	Don’t go too high
work_mem	8MB–256MB	Depends on queries
maintenance_work_mem	1–4GB	For index/vacuum
effective_cache_size	~75% RAM	Informs planner
autovacuum_*	Aggressive	Keep bloat down
max_connections	≤ 100	Use PgBouncer
parallel_workers_*	4–8	Enable parallel queries
random_page_cost	1.1 (SSD)	Lower for SSDs

If you're managing 100M+ rows, this is not optional anymore. It’s engineering. And it works. Add your scaling experience in the comments below.

PostgreSQL Parameter Tuning for 100M+ Rows

The Hidden Checklist You Actually Need

1. Understand Your Workload First

2. `shared_buffers` – But Don’t Max It Out

3. `work_mem` – The Secret Weapon for Big Joins

4. `maintenance_work_mem` – Bulk Ops Love This

5. `effective_cache_size` – Inform the Planner

6. Autovacuum Tuning – The Untold Bottleneck

7. Parallelism Parameters – Scale Joins + Aggregates

8. `random_page_cost` & `seq_page_cost` – For SSD Optimization

9. Connection Limits – Don’t Overload

10. Logging for Insightful Tuning

Bonus: Table/Index-Level Tricks

TL;DR – The Advanced Tune Checklist

Comments

More from this blog

How do you write efficient pagination for millions of rows?

Understanding and Solving Cache Stampede: The Invisible Threat to Databases

Create a Semantic Search API with FastAPI, Sentence-BERT, and PostgreSQL pgvector

PostgreSQL Indexing: When BRIN Is a Better Choice Than B-Tree

Command Palette

The Hidden Checklist You Actually Need

1. Understand Your Workload First

2. shared_buffers – But Don’t Max It Out

3. work_mem – The Secret Weapon for Big Joins

4. maintenance_work_mem – Bulk Ops Love This

5. effective_cache_size – Inform the Planner

6. Autovacuum Tuning – The Untold Bottleneck

7. Parallelism Parameters – Scale Joins + Aggregates

8. random_page_cost & seq_page_cost – For SSD Optimization

9. Connection Limits – Don’t Overload

10. Logging for Insightful Tuning

Bonus: Table/Index-Level Tricks

TL;DR – The Advanced Tune Checklist

Comments

More from this blog

2. `shared_buffers` – But Don’t Max It Out

3. `work_mem` – The Secret Weapon for Big Joins

4. `maintenance_work_mem` – Bulk Ops Love This

5. `effective_cache_size` – Inform the Planner

8. `random_page_cost` & `seq_page_cost` – For SSD Optimization