pgvector in production: lessons learned

I’ve argued more than once that pgvector — Postgres’s vector extension — is the right default for adding semantic search, and that you should reach for a dedicated vector database only when a measured limit forces it. I still believe that. But “use pgvector” is the easy advice; running it in production is where the lessons live, and they’re the things the introductory tutorials skip. After operating pgvector under real load, here’s what I wish I’d known going in: how to choose and tune its index, the recall-versus-speed trade-off you can’t avoid, the discipline of keeping embeddings in sync with their source data, and the operational realities that only surface at scale. This is the practical sequel to the introduction.

The index is the whole performance story

The first and biggest lesson: without the right index, pgvector doesn’t scale, and choosing the index is the central decision. A vector query without an index does an exact nearest-neighbour search — it compares the query vector against every row, which is accurate but linear in your data size and quickly becomes far too slow as the table grows. The fix is an approximate-nearest-neighbour (ANN) index, and pgvector gives you two, with a genuine trade-off between them:

HNSW (Hierarchical Navigable Small World). Generally the better choice for production query performance: it gives excellent query speed and recall. The costs are that it’s slower and more memory-intensive to build, and the index itself is larger. For most production read-heavy workloads, these build-time costs are worth paying for the query-time performance.
IVFFlat. Faster to build and smaller, but generally gives lower query performance than HNSW, and — a real gotcha — its quality depends on being built after you have a representative amount of data, because it partitions the vector space into lists based on the data present at build time. Build it on an empty or tiny table and it performs poorly.

For most production cases I’ve landed on HNSW, accepting the slower build for the better queries. But the real lesson is that this is a decision you must make consciously — the index choice and its parameters are not a detail, they are your vector-search performance, and leaving it to a default or skipping the index entirely is the number-one way pgvector “doesn’t scale” for people who then wrongly blame the extension.

Recall versus speed is a dial you set

The second lesson the tutorials gloss over: ANN indexes are approximate, and you actively control the trade-off between accuracy (recall — how often the index finds the true nearest neighbours) and speed. This isn’t automatic; it’s a dial you set, via the index’s build and query parameters:

HNSW has parameters governing how thoroughly the index is built and how widely a query searches. Searching more thoroughly raises recall but costs query time; searching less is faster but may miss some true matches.
IVFFlat has a parameter for how many partitions a query probes — more probes, higher recall, slower query.

The practical discipline is to tune these against your actual data and your real recall requirements, not to accept defaults blindly. How much recall you need is application-dependent: a “find similar articles” feature can tolerate missing an occasional match, while a feature where a missed result is a real failure needs the dial turned toward accuracy. The point is that you decide where on the speed/accuracy curve to sit, and you should decide it deliberately by measuring recall on your own data — not discover after launch that the defaults put you somewhere you didn’t want.

Keeping embeddings in sync

A lesson that has nothing to do with vectors specifically and everything to do with production correctness: your embeddings must stay in sync with the source data they represent. An embedding is derived from some text — a document, a description, a record. When that source changes, its embedding is now stale and the search results based on it are wrong. The discipline:

Re-embed on change. When the source text is updated, regenerate its embedding and update the vector. Wire this into the data’s update path so it can’t be forgotten — ideally so that changing the source and updating the embedding happen together.
Handle the embedding cost. Generating embeddings is itself an API call (slow, metered), so for bulk updates or re-embeds this belongs in a background job, batched, not inline in a request — the same slow-external-call discipline as everywhere else.
Watch for drift. Over time, embeddings can get out of sync through missed updates or bugs. A periodic reconciliation — or at least monitoring for source rows whose embedding is older than their last update — catches the drift before it quietly degrades search quality.

This is exactly the consistency concern that, in the single-database pgvector setup, is so much easier than with a separate vector store — but “easier” isn’t “automatic.” You still have to keep the embedding current with its source; pgvector just lets you do it in the same transaction, which is a real advantage you should actually use.

The operational realities

A handful of things that only show up once pgvector is carrying real production load:

Index builds are heavy. Building an HNSW index on a large table takes real time and memory, and can impact the database while it runs. Plan index builds (and rebuilds) like the significant operations they are — off-peak, with capacity headroom — rather than running one casually on a loaded production database.
Vectors are large; storage adds up. Embedding vectors have hundreds or thousands of dimensions, and at scale the storage (and the index size on top of it) is substantial. Account for it in capacity planning; it’s easy to underestimate how much room a few million embeddings plus their HNSW index consume.
It shares the database with everything else. Your vector queries and index run on the same Postgres serving the rest of your app, so a heavy vector workload is load on your primary database. For most apps this is fine and is exactly the simplicity benefit — but watch it, and know that the option exists to give vectors their own database if they start crowding out everything else.
Measure query performance with production-shaped data. Vector query performance depends heavily on data size and the index, so benchmark with realistic volumes. Performance that’s fine on ten thousand vectors can change character at ten million, and you want to learn that before your users do.

Verdict

pgvector remains the right default for vector search, but running it well in production is a real discipline the introductions skip. The biggest lesson is that the index is the whole performance story — without an ANN index pgvector does a linear scan and doesn’t scale, and choosing between HNSW (better queries, heavier and slower to build — usually right for production) and IVFFlat (lighter, but lower performance and quality that depends on building after you have representative data) is a conscious decision, not a detail. ANN indexes are approximate, so recall versus speed is a dial you set via index parameters and must tune against your own data and your real accuracy needs, not accept blindly. Your embeddings must stay in sync with their source — re-embed on change, do bulk embedding in background jobs, and watch for drift — which pgvector makes easier than a separate store by letting you do it in one transaction, an advantage you should use. And the operational realities bite at scale: index builds are heavy, vectors consume real storage, the workload shares your primary database, and you must benchmark with production-shaped data. None of this changes the verdict that pgvector is the sensible default — it sharpens it, because the people for whom pgvector “doesn’t scale” are almost always the ones who skipped the index, never tuned recall, or let embeddings drift. Do those three things deliberately, plan the operational realities, and pgvector carries real production vector search comfortably — which is exactly why it’s the default worth starting from.