Pushing Postgres beyond storage

United Kingdomgbvia direct

// Job Type

Full Time

// Salary

Not disclosed

// Posted

2 days ago

About the Role

Pushing Postgres beyond storage

In most systems, the database acts as a boundary. You write data into it, and other systems read from it. If you need something more dynamic, like reacting to changes as they happen, you usually introduce something alongside it, whether that is a service layer, a queue, or a stream.

We ran into a case where that separation became the problem. Some of our databases hold relatively small tables that change very frequently, like configuration and pricing data. These updates sit directly in the path of automated systems, and there can be thousands of processes that need to stay in sync with them.

Polling does not work well in that setup. It either introduces delay or unnecessary load. Putting a service in front of the database has its own issues, because it forces all writes through a single path and changes how people interact with the data. Adding a separate streaming system next to the database creates a different set of trade-offs, introducing another system with its own guarantees and a gap between writing data and observing it.

We ended up keeping the write path simple. Clients write directly to Postgres using normal transactions, without any coordination layer. That avoids introducing another system in the critical path, but it means you need a way to observe what actually changed and distribute that to a large number of consumers. Postgres already has a complete log of those changes, so the problem becomes how to turn that into something usable without breaking the properties you rely on.

What this covers

This post looks at how we turn database writes into a stream of changes, how that maps onto Postgres internals, where the implementation becomes non-trivial, and where the limits show up in practice.

Reading the WAL as a stream

Postgres records every change in the write-ahead log. The WAL exists for durability and recovery, so instead of writing updates directly to disk in random locations, changes are appended sequentially and can be replayed if needed. Logical decoding is the mechanism that turns those WAL records into something applications can consume as a stream of table-level changes.

At the raw storage level, the WAL is not something you can consume directly. It describes physical changes, so you see which bytes changed on which pages rather than something like “this row was updated in this table.” To make it usable, you enable logical decoding, which lets Postgres decode WAL records into inserts, updates, and deletes in an application-specific format.

That gives you something closer to a stream of changes, but transaction semantics still matter. Multiple transactions can be active at the same time, their row changes can happen concurrently, commits happen later, and some transactions roll back entirely.

A useful detail is that a lot of this complexity is already handled inside Postgres. When you build a logical decoding output plugin, Postgres does not just hand you raw interleaved row changes and expect you to reconstruct visibility from scratch. It decodes changes in transaction context, exposes begin and commit boundaries to the plugin, and skips rolled-back work. That means the plugin can focus on translating committed changes into its own protocol. This is the approach we took: we built our own logical decoding plugin and used it to turn decoded changes into the format that our feed distributes to clients.

From changes to something clients can use

Once you have a logical stream, you still need to distribute it. Since we have thousands of consumers, it is not practical to create a logical decoding slot for each of them. A logical replication slot represents its own stream of changes and maintains its own replication state, which makes slots a poor fit as the direct subscription interface for large numbers of downstream consumers.

In our case, this is handled by an internal component called PG feed. It reads decoded changes from Postgres through a single logical decoding path and forwards them to clients that subscribe to specific tables. There can be thousands of subscribers, each interested in a small subset of the data, which makes filtering and distribution part of the core problem rather than an afterthought.

The design is intentionally narrow. Clients continue to write directly to Postgres, and PG feed is not part of the write path. It only observes changes after they happen. It also does not maintain its own copy of the database. If a client needs the current state of a table, it fetches that from Postgres and then continues with a stream of updates.

Subscriptions are defined at the table level, which means the system only deals with what is physically written. If something is not in the WAL, it does not exist from the perspective of the feed. That rules out joins or derived views and keeps the model aligned with what Postgres can actually guarantee.

The snapshot problem

Subscribing to changes in practice means combining a snapshot with a live stream. A client typically fetches the current state of a table and then expects to receive updates from that point onward, but those two steps overlap in time.

While the snapshot query is running, new transactions are committing changes to the same table. Because of transaction isolation, the query only sees data that is visible in its snapshot, while the change stream includes transactions that commit during and after that read. In PostgreSQL, a Repeatable Read transaction sees a fixed snapshot taken at the start of its first statement, rather than a moving view of later commits.

That creates a gap where updates can arrive before the client has seen the corresponding rows in the snapshot. Handling that gap correctly is where most of the complexity sits. If updates are forwarded immediately, clients observe changes out of order. If they are delayed without enough context, updates can be missed or duplicated.

In our case, PG feed reconciles this explicitly. It reads the table on a regular Postgres connection inside a Repeatable Read transaction and captures pg_current_snapshot() alongside that read. Separately, it consumes decoded WAL from the continuously running replication connection. PostgreSQL exposes snapshot information through the pg_snapshot type and related functions, including pg_current_snapshot(), which makes it possible to reason about transaction visibility at the application layer.

That gives us enough information to decide, for each incoming change, whether it is already reflected in the snapshot the client received or whether it should be delivered afterward. In other words, the snapshot and the replication stream are stitched together in the application rather than by Postgres itself.

Where it stops working well

The system works well for the shape of data it was designed around: relatively small tables with frequent updates and many consumers. Outside of that, the limits are fairly clear.

Larger tables make the initial snapshot expensive and slow down new subscribers.
Throughput is constrained by the logical decoding path in Postgres and by the fact that each decoding slot is a single stream of ordered changes. This can be improved by splitting responsibility across multiple decoders or feeds for subsets of tables, but there is still a real scaling boundary in that architecture.
Only data written to the WAL is available, so anything based on queries, joins, or views is out of scope.

Why take this approach

The main benefit is on the write side. By not introducing a coordinating layer, clients can interact with Postgres directly using normal transactions, without needing to go through another system or emit events separately. At the same time, other systems can react to those changes with relatively low latency, using a model that stays close to the underlying data.

That combination is useful, but it comes with specific constraints. You are relying on Postgres not just as storage, but as the source of a change stream, and that exposes parts of the system that are usually hidden. Some of those scale well, while others become bottlenecks that you have to design around. Most of the work ends up being in understanding exactly where that boundary sits and shaping your system to stay within it.

View on Original Source

Interested in this job?

Use our AI to tailor your resume for this Pushing Postgres beyond storage position at Optiver.