Introducing Updates and Deletes on Columnar Postgres

Jonathan Dance, CTO


Hydra is a modern, open source replacement for enterprise data warehouses. It’s fast and feature-rich so devs can build better analytics, quicker.

We are excited to contribute to open source today! Support for updates and deletes on columnar Postgres was our #1 customer feature request. We’re thrilled to make it generally available (GA).

In our previous blog post, How we built the fastest Postgres DB for Analytics, we reviewed how the Hydra team added columnar storage, vectorization, and query parallelization to Postgres and benchmark results using ClickBench. On average, this work resulted in a 23X query performance improvement over basic Postgres.

Star our Github repo to support current and future open source contributions!

How it works

Updating and deleting data are some of the most common functionalities in relational databases. While append-only data stores are useful for immutable data like analytics, it lacks the flexibility necessary for many other database tasks.

When we talk about updates and deletes in the PostgreSQL heap storage engine, we can see that deleted rows are not physically deleted, but instead logically deleted by tuple header data directly on heap. Same applies to updates - older tuple header data is changed and the new tuple, with changed fields, is added on heap. For more information how and why this is done you can check out the online book The Internals of PostgreSQL which provides excellent documentation on this topic.

Hydra Implementation

Columnar storage functionality is dependent on several metadata tables that reside in the columnar schema. For example, the columnar.stripe table contains all stripes that are currently visible to our transaction, and this information will be used to read and locate stripes within our columnar table.

The heap tables provide a consistent view of data in concurrent environments via Postgres’ multi-version model (MVCC). This means that each SQL statement sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. You can imagine a situation when two concurrent transactions are active - A and B. If transaction A adds rows to the table, then the other transaction will not be able to see them because entries in columnar.stripe will not be visible to transaction B , even though they are visible to transaction A.

Each stripe contains up to 15 chunks (with each chunk contains up to 10,000 rows), and the metadata for each chunk is stored in columnar.chunk . This table is used to filter chunks based on their min and max values. Each chunk column has entries in this table, so when executing a filter (a WHERE clause) these values are checked before reading the chunk based on min and max value.

Since Hydra columnar storage was initially immutable and append only, we needed to have some way to mark rows that are updated and deleted outside of columnar storage. For this purpose, we have added a columnar.row_mask heap table to check if a row should be skipped or not, thus emulating DML functionality.

The definition of columnar.row_mask table is as follows:

CREATE TABLE row_mask (
    storage_id BIGINT NOT NULL,
    start_row_number BIGINT NOT NULL,
    end_row_number BIGINT NOT NULL,
    mask BYTEA,
    PRIMARY KEY (id, storage_id, start_row_number, end_row_number)
) WITH (user_catalog_table = true);

ALTER TABLE columnar.row_mask ADD CONSTRAINT row_mask_stripe_unique
UNIQUE (storage_id, start_row_number);

Each columnar.row_mask entry is associated with a written chunk. When a stripe is ready to be flushed, we also create an entry for each chunk in the stripe. This table is almost completely static — all columns except the mask will not change. mask is a byte array where each bit corresponds to a row in the chunk - for each chunk, up to 1125 bytes are used. Initially all bits are set to zero (visible). When a row is deleted, we will set the corresponding bit to one, indicating that this row should be skipped during scan.

Hydra's columnar DELETE command uses the mask column inside each row_mask row to logically mark the rows that are deleted and hide them from future queries. The UPDATE command is very similar, Hydra combines this DELETE operation with an INSERT, but we don’t have leave any link between the logically-deleted tuple and newly inserted one.

Note that columnar tables use concurrency locking differently. In heap based tables, concurrent transactions that modifies different rows can do so in parallel without blocking. Transactions will only be blocked if they try to modify the same row (row-based locking). Columnar doesn’t have this granularity, so we lock the complete table when doing modifications.

Best Practices

Checking whether rows are deleted has some overhead — it takes about 2ms per 1M rows scanned. Rows are only checked if they are not filtered, so performance depends on what WHERE clauses are in your query. Additionally, chunks that have no deleted rows are not checked, meaning performance on data that has not been modified is extra fast.

Updating and deleting data is not nearly as fast as inserting data, and therefore should be done sparingly. Columnar storage works best if we anticipate the data won't change. Since each transaction creates a stripe, ideally you should perform as many updates as possible in single batch transaction. We will be looking into optimizing this behavior in future releases.

What’s Next?

Deleting and updating rows with help of external metadata objects shows that is possible to have mutable functionality on immutable storage engine, but this has some drawbacks.

When updating or deleting rows in a columnar table, storage is still occupied. We plan to support vacuum on these tables so that unused space can be reclaimed. For now, columnar tables should have a relatively low number of updates or deletes to remain space efficient. Space can be reclaimed using vacuum full , but as this requires rewriting the entire table and it should be done sparingly.

Since we can’t lock a single row for delete or update, Hydra locks complete tables while these commands are in process - this will limit concurrent transaction performance. For concurrent transactional workloads, heap tables are the best approach. Row or columnar is defined at the table level, so transactional heap tables can simply live right next to your analytical columnar tables in the same Hydra database or within a single partitioned table.

If you want to design a real-time data warehouse together on Postgres, sign-up for a 14 day free trial.


Thank you to the following people for contributing content and edits: Mario Karuza, Joseph Sciarrino, Delia Cronin, Conor Wong, Sushobhan Jena, Carlos Sanchez, Pankaj Sachdeva, and Kirthana Senguttuvan.

Hydra Github Repo

Star our Github repo to support current and future open source contributions!

Get a fully managed data warehouse at cloud scale with no vendor lock-in.