Hydra is the best data warehouse for devs. It’s open source, built on Postgres, and rockets through the toughest analytics (OLAP) and hybrid transactional workloads (HTAP). We are excited to contribute to open source today - Hydra adds Update & Delete functionality to columnar storage on Postgres!
In our previous blog post, How we built the fastest Postgres DB for Analytics, we reviewed how the Hydra team added columnar storage, vectorization, and query parallelization to Postgres and benchmark results using ClickBench. Support for updates and deletes on columnar was our #1 customer feature request and we’re thrilled to make it generally available today.
If you want to try Hydra, we offer a 14 day free trial of the cloud-managed version of Hydra data warehouse.
How it works
Updating and deleting data are some of the most common functionalities in relational databases. While append-only data stores are useful for immutable data like analytics, it lacks the flexibility necessary for many other database tasks.
When we talk about updates and deletes in the PostgreSQL heap storage engine, we can see that delete rows are not physically deleted, but instead logically deleted by tuple header data directly on heap. Same applies to updates - older tuple header data is changed and the new tuple, with changed fields, is added on heap. For more information how and why this is done you can check out the online book The Internals of PostgreSQL which provides excellent documentation on this topic.
Columnar storage functionality is dependent on several metadata tables that reside in the
columnar schema. For example, the
columnar.stripe table contains all stripes that are currently visible to our transaction, and this information will be used to read and locate stripes within our columnar table.
The heap tables provide a consistent view of data in concurrent environments via Postgres’ multi-version model (MVCC). This means that each SQL statement sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. You can imagine a situation when two concurrent transactions are active - A and B. If transaction A adds rows to the table, then the other transaction will not be able to see them because entries in
columnar.stripe will not be visible to transaction B , even though they are visible to transaction A.
Each stripe contains up to 15 chunks (with each chunk contains up to 10,000 rows), and the metadata for each chunk is stored in
columnar.chunk . This table is used to filter chunks based on their min and max values. Each chunk column has entries in this table, so when executing a filter (a
WHERE clause) these values are checked before reading the chunk based on min and max value.
Since Hydra columnar storage was initially immutable and append only, we needed to have some way to mark rows that are updated and deleted outside of columnar storage. For this purpose, we have added a
columnar.row_mask heap table to check if a row should be skipped or not, thus emulating DML functionality.
The definition of
columnar.row_mask table is as follows:
CREATE TABLE row_mask ( id BIGINT NOT NULL, storage_id BIGINT NOT NULL, start_row_number BIGINT NOT NULL, end_row_number BIGINT NOT NULL, mask BYTEA, PRIMARY KEY (id, storage_id, start_row_number, end_row_number) ) WITH (user_catalog_table = true); ALTER TABLE columnar.row_mask ADD CONSTRAINT row_mask_stripe_unique UNIQUE (storage_id, start_row_number);
columnar.row_mask entry is associated with a written chunk. When a stripe is ready to be flushed, we also create an entry for each chunk in the stripe. This table is almost completely static — all columns except the
mask will not change.
mask is a byte array where each bit corresponds to a row in the chunk - for each chunk, up to 1125 bytes are used. Initially all bits are set to zero (visible). When a row is deleted, we will set the corresponding bit to one, indicating that this row should be skipped during scan.
Hydra's columnar DELETE command uses the
mask column inside each
row_mask row to logically mark the rows that are deleted and hide them from future queries. The UPDATE command is very similar, Hydra combines this DELETE operation with an INSERT, but we don’t have leave any link between the logically-deleted tuple and newly inserted one.
Note that columnar tables use concurrency locking differently. In heap based tables, concurrent transactions that modifies different rows can do so in parallel without blocking. Transactions will only be blocked if they try to modify the same row (row-based locking). Columnar doesn’t have this granularity, so we lock the complete table when doing modifications.
Checking whether rows are deleted has some overhead — it takes about 2ms per 1M rows scanned. Rows are only checked if they are not filtered, so performance depends on what
WHERE clauses are in your query. Additionally, chunks that have no deleted rows are not checked, meaning performance on data that has not been modified is extra fast.
Updating and deleting data is not nearly as fast as inserting data, and therefore should be done sparingly. Columnar storage works best if we anticipate the data won't change. Since each transaction creates a stripe, ideally you should perform as many updates as possible in single batch transaction. We will be looking into optimizing this behavior in future releases.
Deleting and updating rows with help of external metadata objects shows that is possible to have mutable functionality on immutable storage engine, but this has some drawbacks.
When updating or deleting rows in a columnar table, storage is still occupied. We plan to support
vacuum on these tables so that unused space can be reclaimed. For now, columnar tables should have a relatively low number of updates or deletes to remain space efficient. Space can be reclaimed using
vacuum full , but as this requires rewriting the entire table and it should be done sparingly.
Since we can’t lock a single row for delete or update, Hydra locks complete tables while these commands are in process - this will limit concurrent transaction performance. For concurrent transactional workloads, heap tables are the best approach. Row or columnar is defined at the table level, so transactional heap tables can simply live right next to your analytical columnar tables in the same Hydra database or within a single partitioned table.
We’re excited for updates and deletes on columnar to enable even fresher data written directly to the data warehouse. If you want to design a real-time data warehouse together on Postgres, email us at email@example.com and sign-up for a 14 day free trial.
Thank you to the following people for contributing content and edits: Mario Karuza, Joseph Sciarrino, Delia Cronin, Conor Wong, Sushobhan Jena, Carlos Sanchez, Pankaj Sachdeva, and Kirthana Senguttuvan.