Deleting data

How does deleting in SQream DB work?

In SQream DB, when you run a delete statement, after the transaction completes, any rows that match the delete predicate will no longer be returned when running queries. A separate process can be used to reclaim the space occupied by these rows, and to remove the small overhead that queries will have until this is done. Some benefits to this design are:

  1. delete transactions complete quickly
  2. the total disk footprint overhead at any time for a delete transaction or cleanup process is small and bounded (while the system still supports low overhead commit, rollback and recovery for delete transactions).

Phase 1: Delete

When a DELETE statement is run, SQream DB records the predicate used. This predicate will be used to filter future statements on this table until all this delete predicate’s matching rows have been physically cleaned up. This filtering process takes full advantage of SQream’s zone map feature.

Phase 2: Clean-up

The cleanup process is not automatic. This gives control to the user or DBA, and gives flexibility on when to run the clean up.

Files marked for deletion during the logical deletion stage are removed from disk. This is achieved by calling both utility function commands: CLEANUP_CHUNKS and CLEANUP_EXTENTS sequentially.

Note

  • ALTER TABLE and other DDL operations are blocked on tables that require clean-up. See more in the Concurrency and locks guide.
  • If the estimated time for a cleanup processs is beyond a threshold, you will get an error message about it. The message will explain how to override this limitation and run the process anywhere.

Notes on data deletion

Deleting data does not free up space

With the exception of a full table delete (TRUNCATE), deleting data does not free up disk space. To free up disk space, trigger the cleanup process.

Select performance on deleted rows

Queries on tables that have deleted rows may have to scan data that hasn’t been cleaned up. In some cases, this can cause queries to take longer than expected. To solve this issue, trigger the cleanup process.

Use TRUNCATE instead of DELETE

For tables that are frequently emptied entirely, consider using TRUNCATE rather than DELETE. TRUNCATE removes the entire content of the table immediately, without requiring a subsequent cleanup to free up disk space.

Cleanup is I/O intensive

The cleanup process actively compacts tables by writing a complete new version of column chunks with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes.

Cleanup operations can create significant I/O load on the database. Consider this when planning the best time for the cleanup process.

Example

Deleting values from a table

farm=> SELECT * FROM cool_animals;
1,Dog                 ,7
2,Possum              ,3
3,Cat                 ,5
4,Elephant            ,6500
5,Rhinoceros          ,2100
6,\N,\N

6 rows

farm=> DELETE FROM cool_animals WHERE weight > 1000;
executed

farm=> SELECT * FROM cool_animals;
1,Dog                 ,7
2,Possum              ,3
3,Cat                 ,5
6,\N,\N

4 rows

Deleting values based on more complex predicates

farm=> SELECT * FROM cool_animals;
1,Dog                 ,7
2,Possum              ,3
3,Cat                 ,5
4,Elephant            ,6500
5,Rhinoceros          ,2100
6,\N,\N

6 rows

farm=> DELETE FROM cool_animals WHERE weight > 1000;
executed

farm=> SELECT * FROM cool_animals;
1,Dog                 ,7
2,Possum              ,3
3,Cat                 ,5
6,\N,\N

4 rows

Identifying and cleaning up tables

List tables that haven’t been cleaned up

farm=> SELECT t.table_name FROM sqream_catalog.delete_predicates dp
   JOIN sqream_catalog.tables t
   ON dp.table_id = t.table_id
   GROUP BY 1;
cool_animals

1 row

Identify predicates for clean-up

farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp
   JOIN sqream_catalog.tables t
   ON dp.table_id = t.table_id
   WHERE t.table_name = 'cool_animals';
weight > 1000

1 row

Triggering a cleanup

-- Chunk reorganization (SWEEP)
farm=> SELECT CLEANUP_CHUNKS('public','cool_animals');
executed

-- Delete leftover files (VACUUM)
farm=> SELECT CLEANUP_EXTENTS('public','cool_animals');
executed


farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp
   JOIN sqream_catalog.tables t
   ON dp.table_id = t.table_id
   WHERE t.table_name = 'cool_animals';

0 rows