Deleting Data

The Deleting Data page describes how the Delete statement works and how to maintain data that you delete:

Overview

Deleting data typically refers to deleting rows, but can refer to deleting other table content as well. The general workflow for deleting data is to delete data followed by triggering a cleanup operation. The cleanup operation reclaims the space occupied by the deleted rows, discussed further below.

The DELETE statement deletes rows defined by a predicate that you have specified, preventing them from appearing in subsequent queries.

For example, the predicate below defines and deletes rows containing animals heavier than 1000 weight units:

farm=> DELETE FROM cool_animals WHERE weight > 1000;

The major benefit of the DELETE statement is that it deletes transactions simply and quickly.

The Deletion Process

Deleting rows occurs in the following two phases:

  • Phase 1 - Deletion - All rows you mark for deletion are ignored when you run any query. These rows are not deleted until the clean-up phase.

  • Phase 2 - Clean-up - The rows you marked for deletion in Phase 1 are physically deleted. The clean-up phase is not automated, letting users or DBAs control when to activate it. The files you marked for deletion during Phase 1 are removed from disk, which you do by by sequentially running the utility function commands CLEANUP_CHUNKS and CLEANUP_EXTENTS.

Usage Notes

The Usage Notes section includes important information about the DELETE statement:

General Notes

This section describes the general notes applicable when deleting rows:

  • The ALTER TABLE command and other DDL operations are locked on tables that require clean-up. If the estimated clean-up time exceeds the permitted threshold, an error message is displayed describing how to override the threshold limitation. For more information, see Concurrency and Locks.

  • If the number of deleted records exceeds the threshold defined by the mixedColumnChunksThreshold parameter, the delete operation is aborted. This alerts users that the large number of deleted records may result in a large number of mixed chunks. To circumvent this alert, use the following syntax (replacing XXX with the desired number of records) before running the delete operation:

    set mixedColumnChunksThreshold=XXX;
    

Deleting Data does not Free Space

With the exception of running a full table delete, deleting data does not free unused disk space. To free unused disk space you must trigger the clean-up process.

For more information on running a full table delete, see TRUNCATE.

For more information on freeing disk space, see Triggering a Clean-Up.

Clean-Up Operations Are I/O Intensive

The clean-up process reduces table size by removing all unused space from column chunks. While this reduces query time, it is a time-costly operation occupying disk space for the new copy of the table until the operation is complete.

Tip

Because clean-up operations can create significant I/O load on your database, consider using them sparingly during ideal times.

If this is an issue with your environment, consider using CREATE TABLE AS to create a new table and then rename and drop the old table.

Examples

The Examples section includes the following examples:

Deleting Rows from a Table

The following example shows how to delete rows from a table.

  1. Display the table:

    farm=> SELECT * FROM cool_animals;
    

    The following table is displayed:

    1,Dog                 ,7
    2,Possum              ,3
    3,Cat                 ,5
    4,Elephant            ,6500
    5,Rhinoceros          ,2100
    6,\N,\N
    
  2. Delete rows from the table:

    farm=> DELETE FROM cool_animals WHERE weight > 1000;
    
  3. Display the table:

    farm=> SELECT * FROM cool_animals;
    

    The following table is displayed:

    1,Dog                 ,7
    2,Possum              ,3
    3,Cat                 ,5
    6,\N,\N
    

Deleting Values Based on Complex Predicates

The following example shows how to delete values based on complex predicates.

  1. Display the table:

    farm=> SELECT * FROM cool_animals;
    

    The following table is displayed:

    1,Dog                 ,7
    2,Possum              ,3
    3,Cat                 ,5
    4,Elephant            ,6500
    5,Rhinoceros          ,2100
    6,\N,\N
    
  2. Delete rows from the table:

    farm=> DELETE FROM cool_animals WHERE weight > 1000;
    
  3. Display the table:

    farm=> SELECT * FROM cool_animals;
    

    The following table is displayed:

    1,Dog                 ,7
    2,Possum              ,3
    3,Cat                 ,5
    6,\N,\N
    

Identifying and Cleaning Up Tables

The Identifying and Cleaning Up Tables section includes the following examples:

Listing Tables that Have Not Been Cleaned Up

The following example shows how to list tables that have not been cleaned up:

farm=> SELECT t.table_name FROM sqream_catalog.delete_predicates dp
   JOIN sqream_catalog.tables t
   ON dp.table_id = t.table_id
   GROUP BY 1;
cool_animals

1 row

Identifying Predicates for Clean-Up

The following example shows how to identify predicates for clean-up:

farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp
   JOIN sqream_catalog.tables t
   ON dp.table_id = t.table_id
   WHERE t.table_name = 'cool_animals';
weight > 1000

1 row

Triggering a Clean-Up

The following example shows how to trigger a clean-up:

  1. Run the chunk CLEANUP_CHUNKS command (also known as SWEEP) to reorganize the chunks:

    farm=> SELECT CLEANUP_CHUNKS('public','cool_animals');
    
  2. Run the CLEANUP_EXTENTS command (also known as VACUUM) to delete the leftover files:

    farm=> SELECT CLEANUP_EXTENTS('public','cool_animals');
    
  3. Display the table:

    farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp
       JOIN sqream_catalog.tables t
       ON dp.table_id = t.table_id
       WHERE t.table_name = 'cool_animals';
    

Best Practices

This section includes the best practices when deleting rows:

  • Run CLEANUP_CHUNKS and CLEANUP_EXTENTS after running large DELETE operations.

  • When you delete large segments of data from very large tables, consider running a CREATE TABLE AS operation instead, renaming, and dropping the original table.

  • Avoid killing CLEANUP_EXTENTS operations in progress.

  • SQream is optimized for time-based data, which is data naturally ordered according to date or timestamp. Deleting rows based on such columns leads to increased performance.