Why is it not recommended to use `DELETE` to remove a large amount of data in MySQL?

In this post, we will analyze why it's not recommended to use DELETE to remove large volumes of data, from three perspectives: InnoDB storage space allocation, the performance impact of DELETE, and best practice recommendations.

Overview of InnoDB Storage Architecture

  • Logical Structure

    • Tablespace

    • Segment

    • Extent: Each extent consists of 32 pages.

    • Page: The smallest I/O unit in InnoDB, with a default size of 16KB.

  • Physical Structure

    • Data Files (.ibd / ibdata1): Store tables, indexes, and dictionary metadata.

    • Log Files (ib_logfile*): Record page modifications and are used for crash recovery.

  • Extent Auto-Expansion Strategy

    • Initially, 1 extent is allocated.

    • If the total tablespace is less than 32MB, 1 extent is added at a time.

    • If it exceeds 32MB, 4 extents are added at a time.

Types of InnoDB Tablespaces

  1. System Tablespace (ibdata1): Stores internal dictionary and other metadata.

  2. File-Per-Table Tablespace (innodb_file_per_table=ON): Each table has its own .ibd file.

  3. Undo Tablespace: Stores rollback segments for MVCC (Multi-Version Concurrency Control).

Starting from MySQL 8.0, custom general tablespaces are supported.

CREATE TABLESPACE tbs_hot
  ADD DATAFILE '/hot_data/tbs_hot_001.dbf'
  INITIAL_SIZE = 10G
  AUTOEXTEND_SIZE = 1G
  MAX_SIZE = 32G
  ENGINE = InnoDB;

Hot and Cold Data Separation

  • Hot Data (e.g., users, orders) → Stored in SSD tablespace

  • Cold Data (e.g., logs, archives) → Stored in HDD tablespace

Practical Test: Space Allocation & Reclamation

Create an Empty Table

CREATE TABLE user (
  id           BIGINT PRIMARY KEY AUTO_INCREMENT,
  name         VARCHAR(20) NOT NULL,
  age          TINYINT      NOT NULL,
  gender       CHAR(2)      NOT NULL,
  phone        VARCHAR(16)  NOT NULL,
  create_time  DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
  update_time  DATETIME     NOT NULL
) ENGINE=InnoDB;
$ ls -lh user.ibd
-rw-r----- 1 mysql mysql 96K Aug  17 10:41 user.ibd

The first extent (32 pages) of an empty table occupies approximately 96KB.

Insert 100,000 Rows

CALL insert_user_data(100000);  -- self-defined stored procedure 
$ ls -lh user.ibd
-rw-r----- 1 mysql mysql 14M Aug  17 10:58 user.ibd

More extents were allocated, totaling approximately 896 pages (≈14MB).

Delete 50,000 Rows

DELETE FROM user LIMIT 50000;
$ ls -lh user.ibd
-rw-r----- 1 mysql mysql 14M Aug  17 11:05 user.ibd

Space Not Released, Still Remains at 14MB.

InnoDB only marks rows as deleted using a delete flag; it does not physically reclaim the space.

Impact of DELETE on Query Performance

SELECT id, age, phone
  FROM user
 WHERE name LIKE 'cba52%';

Initial Query (1 million rows + index)

Execution Time: 35 ms

COST: 10.499

Physical Reads: 7,968,109

Logical Reads: 7,955,134

Rows Scanned: 25,124

Rows Returned: 12,018

After deleting 500,000 rows, query again

DELETE FROM user LIMIT 500000;
ANALYZE TABLE user;
SELECT id, age, phone
  FROM user
 WHERE name LIKE 'cba52%';

Execution time: 50 ms 

COST: 10.549 

Physical/Logical reads: same as above 

Rows scanned: 25,124 

Rows returned: 0

After deleting half of the data in a large table, the query cost and I/O remain essentially unchanged; only the returned results differ.

Why Large-Scale DELETE Is Not Recommended?

Space Not Reclaimed

  • The .ibd file does not shrink, and extents remain allocated.

Page Fragmentation

  • Random deletes/updates cause page splits and increased gaps.

Difficulties with Subsequent Writes

  • Pages marked as deleted are only reused when inserting smaller rows.

High Cost of Fragmentation Cleanup
  • ALTER TABLE ... ENGINE=InnoDB requires full table rebuild, which is I/O intensive and blocks DML operations.

Best Practices and Optimization Recommendations

Logical Deletion (Marked Deletion)

ALTER TABLE user
  ADD COLUMN is_deleted TINYINT NOT NULL DEFAULT 0;

UPDATE user
   SET is_deleted = 1
 WHERE id = 123456;

-- Filter out deleted rows in queries:
SELECT *
  FROM user
 WHERE is_deleted = 0
   AND name LIKE 'cba52%';

Advantages: No need for large-scale physical deletion, and does not cause fragmentation.

Partition Archiving

Partition by time, regularly exchange partitions to archive historical data.

Online DDL + metadata exchange: zero or minimal blocking.

ALTER TABLE usr_order_bak
  EXCHANGE PARTITION p202505
  WITH TABLE usr_order_mid;

By using partition operations, large amounts of data can be moved instantly without time-consuming DELETE operations.

Permission Isolation

Grant only SELECT, INSERT, and UPDATE permissions to business accounts; disable DELETE privileges.

Split databases by microservice, with separate accounts for each service to prevent accidental deletions.

CREATE USER 'cust_user'@'%'
  IDENTIFIED BY '…';
GRANT SELECT, INSERT, UPDATE
  ON db_user.*;
Dedicated Archiving System

For cold data and historical logs, consider using ClickHouse or Elasticsearch for storage and cleanup.

Use TTL (Time-To-Live) to automatically purge outdated data.

Summary

Deleting large amounts of data does not reduce space; instead, it leaves behind a lot of fragmentation, which negatively impacts indexes and performance.

Logical deletion combined with partition archiving is the proper approach for large-scale data cleanup.

Coupled with permission control and dedicated archiving systems (such as ClickHouse), this ensures both performance and historical data retention.

Comments