Why is it not recommended to use `DELETE` to remove a large amount of data in MySQL?

In this post, we will analyze why it's not recommended to use DELETE to remove large volumes of data, from three perspectives: InnoDB storage space allocation, the performance impact of DELETE, and best practice recommendations.

Overview of InnoDB Storage Architecture

  • Logical Structure

    • Tablespace

    • Segment

    • Extent: Each extent consists of 32 pages.

    • Page: The smallest I/O unit in InnoDB, with a default size of 16KB.

  • Physical Structure

    • Data Files (.ibd / ibdata1): Store tables, indexes, and dictionary metadata.

    • Log Files (ib_logfile*): Record page modifications and are used for crash recovery.

  • Extent Auto-Expansion Strategy

    • Initially, 1 extent is allocated.

    • If the total tablespace is less than 32MB, 1 extent is added at a time.

    • If it exceeds 32MB, 4 extents are added at a time.

Types of InnoDB Tablespaces

  1. System Tablespace (ibdata1): Stores internal dictionary and other metadata.

  2. File-Per-Table Tablespace (innodb_file_per_table=ON): Each table has its own .ibd file.

  3. Undo Tablespace: Stores rollback segments for MVCC (Multi-Version Concurrency Control).

Starting from MySQL 8.0, custom general tablespaces are supported.

CREATE TABLESPACE tbs_hot
  ADD DATAFILE '/hot_data/tbs_hot_001.dbf'
  INITIAL_SIZE = 10G
  AUTOEXTEND_SIZE = 1G
  MAX_SIZE = 32G
  ENGINE = InnoDB;

Hot and Cold Data Separation

  • Hot Data (e.g., users, orders) → Stored in SSD tablespace

  • Cold Data (e.g., logs, archives) → Stored in HDD tablespace

Practical Test: Space Allocation & Reclamation

Create an Empty Table

CREATE TABLE user (
  id           BIGINT PRIMARY KEY AUTO_INCREMENT,
  name         VARCHAR(20) NOT NULL,
  age          TINYINT      NOT NULL,
  gender       CHAR(2)      NOT NULL,
  phone        VARCHAR(16)  NOT NULL,
  create_time  DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
  update_time  DATETIME     NOT NULL
) ENGINE=InnoDB;
$ ls -lh user.ibd
-rw-r----- 1 mysql mysql 96K Aug  17 10:41 user.ibd

The first extent (32 pages) of an empty table occupies approximately 96KB.

Insert 100,000 Rows

CALL insert_user_data(100000);  -- self-defined stored procedure 
$ ls -lh user.ibd
-rw-r----- 1 mysql mysql 14M Aug  17 10:58 user.ibd

More extents were allocated, totaling approximately 896 pages (≈14MB).

Delete 50,000 Rows

DELETE FROM user LIMIT 50000;
$ ls -lh user.ibd
-rw-r----- 1 mysql mysql 14M Aug  17 11:05 user.ibd

Space Not Released, Still Remains at 14MB.

InnoDB only marks rows as deleted using a delete flag; it does not physically reclaim the space.

Impact of DELETE on Query Performance

SELECT id, age, phone
  FROM user
 WHERE name LIKE 'cba52%';

Initial Query (1 million rows + index)

Execution Time: 35 ms

COST: 10.499

Physical Reads: 7,968,109

Logical Reads: 7,955,134

Rows Scanned: 25,124

Rows Returned: 12,018

After deleting 500,000 rows, query again

DELETE FROM user LIMIT 500000;
ANALYZE TABLE user;
SELECT id, age, phone
  FROM user
 WHERE name LIKE 'cba52%';

Execution time: 50 ms 

COST: 10.549 

Physical/Logical reads: same as above 

Rows scanned: 25,124 

Rows returned: 0

After deleting half of the data in a large table, the query cost and I/O remain essentially unchanged; only the returned results differ.

Why Large-Scale DELETE Is Not Recommended?

Space Not Reclaimed

  • The .ibd file does not shrink, and extents remain allocated.

Page Fragmentation

  • Random deletes/updates cause page splits and increased gaps.

Difficulties with Subsequent Writes

  • Pages marked as deleted are only reused when inserting smaller rows.

High Cost of Fragmentation Cleanup
  • ALTER TABLE ... ENGINE=InnoDB requires full table rebuild, which is I/O intensive and blocks DML operations.

Best Practices and Optimization Recommendations

Logical Deletion (Marked Deletion)

ALTER TABLE user
  ADD COLUMN is_deleted TINYINT NOT NULL DEFAULT 0;

UPDATE user
   SET is_deleted = 1
 WHERE id = 123456;

-- Filter out deleted rows in queries:
SELECT *
  FROM user
 WHERE is_deleted = 0
   AND name LIKE 'cba52%';

Advantages: No need for large-scale physical deletion, and does not cause fragmentation.

Partition Archiving

Partition by time, regularly exchange partitions to archive historical data.

Online DDL + metadata exchange: zero or minimal blocking.

ALTER TABLE usr_order_bak
  EXCHANGE PARTITION p202505
  WITH TABLE usr_order_mid;

By using partition operations, large amounts of data can be moved instantly without time-consuming DELETE operations.

Permission Isolation

Grant only SELECT, INSERT, and UPDATE permissions to business accounts; disable DELETE privileges.

Split databases by microservice, with separate accounts for each service to prevent accidental deletions.

CREATE USER 'cust_user'@'%'
  IDENTIFIED BY '…';
GRANT SELECT, INSERT, UPDATE
  ON db_user.*;
Dedicated Archiving System

For cold data and historical logs, consider using ClickHouse or Elasticsearch for storage and cleanup.

Use TTL (Time-To-Live) to automatically purge outdated data.

Summary

Deleting large amounts of data does not reduce space; instead, it leaves behind a lot of fragmentation, which negatively impacts indexes and performance.

Logical deletion combined with partition archiving is the proper approach for large-scale data cleanup.

Coupled with permission control and dedicated archiving systems (such as ClickHouse), this ensures both performance and historical data retention.

Comments

Popular posts from this blog

Why Do Remote Java Transmission Objects Need to Be Serialized?

Usage of MD5 Encryption and Decryption Technology in Java Application Development

For storing mobile phone numbers of 3 billion global users, should the data type be int, string, varchar, or char? And why?