Why is it not recommended to use `DELETE` to remove a large amount of data in MySQL?
DELETE
to remove large volumes of data, from three perspectives: InnoDB storage space allocation, the performance impact of DELETE
, and best practice recommendations.Overview of InnoDB Storage Architecture
-
Logical Structure
-
Tablespace
-
Segment
-
Extent: Each extent consists of 32 pages.
-
Page: The smallest I/O unit in InnoDB, with a default size of 16KB.
-
-
Physical Structure
-
Data Files (
.ibd
/ibdata1
): Store tables, indexes, and dictionary metadata. -
Log Files (
ib_logfile*
): Record page modifications and are used for crash recovery. Extent Auto-Expansion Strategy
Initially, 1 extent is allocated.
If the total tablespace is less than 32MB, 1 extent is added at a time.
If it exceeds 32MB, 4 extents are added at a time.
Types of InnoDB Tablespaces
-
System Tablespace (
ibdata1
): Stores internal dictionary and other metadata. -
File-Per-Table Tablespace (
innodb_file_per_table=ON
): Each table has its own.ibd
file. -
Undo Tablespace: Stores rollback segments for MVCC (Multi-Version Concurrency Control).
Starting from MySQL 8.0, custom general tablespaces are supported.
CREATE TABLESPACE tbs_hot
ADD DATAFILE '/hot_data/tbs_hot_001.dbf'
INITIAL_SIZE = 10G
AUTOEXTEND_SIZE = 1G
MAX_SIZE = 32G
ENGINE = InnoDB;
Hot and Cold Data Separation
-
Hot Data (e.g., users, orders) → Stored in SSD tablespace
-
Cold Data (e.g., logs, archives) → Stored in HDD tablespace
Practical Test: Space Allocation & Reclamation
Create an Empty Table
CREATE TABLE user (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(20) NOT NULL,
age TINYINT NOT NULL,
gender CHAR(2) NOT NULL,
phone VARCHAR(16) NOT NULL,
create_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
update_time DATETIME NOT NULL
) ENGINE=InnoDB;
$ ls -lh user.ibd
-rw-r----- 1 mysql mysql 96K Aug 17 10:41 user.ibd
The first extent (32 pages) of an empty table occupies approximately 96KB.
Insert 100,000 Rows
CALL insert_user_data(100000); -- self-defined stored procedure
$ ls -lh user.ibd
-rw-r----- 1 mysql mysql 14M Aug 17 10:58 user.ibd
More extents were allocated, totaling approximately 896 pages (≈14MB).
Delete 50,000 Rows
DELETE FROM user LIMIT 50000;$ ls -lh user.ibd -rw-r----- 1 mysql mysql 14M Aug 17 11:05 user.ibd
Space Not Released, Still Remains at 14MB.
InnoDB only marks rows as deleted using a delete flag; it does not physically reclaim the space.
Impact of DELETE on Query Performance
SELECT id, age, phone FROM user WHERE name LIKE 'cba52%';Initial Query (1 million rows + index)
Execution Time: 35 ms
COST: 10.499
Physical Reads: 7,968,109
Logical Reads: 7,955,134
Rows Scanned: 25,124
Rows Returned: 12,018
After deleting 500,000 rows, query again
DELETE FROM user LIMIT 500000; ANALYZE TABLE user;
SELECT id, age, phone FROM user WHERE name LIKE 'cba52%';Execution time: 50 ms
COST: 10.549
Physical/Logical reads: same as above
Rows scanned: 25,124
Rows returned: 0
After deleting half of the data in a large table, the query cost and I/O remain essentially unchanged; only the returned results differ.
Why Large-Scale DELETE Is Not Recommended?
Space Not Reclaimed
- The
.ibd
file does not shrink, and extents remain allocated.
Page Fragmentation
Random deletes/updates cause page splits and increased gaps.
Difficulties with Subsequent Writes
Pages marked as deleted are only reused when inserting smaller rows.
- ALTER TABLE ... ENGINE=InnoDB requires full table rebuild, which is I/O intensive and blocks DML operations.
Best Practices and Optimization Recommendations
Logical Deletion (Marked Deletion)
ALTER TABLE user
ADD COLUMN is_deleted TINYINT NOT NULL DEFAULT 0;
UPDATE user
SET is_deleted = 1
WHERE id = 123456;
-- Filter out deleted rows in queries:
SELECT *
FROM user
WHERE is_deleted = 0
AND name LIKE 'cba52%';
Advantages: No need for large-scale physical deletion, and does not cause fragmentation.
Partition Archiving
Partition by time, regularly exchange partitions to archive historical data.
Online DDL + metadata exchange: zero or minimal blocking.
ALTER TABLE usr_order_bak
EXCHANGE PARTITION p202505
WITH TABLE usr_order_mid;
By using partition operations, large amounts of data can be moved instantly without time-consuming DELETE operations.
Permission Isolation
Grant only SELECT, INSERT, and UPDATE permissions to business accounts; disable DELETE privileges.
Split databases by microservice, with separate accounts for each service to prevent accidental deletions.
Dedicated Archiving System
CREATE USER 'cust_user'@'%' IDENTIFIED BY '…'; GRANT SELECT, INSERT, UPDATE ON db_user.*;
For cold data and historical logs, consider using ClickHouse or Elasticsearch for storage and cleanup.
Use TTL (Time-To-Live) to automatically purge outdated data.
Summary
Deleting large amounts of data does not reduce space; instead, it leaves behind a lot of fragmentation, which negatively impacts indexes and performance.
Logical deletion combined with partition archiving is the proper approach for large-scale data cleanup.
Coupled with permission control and dedicated archiving systems (such as ClickHouse), this ensures both performance and historical data retention.
Comments
Post a Comment