Removing Duplicates with Unique Index
In an effort to prevent duplicate data insertion, a normal index was mistakenly created for fields A, B, C, and D, resulting in the presence of duplicate records in a 20 million record table. The question arises: will adding a unique index for these fields remove the duplicates without compromising existing ones?
Correcting the Index and Handling Duplicates
Adding a unique index with the ALTER TABLE statement without the IGNORE modifier will fail since unique records already exist. However, using the IGNORE modifier will remove the duplicates.
Alternative Approach for MySQL Versions 5.7.4 and Above
For MySQL versions 5.7.4 and above, where the IGNORE modifier is not supported, a different approach is recommended:
Syntax for Removing Duplicates with INSERT IGNORE
CREATE TABLE tmp_data SELECT * FROM mytable;
TRUNCATE TABLE mytable;
ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
INSERT IGNORE INTO mytable SELECT * FROM tmp_data;
DROP TABLE tmp_data;
Additional Considerations
The documentation does not specify which duplicate row will be retained after using the IGNORE modifier. It is advisable to test this on a smaller dataset before applying the solution to a large table.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3