考试系统中做了一个用户导入试题功能,导致用户导入了很多重复的试题,我需要查询及删除一下重复的记录,于是有了这篇文章。
(一)单个字段
1、查找表中多余的重复记录,根据(question_title)字段来判断
代码如下 | 复制代码 |
select * from questions where question_title in (select question_title from people group by question_title having count(question_title) > 1) |
2、删除表中多余的重复记录,根据(question_title)字段来判断,只留有一个记录
代码如下 | 复制代码 |
delete from questions where peopleId in (select peopleId from people group by peopleId having count(question_title) > 1) and min(id) not in (select question_id from questions group by question_title having count(question_title)>1) |
(二)多个字段
删除表中多余的重复记录(多个字段),只留有rowid最小的记录
代码如下 | 复制代码 |
DELETE FROM questions WHERE (questions_title,questions_scope) IN (SELECT questions_title,questions_scope FROM questions GROUP BY questions_title,questions_scope HAVING COUNT(*) > 1) AND question_id NOT IN (SELECT MIN(question_id) FROM questions GROUP BY questions_scope,questions_title HAVING COUNT(*)>1) |
用上述语句无法删除,创建了临时表才删的,求各位达人解释一下。
代码如下 | 复制代码 |
CREATE TABLE tmp AS SELECT question_id FROM questions WHERE (questions_title,questions_scope) IN (SELECT questions_title,questions_scope FROM questions GROUP BY questions_title,questions_scope HAVING COUNT(*) > 1) AND question_id NOT IN (SELECT MIN(question_id) FROM questions GROUP BY questions_scope,questions_title HAVING COUNT(*)>1); DELETE FROM questions WHERE question_id IN (SELECT question_id FROM tmp); DROP TABLE tmp; |
(三)
代码如下 | 复制代码 |
declare @max integer,@id integer declare cur_rows cursor local for select 主字段,count(*) from 表名 group by 主字段 having count(*) >; 1 open cur_rows fetch cur_rows into @id,@max while @@fetch_status=0 begin select @max = @max -1 set rowcount @max delete from 表名 where 主字段 = @id fetch cur_rows into @id,@max end close cur_rows set rowcount 0 |
上面讲到了很多,下面我们一起来看实例删除重复记录实例
例1,表中有主键(可唯一标识的字段),且该字段为数字类型
代码如下 | 复制代码 |
/* 表结构 */ DROP TABLE IF EXISTS `t1`; CREATE TABLE IF NOT EXISTS `t1`( `id` INT(1) NOT NULL AUTO_INCREMENT, `name` VARCHAR(20) NOT NULL, `add` VARCHAR(20) NOT NULL, PRIMARY KEY(`id`) )Engine=InnoDB; /* 插入测试数据 */ SELECT * FROM `t1`; |
查找id最小的重复数据(只查找id字段)
代码如下 | 复制代码 |
/* 查找id最小的重复数据(只查找id字段) */ SELECT DISTINCT MIN(`id`) AS `id` FROM `t1` GROUP BY `name`,`add` HAVING COUNT(1) > 1; +------+ | id | +------+ | 1 | | 12 | | 19 | | 21 | | 6 | | 9 | +------+ rows in set (0.00 sec) |
查找所有重复数据
代码如下 | 复制代码 |
/* 查找所有重复数据 */ SELECT `t1`.* FROM `t1`,( SELECT `name`,`add` FROM `t1` GROUP BY `name`,`add` HAVING COUNT(1) > 1 ) AS `t2` WHERE `t1`.`name` = `t2`.`name` AND `t1`.`add` = `t2`.`add`; +----+------+-----+ | id | name | add | +----+------+-----+ | 1 | abc | 123 | | 2 | abc | 123 | | 4 | abc | 123 | | 6 | xzy | 456 | | 7 | xzy | 456 | | 8 | xzy | 456 | | 9 | xzy | 789 | | 11 | xzy | 789 | | 12 | ijk | 147 | | 13 | ijk | 147 | | 19 | tpk | 963 | | 20 | tpk | 963 | | 21 | wer | 546 | | 22 | wer | 546 | +----+------+-----+ rows in set (0.00 sec) |
查找除id最小的数据外的重复数据
代码如下 | 复制代码 |
/* 查找除id最小的数据外的重复数据 */ SELECT `t1`.* FROM `t1`,( SELECT DISTINCT MIN(`id`) AS `id`,`name`,`add` FROM `t1` GROUP BY `name`,`add` HAVING COUNT(1) > 1 ) AS `t2` WHERE `t1`.`name` = `t2`.`name` AND `t1`.`add` = `t2`.`add` AND `t1`.`id` <> `t2`.`id`; +----+------+-----+ | id | name | add | +----+------+-----+ | 2 | abc | 123 | | 4 | abc | 123 | | 7 | xzy | 456 | | 8 | xzy | 456 | | 11 | xzy | 789 | | 13 | ijk | 147 | | 20 | tpk | 963 | | 22 | wer | 546 | +----+------+-----+ rows in set (0.00 sec) |
例2,表中没有主键(可唯一标识的字段),或者主键并非数字类型(也可以删除重复数据,但效率上肯定比较慢)
例2测试数据
代码如下 | 复制代码 |
/* 表结构 */ DROP TABLE IF EXISTS `noid`; CREATE TABLE IF NOT EXISTS `noid`( `pk` VARCHAR(20) NOT NULL COMMENT '字符串主键', `name` VARCHAR(20) NOT NULL, `add` VARCHAR(20) NOT NULL, PRIMARY KEY(`pk`) )Engine=InnoDB; /* 测试数据,与上例一样的测试数据,只是主键变为字符串形式 */ SELECT * FROM `noid`; |
为表添加自增长的id字段
代码如下 | 复制代码 |
/* 为表添加自增长的id字段 */ ALTER TABLE `noid` ADD `id` INT(1) NOT NULL AUTO_INCREMENT, ADD INDEX `id`(`id`); Query OK, 23 rows affected (0.16 sec) Records: 23 Duplicates: 0 Warnings: 0 SELECT * FROM `noid`; |
MySQL中必须是有索引的字段才可以使用AUTO_INCREMENT
删除重复数据与上例一样,记得删除完数据把id字段也删除了
删除重复数据,只保留一条数据
代码如下 | 复制代码 |
/* 删除重复数据,只保留一条数据 */ DELETE FROM `noid` USING `noid`,( SELECT DISTINCT MIN(`id`) AS `id`,`name`,`add` FROM `noid` GROUP BY `name`,`add` HAVING COUNT(1) > 1 ) AS `t2` WHERE `noid`.`name` = `t2`.`name` AND `noid`.`add` = `t2`.`add` AND `noid`.`id` <> `t2`.`id`; Query OK, 8 rows affected (0.05 sec) /* 删除id字段 */ SELECT * FROM `noid`; |