MySQL: Deleting with Left Join

Today I had to deal with one huge table and cleanup all data where foreign key doesn’t have it’s primary key match in original table, just to remind myself how sub-queries in MySQL are terrible slower than joins.

I have some script which generates domains from typos, so I have one table with original domains (master_domains) and other one (result_domains) with generated typo domains. Basically something like this:

mysql> describe master_domains;
+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| id     | int(10) unsigned | NO   | PRI | NULL    | auto_increment | 
| domain | varchar(255)     | NO   | UNI | NULL    |                | 
+--------+------------------+------+-----+---------+----------------+
2 rows in set (0.07 sec)

mysql> describe result_domains;
+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| id        | int(10) unsigned | NO   | PRI | NULL    | auto_increment | 
| domain    | varchar(255)     | NO   | UNI | NULL    |                | 
| master_id | int(10) unsigned | YES  | MUL | NULL    |                | 
+-----------+------------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)

Table result_domains has master_id which is foreign key reference to primary key (id) in master_domains table. Since I also have other scripts generating domains without typos (which store result_domains.master_id field as NULL), today I simple wanted to get rid of those masters without proper master_id reference in result table or in other words those master domains where result_domains.master_id is NOT NULL.

With sub-queries you could write query easily with something like this:

delete from master_domains where id not in 
(select master_id from result_domains_frontend)

It is good habit to always run select query before deleting big number of rows (just to make sure your query is written correctly) so I tried select query first:

select * from master_domains where id not in 
(select master_id from result_domains_frontend) limit 10

However, it took several minutes to run without any output so eventually I’ve decided to stop it. I know that sub-queries are much slower than joins, so decided to do try removal operation with left join.

Left joins are actually perfect weapon to find rows that exist in one (left) and doesn’t exist in other (right) table. They also have one big advantage over sub-queries – they are performing much faster, plus they are backward compatible with old prehistoric MySQL 5.x versions. However delete syntax is little bit tricky so after few trial and errors eventually I came out with this query:

delete master_domains.* from master_domains 
left join result_domains_frontend 
on master_domains.id=result_domains_frontend.master_id 
where result_domains_frontend.master_id is null ;

And voila after a while it came up with result:

mysql> delete master_domains.* from master_domains 
left join result_domains_frontend 
on master_domains.id=result_domains_frontend.master_id 
where result_domains_frontend.master_id is null ;
Query OK, 270558 rows affected (46.58 sec)
mysql> 

Comments

  1. December 15th, 2009 | 3:47 am

    Zašto nemaš indeks ključ na results_domain.master_id?

  2. December 15th, 2009 | 4:03 am

    A i UNIQUE bi trebalo da bude par (master_id, domain) jer ista greška može da se pogodi za dva različita glavna domena.

  3. December 15th, 2009 | 9:54 am

    Original tables are MyISAM tables with plenty more fields than I put here and of course that I have index on master_id field, along with primary on id and unique on domain :) Anyway, let’s add it here so it don’t cofuse others :)

    Btw, I can’t have unique on master_id, domain pair since master_id can also have null values, because there are many domains generated without typo (too complicate to explain) :)

  4. December 18th, 2009 | 11:30 am

    Using aliases can make your life easier…

    delete md from master_domains AS md
    left join result_domains_frontend AS rdf
    on md.id=rdf.master_id
    where rdf.master_id is null ;

  5. December 18th, 2009 | 1:34 pm

    Thanks Shantanu, I didn’t use them here because I wanted to make examples easier to read :)

  6. January 2nd, 2010 | 7:09 am

    MySQL is often used in free software projects that require a full-featured database management system, including projects such as WordPress, phpBB and other software built on the LAMP software stack. It is also used in many high-profile, me so excited to visit the site aits give me good information about the MySQL: Deleting with Left Join .its nice sharing with us……..

Leave a reply