Skip to main content

MySQL 5.x – Finally improved client

Looking at my favorite rss feeds today, I found this post on great MySQL Performance Blog:

…if you press CTRL-C MySQL Command Line Client will not exit but will terminate query being executed.

In other words, in previous versions of MySQL client program, if you issue a query and try to interrupt it by hitting CTRL-C, CTRL-C would actually kill MySQL client itself, but query still continue running in background! In this case the only solution to really kill that query is to find it’s ID on process list (by issuing “show full processlist” query), and then to kill it with a query like “kill 12345”, where 12345 is ID of query that you want to be killed. In other words, something like this:

mysql> select * from odm_result_keywords where keyword like '%foo%joe%';
^CAborted
bash-2.05b$ mysql -A --enable-local-infile -udinke -ppass mydb
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1512 to server version: 4.1.18-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show full processlist;
+------+-------+----------------------------+-------------------+---------+------+--------------+--------------------------------------------------------------------+
| Id   | User  | Host                       | db                | Command | Time | State        | Info                                                               |
+------+-------+----------------------------+-------------------+---------+------+--------------+--------------------------------------------------------------------+
| 1486 | dinke | localhost                  | mydb | Query   |    3 | Sending data | select * from odm_result_keywords where keyword like '%foo%joe.cl' |
+------+-------+----------------------------+-------------------+---------+------+--------------+--------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> kill 1486;
mysql>

Thanks to changes in MySQL client program, all you have to do now is to hit CTRL-C, and query will be stopped immediately:

mysql> select domain from odm_result_keywords_de where whois_status is null and domain like '%.%.%';
Query aborted by Ctrl+C
ERROR 1317 (70100): Query execution was interrupted
mysql>

For more information about this feature (as well as other changes in MySQL 5.0.25) please follow this link.

MySQL – Backup of Big MyISAM Tables

If you ever dealt with backup of MySQL tables, you probably used mysqldump utility, which allows you to dump all data into some mytables.sql file, which you can backup somewhere, import to other MySQL server etc. I used this procedure for a thousands times so far, and generally it goes as follows:

export:

mysqldump -udinke -pmojpass -hhostname.of.server1 dbname table1 table2 tableN > dump_file.sql
mysql -udinke -pmojpass -hhostname.of.server2 dbname  < dump_file.sql

Sometimes there is a problem when you move data from new version of MySQL to old, and in that case you have to specify proper compability flag when running mysql dump (--compatible=name where name can be mysql323, mysql40, postgresql, oracle etc.).

Anyway, few days ago in order to move data from one MySQL to another, I had to dump some ... let say big mysql tables (about 10 tables, where each contained about 10 millions of records). After long lasted procedure (dump to file, gzip, scp to other server) I finally started import. However, after 3 hours (yes, three hours) instead of Linux prompt I got this:

[dinke@um-917 ~/public_html]$ mysql -udinke -p325ewfwt23rasf
keyword_discovery < es_miner_data.sql
ERROR 1582 (23000) at line 163833: Duplicate entry '1167548' for key
'PRIMARY'

WTF? I moved data from old server to new, which means import should run without compability problems. It could be indexes on old table were damaged, but instead to wait like 2 more hours in order to complete check & repair procedure, I've decided to create dump file, this time with ignore option, so all insert queries in dump file are "insert ignore", so in case of error like previous one, errors will be ignored. Not very smart, but those data are not really high sensitive, and I can afford to lose few records but can't afford to lose 10 hours for import!

So, dump, gzip, scp, import again ... which lasted long... loooooong .... so fucking long that after 3 hours after I started import I started to think about other solutions. And solution was dumb but effective. We simple moved all MySQL data files (*.MYI, *.MYD i *.frm) from one server to another, and then we run myisamchk in order to fix those tables because ... we didn't shutdown MySQL server during copy procedure which generally could cause some problems with data.

All in all, this procedure went very fast (the longest was actual copy from one host to another), and in less than half hour I got everything settled down.

At the end I came with conclusion that in case of really big tables, using tool like mysqldump is unaccepted as backup solution because it takes literally hours to complete. One of solution to that problem can be to copy MySQL data files like I did. That shouldn't be a problem, because tables are "platform safe" meaning, binary file created on one platform (ie *.MYI file on Linux) will work without problem when moved to other problem (ie. on Windows). The only one real problem is that if you copy data wihout MySQL shutdown, table files there are in some kind of "state of flux" which can be a problem, especialy if they are highly used in moment when you do copy. That's why you need to do myisamchk on it.

Introduction to GeoIP

In case you’ve ever used Google Analytics or any simmilar tool where you can see exact location from where visitors of your site came from, you’ve probably wondered how they were able to dig that info. Is it magic or what? Of course, it’s not kind of magic, exact location of visitor is defined by visitor’s IP address, and technology used to locate user by his IP is well known as GeoIP.

Today we are going to look how to locate visitor of your site with PHP and Max Mind’s GeoIP database. In examples bellow we used free(lite) versions of GeoIP databases, because fully supported GeoIP databases are not free(you’d have to pay $50USD setup + $12USD update for GeoIP Country and $370USD + $90USD for GeoIP City base). Drawback of lite version is that it is not as accurate as fully supported GeoIP databases, but it is still very usefull and probably good enough for great majority of live projects.

MaxMind offer API for dozen of programming languages (full list is available here), details about PHP API are available here. This tutorial deal with so called “Pure PHP API”, there are also PECL extensions and apache mod_geoip modul available. Apache modul provide better perfomance, but Pure PHP API is easier to set up.

Just for a start let’s download all PHP API files from http://www.maxmind.com/download/geoip/api/php/, and save them somewhere inside of your Web tree(let say /htdocs/geoip). To use GeoIP Country you need to download lite database from here, and for GeoLiteCity download database from here. Just for the sake of simplicity, we are going to unpack both bases to the same dir where we saved our PHP API’s files (/htods/geoip in our example).

GeoIP Country
——————————–
Now, let’s see how country detection works:

<?php
/**
 * GeoIP Country Database Example
 *
 * @version $Id$
 * @package geoip
 * @copyright © 2006 Lampix.net
 * @author Dragan Dinic <dinke@lampix.net>
 */

require_once("geoip.inc");

$gi = geoip_open("GeoIP.dat", GEOIP_STANDARD);

$ip = $_SERVER['REMOTE_ADDR'];
//if you test on localhost use IP bellow for test
//since $_SERVER['REMOTE_ADDR'] would be 127.0.0.1
//$ip = "89.216.226.174";

$country_name = geoip_country_name_by_addr($gi, $ip);
$country_code = geoip_country_code_by_addr($gi, $ip);
if($country_name)
{
	echo "Your country is: $country_name <br />";
	echo "Country Code is: $country_code <br />";
}
else
{
	echo "Sorry, we weren't able to locate you.";
}

geoip_close($gi);
?>

So, at the beggining we’ve included geoip.inc which contains all functions needed to use GeoIP country database, then we’ve created new instance of GeoIP class with geoip_open function, and at the end we called proper functions(geoip_country_name_by_addr and geoip_country_code_by_addr) to get country name/code in which detected IP address reside. Again, in case you test localy, don’t use $_SERVER[‘REMOTE_ADDR’].

When you run script above you should get something like this as output:

Your country is: Serbia and Montenegro
Country Code is: CS

GeoIP City
—————————-
Now, let’s extend visitor’s country data with exact location(like city, postal code etc.)

<?php
/**
 * GeoIP City Database Example
 *
 * @version $Id$
 * @package geoip
 * @copyright © 2006 Lampix.net
 * @author Dragan Dinic <dinke@lampix.net>
 */

require_once("geoipcity.inc");

$gi = geoip_open("GeoLiteCity.dat", GEOIP_STANDARD);

$ip = $_SERVER['REMOTE_ADDR'];
//if you test on localhost use IP bellow for test
//since $_SERVER['REMOTE_ADDR'] would be 127.0.0.1
//$ip = "89.216.226.174";

$record = geoip_record_by_addr($gi, $ip);

if(!$record)
{
	echo "Sorry, we weren't able to locate you.";
}
else
{
	echo "Country: " .$record->country_name . "<br />";
	echo "Country Code: " . $record->country_code . "<br />";
	echo "Country Code 2: " . $record->country_code3 . "<br />";
	echo "Region: " .$record->region . "<br />";
	echo "City: " .$record->city . "<br />";
	echo "Postal Code: " .$record->postal_code . "<br />";
	echo "Latitude: " .$record->latitude . "<br />";
	echo "Longitude: " .$record->longitude . "<br />";
	echo "DMA Code: " .$record->dma_code . "<br />";
	echo "Area Code: " .$record->area_code . "<br />";
}

geoip_close($gi);
?>

As you see, PHP code is simmilar as in our country detection example, with exception that we used geoipcity.inc and GeoLiteCity.dat database. Function geoip_record_by_addr($gi, $ip) return instance of ‘geoiprecord’ class which contains in it’s properties location’s data we used in our example. After you run script you should get output like this one:

Country: Serbia and Montenegro
Country Code: CS
Country Code 2: SCG
Region: 02
City: Beograd
Postal Code: 11000
Latitude: 44.8186
Longitude: 20.4681
DMA Code:
Area Code:

CaseStudy – Redirection depending of Country
————————————————————–
At the end, we are going to see some real GeoIP usage. Our goal is to redirect users on multy language site(blog) to proper language section on the site depending of their location. Here is how code looks like on my own blog:

<?php
/**
 * Case Study - GeoIP Redirection
 *
 * @version $Id$
 * @package geoip
 * @copyright © 2006 Lampix.net
 * @author Dragan Dinic <dinke@lampix.net>
 */

require_once("geoip/geoip.inc");

$gi = geoip_open("geoip/GeoIP.dat",GEOIP_STANDARD);

$country_code = geoip_country_code_by_addr($gi, $_SERVER['REMOTE_ADDR']);

geoip_close($gi);

if($country_code == 'CS')
{
        header("HTTP/1.1 301 Moved Permanently");
        header('Location: http://www.dinke.net/blog/sr/');
}
else
{
        header("HTTP/1.1 301 Moved Permanently");
        header('Location: http://www.dinke.net/blog/en/');
}
?>

Above example is used on this blog in order to redirect all users located out of Serbia to english version of the blog. Sending custom 301 redirection headers is important so bots(like google etc. google) are able to index blog pages without problems.

PHP 5.2 upload progress meter

Yesterday I’ve spent considerable amount of time in order to find out more about the most interesting new PHP 5.2 feature – hook for upload progress meter. Except for this link I haven’t found anything else, no php source code example how to make one.

However, after I looked at internal php mailing list archive, I’ve found this thread, still no php data found, but Rasmus mentioned link with an example at http://progphp.com/progress.php. I immediately tried upload and it looked very cool. Then I looked at html source code and noticed one unusual thing there: APC_UPLOAD_PROGRESS hidden field inside of html source. I knew it must be important so I’ve googled for it, and insteresting enough first result was source code of Rasmus example above 🙂

So, I took complete source (I figured out later it is upload meter example made by Rasmus Lerdorf) and quickly tried to make it working under my fresh new installed PHP 5.2.0. Unfortunately, it didn’t work since it needed apc stuff installed. After I looked at apc documentation, I found that I need to grab it from list of pecl dll’s for windows php 5.2 version. Unfortunately, after I’ve downloaded it from here I’ve noticed apc dll is missing there ?!!

Again, I had to google for php_apc.dll and after a while found needed dll available at http://pecl4win.php.net/ext.php/php_apc.dll. In order to make it working, you have to save dll file under php/ext dir (i.e. c:\php\ext on windows) and put this to php.ini:

extension=php_apc.dll

Unfortunately, it still didn’t work, so I’ve looked at apc docs further. Finally on this page I’ve found apc have new “special feature” which is directly related to our new upload feature.

    apc.rfc1867             RFC1867 File Upload Progress hook handler is only available
                            if you compiled APC against PHP 5.2.0 or later.  When enabled
                            any file uploads which includes a field called
                            APC_UPLOAD_PROGRESS before the file field in an upload form
                            will cause APC to automatically create an upload_
                            user cache entry where  is the value of the
                            APC_UPLOAD_PROGRESS form entry.
                            (Default: 0)

After I figured out on my phpinfo page apc.rfc1867 setting is turned off, I’ve added

apc.rfc1867 = on 

in php.ini, and after restart was finally able to enjoy new fancy upload progress meter 🙂

upload progress meter

Btw, upload also depend of json turned on as well, but it was already turned on so I didn’t have any more problems.

About the code Rasmus used in his example, I am tired to analyze it more now, but obviously it use Yahoo! User Interface Library to create progress bar and json/apc to control it from php during file upload.

I hope this will be helpful for someone. Enjoy 😉

Why 09/11 Happened to USA

Few days ago I watched on history channel an documentary about Berlin Blockade in 1948. On Jun 24, 1948 Soviets started total blockade of Western part of Berlin, with goal to spread their control over whole city. All communitations were blocked, and citizens of West Berlin were faced with lack of food, medicine and fuel for upcoming winter.

American response was quick, and starting from Jun 25 they launched a massive airlift using both civil and military aircrafts that flew supplies into the Western Berlin. Aircrafts were fly by volunteers, mostly by experienced WWII pilots who flew day and night, sometimes without sleep. Landing on improvised Berlin airport was very hard (especially during bad weather) – many brave pilots died during this mission.

One of the pilots involved in this mission was Gail Halvorsen. One day during pause between two flights he gave a few sticks of chewing gum to some children watching the planes from outside the base. Wanting to give them more, he promised to drop more candy from his plane the next day. And really, during next flight, soon before landing Halvorsen dropped candy attached to parachutes to children below.

His actions were soon noticed by the press and gained widespread attention. A wave of public support led to donations which enabled Halvorsen and his crew to drop 850 pounds of candy. By the end of the airlift, around 25 plane crews had dropped 23 tons of chocolate, chewing gum, and other candies over various places in Berlin. The action may have had a substantial impact on the postwar perception of Americans in Germany, and it is still pointed to as a symbol of German-American relations.

Yesterday was 09/11, 5 years after terrorist attack on United States. I see many people are looking for cause, why that happend to USA, why USA was picked as a target and why USA is no longer an icon of freedom in modern world.

Not sure about all causes, but I am sure that one of the reasons is that US advocated their freedom in Vietnam, Cambodia, Iraq, Serbia etc. with bombs and not with candies.

Curl HTTP Client

Frequently, in my daytime job I have to fetch data from various url’s, either by sending get or post request, binding to different IP address etc. Long time I used my own socket based HTTP class, although I wasn’t quite happy with perfomances and various other things with it. I already used curl cli tool (mostly for debugging purposes), but didn’t really liked it’s php api, so I’ve decided to take some spare time and make some kind of oop wrapper for it, which should be easier to use for easy stuff like sending get/post request etc.

I use this class several months since then, and it evolved over time whenever I needed some new feature. Since various of my colleagues found it very usefull (some of them even sent me new methods for it), I’ve decided to put it out for public. Recently I submitted a code to phpclasses.org, and today got confirmation that class is officially approved.

Update 03/01/2007
New version 1.1 released with new features: fetch into file, upload, proxy etc.

Update 15/02/2008
New version 1.2 released with few bug fixes. New features are ability to send post string as string argument in send_post_data method, ability to accept gzipped content, close curl session etc.

Update 25/03/2013
Version 2.0 released. Pretty much the same features as old versions, but with updated code in order to match latest PHP changes. Code will be regularly updated on github.

You can download class and example files directly from gitgub https://github.com/dinke/curl_http_client

Here are few usage examples.

<?php
/**
 * @version 2.0
 * @package dinke.net
 * @copyright © 2013 Lampix.net
 * @author Dragan Dinic 
 */

require_once("curl_http_client.php");

$curl = new Curl_HTTP_Client();

//pretend to be Firefox 19.0 on Mac
$useragent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:19.0) Gecko/20100101 Firefox/19.0";
$curl->set_user_agent($useragent);

//uncomment next two lines if you want to automatically manage cookies
//which will store them to file and send them on each http request
//$cookies_file = "/tmp/cookies.txt";
//$curl->store_cookies($cookies_file);

//Uncomment next line if you want to set credentials for basic http_auth
//$curl->set_credentials($username, $password);

//Uncomment next line if you want to set specific referrer
//$curl->set_referer('http://www.foo.com/referer_url/');

//if you want to send some post data
//form post data array like this one
$post_data = array('login' => 'pera', 'password' => 'joe', 'other_foo_field' => 'foo_value');
//or like a string: $post_data = 'login=pera&password=joe&other_foo_field=foo_value';

//and send request to http://www.foo.com/login.php. Result page is stored in $html_data string
$html_data = $curl->send_post_data("http://www.foo.com/login.php", $post_data);

//You can also fetch data from somewhere using get method!
//Fetch html from url
$html_data = $curl->fetch_url("http://www.foo.com/foobar.php?login=pera&password=joe&other_foo_field=foo_value");

//if you have more than one IP on your server,
//you can also bind to specific IP address like ...
//$bind_ip = "192.168.0.1";
//$curl->fetch_url("http://www.foo.com/login.php", $bind_ip);
//$html_data = $curl->send_post_data("http://www.foo.com/login.php", $post_data, $bind_ip);

//and there are many other things you can do like

//use proxy
//$curl->set_proxy('http://www.proxyurl.com');

//get http response code for last request
//$http_code = $curl->get_http_response_code();

//get last http request duration in sec
//$duration = $curl->get_request_duration();
?>

MySQL Full-Text Searches

Today I am presenting you with a third and the last in tutorial series about “rarely used MySQL features”.

In first and second tutorials we spoke about transactions and referential integrity – features that are available in InnoDB and BDB tables only. Today we are going to deal with “Full-Text Search”, which is unlike features above available in MyISAM tables only (which is MySQL default engine).

So, let say you’ve just created one database driven site (forum, blog, cms, whatever) and you need to make it searchable. That shouldn’t be too complicated, so withot thinking much, you would probably come to easy solution like:

select * from foo_table where foo_text_field like '%search_text%'

Maybe include more fields in search, let say some dynamically generated from php form, and that should be it, right ? Hmmm not exactly.

Method we used above have one huge disadvantage – it is not able to use indexes! Because the fact that search pattern starts with joker ‘%’, query can’t use indexes even though you’ve defined them in mysql table fields used in search. No indexes means slow lookup (becasue search query have to pass trough all records in your tables). Fortunately, there is solution to this problem and it’s name is Full-Text Search.

Full-Text Search allow you efficient search trough text fields (CHAR, VARCHAR, TEXT), by using so called Full-Text Indexes.

There are three types of full-text searches:
Natural language search
Search pattern is splitted into words, so as a result you got only those records containings those words.

Bolean mode seach
Search pattern is splitted into words, but for each word you can add logic operator which allow you to dig only those records which contain or not contain some word.

Expansion Search
This is search in two phases. First one is identical to natural language search. In second phase words from the most relevant rows returned by the search are added to the search string and the search is done again.

During search, those words that are present in more than 50% of records are ignored. Also ignored are those so called “stopwords” (like “the”, “and”, etc.), as well as words shorter than 4 chars. You can find a list with all stopwords in corresponding section of MySQL Manual, and bear in mind that you can change those by changing path to file with stopwords. You can find a way how to do that here.

Now, lets see few examples. In order to easily illustrate full text usage, we are going to create database with “famous wrong predictions“.

So, lets copy/paste text bellow and save it to your computer in a file wrong_prediction.txt.

Albert Einstein, 1974|There is not the slightest indication that nuclear energy will ever be obtainable. It would mean that the atom would have to be shattered at will
Margaret Thatcher, 1974|It will be years - not in my time - before a woman will become a Prime Minister.
Alexander Graham Bell, c.1880.|One day there will be a telephone in every major city in the USA.
Popular Mechanics, 1949|Computers in the future may weigh no more than 1.5 tons.
Thomas Watson, IBM Computers,1943|I think there is a world market for as many as 5 computers.
Marshal Foch, France, 1912|Aircraft are interesting toys, but of no military value.

Now, lets create proper table, import predictions, and add indexes. Start MySQL command line client (you have to star it from the same dir where you saved your wrong predictions file), and type (or copy/paste) this:

create table wrong_predictions(
who varchar(100) not null,
prediction text not null
) engine = MyISAM;

load data local infile 'wrong_predictions.txt' into table wrong_predictions
fields terminated by '|' lines terminated by 'rn';

ALTER TABLE wrong_predictions
ADD FULLTEXT (who),
ADD FULLTEXT (prediction),
ADD FULLTEXT (who, prediction);

As you can see, first we’ve imported data into table, then we’ve created indexes. The reason why we didn’t create indexes during table creation is perfomance – insert is always slower if table have indexes created. Of course, you are not going to see any difference here, but pay attention to that when you import some million record file 🙂

Now, let’s see how Natural Search works:

mysql> SELECT * FROM wrong_predictions WHERE MATCH(who) AGAINST('Thatcher');
+-------------------------+---------------------------------------------------------------------------------+
| who                     | prediction                                                                      |
+-------------------------+---------------------------------------------------------------------------------+
| Margaret Thatcher, 1974 | It will be years - not in my time - before a womanill become a Prime Minister.  |
+-------------------------+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql>  SELECT * FROM wrong_predictions WHERE MATCH(who,prediction) AGAINST('computers');
+-------------------------+-------------------------------------------------------------+
| who                     | prediction                                                  |
+-------------------------+-------------------------------------------------------------+
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
| Popular Mechanics, 1949 | Computers in the future may weigh no more than 1.5 tons.    |
+-------------------------+-------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql>

When using MATCH in where clause, order of columns in output is defined with relevance of matched records. We can see that if we issue this query:

mysql> SELECT prediction, MATCH(prediction) AGAINST('computers') AS relevance FROM wrong_predictions;
+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------+
| prediction                                                                                                                                        | relevance        |
+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------+
| There is not the slightest indication that nuclear energy will ever be obtainable. It would mean that the atom would have to be shattered at will | 0                |
| It will be years - not in my time - before a woman will become a Prime Minister.                                                                  | 0                |
| One day there will be a telephone in every major city in the USA.                                                                                 | 0                |
| Computers in the future may weigh no more than 1.5 tons.                                                                                          | 0.66266459031789 |
| I think there is a world market for as many as 5 computers.                                                                                       | 0.67003110026735 |
| Aircraft are interesting toys, but of no military value.                                                                                          | 0                |
+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------+
6 rows in set (0.00 sec)

Boolean Mode allow us to make seach pattern in a simmilar way as in some search engines on Internet, since you can add words that must be presented (+foo) as well as words that must not be there (-foo).

mysql> SELECT * FROM wrong_predictions WHERE MATCH(who,prediction)
-> AGAINST('+computers -popular' IN BOOLEAN MODE);
+-------------------------+-------------------------------------------------------------+
| who                     | prediction                                                  |
+-------------------------+-------------------------------------------------------------+
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
+-------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

You can also use jocker like:
pre>mysql> SELECT * FROM wrong_predictions WHERE MATCH(who,prediction)
-> AGAINST(‘comp*’ IN BOOLEAN MODE);
+————————-+————————————————————-+
| who | prediction |
+————————-+————————————————————-+
| Popular Mechanics, 1949 | Computers in the future may weigh no more than 1.5 tons. |
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
+————————-+————————————————————-+
2 rows in set (0.00 sec)

Full description of boolean mode is available in coresponding section of MySQL Manual.

And last but not least, let’s see how Expansion Search works:

mysql> SELECT * FROM wrong_predictions WHERE MATCH(who,prediction)
-> AGAINST('market');
+-------------------------+-------------------------------------------------------------+
| who                     | prediction                                                  |
+-------------------------+-------------------------------------------------------------+
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
+-------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM wrong_predictions WHERE MATCH(who,prediction)
-> AGAINST('market' WITH QUERY EXPANSION);
+-------------------------+-------------------------------------------------------------+
| who                     | prediction                                                  |
+-------------------------+-------------------------------------------------------------+
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
| Popular Mechanics, 1949 | Computers in the future may weigh no more than 1.5 tons.    |
+-------------------------+-------------------------------------------------------------+
2 rows in set (0.00 sec)

In first query we’ve looked for word ‘market’ with natural seach, not really needed for query expansion, but usefull so we can track how exactly QUERY EXPANSION works. In second query we used QUERY EXPANSION which works in two phases. First it run natural search (like we did in first query), and then in 2nd phase it form new search phrase with original search phrase (market) concatenated with most highly relevant records we got from first phase. As a result we get more records than with natural search that are relevant with original search. More info about expansion search you find in MySQL Manual.

This is all for this introductory tutorial. Examples I used here serve only to “get your feet wet”, for full explanation of MySQL Full Text indexes you might want to consult MySQL Manual, or some book on subject (I recommend MySQL by Paul Dubois).

MySQL::Foreign Keys and Referential Integrity

Today we are going to continue with exploring those “rarely used” MySQL features. In previous post we were talking about MySQL Transactions and how to use them. Today our focus is on Foreign Keys and Referential Integrity.

Like with transactions, referential integrity support is not available in MyISAM tables, so InnoDB tables has to be used. Many developers probably don’t know that this feature is available since version 3.23.44, which is available since 2001. Anyway, let’s go with a (very) little theory, then we are going to move to some practice.

A foreign key (FK) is a field or group of fields in a database record that point to a key field or group of fields forming a key of another database record in some (usually different) table. The rule of “Referential Integrity” says that foreign key must not have a value which doesn’t exist in table where it refers to. That’s all about theory from me (I told you it will be very short :). If you need more theory feel free to use google for more informations. Now lets see some examples.

Imagine a database with cell phones, which among other things contains tables with phone vendors and phones. Let say something like this:

Vendors
---------------------
ID     Vendor
1.     Nokia
2.     Sony-Ericson
3.     Samsung
4.     Siemens

Phones
-------------------------------
ID.     Model     VID
1.      6600       1
2.      6630       1
3.      3650       1
4.      p800       2
5.      p900       2
...

VID column in ‘phones’ table is in direct relation with ID column of table ‘vendors’, in other words in this particular case VID column is so called foreign key. VID values in our example are related with smarth phones of vendors Nokia (6600, 6630 i 3650) and Sony-Ericson (p800, p900).
In order to keep so called referential integrity of our tables, we need to make sure that values of VID columns of table phones contains only those values that exists in ID column of vendors table. In case we remove some particular vendor from first table, all records referencing to it from phones table must be removed as well (this is known as cascade delete). Simmilar stands for update and insert operations, which means if we update ID in first table, foreign key VID must be updated too. Also, during inserting records into 2nd (phones) table we need to take care that VID value have proper reference in ID column of vendors table. If your RDBMS doesn’t have support for Referential Integrity you have to take care about all these things. Fortunately, by using MySQL InnoDB tables, we can let MySQL do da job for us.

Foreign Key is set during initial table creation. Syntax looks like this:

[CONSTRAINT symbol]
FOREIGN KEY [id] (index_col_name, ...)
REFERENCES tbl_name (index_col_name, ...)
[ON DELETE {RESTRICT | CASCADE | SET NULL | NO ACTION}]
[ON UPDATE {RESTRICT | CASCADE | SET NULL | NO ACTION}]

CONSTRAINT symbol is optional constraint name for this foreign key.
FOREIGN KEY stands for table column which represent foreign key
REFERENCES tbl_name (index_col_name, …) stands for table and exact column where foreign key is referencing
ON DELETE and ON UPDATE defines actions that will be taken in case of delete/update in main table. Possible values are:
RESTRICT deleting records in main table will be refused if there are foreign keys in 2nd table referencing to them
CASCADEdeleting records in main table will result in deleting related records in 2nd table (table with foreign key)
SET NULL simmilar to CASCADE except that records in 2nd table will not be removed, they will be set to NULL instead

In order to demonstrate concept, I am going to create database with cell phones I mentioned above.

create table vendors(
	id int unsigned primary key not null auto_increment,
	vendor varchar(255) not null unique
	) engine = InnoDB;


create table phones(
	id int unsigned primary key not null auto_increment,
	model varchar(255) not null,
	vid int unsigned not null,
	foreign key(vid) references vendors(id)
		on delete cascade
		on update cascade
	) engine = InnoDB;

insert into vendors(vendor) values ('Nokia'),('Sony-Ericson'),('Samsung'),('Siemens');
insert into phones (model,vid) values ('6600',1),('6630',1),('3650',1),('p800',2),('p900',2);

In this example we used option cascade, which means that deleting any record in vendors table will result in removing all records from phones table that refer to them. Now, let try to do some “cheating”, by trying to insert phone with unexistant vendor ID:

mysql> select * from vendors;
+----+--------------+
| id | vendor  |
+----+--------------+
|  1 | Nokia        |
|  3 | Samsung      |
|  4 | Siemens      |
|  2 | Sony-Ericson |
+----+--------------+
4 rows in set (0.00 sec)

mysql> select * from phones;
+----+-------+-----+
| id | model | vid |
+----+-------+-----+
|  1 | 6600  |   1 |
|  2 | 6630  |   1 |
|  3 | 3650  |   1 |
|  4 | p800  |   2 |
|  5 | p900  |   2 |
+----+-------+-----+
5 rows in set (0.02 sec)

mysql> insert into phones(model,vid) values ('unexistant',5);
ERROR 1216 (23000): Cannot add or update a child row: a foreign key constraint fails
mysql>

As you see, it didn’t work. We’ve tried to insert new phone model into phones table, but with vendor id (vid) 5 which doesn’t exist in vendor table, so MySQL complained in order to keep referential integrity of database.

Now, let see how cascade delete/update works:

mysql> update vendors set id=5 where vendor='sony-ericson';
Query OK, 1 row affected (0.05 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from vendors;
+----+--------------+
| id | vendor  |
+----+--------------+
|  1 | Nokia        |
|  3 | Samsung      |
|  4 | Siemens      |
|  5 | Sony-Ericson |
+----+--------------+
4 rows in set (0.00 sec)

mysql> select * from phones;
+----+-------+-----+
| id | model | vid |
+----+-------+-----+
|  1 | 6600  |   1 |
|  2 | 6630  |   1 |
|  3 | 3650  |   1 |
|  4 | p800  |   5 |
|  5 | p900  |   5 |
+----+-------+-----+
5 rows in set (0.00 sec)
mysql>

So, we’ve updated vendor id (vid in vendors table) for ‘sony-ericsson’ to value 5. Thanks to on update cascade options we used during table creation, value of changed field is also updated for all ‘sony-ericsson’ phones. Simmilar would happen if we try to delete some vendor, let’s try Nokia:

mysql> select * from vendors;
+----+--------------+
| id | vendor  |
+----+--------------+
|  1 | Nokia        |
|  3 | Samsung      |
|  4 | Siemens      |
|  5 | Sony-Ericson |
+----+--------------+
4 rows in set (0.00 sec)

mysql> select * from phones;
+----+-------+-----+
| id | model | vid |
+----+-------+-----+
|  1 | 6600  |   1 |
|  2 | 6630  |   1 |
|  3 | 3650  |   1 |
|  4 | p800  |   5 |
|  5 | p900  |   5 |
+----+-------+-----+
5 rows in set (0.00 sec)

mysql> delete from vendors where vendor='Nokia';
Query OK, 1 row affected (0.03 sec)

mysql> select * from vendors;
+----+--------------+
| id | vendor  |
+----+--------------+
|  3 | Samsung      |
|  4 | Siemens      |
|  5 | Sony-Ericson |
+----+--------------+
3 rows in set (0.00 sec)

mysql> select * from phones;
+----+-------+-----+
| id | model | vid |
+----+-------+-----+
|  4 | p800  |   5 |
|  5 | p900  |   5 |
+----+-------+-----+
2 rows in set (0.00 sec)
mysql>

So, as you see all precious Nokia phones are deleted from phones table, since we’ve removed Nokia from vendors table.

This is all for today. I think this tutorial should be enough for introduction, for more information you should consult proper section of MySQL manual.