Skip to main content

MySQL Full-Text Searches

Today I am presenting you with a third and the last in tutorial series about “rarely used MySQL features”.

In first and second tutorials we spoke about transactions and referential integrity – features that are available in InnoDB and BDB tables only. Today we are going to deal with “Full-Text Search”, which is unlike features above available in MyISAM tables only (which is MySQL default engine).

So, let say you’ve just created one database driven site (forum, blog, cms, whatever) and you need to make it searchable. That shouldn’t be too complicated, so withot thinking much, you would probably come to easy solution like:

select * from foo_table where foo_text_field like '%search_text%'

Maybe include more fields in search, let say some dynamically generated from php form, and that should be it, right ? Hmmm not exactly.

Method we used above have one huge disadvantage – it is not able to use indexes! Because the fact that search pattern starts with joker ‘%’, query can’t use indexes even though you’ve defined them in mysql table fields used in search. No indexes means slow lookup (becasue search query have to pass trough all records in your tables). Fortunately, there is solution to this problem and it’s name is Full-Text Search.

Full-Text Search allow you efficient search trough text fields (CHAR, VARCHAR, TEXT), by using so called Full-Text Indexes.

There are three types of full-text searches:
Natural language search
Search pattern is splitted into words, so as a result you got only those records containings those words.

Bolean mode seach
Search pattern is splitted into words, but for each word you can add logic operator which allow you to dig only those records which contain or not contain some word.

Expansion Search
This is search in two phases. First one is identical to natural language search. In second phase words from the most relevant rows returned by the search are added to the search string and the search is done again.

During search, those words that are present in more than 50% of records are ignored. Also ignored are those so called “stopwords” (like “the”, “and”, etc.), as well as words shorter than 4 chars. You can find a list with all stopwords in corresponding section of MySQL Manual, and bear in mind that you can change those by changing path to file with stopwords. You can find a way how to do that here.

Now, lets see few examples. In order to easily illustrate full text usage, we are going to create database with “famous wrong predictions“.

So, lets copy/paste text bellow and save it to your computer in a file wrong_prediction.txt.

Albert Einstein, 1974|There is not the slightest indication that nuclear energy will ever be obtainable. It would mean that the atom would have to be shattered at will
Margaret Thatcher, 1974|It will be years - not in my time - before a woman will become a Prime Minister.
Alexander Graham Bell, c.1880.|One day there will be a telephone in every major city in the USA.
Popular Mechanics, 1949|Computers in the future may weigh no more than 1.5 tons.
Thomas Watson, IBM Computers,1943|I think there is a world market for as many as 5 computers.
Marshal Foch, France, 1912|Aircraft are interesting toys, but of no military value.

Now, lets create proper table, import predictions, and add indexes. Start MySQL command line client (you have to star it from the same dir where you saved your wrong predictions file), and type (or copy/paste) this:

create table wrong_predictions(
who varchar(100) not null,
prediction text not null
) engine = MyISAM;

load data local infile 'wrong_predictions.txt' into table wrong_predictions
fields terminated by '|' lines terminated by 'rn';

ALTER TABLE wrong_predictions
ADD FULLTEXT (who),
ADD FULLTEXT (prediction),
ADD FULLTEXT (who, prediction);

As you can see, first we’ve imported data into table, then we’ve created indexes. The reason why we didn’t create indexes during table creation is perfomance – insert is always slower if table have indexes created. Of course, you are not going to see any difference here, but pay attention to that when you import some million record file 🙂

Now, let’s see how Natural Search works:

mysql> SELECT * FROM wrong_predictions WHERE MATCH(who) AGAINST('Thatcher');
+-------------------------+---------------------------------------------------------------------------------+
| who                     | prediction                                                                      |
+-------------------------+---------------------------------------------------------------------------------+
| Margaret Thatcher, 1974 | It will be years - not in my time - before a womanill become a Prime Minister.  |
+-------------------------+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql>  SELECT * FROM wrong_predictions WHERE MATCH(who,prediction) AGAINST('computers');
+-------------------------+-------------------------------------------------------------+
| who                     | prediction                                                  |
+-------------------------+-------------------------------------------------------------+
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
| Popular Mechanics, 1949 | Computers in the future may weigh no more than 1.5 tons.    |
+-------------------------+-------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql>

When using MATCH in where clause, order of columns in output is defined with relevance of matched records. We can see that if we issue this query:

mysql> SELECT prediction, MATCH(prediction) AGAINST('computers') AS relevance FROM wrong_predictions;
+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------+
| prediction                                                                                                                                        | relevance        |
+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------+
| There is not the slightest indication that nuclear energy will ever be obtainable. It would mean that the atom would have to be shattered at will | 0                |
| It will be years - not in my time - before a woman will become a Prime Minister.                                                                  | 0                |
| One day there will be a telephone in every major city in the USA.                                                                                 | 0                |
| Computers in the future may weigh no more than 1.5 tons.                                                                                          | 0.66266459031789 |
| I think there is a world market for as many as 5 computers.                                                                                       | 0.67003110026735 |
| Aircraft are interesting toys, but of no military value.                                                                                          | 0                |
+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------+
6 rows in set (0.00 sec)

Boolean Mode allow us to make seach pattern in a simmilar way as in some search engines on Internet, since you can add words that must be presented (+foo) as well as words that must not be there (-foo).

mysql> SELECT * FROM wrong_predictions WHERE MATCH(who,prediction)
-> AGAINST('+computers -popular' IN BOOLEAN MODE);
+-------------------------+-------------------------------------------------------------+
| who                     | prediction                                                  |
+-------------------------+-------------------------------------------------------------+
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
+-------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

You can also use jocker like:
pre>mysql> SELECT * FROM wrong_predictions WHERE MATCH(who,prediction)
-> AGAINST(‘comp*’ IN BOOLEAN MODE);
+————————-+————————————————————-+
| who | prediction |
+————————-+————————————————————-+
| Popular Mechanics, 1949 | Computers in the future may weigh no more than 1.5 tons. |
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
+————————-+————————————————————-+
2 rows in set (0.00 sec)

Full description of boolean mode is available in coresponding section of MySQL Manual.

And last but not least, let’s see how Expansion Search works:

mysql> SELECT * FROM wrong_predictions WHERE MATCH(who,prediction)
-> AGAINST('market');
+-------------------------+-------------------------------------------------------------+
| who                     | prediction                                                  |
+-------------------------+-------------------------------------------------------------+
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
+-------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM wrong_predictions WHERE MATCH(who,prediction)
-> AGAINST('market' WITH QUERY EXPANSION);
+-------------------------+-------------------------------------------------------------+
| who                     | prediction                                                  |
+-------------------------+-------------------------------------------------------------+
| Thomas Watson, IBM,1943 | I think there is a world market for as many as 5 computers. |
| Popular Mechanics, 1949 | Computers in the future may weigh no more than 1.5 tons.    |
+-------------------------+-------------------------------------------------------------+
2 rows in set (0.00 sec)

In first query we’ve looked for word ‘market’ with natural seach, not really needed for query expansion, but usefull so we can track how exactly QUERY EXPANSION works. In second query we used QUERY EXPANSION which works in two phases. First it run natural search (like we did in first query), and then in 2nd phase it form new search phrase with original search phrase (market) concatenated with most highly relevant records we got from first phase. As a result we get more records than with natural search that are relevant with original search. More info about expansion search you find in MySQL Manual.

This is all for this introductory tutorial. Examples I used here serve only to “get your feet wet”, for full explanation of MySQL Full Text indexes you might want to consult MySQL Manual, or some book on subject (I recommend MySQL by Paul Dubois).

MySQL::Foreign Keys and Referential Integrity

Today we are going to continue with exploring those “rarely used” MySQL features. In previous post we were talking about MySQL Transactions and how to use them. Today our focus is on Foreign Keys and Referential Integrity.

Like with transactions, referential integrity support is not available in MyISAM tables, so InnoDB tables has to be used. Many developers probably don’t know that this feature is available since version 3.23.44, which is available since 2001. Anyway, let’s go with a (very) little theory, then we are going to move to some practice.

A foreign key (FK) is a field or group of fields in a database record that point to a key field or group of fields forming a key of another database record in some (usually different) table. The rule of “Referential Integrity” says that foreign key must not have a value which doesn’t exist in table where it refers to. That’s all about theory from me (I told you it will be very short :). If you need more theory feel free to use google for more informations. Now lets see some examples.

Imagine a database with cell phones, which among other things contains tables with phone vendors and phones. Let say something like this:

Vendors
---------------------
ID     Vendor
1.     Nokia
2.     Sony-Ericson
3.     Samsung
4.     Siemens

Phones
-------------------------------
ID.     Model     VID
1.      6600       1
2.      6630       1
3.      3650       1
4.      p800       2
5.      p900       2
...

VID column in ‘phones’ table is in direct relation with ID column of table ‘vendors’, in other words in this particular case VID column is so called foreign key. VID values in our example are related with smarth phones of vendors Nokia (6600, 6630 i 3650) and Sony-Ericson (p800, p900).
In order to keep so called referential integrity of our tables, we need to make sure that values of VID columns of table phones contains only those values that exists in ID column of vendors table. In case we remove some particular vendor from first table, all records referencing to it from phones table must be removed as well (this is known as cascade delete). Simmilar stands for update and insert operations, which means if we update ID in first table, foreign key VID must be updated too. Also, during inserting records into 2nd (phones) table we need to take care that VID value have proper reference in ID column of vendors table. If your RDBMS doesn’t have support for Referential Integrity you have to take care about all these things. Fortunately, by using MySQL InnoDB tables, we can let MySQL do da job for us.

Foreign Key is set during initial table creation. Syntax looks like this:

[CONSTRAINT symbol]
FOREIGN KEY [id] (index_col_name, ...)
REFERENCES tbl_name (index_col_name, ...)
[ON DELETE {RESTRICT | CASCADE | SET NULL | NO ACTION}]
[ON UPDATE {RESTRICT | CASCADE | SET NULL | NO ACTION}]

CONSTRAINT symbol is optional constraint name for this foreign key.
FOREIGN KEY stands for table column which represent foreign key
REFERENCES tbl_name (index_col_name, …) stands for table and exact column where foreign key is referencing
ON DELETE and ON UPDATE defines actions that will be taken in case of delete/update in main table. Possible values are:
RESTRICT deleting records in main table will be refused if there are foreign keys in 2nd table referencing to them
CASCADEdeleting records in main table will result in deleting related records in 2nd table (table with foreign key)
SET NULL simmilar to CASCADE except that records in 2nd table will not be removed, they will be set to NULL instead

In order to demonstrate concept, I am going to create database with cell phones I mentioned above.

create table vendors(
	id int unsigned primary key not null auto_increment,
	vendor varchar(255) not null unique
	) engine = InnoDB;


create table phones(
	id int unsigned primary key not null auto_increment,
	model varchar(255) not null,
	vid int unsigned not null,
	foreign key(vid) references vendors(id)
		on delete cascade
		on update cascade
	) engine = InnoDB;

insert into vendors(vendor) values ('Nokia'),('Sony-Ericson'),('Samsung'),('Siemens');
insert into phones (model,vid) values ('6600',1),('6630',1),('3650',1),('p800',2),('p900',2);

In this example we used option cascade, which means that deleting any record in vendors table will result in removing all records from phones table that refer to them. Now, let try to do some “cheating”, by trying to insert phone with unexistant vendor ID:

mysql> select * from vendors;
+----+--------------+
| id | vendor  |
+----+--------------+
|  1 | Nokia        |
|  3 | Samsung      |
|  4 | Siemens      |
|  2 | Sony-Ericson |
+----+--------------+
4 rows in set (0.00 sec)

mysql> select * from phones;
+----+-------+-----+
| id | model | vid |
+----+-------+-----+
|  1 | 6600  |   1 |
|  2 | 6630  |   1 |
|  3 | 3650  |   1 |
|  4 | p800  |   2 |
|  5 | p900  |   2 |
+----+-------+-----+
5 rows in set (0.02 sec)

mysql> insert into phones(model,vid) values ('unexistant',5);
ERROR 1216 (23000): Cannot add or update a child row: a foreign key constraint fails
mysql>

As you see, it didn’t work. We’ve tried to insert new phone model into phones table, but with vendor id (vid) 5 which doesn’t exist in vendor table, so MySQL complained in order to keep referential integrity of database.

Now, let see how cascade delete/update works:

mysql> update vendors set id=5 where vendor='sony-ericson';
Query OK, 1 row affected (0.05 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from vendors;
+----+--------------+
| id | vendor  |
+----+--------------+
|  1 | Nokia        |
|  3 | Samsung      |
|  4 | Siemens      |
|  5 | Sony-Ericson |
+----+--------------+
4 rows in set (0.00 sec)

mysql> select * from phones;
+----+-------+-----+
| id | model | vid |
+----+-------+-----+
|  1 | 6600  |   1 |
|  2 | 6630  |   1 |
|  3 | 3650  |   1 |
|  4 | p800  |   5 |
|  5 | p900  |   5 |
+----+-------+-----+
5 rows in set (0.00 sec)
mysql>

So, we’ve updated vendor id (vid in vendors table) for ‘sony-ericsson’ to value 5. Thanks to on update cascade options we used during table creation, value of changed field is also updated for all ‘sony-ericsson’ phones. Simmilar would happen if we try to delete some vendor, let’s try Nokia:

mysql> select * from vendors;
+----+--------------+
| id | vendor  |
+----+--------------+
|  1 | Nokia        |
|  3 | Samsung      |
|  4 | Siemens      |
|  5 | Sony-Ericson |
+----+--------------+
4 rows in set (0.00 sec)

mysql> select * from phones;
+----+-------+-----+
| id | model | vid |
+----+-------+-----+
|  1 | 6600  |   1 |
|  2 | 6630  |   1 |
|  3 | 3650  |   1 |
|  4 | p800  |   5 |
|  5 | p900  |   5 |
+----+-------+-----+
5 rows in set (0.00 sec)

mysql> delete from vendors where vendor='Nokia';
Query OK, 1 row affected (0.03 sec)

mysql> select * from vendors;
+----+--------------+
| id | vendor  |
+----+--------------+
|  3 | Samsung      |
|  4 | Siemens      |
|  5 | Sony-Ericson |
+----+--------------+
3 rows in set (0.00 sec)

mysql> select * from phones;
+----+-------+-----+
| id | model | vid |
+----+-------+-----+
|  4 | p800  |   5 |
|  5 | p900  |   5 |
+----+-------+-----+
2 rows in set (0.00 sec)
mysql>

So, as you see all precious Nokia phones are deleted from phones table, since we’ve removed Nokia from vendors table.

This is all for today. I think this tutorial should be enough for introduction, for more information you should consult proper section of MySQL manual.

MySQL Transactions

As I announced in my previous post, from today I am starting with series of tutorials about those rarely used MySQL features like transactions, referential integrity, fulltext searches etc. Developers working with MySQL don’t use these features very much. Many of them don’t know that those features even exists, so very often we have a situation that on various advocacy like discussions MySQL is declared as “uncomplete” database, because it lack support for above features. However, a lot of people doesn’t know that MySQL support transactions since version 3.23.15 (which is out since May 2000). Simmilar stands with other features I mentioned above.

My goal in this and follow up tutorials is to break some prejudices about MySQL, as well to encourage developers to start using all features available in their favorite RDBMS.

Today we are starting with transactions.

In SQL parlance, transaction stands for series of sql queries that should be run as a whole. If all transaction queries are succesfully executed, transaction is succesfull, otherwise transaction has failed, and we are reverting to previous state before transaction began. Many people think that transaction is something money related, so I will use such example to describe this whole thing in easiest possible way.

Imagine a situation where John Doe needs to pay it’s client Mickey a 1000USD for some service. Here is how this situation can be described with tipical MySQL parlance:

update bank_accounts set amount = amount - 1000
where user = 'JohnDoe';

update bank_accounts set amount = amount + 1000
where user = 'Mickey';

Now, what if some of queries above has failed? We would have a situation that John left short for 1000 bucks without to paying Mickey for his services, or that Mickey get free credit from his bank in case first query is not done. By using transactions in case some error occured, system is reverting back to previous state (immediately before transaction started), so in this particular scenario, both queries or none will be executed.

RDBMS’s with transaction capability typically needs to provide 4 behaviours, which are usually refered as ACID:
Atomicity
Either all of the tasks of a transaction are performed or none of them are.
Consistency
Database is in a legal state before as well as after transaction end.
Isolation
Transaction doesn’t have any effect to other processes. That also means that effect of single queries are not visible to other clients untill transaction is finished.
Durability
Once transaction success, it persist and can’t be undone.

Now, enough theory, let’s do some real life usage. Syntax you will see here is customized to work with MySQL 4.1.x. Depending of your version of MySQL server, you might need to make certain changes. If you want to find out which version you use, run following query in your mysql client program:

select version();

Using transactions is currently possible only with InnoDB and BDB tables. If you’re unsure whether your server supports them, use show engines query which will list supported table types. If you need to enable support for InnoDB (BDB) tables, please consult MySQL Manual how to do that (or contact support if you use MySQL server on your hosting account). In our examples here we are going to use InnoDB tables, but the same principle applies to BDB tables as well.

For start, let’s create one test table:

create table test(
name char(20) not null unique)
engine = innodb;

And let’s try to use one transaction:

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test(name) values('John');
Query OK, 1 row affected (0.01 sec)

mysql> insert into test(name) values('Mickey');
Query OK, 1 row affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from test;
+--------+
| name   |
+--------+
| John   |
| Mickey |
+--------+
2 rows in set (0.02 sec)

As you can see, start transaction is used for beggining of transaction. After 2 succesfully executed queries we used commit for the end of transaction, which is done succesfully in this case. Important thing to note is that none of changes during transaction (let say inserting new name John) were not visible to the other clients until commit has executed. That’s Isolation property I mentioned above.

Now, lets look at one failed transaction:

mysql> select * from test;
+--------+
| name   |
+--------+
| John   |
| Mickey |
+--------+
2 rows in set (0.02 sec)

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test(name) values('Peter');
Query OK, 1 row affected (0.00 sec)

mysql> insert into test(name) values('John');
ERROR 1062 (23000): Duplicate entry 'John' for key 1
mysql> rollback;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from test;
+--------+
| name   |
+--------+
| John   |
| Mickey |
+--------+
2 rows in set (0.00 sec)

In this case during transaction error has occurred, because we’ve tried to enter duplicate value (‘John’) to a field defined as unique. As a result we got an error, so we issued rollback query. As you can see, after rollback table contains the same records as before transaction begun, which mean the name ‘Peter’ we entered during transaction is not left there, as transaction as a whole has failed.

Starting from MySQL 4.1.1 it is possible to set savepoint during transaction so we can revert to it with rollback command. Here is an example:

mysql> select * from test;
+--------+
| name   |
+--------+
| John   |
| Mickey |
+--------+
2 rows in set (0.00 sec)

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test(name) values('Peter');
Query OK, 1 row affected (0.00 sec)

mysql> savepoint point1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test(name) values('Bryan');
Query OK, 1 row affected (0.00 sec)

mysql> rollback to savepoint point1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test(name) values('Dragan');
Query OK, 1 row affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from test;
+--------+
| name   |
+--------+
| John   |
| Mickey |
| Peter  |
| Dragan |
+--------+
4 rows in set (0.00 sec)

As you can see, after we entered first name (Peter) we’ve saved position as point1. Other name ‘Bryan’ is not present since we reverted back to point1. Also after reverting we’ve succesfully entered one more name (Dragan) before finishing transaction with commit.

This is all about transactions in this short tutorial, I hope it contains enough informations to get your feet wet, so you can start using them with MySQL. If you are looking for more informations about transactions, you can look at MySQL Manual or try some good book on subject. My personal recommendation is MySQL 3rd Edition by Paul DuBois.