blog
Full-Text Searches in MySQL: The Good, the Bad and the Ugly
Sometimes when searching in a MySQL database you might want to run full-text search queries against character-based data. Today we’re discussing the advantages and disadvantages of such search methods.
What are Full-Text Searches in MySQL?
Full-text search is a technique that enables you to search for records that might not perfectly match the search criteria. Full-text searches in MySQL are performed when certain indexes are in use and those indexes have many unique nuances including the following:
- In order for the index to be considered a full-text index, the index must be of the FULLTEXT type.
- FULLTEXT indexes can only be used on tables running the InnoDB or MyISAM storage engines.
- FULLTEXT indexes can only be created for CHAR, VARCHAR, or TEXT columns.
- FULLTEXT indexes are only used when the MATCH() AGAINST() clause is used.
- Full-text searches have three modes: the natural language mode, the boolean mode, and the query expansion mode.
A FULLTEXT index is a special type of index that finds keywords in the text instead of comparing the values to the values in the index. Although FULLTEXT searching is different from other types of matching, do note that you can have a BTREE index and a FULLTEXT index on the same column at the same time – they will not conflict because they are suited for different purposes.
Full-Text Search Types
When running full-text searches in MySQL, keep in mind that there are three search types to choose from:
- A natural language search type – such a search mode interprets the search string as a literal phrase. Enabled by default if no modifier is specified or when the IN NATURAL LANGUAGE MODE modifier is specified;
- A query expansion search type – such a search mode performs the search twice. When searching the second time, the result set includes a few most relevant documents from the first search. Enabled using the WITH QUERY EXPANSION modifier;
- A boolean search type – such a search mode enables searching for complex queries that can include boolean operators such as less than (“<”) and more than (“>”) operators, subexpressions (“(” and “)”), the plus (+) sign, the minus (-) sign, double quotes (“”), an operator that lowers the value’s contribution to the results (~) and the wildcard operator (*) – the wildcard operator allows searching with fuzzy matching (for example, “demo*” would also match “demonstration”). Enabled using the IN BOOLEAN MODE modifier.
Full-Text Searches with the Natural Language Search Mode
A natural language search mode, as noted above, is enabled by default or when the IN NATURAL LANGUAGE MODE modifier is specified. This mode performs a natural language search against a given text collection (one or more columns). The basic query format of full-text searches in MySQL should be similar to the following:
SELECT * FROM table WHERE MATCH(column) AGAINST(“string” IN NATURAL LANGUAGE MODE);
When MATCH() is used together with a WHERE clause, the rows are automatically sorted by the highest relevance first. To search for an exact string, enclose it with double quotes.
Full-Text Searches with the Query Expansion Mode
Full-text searches also support the query expansion mode. Such a search mode is frequently used when the user relies on implied knowledge – for example, the user might search for “DBMS” hoping to see both “MongoDB” and “MySQL” in the search results. The reason why the user might be able to rely on some implied knowledge when using such a search mode is pretty simple – a full-text search with the query expansion mode works by performing the search twice: the second search phrase is the first search phrase concatenated with a few most relevant entries from the first search. That means that, for example, if in the first search one of the rows would contain the word “DBMS” and the word “MySQL”, the second search would find the entries that would include the word “MySQL” even if they do not contain “DBMS”. The query format that would use the query expansion mode would look like so:
SELECT * FROM table WHERE MATCH(column) AGAINST(“string” WITH QUERY EXPANSION);
Full-text Searches Using the Boolean Mode
The boolean mode is perhaps one of the most interesting things that MySQL full-text search has to offer. This mode has many caveats unique to it because it allows you to expand the search capabilities using boolean operators. When the boolean mode is in use, certain characters can have special meaning at the beginning or end of words. For example:
- “+” means AND;
- “-” means NOT;
- The “(“ and “)” operators allows to create subexpressions;
- “<” and “>” operators change the rank of the search value lower or higher;
- “~” lowers the value’s contribution to the search results;
- Double quotes (“”) only match literal values;
- “*” is a wildcard operator (refer to the explanation above).
These operators allow you to expand the functionality of the search: for example, if you would want to retrieve all rows that contain the word “Demo”, but not “Demo2”, you could use a query like so:
SELECT * FROM table WHERE MATCH(column) AGAINST (“+Demo -Demo2” IN BOOLEAN MODE);
You can also use double quotes together with single quotes like so:
SELECT * FROM table WHERE MATCH(column) AGAINST(‘“search string”’ IN BOOLEAN MODE);
Full-Text Search Gotchas
Before using full-text search in MySQL, do keep in mind that the search does have a few “gotchas”:
- Both the InnoDB and MyISAM storage engines have their own lists of stopwords. InnoDB stopword list can be found here, MyISAM stopword list can be found here.
- To define your own stopword list for InnoDB, define a table with the same structure as the INNODB_FT_DEFAULT_STOPWORD table, insert stopwords there, then set the value of the innodb_ft_server_stopword_table option in the form of db_name/table_name.
- To define your own stopword list for MyISAM, set the ft_stopword_file variable to the path name of the file containing the stopword list. In the file stopwords can be separated by any nonalphanumberic character except “_” and “‘“. The default stopword file is located at storage/myisam/ft_static.c. Stopwords can be disabled by setting the variable to an empty string.
- Full-text searches are not supported on partitioned tables.
- All columns in a FULLTEXT index must use the same character set and collation.
- Full-text search operations do not treat the % string as a wildcard.
Here’s another catch: you might also want to keep in mind that the built-in FULLTEXT parser determines where words start and end by looking at certain characters including the space (“ “), comma (“,”) and period (“.”) meaning that if your search string contains one or more of those characters, the search results might not be accurate. For example, if your database contains 5 rows with the string “test.demo”, the search query “test.demo” might return more (10, 15 etc.) results including “demo”, “string.demo_example” etc. because it’s going to search for “demo” instead of “test.demo”, so you might be stuck with a lot of irrelevant matches. MySQL does offer a workaround for this issue if you’re willing to write your own plugin in C or C++ (see MySQL documentation), but until then, you cannot do much.
The full list of MySQL’s full-text restrictions can be seen at the MySQL’s documentation page.
Summary
The MySQL full-text search capability provides a simple way to implement various search techniques (natural language search, query expansion search, and boolean search) into your application running MySQL. Each of those search techniques have their own caveats and each of them might be suited for different purposes – when deciding whether to use full-text searching, keep in mind that this type of searching has many subtleties unique to itself, know both the benefits and disadvantages of utilizing full-text searching in MySQL and choose wisely.