blog

A Look at the Percona Toolkit

Paul Namuag

Published: November 25, 2020
Last Updated: May 4, 2022

The Percona Toolkit (PT Toolkit) was initially designed as a swiss army knife for MySQL/MariaDB/Percona Server and its underlying operating system to perform advanced database and system administration tasks that are hard to perform especially for daily routine tasks. With the advancements of technology the toolkit has been evolving to support MongoDB and PostgreSQL as well. Though support for MongoDB and PostgreSQL are not as extensive compared to MySQL. We’ll take a look at these tools which are useful for your DBA or server administration routine tasks.

A Brief History of Percona Toolkit

Percona Toolkit is derived from Maatkit and Aspersa, two of the best-known MySQL management software utility toolkits for MySQL server administration. This software has been downloaded and used by large organizations and companies as part of their daily routine tasks especially in DBA and server administration routines. This software has been years of deployments and supported by the MySQL community. Its maturity level is sufficient enough and proven tested in several deployments from development, to QA, then to production. Its years of deployments have been solidified by tens of thousands of users including some of the best-known Internet sites, have proven the reliability of the Percona Toolkit tools for MySQL/MariaDB/Percona Server and now includes MongoDB and PostgreSQL.

As of this writing, the Percona Toolkit consists of 36 unique command line tools that you can use. The Percona Toolkit has been a de facto standard for working with databases analyzing logs, advanced tasks management for databases such as fixing data inconsistencies, performance analysis, query optimization and query tuning, and a lot more.

Mechanism for Percona Toolkit

Percona Toolkit is designed to work specifically for Linux type environments. It has been tested in many popular Linux distributions and proven to work for old and new versions of MySQL, and that includes particularly Percona Server and MariaDB.

System Requirements

Percona Toolkit are roughly built on Perl so most tools requires it.Below are what this toolkit requires:

Perl v5.8 or newer
Bash v3 or newer
Core Perl modules like Time::HiRes

Tools that connect to MySQL require:

Perl modules DBI and DBD::mysql
MySQL 5.0 or newer

Configuration

The Percona Toolkit allows you to specify a configuration file just as how MySQL works compared to –defaults-file option for example. Its configuration file bears resemblances to MySQL command-line client tools and all the configuration files all follow the same conventions.

When the configuration files are read, the options specified are added command-line options and arguments to the tool, so just think of the configuration files as a way to write your command lines.

When using and relying on a configuration file, you have to take note of its read order as below:

The global Percona Toolkit configuration file, /etc/percona-toolkit/percona-toolkit.conf. All tools read this file, so you should only add options to it that you want to apply to all tools.
The global tool-specific configuration file, /etc/percona-toolkit/TOOL.conf, where TOOL is a tool name like pt-query-digest. This file is named after the specific tool you’re using, so you can add options that apply only to that tool.
The user’s own Percona Toolkit configuration file, $HOME/.percona-toolkit.conf. All tools read this file, so you should only add options to it that you want to apply to all tools.
The user’s tool-specific configuration file, $HOME/.TOOL.conf, where TOOL is a tool name like pt-query-digest. This file is named after the specific tool you’re using, so you can add options that apply only to that tool.

Although the tools from the Percona Toolkit allows you the option to specify a configuration file, you have to place the option –config first among the other options you want to specify. Then, the –config format shall be,

--config /path/to/config/file

and not the –config=/path/to/config/file format which separates the equal symbol (=) between the value assigned. For some cases, you don’t want the tool to read your config file, you can specify –config ” at all.

Risks

The PT Toolkit cannot promise that the tool is pretty safe to use with no impact and no harm for your database. Of course it has been tested and works successfully all the time but mistakes and unanticipated events might occur. Although the majority of the tools are simply reading or collecting information from your server, there are chances that it can stall your database (for example, using pt-ioprofile on production servers) or even corrupt your dataset consistency from what you might expect, for example using pt-table-sync with wrong options specified. In worse case scenarios, using tools such as pt-online-schema-change can possess harm if you intend to use this on a busy server. With all of these FUDs, we certainly encourage you to try the tool first in a non-prod or on a dummy MySQL database table, or even test the command with –dry-run (if option is available) and see the result first.

Another thing to take note of, before deciding to use the tool on a production basis, determine how to have a fallback and how to stop and kill the tool if it ends abruptly or abnormally. There are certain things you have to clear that everything has been stopped and killed normally but there are still tidbits that aren’t yet cleared. For example, using pt-online-schema-change creates MySQL Triggers. You have to check if the triggers are still in existence and you might have to drop these in order to avoid unwanted events or transactions in your database.

Debugging

The PT Toolkit offers debugging mode. Simply you can run the tool but prefix it with PTDEBUG=1. For example,

PTDEBUG=1 pt-visual-explain ... > FILE 2>&1

What’s In The Toolkit?

There are a lot of tools available, and as of this writing, the total available tools that can be used for command lines are 36. Let’s take a bit on what these are.

pt-align

Use this tool to align output into a columnar format from other command line tools (vmstat, iostat, pt-diskstats) to make it easier to read.

pt-archiver

This tool offers the user to archive old records from specific tables especially on a busy and large OLTP database server. Its purpose is to run the job with low-impact, forward-only job to nibble old data out of the table without impacting OLTP queries much.

pt-config-diff

diffs MySQL configuration files and server variables. It can perform a diff either using a config file or through a specified DSN.

pt-deadlock-logger

Logs information about MySQL deadlocks on the given DSN. Information is printed to STDOUT (you can use pipes or redirection to store it to a file), and it can also be saved to a table by specifying –dest. The tool runs for forever unless –run-time or –iterations is specified.

pt-diskstats

Prints disk I/O statistics for GNU/Linux. It is somewhat similar to iostat, but it is interactive and more detailed. It can analyze samples gathered from another machine.

pt-duplicate-key-checker

This tool examines MySQL tables for duplicate or redundant indexes and foreign keys. Connection options are read from MySQL option files.

pt-fifo-split

Splits file and pipes lines to a fifo. If no file is specified, it reads through standard input. This is useful when you are going to load or read a huge file, then chunk it to have a minimal impact and store it to your database.

pt-find

It searches for MySQL tables and executes actions, similar to GNU find. The default action is to print the database and table name. This tool is very useful, for example, if you want to find tables that are not InnoDB format then convert as one of your desired actions.

pt-fingerprint

This tool converts queries into fingerprints. Query can be from any RDBMS type of query as it converts queries based on the given non-deterministic value. Basically, this is designed to work for MySQL type of queries. With the –query option, converts the option’s value into a fingerprint. With no options, treats command-line arguments as files and reads and converts semicolon-separated queries from the files. When a file is absent and instead of -, it reads standard input.

pt-fk-error-logger

Logs information about foreign key errors on the given DSN.

pt-heartbeat

Monitor MySQL or PostgreSQL replication delay. You can use it to update a master or monitor a replica. If possible, MySQL connection options are read from your .my.cnf file.

pt-index-usage

Reads queries from logs and analyzes how they use indexes. It is useful when you have unused indexes or duplicate indexes and will print an ALTER statement to drop the indexes.

pt-ioprofile

Watch process IO and print a table of file and I/O activity. It works by attaching strace to the process using ptrace(), which will make it run very slowly until strace detaches. In addition to freezing the server, there is some risk of the process crashing or performing badly after strace detaches from it, or of strace not detaching cleanly and leaving the process in a sleeping state. As a result, this should be considered an intrusive tool, and should not be used on production servers unless you are comfortable with that.

pt-kill

Kill MySQL queries that match certain criteria.

pt-mext

Columnizes repeated output from a program like mysqladmin extended.

pt-mongodb-query-digest

Reports query usage statistics by aggregating queries from MongoDB query profiler.

pt-mongodb-summary

Simply reads or collects information about a MongoDB cluster from several sources to provide an overview of the cluster.

pt-mysql-summary

It reads or collects information from your MySQL database based on the given DSN. It summarizes the status and configuration of a MySQL database which is useful to get a heads up and overview of the working database node.

pt-online-schema-change

It’s purpose is to alter a table on a given modifying action through –alter option. This tool can possess risk if you are using it aggressively on a busy server. You can read more here about Top Open Source Tools for MySQL & MariaDB Migrations which this tool explains where it can be applicable for.

pt-pg-summary

This tool simply collects information for a PostgreSQL server. You can read more here Using the pt-pg-summary Percona Toolkit for PostgreSQL.

pt-pmp

Aggregate GDB stack traces for a selected program.It is a read-only tool which collects GDB stactraces, which can be known to freeze for some period of time and can affect MySQL which goes unresponsive for a while. For that reason, use this tool if you require advanced investigation just like an unresponsive MySQL database or that causes a crash for a specific event you suspect. In addition to freezing the server, there is also some risk of the server crashing or performing badly after GDB detaches from it.

pt-query-digest

Analyzes MySQL queries from slow, processlist, tcpdump, general, and binary log files. By default, queries are grouped by fingerprint (other options can be tables or distill) and reported in descending order of query time (i.e. the slowest queries first).

pt-secure-collect

It has similar tasks as pt-stalk but integrates pt-summary and pt-mysql-summary then encrypts the data.

pt-show-grants

Canonicalize and print MySQL grants so you can effectively replicate, compare and version-control them. Useful as well when you have to preserve your user grants and determine if there were changes to reflect if changes possess security threats.

pt-sift

Browses files created by pt-stalk.

pt-slave-delay

Make your MySQL replica to start or stop as needed to make it lag behind the master. This is useful if you are setting up a DR environment and require that node to be delayed so it won’t affect the disaster that the master has experienced and avoid from replicating instant data.

pt-slave-find

Find and print replication hierarchy tree of MySQL replicas.

pt-slave-restart

Watches one or more MySQL replication slaves for errors, and tries to restart replication if it stops. This tool is useful when you want to skip errors that affect your replica from performing the replicated events or transactions.

pt-stalk

Collect forensic data about MySQL when problems occur. It waits for a trigger condition to occur, then collects data to help diagnose problems. This is a very useful tool especially for performance tuning or diagnosing a crashing MySQL database server identifying what the culprit can be.

pt-summary

Collects information from the system itself and displays the collected information as a report in a readable format.

pt-table-checksum

Verifies your MySQL database cluster integrity. It checks by running checksum algorithm (can be supplied based on –function option). Useful when you need to fix your slaves which are identified to have data drift against the master.

pt-table-sync

Synchronizes MySQL table data efficiently. It is used in tandem with pt-table-checksum for which checksum results shall be based on what pt-table-sync tool has to sync or fix to resolve data inconsistency in the cluster.

pt-table-usage

Analyze by reading queries from a log on how queries use tables.

pt-upgrade

Verify that query results are identical on different servers. This is useful when you want to compare the results from two different versions of MySQL. Let say you are planning to do a major upgrade and wanted to test the performance of your queries against the new version and check if it’s performant or requires tuning for this specific new version of MySQL.

pt-variable-advisor

This tool simply provides suggestions or recommendations based on the analyzed MySQL variables and will let you know of possible problems it can impact.

pt-visual-explain

Similar to pt-align which is used for formatting. But this tool just formats EXPLAIN output as a tree.

Conclusion

We hope this blog let you know what the Percona Toolkit can do and where it is useful. Keep in mind, there are some caveats, so that you have to be careful when using these tools (especially on production and with vital data in your database infrastructure). Always test it first in non-production data and do not test it without knowing the impact first on the production servers.