Severalnines Blog
The automation and management blog for open source databases

Become a MySQL DBA blog series - Troubleshooting with pt-stalk - part 2

Severalnines

In our last post, we showed you how to use pt-stalk to help gather a bunch of data that can be used for troubleshooting. But what type of data does it collect? How do we parse the files to make sense of all that data?

This is the twentieth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

After reading the  previous post, you should know how to use pt-stalk to watch MySQL metrics and trigger data collection when a predefined situation happens. You should also know about a couple of different ways how you can collect the data, and how you can extend pt-stalk to cover some more complex cases.

Today, we’ll talk about the data you collected - what kind of data does pt-stalk provide you with, and how you can utilize it. Let’s start with answering the question - where and how pt-stalk stores its data?

By default, it uses /var/lib/pt-stalk directory but you can change it using --dest variable at pt-stalk invocation. If you take a look into the directory, you’ll see multiple files following the pattern of:

‘2015_12_03_15_45_20-*’

Basically, every file from a single data set uses the same timestamp, which you’ll find very useful when searching through the different incidents. Each of the files contains specific type of data collected by a particular process executed by pt-stalk. Some of the files are created all the time, some of them only when a particular condition is matched. An example of such a file can be ‘*-lock-waits’ - file created only when at least one transaction was in the ‘LOCK_WAIT’ state. Otherwise you won’t find it in the output.

We will not cover all of the files created by pt-stalk in detail - there are too many of them and some are useful only in edge cases. We want to focus on those which are used most often.

Disk status

Pt-stalk collects some data about the state of the storage layer. Most important aspects are stored in following files.

*-df

Sample of data from this file may look like this:

TS 1449157522.002282558 2015-12-03 15:45:22
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/xvda1       8115168  2286260   5393632  30% /
none                   4        0         4   0% /sys/fs/cgroup
udev             2018468       12   2018456   1% /dev
tmpfs             404688      412    404276   1% /run
none                5120        0      5120   0% /run/lock
none             2023428        0   2023428   0% /run/shm
none              102400        0    102400   0% /run/user
/dev/xvdf      104806400 11880864  92925536  12% /var/lib/mysql
TS 1449157523.008747734 2015-12-03 15:45:23
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/xvda1       8115168  2286264   5393628  30% /
none                   4        0         4   0% /sys/fs/cgroup
udev             2018468       12   2018456   1% /dev
tmpfs             404688      412    404276   1% /run
none                5120        0      5120   0% /run/lock
none             2023428        0   2023428   0% /run/shm
none              102400        0    102400   0% /run/user
/dev/xvdf      104806400 11879236  92927164  12% /var/lib/mysql

This is a simple output of df command, executed every second for the duration of 30s. This is a pretty common pattern in pt-stalk, it tries to collect some data samples over time to make it easier to compare the changes. In this particular case, we are presented with information about disk space in the system - may come handy when e.g. binary logs filled the disk, triggering some weird behavior.

*-diskstats

This file contains samples from /proc/diskstats collected for 30 seconds. Raw output is not very useful but we can use it together with pt-diskstats, another tool from Percona Toolkit, to present some nice info about I/O utilization. Here’s a sample of this output.

root@ip-172-30-4-156:/var/lib/pt-stalk# pt-diskstats --columns-regex='_s|time|cnc|prg' 2015_12_03_15_45_20-diskstats

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  1.0 xvda       0.0     0.0    0.0    37.8     0.6    2.3      0    37.8   31.2   2.0
  1.0 xvda1      0.0     0.0    0.0    37.8     0.6    2.3      0    37.8   31.2   2.0
  1.0 xvdf      48.7     0.7    3.8   217.6    15.3   37.8      0   266.3  127.3   2.4

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  2.0 xvda       0.0     0.0    0.0     0.0     0.0    0.0      0     0.0    0.0   0.0
  2.0 xvda1      0.0     0.0    0.0     0.0     0.0    0.0      0     0.0    0.0   0.0
  2.0 xvdf      73.5     1.0    0.6   136.3     5.4    2.0      0   209.8    7.7   1.9

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  3.0 xvda       0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  3.0 xvda1      0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  3.0 xvdf      73.8     0.9    0.5   141.5     5.8    1.8      0   215.3    6.3   2.1

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  4.0 xvda       0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  4.0 xvda1      0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  4.0 xvdf      57.1     0.8    1.0   192.3     8.9    3.7      0   249.4   12.1   1.7

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  5.0 xvda       0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  5.0 xvda1      0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  5.0 xvdf      79.0     1.0    0.5   180.0    10.9    4.4    110   259.0   37.5   1.8

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  6.0 xvda       0.0     0.0    0.0     5.9     0.1    0.1      0     5.9    0.5   2.1
  6.0 xvda1      0.0     0.0    0.0     5.9     0.1    0.1      0     5.9    0.5   2.1
  6.0 xvdf      71.4     1.0    3.6   234.0    19.0   40.3      0   305.4  106.2   2.4

Pt-diskstats is subject for another blog post. We’ll just mention that this tool gives you the ability to aggregate and filter the most important I/O data from your system. Looking into our case here, it’s pretty clear we are suffering from some kind of I/O contention - ‘qtime’, a column which tells us how long I/O call waits in the scheduler’s queue, spikes even to more than 100ms. This is high even for a spindle. This kind of data, combined with knowledge of the limits of your hardware, gives you better insight in whether I/O is a culprit or just one of the symptoms of a problem.

Network data

*-netstat, *-netstat_s

Those files contain info about currently opened connections (*-netstat) and a summary of network-related statistics (*-netstat_s). This kind of data can be very useful when dealing with issues with network.

Memory data

*-meminfo

This file contains samples extracted from /proc/meminfo. Samples are taken every second, collecting information about memory utilization on the system.

TS 1449157522.002282558 2015-12-03 15:45:22
MemTotal:        4046860 kB
MemFree:          121688 kB
Buffers:            6768 kB
Cached:          2353040 kB
SwapCached:            0 kB
Active:          2793796 kB
Inactive:         986648 kB
Active(anon):    1420712 kB
Inactive(anon):      396 kB
Active(file):    1373084 kB
Inactive(file):   986252 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             14476 kB
Writeback:         15772 kB
AnonPages:       1420692 kB
Mapped:           145200 kB
Shmem:               472 kB
Slab:              96028 kB
SReclaimable:      80948 kB
SUnreclaim:        15080 kB
KernelStack:        2160 kB
PageTables:         9996 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2023428 kB
Committed_AS:    1715484 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       14164 kB
VmallocChunk:   34359717888 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1335296 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       26624 kB
DirectMap2M:     4298752 kB

*-pmap

More detailed info about MySQL’s memory allocation can be find in this file.

*-slabinfo

Here you’ll see information collected from /proc/slabinfo. Again here, some samples are collected every second which show how slab cache utilization changed over time.

*-procvmstat

This file contain samples of data from /proc/vmstat collected every second.

CPU and kernel activity

*-procstat

Information about CPU activity as shown in /proc/stat.

*-interrupts

In this file, you can find information about different kind of interrupt counts and how they are split between CPU cores. It can be pretty useful when trying to identify weird cases of high system load or, sometimes, hardware issues.

Miscellaneous system data

*-lsof

This particular file contains output of “lsof -p $(pidof mysqld)”. We can go through a list of files opened by mysqld, this also include any TCP connections which may be open at that time. Data like this may be interesting when, let’s say, you want to confirm what exact MySQL version you are running (you’ll see what libraries were opened by the mysqld process).

*-ps

This file contains the output of the ps command, listing processes running at the time of the incident. Could be very useful for catching different types of jobs executed directly on the MySQL host. It also gives nice insight on what is running on the system.

*-top

This file stores collected output of the ‘top’ command - data very similar to what you can find in the ‘*-ps‘ file. An important addition is a snapshot of the CPU utilization per process.

*-vmstat and *-vmstat-overall

Stores data collected using ‘vmstat’ command. Sample may look like this:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
71  1      0 122156   6768 2352744    0    0   166  1267   83  200 20 18 47  4 11
 1 11      0 124996   6768 2353260    0    0   684 19456 7023 6694 37 34  0  8 21
56  1      0 125676   6780 2352220    0    0   636  6259 7830 6733 37 36  0  4 23
 0  1      0 125304   6780 2353312    0    0   920  7118 6233 10782 36 30 12  5 17
 0  3      0 122824   6780 2355568    0    0   988  5474 6733 10108 31 27 13 12 16
39  0      0 122396   6780 2356252    0    0   844  8618 8251 10843 30 24 25  9 13
 0  4      0 123236   6780 2355080    0    0   896 22482 11639 9266 30 29  9 18 14
58  0      0 122072   6788 2356484    0    0  1036  6446 8043 11345 33 25  8 20 15
11  2      0 124680   4460 2355928    0    0  1096  9380 8784 11449 36 33  3  8 19
22  0      0 124172   4460 2356952    0    0  1068  5922 8303 12137 36 32  3 11 18
15  0      0 124008   4460 2355200    0    0   796  5764 8256 10389 36 36  3  3 22
54  0      0 125920   4460 2356124    0    0  1012  8041 8449 12298 38 31  2 11 18
61  0      0 122172   4468 2358724    0    0   920  5811 7817 11650 37 33  2  8 20
51  0      0 124420   4248 2357304    0    0   920 10804 6565 10155 32 25  8 21 14
10  0      0 123900   4248 2358144    0    0  1252  5554 8922 12652 38 30  5 10 18
30  0      0 125356   4248 2356552    0    0   760  4208 8296 9757 32 26 22  5 15
27  2      0 123956   4248 2357688    0    0  1420  7164 6986 12300 41 32  4  4 20
 3  1      0 121624   4256 2360016    0    0   912  6373 7653 10600 36 34  6  4 20
 0  0      0 123724   4248 2358048    0    0   748 10550 6878 10302 34 30  7 12 17
52  1      0 124720   4248 2356756    0    0  1068  3228 6470 10636 33 29 17  5 17
23  0      0 123984   4248 2357672    0    0  1072  5455 5990 11145 36 32  7  7 17

It’s a little bit of everything - running processes, memory and swap utilization, I/O and swap disk traffic, number of interrupts and context switches per second, CPU utilization. Simple data but it can give you a good idea on how the system performed at a time of the incident.

Data related to MySQL

*-processlist

This is self-explanatory - this file contains output of the SHOW PROCESSLIST\G, one of the most important means of understanding the state of MySQL.

*-innodbstatus1 and *-innodbstatus2

Those files contain output of SHOW ENGINE INNODB STATUS taken at the start and at the end of the data collection. SHOW ENGINE INNODB STATUS is a crucial tool to understand the state of MySQL and InnoDB as engine. Having two files allows us to compare how the state changed. It would not be possible to explain, in a couple of sentences, how this data can be useful to a DBA. In short, you’ll find here information about running transactions and their state, info about last deadlocks, locking contentions on both row level and internal contention (on mutex/latch level). You can also use this data to get an insight on I/O activity for InnoDB. We covered some of these points in two past blog posts:

Monitoring - Understanding and Optimizing CPU-related InnoDB metrics

Monitoring - Understanding and Optimizing IO-related InnoDB metrics

*-lock-waits

Another important piece of data - this file contains information about transactions which were stuck in the ‘LOCK_WAIT’ state when data was collected. Sample output may look like this:

TS 1449157537.003406070 2015-12-03 15:45:37
*************************** 1. row ***************************
   who_blocks: thread 28242 from 172.30.4.23:14643
  idle_in_trx: 0
max_wait_time: 0
  num_waiters: 1
*************************** 1. row ***************************
    waiting_trx_id: 16480064
    waiting_thread: 28196
         wait_time: 0
     waiting_query: UPDATE sbtest1 SET k=k+1 WHERE id=500167
waiting_table_lock: `sbtest`.`sbtest1`
   blocking_trx_id: 16479993
   blocking_thread: 28242
     blocking_host: 172.30.4.23
     blocking_port: 14643
       idle_in_trx: 0
    blocking_query: COMMIT

As you can see, we have information about the transaction which was waiting - transaction ID and thread number, query which waits for the execution, table which is involved. We also can see who’s locking - transaction id, thread number, host, port and blocking query. At the beginning you can also see nice summary of who’s blocking and how many other transactions are waiting (num_waiters) - pretty useful in identifying the most offending blockers.

Data collected here is coming from information_schema.innodb_lock_waits table.

*-transactions

This file contains list of all running transactions at the time of the data collection. Sample output can look like:

*************************** 43. row ***************************
                    trx_id: 16479993
                 trx_state: RUNNING
               trx_started: 2015-12-03 15:45:36
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 8
       trx_mysql_thread_id: 28242
                 trx_query: COMMIT
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 0
          trx_lock_structs: 4
     trx_lock_memory_bytes: 1184
           trx_rows_locked: 3
         trx_rows_modified: 4
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 10000
          trx_is_read_only: 0
trx_autocommit_non_locking: 0

As you may have noticed, this particular example contains the info about transactions in which there was a blocker - we identified it in the *-lock_waits file. Knowing the thread or transaction ID, it’s very easy to grep through the lock_waits data, transaction data, processlist and SHOW ENGINE INNODB STATUS output and connect all the pieces together, building a broad view of how the lock contention started and what might have been the culprit. This particular data comes from the information_schema.innodb_trx table and contains lots of info about each of the running transactions. You can find here information on currently running queries, number of rows locked, when the transaction started, transaction isolation level and many others.

*-mutex-status1 and *-mutex-status2

These two files contain information collected using ‘SHOW ENGINE INNODB MUTEX’ at the beginning and at the end of the data collection process. Sample output may look like:

InnoDB  &space->latch   os_waits=1
InnoDB  &purge_sys->latch       os_waits=11760
InnoDB  &new_index->lock        os_waits=20451
InnoDB  &dict_operation_lock    os_waits=229
InnoDB  &space->latch   os_waits=9778
InnoDB  &log_sys->checkpoint_lock       os_waits=925
InnoDB  &btr_search_latch_arr[i]        os_waits=23502
InnoDB  table->sync_obj.rw_locks + i    os_waits=2

What you see here is a list of different mutexes, and for each information, details on how many os_waits happened. If you do not have detailed tracking of waits enabled in the performance schema, this is the best way to get the understanding which mutex causes the biggest contention for your workload.

*-mysqladmin

This file contains samples of output of ‘SHOW GLOBAL STATUS’ executed using mysqladmin tool. What we are looking here is everything MySQL presents using this command - handlers, InnoDB stats, binlog stats, Galera stats (if you use Galera cluster). Such data can be useful for investigating different type of problems. In fact, this is exactly the type of data DBA’s will like to have and review if needed. Main issue with such output is that MySQL presents some of the data as monotonous incremented counters while we’d be interested more in how they change in time. It’s not rocket science to build a script which will perform such conversion but there’s also no need to reinvent the wheel as Percona Toolkit contains pt-mext, a tool which does exactly that.

You can use it in a following way to get a nice and browsable, columnar output.

pt-mext -r -- cat /var/lib/pt-stalk/2015_12_03_15_45_20-mysqladmin | less -S

*-opentables1 and *-opentables2

Those files contain a list of opened tables in MySQL, at the beginning and at the end of the data collection. Please note that the current pt-stalk version at the time of writing (version 2.2.16) has a bug where it doesn’t print this data if MySQL contains more than 1000 tables.

*-variables

This file contains output of ‘SHOW GLOBAL VARIABLES’ - a list of all runtime variables in MySQL at the time of data collection. There’s not much to add here - such data is invaluable because, as you probably know, seeing a configuration file is not enough to tell what kind of configuration MySQL is using. Many MySQL variables are dynamic and can be changed at any time, without making a change in my.cnf.

Tools which help to parse pt-stalk data

As you can see, pt-stalk brings you a considerable amount of data to work with. It may look like it’s hard to go through this data but it gets easier with experience. Additionally, the more you are familiar with your MySQL installation, the faster you’ll be able to pinpoint possible issues. Still, it’s a process of looking into tens of files to find the important data - a process which can be very time consuming. Luckily, there are tools that can help in this regard. We’ll talk about two of them.

pt-sift

This tool is part of the Percona Toolkit, and it’s intended to give you an overview of the data collected by pt-stalk. Invocation is simple:

root@ip-172-30-4-156:~# pt-sift /var/lib/pt-stalk/

You are presented with a list of data sets stored in the directory you’ve chosen. You can use the latest one (default) or pass another one you want to review.

  2015_11_20_10_15_49  2015_11_20_10_34_20  2015_11_20_10_40_15
  2015_11_24_19_47_52  2015_11_24_19_48_03  2015_11_24_19_48_46
  2015_12_03_12_38_18  2015_12_03_12_42_09  2015_12_03_12_47_18
  2015_12_03_15_43_41  2015_12_03_15_45_20  2015_12_03_16_10_23
  2015_12_03_16_26_41

Select a timestamp from the list [2015_12_03_16_26_41] 2015_12_03_15_45_20

After you pick the sample, you’ll see a summary page:

======== ip-172-30-4-156 at 2015_12_03_15_45_20 DEFAULT (11 of 13) ========
--diskstats--
  #ts device    rd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt    wr_s wr_avkb wr_mb_s wr_mrg wr_cnc   wr_rt busy in_prg    io_s  qtime stime
 {29} xvdf      75.9    13.1     1.0     0%    1.2    15.5   151.1    47.2     7.0    29%    5.3    24.6  63%      0   227.0   18.4   2.2
 xvdf  0% 30% 15% 20% 15% 20% 30% 25% 20% 25% . . 30% 25% 15% 25% 20% 15% . 20% 15% 20% 15% 25% 15% . . 25% 10% 15%
--vmstat--
 r b swpd   free buff   cache si so  bi   bo   in    cs us sy id wa st
72 1    0 122064 6768 2352744  0  0 166 1267   83   200 20 18 47  4 11
 0 0    0 126908 4248 2356128  0  0 979 7353 7933 10729 35 31  8  8 18
wa 0% 5% 0% 5% 10% 5% 15% 20% 5% 10% 0% 10% 5% 20% 10% 5% 0% . 10% 5% . 0% 5% . 10% 0% 5% 0% 5% 0%
--innodb--
    txns: 65xACTIVE (180s) 3xnot (0s)
    3 queries inside InnoDB, 0 queries in queue
    Main thread: sleeping, pending reads 1, writes 1, flush 0
    Log: lsn = 28182468703, chkp = 27881812915, chkp age = 300655788
    Threads are waiting at:
     50 log0log.ic line 358
      1 btr0cur.cc line 3892
    Threads are waiting on:
      1 S-lock on RW-latch at 0x7f1bf2f53740 '&block->lock'
--processlist--
    State
     40  updating
     14  wsrep in pre-commit stage
      6  statistics
      5
      3  Sending data
    Command
     65  Query
      7  Sleep
      1  Connect
--stack traces--
    No stack trace file exists
--oprofile--
    No opreport file exists

As you can see, pt-sift presents a summary of the diskstats data and vmstat data. It also presents information about InnoDB internals - how many transactions, are there any waits? Some data on InnoDB checkpointing is also presented. We also see a summary of the processlist output. By typing ‘?’, you can see the internal ‘help’ for pt-sift. There are couple of other options you can pick from - see diskstats output, check the SHOW ENGINE INNODB STATUS output, summarize netstat data and check the data from SHOW STATUS.

Rain Gauge

Pt-sift is a CLI tool but if you prefer a UI, you can use a Rain Gauge - a tool which presents pt-stalk data using the web browser.

After setup you’ll be presented with a screen summarizing all the source hosts which pushed the data to the Rain Gauge and number of samples collected for each of them. You can pick a host either from the drop down menu or by clicking on the list. You’ll see a screen similar to this one:

Next step is to pick a sample you find interesting. You’ll end up in a summary page.

At the top, there are links to different sections of collected data. You can click those links to check the information particularly interesting to you. At the bottom, there’s a summary generated using pt-sift.

Most of the ‘sections’ just print file contents, split by samples, which doesn’t make much difference than using CLI. One of them, though, gives you some extra info. When you check the ‘mysqladmin’ section, you will see a screen like the below:

It is possible to graph any combination of the counters by picking them from the list on the left. You can also use one of the presets by clicking the buttons below the graph. Finally, at the bottom, you will see samples stored in multiple columns. Such way of presenting the data is really useful when you want to compare state of different counters. Having the ability to graph the data is also helpful. It’s definitely easier to see trends on a graph than by looking at raw numbers.

With this post, we conclude our coverage of pt-stalk. Hopefully you’ll find this useful if ever you’ll need to use pt-stalk to pinpoint issues in your database environment. In our next post, we will talk about database recovery in MySQL Replication and Galera Cluster setups.


Picture credit: Ken Fager