Steps to Take if You Have a MySQL Outage
A MySQL outage simply means your MySQL service is not accessible or unresponsive from the other’s perspective. Outages can be originated by a bunch of possible causes..
- Network issue – Connectivity issue, switch, routing, resolver, load-balancer tier.
- Resource issue – Whether you have reached resources limit or bottleneck.
- Misconfiguration – Wrong permission or ownership, unknown variable, wrong password, privilege changed.
- Locking – Global or table lock prevent others from accessing the data.
In this blog post, we’ll look at some steps to take if you’re having a MySQL outage (Linux environment).
Step One: Get the Error Code
When you have an outage, your application will throw out some errors and exceptions. These errors commonly come with an error code, that will give you a rough idea on what you’re facing and what to do next to troubleshoot the issue and recover the outage.
To get more details on the error, check the MySQL Error Code or MariaDB Error Code pages respectively to figure out what the error means.
Step Two: Is the MySQL Server Running?
Log into the server via terminal and see if MySQL daemon is running and listening to the correct port. In Linux, one would do the following:
Firstly, check the MySQL process:
$ ps -ef | grep -i mysql
You should get something in return. Otherwise, MySQL is not running. If MySQL is not running, try to start it up:
$ systemctl start mysql # systemd $ service mysql start # sysvinit/upstart $ mysqld_safe # manual
If you are seeing an error on the above step, you should go look at the MySQL error log, which varies depending on the operating system and MySQL variable configuration for log_error in MySQL configuration file. For RedHat-based server, the file is commonly located at:
$ cat /var/log/mysqld.log
Pay attention to the most recent lines with log level “[Error]”. Some lines labelled with “[Warning]” could indicate some problems, but those are pretty uncommon. Most of the time, misconfiguration and resource issues can be detected from here.
If MySQL is running, check whether it’s listening to the correct port:
$ netstat -tulpn | grep -i mysql tcp6 0 0 :::3306 :::* LISTEN 1089/mysqld
You would get the process name “mysqld”, listening on all interfaces (:::3306 or 0.0.0.0:3306) on port 3306 with PID 1089 and the state is “LISTEN”. If you see the above line shows 127.0.0.1:3306, MySQL is only listening locally. You might need to change the bind_address value in MySQL configuration file to listen to all IP addresses, or simply comment on the line.
Step Three: Check for Connectivity Issues
If the MySQL server is running fine without error inside the MySQL error log, the chance that connectivity issues are happening is pretty high. Start by checking connectivity to the host via ping (if ICMP is enabled) and telnet to the MySQL server from the application server:
(application-server)$ ping db1.mydomain.com (application-server)$ telnet db1.mydomain.com 3306 Trying db1.mydomain.com... Connected to 192.168.0.16. Escape character is '^]'. O 5.6.46-86.2sN&nz9NZ�32?&>H,EV`_;mysql_native_password
You should see some lines in the telnet output if you can get connected to the MySQL port. Now, try once more by using MySQL client from the application server:
(application-server)$ mysql -u db_user -p -h db1.mydomain.com -P3306 ERROR 1045 (28000): Access denied for user 'db_user'@'db1.mydomain.com' (using password: YES)
In the above example, the error gives us a bit of information on what to do next. The above probably because someone has changed the password for “db_user” or the password for this user has expired. This is a rather normal behaviour from MySQL 5.7. 4 and above, where the automatic password expiration policy is enabled by default with a 360 days threshold – meaning that all passwords will expire once a year.
Step Four: Check the MySQL Processlist
If MySQL is running fine without connectivity issues, check the MySQL process list to see what processes are currently running:
mysql> SHOW FULL PROCESSLIST; +-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | +-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+ | 117 | root | localhost | NULL | Query | 0 | init | SHOW FULL PROCESSLIST | 0 | 0 | +-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+ 1 row in set (0.01 sec)
Pay attention to the Info and Time column. Some MySQL operations could be destructive enough to make the database stalls and become unresponsive. The following SQL statements, if running, could block others to access the database or table (which could bring a brief outage of MySQL service from the application perspective):
- FLUSH TABLES WITH READ LOCK
- LOCK TABLE …
- ALTER TABLE …
Some long running transactions could also stall others, which eventually would cause timeouts to other transactions waiting to access the same resources. You may either kill the offensive transaction to let others access the same rows or retry the enqueue transactions after the long transaction finishes.
Proactive monitoring is really important to minimize the risk of MySQL outage. If your database is managed by ClusterControl, all the mentioned aspects are being monitored automatically without any additional configuration from the user. You shall receive alarms in your inbox for anomaly detections like long running queries, server misconfiguration, resource exceeding threshold and many more. Plus, ClusterControl will automatically attempt to recover your database service if something goes wrong with the host or network.
You can also learn more about MySQL & MariaDB Disaster Recovery by reading our whitepaper.
Subscribe to get our best and freshest content