Understanding MySQL’s New Heatwave

Sebastian Insausti

Data Analytics is important in any company as you can see what happened in the past to be able to make smart decisions or even predict future actions using the existing data.

Analyze a huge amount of data could be hard and you should need to use more than one database engine, to handle OLTP and OLAP workloads. In this blog, we will see what is HeatWave, and how it can help you on this task.

What is HeatWave?

HeatWave is a new integrated engine for MySQL Database Service in the Cloud. It is a distributed, scalable, shared-nothing, in-memory, columnar, query processing engine designed for fast execution of analytic queries. According to the official documentation, it accelerates MySQL performance by 400X for analytics queries, scales out to thousands of cores, and is 2.7X faster at around one-third the cost of the direct competitors. MySQL Database Service, with HeatWave, is the only service for running OLTP and OLAP workloads directly from the MySQL database.

How HeatWave Works

A HeatWave cluster includes a MySQL DB System node and two or more HeatWave nodes. The MySQL DB System node has a HeatWave plugin that is responsible for cluster management, loading data into the HeatWave cluster, query scheduling, and returning query results to the MySQL DB System. HeatWave nodes store data in memory and process analytics queries. Each HeatWave node contains an instance of HeatWave.

The number of HeatWave nodes required depends on the size of your data and the amount of compression that is achieved when loading the data into the HeatWave cluster. We can see the architecture of this product in the following image:

As you can see, users don’t access the HeatWave cluster directly. Queries that meet certain prerequisites are automatically offloaded from the MySQL DB System to the HeatWave cluster for accelerated processing, and the results are returned to the MySQL DB System node and then to the MySQL client or application that issued the query.

How to use it

To enable this feature, you will need to access the Oracle Cloud Management Site, access the existing MySQL DB System (or create a new one), and add an Analitycs Cluster. There you can specify the type of cluster and the number of nodes. You can use the Estimate Node Count feature to know the necessary number based on your workload.

Loading data into a HeatWave cluster requires preparing tables on the MySQL DB System and executing table load operations. 

Preparing Tables

Preparing tables involves modifying table definitions to exclude certain columns, define string column encodings, add data placement keys, and specify HeatWave (RAPID) as the secondary engine for the table, as InnoDB is the primary one.

To define RAPID as the secondary engine for a table, specify the SECONDARY_ENGINE table option in a CREATE TABLE or ALTER TABLE statement:

mysql> CREATE TABLE orders (id INT) SECONDARY_ENGINE = RAPID;
or
mysql> ALTER TABLE orders SECONDARY_ENGINE = RAPID;

Loading Data

Loading a table into a HeatWave cluster requires executing an ALTER TABLE operation with the SECONDARY_LOAD keyword. 

mysql> ALTER TABLE orders SECONDARY_LOAD;

When a table is loaded, data is sliced horizontally and distributed among HeatWave nodes. After a table is loaded, changes to a table's data on the MySQL DB System node are automatically propagated to the HeatWave nodes.

Example

For this example, we will use the table orders:

mysql> SHOW CREATE TABLE orders\G
*************************** 1. row ***************************
       Table: orders
Create Table: CREATE TABLE `orders` (
  `O_ORDERKEY` int NOT NULL,
  `O_CUSTKEY` int NOT NULL,
  `O_ORDERSTATUS` char(1) COLLATE utf8mb4_bin NOT NULL,
  `O_TOTALPRICE` decimal(15,2) NOT NULL,
  `O_ORDERDATE` date NOT NULL,
  `O_ORDERPRIORITY` char(15) COLLATE utf8mb4_bin NOT NULL,
  `O_CLERK` char(15) COLLATE utf8mb4_bin NOT NULL,
  `O_SHIPPRIORITY` int NOT NULL,
  `O_COMMENT` varchar(79) COLLATE utf8mb4_bin NOT NULL,
  PRIMARY KEY (`O_ORDERKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

You can exclude columns that you don’t want to load to HeatWave:

mysql> ALTER TABLE orders MODIFY `O_COMMENT` varchar(79) NOT NULL NOT SECONDARY;

Now, define RAPID as SECONDARY_ENGINE for the table:

mysql> ALTER TABLE orders SECONDARY_ENGINE RAPID;

Make sure that you have the SECONDARY_ENGINE parameter added in the table definition:

ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin SECONDARY_ENGINE=RAPID

And finally, load the table into HeatWave:

mysql> ALTER TABLE orders SECONDARY_LOAD;

You can use EXPLAIN to check if it is using the correct engine. You should see something like this:

Extra: Using where; Using temporary; Using filesort; Using secondary engine RAPID

On the MySQL official site, you can see a comparison between a normal execution and using HeatWave:

HeatWave Execution

mysql> SELECT O_ORDERPRIORITY, COUNT(*) AS ORDER_COUNT FROM orders 
       WHERE O_ORDERDATE >= DATE '1994-03-01' GROUP BY O_ORDERPRIORITY 
       ORDER BY O_ORDERPRIORITY;
+-----------------+-------------+
| O_ORDERPRIORITY | ORDER_COUNT |
+-----------------+-------------+
| 1-URGENT        |     2017573 |
| 2-HIGH          |     2015859 |
| 3-MEDIUM        |     2013174 |
| 4-NOT SPECIFIED |     2014476 |
| 5-LOW           |     2013674 |
+-----------------+-------------+
5 rows in set (0.04 sec)

Normal Execution

mysql> SELECT O_ORDERPRIORITY, COUNT(*) AS ORDER_COUNT FROM orders 
       WHERE O_ORDERDATE >= DATE '1994-03-01' GROUP BY O_ORDERPRIORITY 
       ORDER BY O_ORDERPRIORITY;
+-----------------+-------------+
| O_ORDERPRIORITY | ORDER_COUNT |
+-----------------+-------------+
| 1-URGENT        |     2017573 |
| 2-HIGH          |     2015859 |
| 3-MEDIUM        |     2013174 |
| 4-NOT SPECIFIED |     2014476 |
| 5-LOW           |     2013674 |
+-----------------+-------------+
5 rows in set (8.91 sec)

As you can see, there is an important difference in the time of the query, even in a simple query. For more information, you can refer to the official documentation.

Conclusion

A single MySQL Database can be used for both OLTP and Analytics Applications. It is 100% compatible with MySQL on-premises, so you can keep your OLTP workloads on-premises and offload your analytics workloads to HeatWave without changes in your application, or even use it directly on the Oracle Cloud to improve your MySQL performance for Analytics purposes.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.