An Overview of MariaDB Xpand (formerly ClustrixDB)

Paul Namuag

MariaDB Xpand is a new product from MariaDB. It was formerly known as ClustrixDB which was acquired in September of 2018 by MariaDB Corporation. 

ClustrixDB is no longer available as a separate entity, but is now included as part of MariaDB Enterprise Server. Now called Xpand, it extends MariaDB Enterprise Server with distributed data and transaction processing, transforming it into a distributed SQL database capable of scaling to millions of transactions per second with a shared-nothing architecture. However, Xpand is not an all or nothing, as DBAs can choose to use both replicated and distributed tables. Xpand is good for complex queries and analytics processing as it can perform parallel queries across the available nodes within the cluster.

Basically, Xpand is a shared-nothing architecture and designed as a scale-out SQL database, built from the ground up which originally could run on commodity hardware with automatic data redistribution (so you never need to shard). It has built-in fault tolerance, all accessible by a simple SQL interface and support for business critical MySQL features (replication, triggers, stored routines, etc). It's license is only available as proprietary, so if you want to take advantage of this product, you have to contact MariaDB sales first in order to acquire a valid license.

When To Use MariaDB Xpand

Xpand is designed to handle large volumes of data and that lets you scale your database more efficiently. This means the scaling out of your cluster is done easily and automatically by Xpand itself. Since the release of MariaDB Platform X5, Xpand is already part of the platform provided to the customers as part of the distributed SQL solution. The Xpand smart engine allows customers to scale beyond the InnoDB storage engine's sweet spot of high-performance mixed read/write workloads on a single node with the option of adding scale via replication and employing a highly-available fault-tolerant distributed solution for large-scale workloads.

With Xpand, you have the flexibility to scale on a per table basis. Start by using Xpand for just a single table and expand the usage as your needs grow beyond what a single node can handle. Increase the use of distributed SQL as your enterprise needs grow beyond replication or clustering. When data or query volumes increase to the point of degrading performance, you can use Xpand to distribute tables or the entire database for improved throughput and concurrency. Xpand has built-in high-availability and elasticity, so nodes can be added or removed transparently as needed to scale-out.

Just as with MariaDB ColumnStore, the columnar smart engine, cross engine JOINs are possible (and encouraged) between replicated and distributed tables. Unlike other Distributed SQL implementations that distribute the entire database and have, therefore, significant overhead on smaller tables, MariaDB allows the combined use of InnoDB for replicated small data sets and massive distributed data sets via Xpand.

Unfortunately, there's no formal documentation regarding the state of change from ClustrixDB to MariaDB Xpand, so you might still want to rely on https://docs.clustrix.com/ for documentation regarding how ClustrixDB works. It's also known that GTID is not supported by ClustrixDB, though this might have changed since the release of MariaDB 10.5.

How Does MariaDB Xpand Work?

Deployment using the MariaDB Xpand requires that you have MariaDB Enterprise Servers with the Xpand plugin installed, then the Xpand Nodes running alongside. It's similarly just like how you set up MaxScale and MariaDB Server replication setup for High Availability and you can place MaxScale on top to manage connections and transparently fail over between the frontend Enterprise Server instances with replicated smaller data sets in InnoDB.. It's also recommended that for best performance experience with Xpand, the front-end servers and nodes have to be run on separate physical servers. See the MariaDB Xpand topology architecture below from MariaDB on how this works:

To explain further above, the Xpand splits a number of slices for each table that is built using Xpand. Each slice is stored on a primary node and then replicated to one or more other nodes to ensure fault tolerance. Each Xpand node can perform both reads and writes. And each node has a map of the data distribution.

For read operations, the major part of the query is pushed down to Xpand where the query is evaluated and relevant portions of the query are then sent to the appropriate Xpand nodes. MariaDB Enterprise Server collects the return data from the Xpand nodes to generate a result-set.

For write operations, MariaDB Xpand uses a component called the “rebalancer” to automatically and transparently distribute data across the available Xpand nodes.

MariaDB Xpand as a Distributed SQL

Each Xpand node is able to perform both reads and writes. When a query is received by MariaDB Enterprise Server, it is evaluated by a query optimizer and portions of the query are sent to the relevant Xpand nodes. The results are collected and a single result-set returned to the client.

MariaDB Xpand leverages a shared-nothing architecture; a single node handles each request, and memory and storage are not shared.

MariaDB Xpand HA and Fault Tolerance

MariaDB Xpand is fault tolerant by design. Xpand maintains two replicas of all data using a rebalancer process that runs in the background. Xpand can suffer a single node or zone failure without data loss.

Upon node failure, data is rebalanced from remaining nodes, automatically healing the data protection without intervention. In a zone failure, the rebalancer performs the same operation between nodes and remaining zones.

When the failed node is replaced, the rebalancer redistributes data, restoring MariaDB Xpand to its intended node count.

Horizontal Scale-Out with MariaDB Xpand

MariaDB Xpand is flexible by design. If the load on MariaDB Enterprise Server increases, you can add additional Servers to your deployment, load balancing between them using MariaDB MaxScale. Each Server can connect to the Xpand nodes to access data stored on Xpand tables.

If the load on MariaDB Xpand increases, you can scale out by adding new nodes. When you add an Xpand node to the deployment, the rebalancing process redistributes data from the existing nodes. Once complete, the Xpand node can now handle both read and write operations from MariaDB Enterprise Servers.

If the load on MariaDB Xpand decreases, you can scale down by removing nodes. When you remove an Xpand node from the deployment, the rebalancing process redistributes data to the remaining nodes, ensuring fault tolerance.

What Makes MariaDB Xpand scalable?

There are no bottlenecks and no single points of failure. All processors are enlisted in support of query processing. Queries are parallelized and distributed across the cluster to the relevant data. New nodes are automatically recognized and incorporated into the cluster. Workloads and data are automatically balanced across all nodes in the cluster. Cluster-wide SQL relational calculus and ACID properties eliminate multi-node complexity from the development and management of multi-tiered applications. The complexity commonly required to scale existing db models to handle large volumes of data is eliminated. And as your database grows, just add nodes.

There are several things that affect scalability and performance:

  • Shared-nothing architecture, which eliminates potential bottlenecks. Contrast this with shared-disk / shared-cache architectures that bottleneck, don't scale, and are difficult to manage.
  • Parallelization of queries, which are distributed to the node(s) with the relevant data. Results are created as close to the data as possible, then routed back to the requesting node for consolidation and returned to the client.

This is very different from other systems, which routinely move large amounts of data to the node that is processing the query, then eliminate all the data that doesn't fit the query (typically lots of data). By only moving qualified data across the network to the requesting node, Xpand significantly reduces the network traffic bottleneck. In addition, more processors participate in the data selection process, By selecting data on multiple nodes in parallel, the system produces results more quickly than if all data was selected by a single node, which first has to collect all the required data from the other nodes in the system.

Since each node focuses on a particular partition and sends work items to other nodes rather than requesting raw data from other nodes, each node's cache contains more of that node's data, and less redundant data from other nodes. This means cache hit rates will be much higher, significantly reducing the need for slow disk accesses.

Deploying MariaDB Xpand

There are two separate MariaDB Xpand deployments in order to start using the MariaDB Xpand. Xpand deployments consist of MariaDB Enterprise Server instances, called the front-end servers, having the Xpand plugin installed, then the Xpand Nodes are running alongside with these front-end servers. For the best performance, the Enterprise Server and the Xpand node can be installed on separate physical servers.

  1. You need to set up the MariaDB Xpand Node. Xpand nodes are configured in a deployment to provide the storage back-end for MariaDB Enterprise Servers with the Xpand storage engine plugin. Servers store data for Xpand tables on Xpand nodes rather than the local file system.  Installing the Xpand Node requires a license, which is a JSON object,  and you can only acquire by reaching out to MariaDB Sales. The installation process is not as quick as just a single command or click so we suggest you go to their installation guide for the Xpand Node.
  2. Deploy a front-end Server. As what I've noticed here over the changes they made, it looks like the most recommended way to use Xpand is using MariaDB Enterprise Server 10.5. The Xpand 

MariaDB Xpand Hardware Compatibility

If you're curious about its hardware compatibility, the MariaDB Platform can run in a variety of environments. As long as your MariaDB servers can run or hosted on the environments you are currently using, as long as you are able to set up the Xpand Nodes alongside with the MariaDB servers and have Xpand plugins installed, then this will definitely work. From their documentation, the list of Physical and Cloud Environments are listed below:

  • On-premises (on-prem)
  • Collocated (colo)
  • Private Cloud
  • Public Cloud
  • Hybridized

For the hardware architecture, it's worth noting that as of MariaDB Enterprise Server 10.4.10-4 (2019-11-18), MariaDB Enterprise Server supports only x86_64 hardware architecture platforms.

Conclusion

MariaDB Xpand simplifies efficiency and expandability in a very convenient fashion. The most appealing aspect of this product is that you can use MariaDB’s standard SQL functions as well. It can be embedded through your existing MariaDB environment, which can take advantage of its features and scalability. Although that may be enticing, it requires special licensing and large fees in order for you to leverage this product. If it serves a purpose for your enterprise application, then this MariaDB Xpand might be worth a try.

 
ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.