MySQL+memcached is one of the most widely used setup to store data for high-scale applications. MySQL is sharded at the application layer to increase write throughput and objects are cached in memcached to handle high read loads. Caching data in memory reduces the amount of interaction with the database, thus increasing performance and decreasing server load.
Because of its simplicity and availability for most popular programming languages (C/C++, PHP, Java, Python, Ruby, C#, etc.), memcached is very widely adopted today, specially by the heavy websites. Some of the larger users include Facebook, Twitter, Youtube, Wikipedia, Flickr, Craigslist, etc.
2. What is memcached?
memcached is a high performance, distributed memory object caching system. It is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database or API calls. Items are made up of a key, expiration time, optional flags and raw data. It does not understand data structures: you must upload data that is pre-serialized.
memcached is made up of:
- client software, which is given a list of available memcached servers
- a client-based hashing algorithm, which chooses the server based on the key
- server software, which stores keys and values into an internal hash table. Servers are unaware of each other; there is no synchronization
- server algorithms, which determine when to expire and evict old data
Most applications benefits from caching, unless the application is very write heavy (thus the cache is invalidated all the time).
3. Scaling applications using MySQL and memcached
Memcached provides low-latency access to in-memory data, and isolates MySQL from the application. This reduces the load on the database.
The flow can be described by the following:
- the application tries to read an object from memcached
- if the object is found, the application uses it
- if it is not found in memcached, the application fetches it from the database and also writes it into the cache
- the next time the application tries to fetch the same object, it will be read from the cache (unless the object has been expired or evicted from the cache)
In the case of an update, the application would remove the data from the memcached instance, then write the data to the database and memcached at the same time to make sure the cache is up to date.
With the memcached implementation for NDB, it is now possible to compress the database and cache layers into a single data tier. This eliminates the need for cache synchronization by the application.
4. memcached driver for NDB
Starting version 7.2.1, the memcached interface is directly integrated in MySQL Cluster. Using the memcached API, an application sends reads and writes to the memcached process, which in turn invokes an NDB driver. The driver uses the NDB API to bypass the SQL layer and directly access data in Data Nodes.
The memcached process can be collocated with the data nodes, the applications, or on separate VMs.
By default, key-value data is written to the same table with each key-value pair written in a single row. It is also possible to configure access to existing data in NDB by defining a key prefix so that each value is linked to a pre-defined column in a specific table.
4.1 How to add memcached API to a MySQL Cluster?
Severalnines has created deployment scripts for memcached. The scripts work with a MySQL Cluster that has been deployed using the Severalnines Configurator. You will find them in the install directory of the deployment package on the ClusterControl server.
./install-memcache.sh <hostname> <clusterid>
E.g., the following installs memcached on 10.176.131.76 and attaches
it to cluster with id '1' and installs ndb_memcache_metadata.sql in the cluster (VERIFY THIS):
./install-memcache.sh 10.176.131.76 1
4.2 How to map memcached requests on existing NDB tables?
Data can be cached within the memcached server and can also be persisted within NDB. This is configurable through tables in MySQL Cluster, on a key prefix basis as shown in the diagram below.
For further information on how to do this, we recommend this blog post by Andrew Morgan.
5. Integration with ClusterControl
Once installed, the memcached instance is integrated within ClusterControl, so it is possible to view its status from the ClusterControl GUI, as seen in the below screenshot.
It is also possible to drill down into memcached stats and graphs.
The memcached process will be managed by ClusterControl, and is automatically restarted if it fails.