blog
A Deep Dive on Kubernetes Container Storage Interface (CSI)

Discussions around deploying stateful applications on Kubernetes are more friendly today because of advancements over the years — particularly Kubernetes operators. Operators are Kubernetes extensions that use custom resources to manage applications and their components.
Many operators use Persistent Volumes to store data in Kubernetes, and these Persistent Volumes are typically provisioned by a Container Storage Interface (CSI) driver. A Container Storage Interface is an API specification designed to abstract and streamline how containerized workloads interact with various storage systems.
Before CSI, container orchestration platforms—like Kubernetes, Mesos, or Cloud Foundry -often relied on platform-specific storage integration methods. This meant that storage vendors had to develop and maintain multiple drivers or plugins, each tailored to a given orchestrator’s unique API and lifecycle management patterns. The result was fragmentation, complexity, and redundancy in the storage ecosystem. This also meant that organizations had to wait for releases to enable new features or bug fixes in storage. But you know that not everyone upgrades soon after a release.
Now, if a driver correctly implements the CSI API spec, the driver can be used in any supported Container Orchestration system, like Kubernetes. This decouples persistent storage development efforts from core Kubernetes, allowing for the rapid development and iteration of storage drivers across the cloud native ecosystem.
I will start this article by discussing the history of Kubernetes storage plugins, the CSI architecture, and demonstrate how to deploy a CSI driver while explaining its internals. Towards the end, I will briefly explain how dynamic provisioning works in Kubernetes.
Kubernetes Storage Plugins History
If we go back to the early days of Kubernetes — up to version 1.1 — it had In-Tree Persistent Volume Plugins like fiber channel and iSCSI, along with other plugins as part of the main Kubernetes distribution.
As mentioned earlier, this was a challenge early on, as every storage vendor needed to learn how to contribute to Kubernetes. And this was no easy feat, as every code needed to be reviewed, and they were also stuck to the release cadence of Kubernetes.

What came along in Kubernetes 1.2 is the In-Tree FlexVolume Plugin. The In-Tree FlexVolume Plugin introduced the concept of allowing vendors to write flex volume drivers but that became very un-user friendly as well because the Flex volume driver was a self-contained binary that needed to reside on every Kubelet in the cluster that was supposed to attach persistent storage.
The FlexVolume Plugin also did not support dynamic provisioning, so vendors had to write their own provisioner to satisfy persistent volume claims with their drivers. All this prompted the move towards the container storage interface, which completely lives outside of Kubernetes. The CSI functionality and development velocity are independent of core Kubernetes.
Container Storage Interface (CSI) Architecture
Like many components in the Kubernetes ecosystem, the Container Storage Interface is an API specification. You can find the spec in the container-storage-interface/spec GitHub repo. The spec is divided into two different versions:
- A protobuf file that defines the API schema in gRPC terms.
- A markdown file that describes the overall system architecture and goes into detail about each API call.
In this section I will summarize the markdown file, highlighting some examples and scenarios.
The primary focus of the CSI specification is on the protocol between a container orchestration (CO) and a plugin ( “plugin implementation”)—a gRPC endpoint that implements the CSI Services. The terms CSI plugin and CSI driver are often used interchangeably, but it’s important to note that they are not entirely the same thing.
A CSI driver is the actual deployment or implementation of the CSI plugin for a specific storage system (e.g., Azure File Driver, AWS EBS, Google Persistent Disk, or Ceph). It represents the concrete instance of a CSI plugin running within a container orchestrator environment.
A CSI driver is made up of 2 components: a Controller Plugin and a Node Plugin. The Controller Plugin is available on the Kubernetes master host, and the Node Plugin is available on all of the Kubernetes Nodes. The node plugin runs on all nodes like the Kubelet.

Controller Plugin
The Controller Plugin is the brain of a CSI driver, orchestrating the overall management of volumes. Its primary responsibilities are creating and deleting volumes based on user requests for persistent storage, such as through a PersistentVolumeClaim. It acts as an intermediary between Kubernetes and the storage backend (e.g., a cloud provider or storage array), provisioning and de-provisioning volumes as needed.
Beyond basic volume lifecycle management, the Controller Plugin also handles attaching, detaching, snapshotting, and restoring volumes to nodes, ensuring that storage is connected where needed. It also provides flexibility by enabling volume expansion to accommodate growing storage needs and cloning volumes to create duplicates for various use cases.
For example, using a cloud provider driver like Azure File Driver, the Controller Plugin communicates with Azure HTTPS APIs to perform these operations.
Node Plugin
You can think of the Node Plugin as the hands-on component of a CSI driver. As mentioned earlier, it resides on each node in the cluster and directly interacts with the node’s operating system and storage hardware.
When the Controller Plugin is done attaching a volume to a node for a workload to use, the Node Plugin (running on that node) will take over by mounting the volume to a well-known path and optionally formatting it with the appropriate file system (e.g., ext4, XFS) to ensure compatibility and optimal performance.
The Node Plugin also plays a crucial role in monitoring by reporting metrics like disk usage back to Kubernetes, providing valuable insights into storage health and performance.
CSI Driver examples and capabilities
So far, you’ve seen a few Kubernetes CSI Drivers that implement the CSI Specification. There are more CSI drivers for different storage vendors and file systems. To see all of them, go to the CSI drivers documentation.

Given the flexibility of the CSI specification, it’s important to note that not all drivers are the same. Certain drivers support various aspects and features of the CSI spec.
For example, see below a comparison between the 2 Ceph CSI drivers. Ceph is an open-source software-defined distributed storage system that provides object, block, and file storage.
CephFS | Ceph RDB | |
Provisioner | cephfs.csi.ceph.com | rbd.csi.ceph.com |
Compatible with CSI Version(s) | v0.3, >=v1.0.0 | v0.3, >=v1.0.0 |
Persistence (Beyond Pod Lifetime) * | Persistent | Persistent |
Supported Access Modes | Read/Write Multiple Pods | Read/Write Single Pod |
Dynamic Provisioning | Yes | Yes |
Features | Expansion, Snapshot, Cloning | Raw Block, Snapshot, Expansion, Cloning, Topology, In-tree plugin migration |
We will discuss some of the above features later. Before doing that, let’s understand how to install these drivers on Kubernetes.
Deploying a CSI driver on Kubernetes
By default, a Kubernetes cluster, especially if it’s a managed Kubernetes cluster, comes with one or two default drivers installed. For example, the demos we will cover are done in a managed DigitalOcean cluster. By default, the cluster comes with the DigitalOcean Block Storage (dobs.csi.digitalocean.com) CSI driver.

The kubectl get csidrivers in the above image lists all the drivers in a cluster and their capabilities. With the kubectl get csinodes you will see which nodes in your cluster have drivers installed.
You can find most CSI drivers on Artifact Hub. For demo purposes, let’s look at the steps to install the Ceph cephfs CSI driver. As with most packages on Artifact Hub, you’d install it with Helm.
Add the helm repo to your cluster:
helm repo add ceph-csi https://ceph.github.io/csi-charts
helm repo update
Create a namespace where Helm should install the components with:
kubectl create namespace ceph-csi-cephfs
And then install the driver in that namespace:
helm install --namespace "ceph-csi-cephfs" "ceph-csi-cephfs" ceph-csi/ceph-csi-cephfs
Once the installation completes, we can run the kubectl get csidrivers command and see that the driver is deployed.

Now the driver is deployed, let’s inspect it to properly understand how it works. As mentioned earlier, a CSI driver has two components: a controller component (deployed as a Deployment) and a per-node component (daemonsets).
In the case of the Cephfs driver we just deployed, you can see the controller and node deployment with the kubectl get commands in the image below.

The Cephfs CSI driver implements the CSI Controller service and its sidecar containers. The controller — provisioner in this case — sidecar containers interact with the Kubernetes objects through the Kubernetes API and external control plane services, and make calls to the driver’s CSI Controller service.
The Controller sidecars include:
Sidecar Name | Kubernetes Resources Watched | CSI API Endpoints Called |
csi-provisioner | PersistentVolumeClaim | CreateVolume, DeleteVolume |
csi-snapshotter | VolumeSnapshot (Content) | CreateSnapshot, DeleteSnapshot |
csi-resizer | PersistentVolumeClaim | ControllerExpandVolume |
liveness-prometheus | The CSI driver itself | HTTP /healthz endpoint |
With the kubectl get command and JSONPath flag we can view the containers in the running deployment.

The CSI driver communication with the Sidecar container:

The node component implements the CSI Node service and the node-driver-registrar sidecar container. The node-driver-registrar container that fetches driver information (using NodeGetInfo) from a CSI endpoint and registers it with the Kubelet on that node.
The Kubelet which runs on every node is responsible for making the CSI Node service calls. These calls mount and unmount the storage volume from the Cephfs storage system, making it available to the Pod to consume.

As shown in the above image, the Kubelet makes calls to the CSI driver through a UNIX domain socket csi.sock shared on the host via a HostPath volume /var/lib/kubelet/. There is also a second UNIX domain socket that the node-driver-registrar uses to register the CSI driver to Kubelet.
For our deployed Ceph cephfs CSI driver, the following is the driver volume mounts configuration of the node component running as a Daemonsets in the DigitalOcean cluster.
Note: The inline comments explain the volumes. To see the full YAML configuration, visit this GitHub gist.
containers:
- name: Ceph_cephfs
...
volumeMounts:
- mountPath: /csi
name: socket-dir
- mountPath: /var/lib/kubelet/pods
mountPropagation: Bidirectional
name: mountpoint-dir
- mountPath: /var/lib/kubelet/plugins
mountPropagation: Bidirectional
name: plugin-dir
...
- args:
- --v=5
- --csi-address=/csi/csi.sock
- --kubelet-registration-path=/var/lib/kubelet/plugins/cephfs.csi.ceph.com/csi.sock
...
volumeMounts:
- mountPath: /csi
name: socket-dir
- mountPath: /registration
name: registration-dir
volumes:
# This volume is where the socket for kubelet->driver communication is done
- hostPath:
path: /var/lib/kubelet/plugins/cephfs.csi.ceph.com
type: DirectoryOrCreate
name: socket-dir
# This volume is where the node-driver-registrar registers the plugin
# with kubelet
- hostPath:
path: /var/lib/kubelet/plugins_registry
type: Directory
name: registration-dir
# This volume is where the driver mounts volumes
- hostPath:
path: /var/lib/kubelet/pods
type: DirectoryOrCreate
name: mountpoint-dir
Now that you understand how CSI and its drivers work, let’s examine how it handles volume creation at scale with its dynamic provisioning feature.
Dynamic Provisioning in Kubernetes
Dynamic provisioning in Kubernetes is a mechanism that automatically provisions storage resources for a cluster on-demand, without requiring you to manually create and manage PersistentVolumes (PVs).
Traditionally, Kubernetes administrators used static provisioning:
- The cluster admin manually created PersistentVolumes (PVs).
- Application developers created PersistentVolumeClaims (PVCs) that had to match those pre-created PVs (e.g., by matching a label, capacity, or access mode).
This approach can become cumbersome because as soon as your cluster’s storage needs change, you must create more PVs or modify existing ones.
Dynamic provisioning solves these issues by allowing Kubernetes to automatically create PVs based on developer requests in the form of PVCs. This is particularly beneficial for cloud environments (AWS, GCP, Azure) and storage systems that have APIs to programmatically create block or file system storage.
To enable dynamic provisioning in Kubernetes aside from the resources we’ve mentioned like PVs, PVCs and CSI drivers itself, a key Kubernetes resource is the Storage Class.
A StorageClass provides a way for Kubernetes administrators to describe the storage they offer. Different classes may correspond to backup policies, quality-of-service levels, or random policies that the cluster administrators have decided upon. Kubernetes itself is unopinionated about what classes represent. Let’s take a look at an example.
Dynamic provisioning in Rook-Ceph operator
Rook is an open-source, production-ready solution for orchestrating the Ceph storage solution, with a specialized Kubernetes Operator to automate management.
If you’re running Ceph via the Rook operator, it usually creates a CephFilesystem CRD that provides CephFS. Then, Rook sets up a CSI driver. We might have a StorageClass similar to:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
clusterID: <rook-ceph-namespace>
fsName: ourfs # The name of the CephFilesystem
pool: myfs-data0
# 'csi.storage.k8s.io/provisioner-secret-name' and 'csi.storage.k8s.io/node-stage-secret-name'
# store the Ceph credentials. Rook usually creates these Secrets automatically.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# optional: disable thick-provisioning to create a subvolume with minimal overhead
csi.ceph.com/subvolumeGroup: kube_subvolgroup
reclaimPolicy: Delete
mountOptions:
- discard
volumeBindingMode: Immediate
In the above StorageClass:
- provisioner: Must match the name of the CephFS CSI driver (rook-ceph.cephfs.csi.ceph.com in Rook’s case).
- fsName: The name of the CephFS filesystem (defined by the Rook CephFilesystem resource).
- pool: The data pool used by that filesystem.
- Secrets: References to the Secrets for provisioning and mounting (these are often managed automatically by Rook).
Once the StorageClass is created, we can define a PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cephfs-pvc
spec:
storageClassName: rook-cephfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
When we apply this PVC, the CephFS CSI driver dynamically creates a subvolume for us in the ourfs filesystem. Then, a corresponding PersistentVolume is created and bound to cephfs-pvc.
Cloud Native Storage at Scale
We explored the container storage interface architecture, demoed how to deploy a CSI driver and discussed its internals, and also covered how dynamic provisioning works in Kubernetes.
At Severalnines, we’ve been contributing to the database on Kubernetes community over the years, building CCX Cloud and CCX for CSPs, Kubernetes-based DBaaS implementations for end-users and Cloud Service Providers, respectively.
There are more concepts of the container storage interface and its drivers not covered in this article. In our upcoming posts, we will discuss more features of CSI such as accessing raw block storage, CSI snapshots and ephemeral volumes. In the meantime, don’t forget to follow us on LinkedIn and X or sign up for our newsletter to stay informed of the latest in the database world.