Each of these steps are shown in depth in the following sections:
- Deploy lvmd as a
systemd
service or as a daemonset on a worker node with LVM installed. - Prepare cert-manager for topolvm-controller. You may supplement an existing instance.
- Determine how topolvm-scheduler to be run:
- If you run with a managed control plane (such as GKE, AKS, etc),
topolvm-scheduler
should be deployed as Deployment and Service topolvm-scheduler
should otherwise be deployed as DaemonSet in unmanaged (i.e. bare metal) deployments- Enable Storage Capacity Tracking mode instead of using
topolvm-scheduler
- If you run with a managed control plane (such as GKE, AKS, etc),
- Install Helm chart
- Configure
kube-scheduler
to usetopolvm-scheduler
.
lvmd is a gRPC service to manage an LVM volume group. The pre-built binary can be downloaded from releases page.
It can be built from source code by mkdir build; go build -o build/lvmd ./pkg/lvmd
.
lvmd
can setup as a daemon or a Kubernetes Daemonset.
-
Prepare LVM volume groups. A non-empty volume group can be used because LV names wouldn't conflict.
-
Edit lvmd.yaml if you want to specify the device-class settings to use multiple volume groups. See lvmd.md for details.
device-classes: - name: ssd volume-group: myvg1 default: true spare-gb: 10
-
Install
lvmd
andlvmd.service
, then start the service.
Also, you can setup lvmd using Kubernetes DaemonSet.
Notice: The lvmd container uses nsenter
to run some lvm commands(like lvcreate
) as a host process, so you can't launch lvmd with DaemonSet when you're using kind.
To setup lvmd
with Daemonset:
-
Prepare LVM volume groups in the host. A non-empty volume group can be used because LV names wouldn't conflict.
-
Specify the following options in the values.yaml of Helm Chart:
lvmd: managed: true socketName: /run/topolvm/lvmd.sock deviceClasses: - name: ssd volume-group: myvg1 # Change this value to your VG name. default: true spare-gb: 10
Notice: If you are using a read-only filesystem, or /etc/lvm
is mounted read-only,
lvmd will likely fail to create volumes with status code 5. To mitigate this
you need to set an extra environment variable:
lvmd:
env:
- name: LVM_SYSTEM_DIR
value: /tmp
This section describes how to switch to DaemonSet lvmd from lvmd running as daemons.
-
Install Helm Chart by configuring lvmd to act as a DaemonSet. You need to set the temporal
socket-name
which is not the same as the value in lvmd running as daemon. After the installation Helm Chart, DaemonSet lvmd and lvmd running as daemon exist at the same time using different sockets.<snip> lvmd: managed: true socketName: /run/topolvm/lvmd.sock # Change this value to something like `/run/topolvm/lvmd-work.sock`. deviceClasses: - name: ssd volume-group: myvg1 default: true spare-gb: 10 <snip>
-
Change the options of topolvm-node to communicate with the DaemonSet lvmd instead of lvmd running as daemon. You should set the temporal socket name which is not the same as in lvmd running as daemon.
<snip> node: lvmdSocket: /run/lvmd/lvmd.sock # Change this value to to something linke `/run/lvmd/lvmd-work.sock`. <snip>
-
Check if you can create Pod/PVC and can access to existing PV.
-
Stop and remove lvmd running as daemon.
-
Change the
socket-name
and--lvmd-socket
options to the original one. To reflect the changes of ConfigMap, restart DamonSet lvmd manually.<snip> lvmd: socketName: /run/topolvm/lvmd-work.sock # Change this value to something like `/run/topolvm/lvmd.sock`. <snip> node: lvmdSocket: /run/lvmd/lvmd.sock # Change this value to something linke `/run/lvmd/lvmd.sock`. <snip>
cert-manager is used to issue self-signed TLS certificate for topolvm-controller. Follow the documentation to install it into your Kubernetes cluster.
Before installing the chart, you must first install the cert-manager CustomResourceDefinition resources.
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.7.0/cert-manager.crds.yaml
Set the cert-manager.enabled=true
in the Helm Chart values.
cert-manager:
enabled: true
You can prepare the certificate manually without cert-manager
.
-
Prepare PEM encoded self-signed certificate and key files.
The certificate must be valid for hostnamecontroller.topolvm-system.svc
. -
Base64-encode the CA cert (in its PEM format
-
Create Secret in
topolvm-system
namespace as follows:kubectl -n topolvm-system create secret tls topolvm-mutatingwebhook \ --cert=<CERTIFICATE FILE> --key=<KEY FILE>
-
Specify the certificate in the Helm Chart values.
<snip> webhook: caBundle: ... # Base64-encoded, PEM-encoded CA certificate that signs the server certificate <snip>
topolvm-scheduler is a scheduler extender for kube-scheduler
.
It must be deployed to where kube-scheduler
can connect.
If your Kubernetes cluster runs the control plane on Nodes, topolvm-scheduler
should be run as DaemonSet
limited to the control plane nodes. kube-scheduler
then connects to the extender via loopback network device.
Otherwise, topolvm-scheduler
should be run as Deployment and Service.
kube-scheduler
then connects to the Service address.
Set the scheduler.type=daemonset
in the Helm Chart values.
The default is daemonset.
```yaml
<snip>
scheduler:
type: daemonset
<snip>
```
In this case, you can set the scheduler.type=deployment
in the Helm Chart values.
```yaml
<snip>
scheduler:
type: deployment
<snip>
```
This way, topolvm-scheduler
is exposed by LoadBalancer service.
Then edit urlPrefix
in scheduler-config.yaml, to specify the LoadBalancer address.
The node scoring for Pod scheduling can be fine-tuned with the following two ways:
- Adjust
divisor
parameter in the scoring expression - Change the weight for the node scoring against the default by kube-scheduler
The scoring expression in topolvm-scheduler
is as follows:
min(10, max(0, log2((capacity >> 30) / divisor)))
For example, the default of divisor
is 1
, then if a node has the free disk capacity more than 1024GiB
, topolvm-scheduler
scores the node as 10
. divisor
should be adjusted to suit each environment. It can be specified the default value and values for each device-class in scheduler-options.yaml as follows:
default-divisor: 1
divisors:
ssd: 1
hdd: 10
Besides, the scoring weight can be passed to kube-scheduler via scheduler-config.yaml. Almost all scoring algorithms in kube-scheduler are weighted as "weight": 1
. So if you want to give a priority to the scoring by topolvm-scheduler
, you have to set the weight as a value larger than one like as follows:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
clientConnection:
kubeconfig: /etc/kubernetes/scheduler.conf
extenders:
- urlPrefix: "http://127.0.0.1:9251"
filterVerb: "predicate"
prioritizeVerb: "prioritize"
nodeCacheCapable: false
weight: 100 ## EDIT THIS FIELD ##
managedResources:
- name: "topolvm.io/capacity"
ignoredByScheduler: true
topolvm supports Storage Capacity Tracking. You can enable Storage Capacity Tracking mode instead of using topolvm-scheduler. You need to use Kubernetes Cluster v1.21 or later when using Storage Capacity Tracking with topolvm.
You can see the limitations of using Storage Capacity Tracking from here.
If you want to use Storage Capacity Tracking instead of using topolvm-scheduler,
you need to set the controller.storageCapacityTracking.enabled=true
, scheduler.enabled=false
and webhook.podMutatingWebhook.enabled=false
in the Helm Chart values.
```yaml
<snip>
controller:
storageCapacityTracking:
enabled: true
<snip>
scheduler:
enabled: false
<snip>
webhook:
podMutatingWebhook:
enabled: false
<snip>
```
TopoLVM installs a mutating webhook for Pods. It may prevent Kubernetes from bootstrapping if the webhook pods and the system pods are both missing.
To workaround the problem, add a label to system namespaces such as kube-system
as follows:
$ kubectl label namespace kube-system topolvm.io/webhook=ignore
You need to create StorageClasses for TopoLVM. The Helm chart creates a StorageClasses by default with the following configuration. You can edit the Helm Chart values as needed.
<snip>
storageClasses:
- name: topolvm-provisioner
storageClass:
fsType: xfs
isDefaultClass: false
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
<snip>
The first step is to create a namespace and add a label.
$ kubectl create namespace topolvm-system
$ kubectl label namespace topolvm-system topolvm.io/webhook=ignore
📝 Helm does not support adding labels or other metadata when creating namespaces.
refs: helm/helm#5153, helm/helm#3503
Install Helm Chart using the configured values.yaml.
helm upgrade --namespace=topolvm-system -f values.yaml -i topolvm topolvm/topolvm
kube-scheduler
need to be configured to use topolvm-scheduler
extender.
First you need to choose an appropriate KubeSchedulerConfiguration
YAML file according to your Kubernetes version.
cp ./deploy/scheduler-config/scheduler-config.yaml ./deploy/scheduler-config/scheduler-config.yaml
And then copy the deploy/scheduler-config directory to the hosts where kube-scheduler
s run.
If you are installing your cluster from scratch with kubeadm
, you can use the following configuration:
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
metadata:
name: config
kubernetesVersion: v1.25.3
scheduler:
extraVolumes:
- name: "config"
hostPath: /path/to/scheduler-config # absolute path to ./scheduler-config directory
mountPath: /var/lib/scheduler
readOnly: true
extraArgs:
config: /var/lib/scheduler/scheduler-config.yaml
The changes to /etc/kubernetes/manifests/kube-scheduler.yaml
that are affected by this are as follows:
-
Add a line to the
command
arguments array such as- --config=/var/lib/scheduler/scheduler-config.yaml
. Note that this is the location of the file after it is mapped to thekube-scheduler
container, not where it exists on the node local filesystem. -
Add a volume mapping to the location of the configuration on your node:
spec.volumes: - hostPath: path: /path/to/scheduler-config # absolute path to ./scheduler-config directory type: Directory name: topolvm-config
-
Add a
volumeMount
for the scheduler container:spec.containers.volumeMounts: - mountPath: /var/lib/scheduler name: topolvm-config readOnly: true
See podpvc.yaml for how to use TopoLVM provisioner.
To create VolumeSnapshots, please follow snapshot controller deployment to install snapshot controller. Do this once per cluster. Refer to the kubernetes guide for VolumeSnapshot creation.
VolumeSnapshotClass
example:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-topolvm-snapclass
driver: topolvm.io
deletionPolicy: Delete
VolumeSnapshot
example:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: new-snapshot
spec:
volumeSnapshotClassName: csi-topolvm-snapclass
source:
persistentVolumeClaimName: snapshot-pvc