Install Alluxio on Kubernetes

Slack Docker Pulls

This page describes how to deploy Alluxio on Kubernetes and run FIO as validation.

Prerequisites

  • Kubernetes
    • A Kubernetes cluster with version at least 1.19, with feature gates enabled.
    • Ensure the cluster’s Kubernetes Network Policy allows for connectivity between applications (Alluxio clients) and the Alluxio Pods on the defined ports.
    • The Kubernetes cluster has helm 3 with version at least 3.6.0 installed.
    • Image registry for storing and managing container image
  • Alluxio Operator
    • Permission to create CRD (Custom Resource Definition);
    • Permission to create ServiceAccount, ClusterRole, and ClusterRoleBinding for the operator pod;
    • Permission to create namespace that the operator will be in.

Reference: Using RBAC Authorization

Preparation

Download files

# helm chart for alluxio operator
# extract this tarball, which will create the directory alluxio-operator/
alluxio-operator-1.1.2-helmchart.tgz

# docker images
# use docker load to load the respective images into docker
# alluxio operator docker image
alluxio-k8s-operator-1.1.2-docker.tar

# alluxio/alluxio-enterprise docker image
alluxio-enterprise-AI-3.1-3.3.2-docker.tar

# alluxio csi docker image
alluxio-csi-1.1.2-docker.tar

Extract Operator helm chart

# untar the Operator helm chart. This will extract to the directory alluxio-operator/
$ tar -xzf alluxio-operator-1.1.2-helmchart.tgz

Upload images

This example shows how to upload Alluxio operator image. Repeat these steps for the Alluxio CSI and Alluxio Enterprise images.

# Download docker tar file
# load image
$ docker load < alluxio-k8s-operator-1.1.2-docker.tar
Loaded image: alluxio/k8s-operator:1.1.2

# retag image
$ docker tag alluxio/k8s-operator:1.1.2 <your.private.registry.here>/alluxio/operator:1.1.2

# push image
$ docker push <your.private.registry.here>/alluxio/operator:1.1.2

Prepare configuration files

Create the following configuration files within the extracted directory of the alluxio operator helm chart.

Create the operator configuration in alluxio-operator/alluxio-operator.yaml

nameOverride: alluxio-operator
image: alluxio/operator # set this value to be an accessible registry containing this image
imageTag: 1.1.2
imagePullPolicy: Always

alluxio-csi: # disable CSI
  enabled: false

Create the dataset configuration in alluxio-operator/dataset.yaml

apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: Dataset
metadata:
  name: null-dataset
spec:
  dataset:
    path: file:///null

Note that placeholder values are inserted for the dataset name and path, otherwise it will not work with the mount table feature.

Create the cluster configuration in alluxio-operator/alluxio-cluster.yaml

apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  dataset: null-dataset  # note this matches .metadata.name in the dataset configuration
  image: alluxio/alluxio-enterprise # set this value to be an accessible registry containing this image
  imageTag: AI-3.1-3.3.2
  imagePullPolicy: Always
  properties:  # adjust the following alluxio-site.properties as needed
    alluxio.master.journal.type: "NOOP"
    alluxio.master.scheduler.initial.wait.time: "1s"
    alluxio.master.scheduler.restore.job.from.journal: "false"

    alluxio.user.file.writetype.default: "THROUGH"
    alluxio.user.metadata.cache.max.size: "0"
    alluxio.user.file.replication.min: "1"

    alluxio.user.fuse.sync.close.enabled: "true"
    alluxio.fuse.web.enabled: "true"

    alluxio.mount.table.source: "ETCD"
    alluxio.worker.membership.manager.type: "ETCD"

    alluxio.dora.ufs.list.status.cache.nr.files: "0"

    alluxio.security.authorization.permission.enabled: "false"
    alluxio.security.authentication.type: "NOSASL"
    alluxio.network.tls.enabled: "false"
    alluxio.user.fuse.sync.close.enabled: "false"

    alluxio.license: "xxxxxxx"

  master:
    count: 1
    resources:
      limits:
        cpu: "1"
        memory: "10Gi"
      requests:
        cpu: "1"
        memory: "2Gi"
    jvmOptions:
      - "-Xmx4g"
      - "-Xms1g"
      - "-XX:MaxDirectMemorySize=4g"
#    nodeSelector:  # change node label based on customer's environment
#      alluxio-node: "true"
        # localfs-server: "true"

  worker:
    count: 2
    resources:
      limits:
        cpu: "10"
        memory: "20Gi"
      requests:
        cpu: "1"
        memory: "4Gi"
    jvmOptions:
      - "-Xmx8g"
      - "-Xms2g"
      - "-XX:MaxDirectMemorySize=8g"
#    nodeSelector: # change it based one real node label
#     alluxio-node: "true"

  pagestore:
    type: hostPath
    quota: 10Gi
    hostPath: /mnt/alluxio/page

  metastore:
    type: hostPath
    hostPath: /mnt/alluxio/meta


  fuse:
    enabled: true
    hostPathForMount: /mnt/alluxio/fuse
    resources:
      requests:
        cpu: "1"
        memory: "4Gi"
      limits:
        cpu: "6"
        memory: "16Gi"
    jvmOptions:
      - "-Xmx4g"
      - "-Xms1g"
      - "-XX:MaxDirectMemorySize=8g"
    mountOptions:
      - allow_other
      - kernel_cache
      - entry_timeout=10000
      - attr_timeout=10000

  metrics:
    prometheusMetricsServlet:
      enabled: true
      podAnnotations:
        prometheus.io/scrape: "true"
        prometheus.io/masterPort: "19999"
        prometheus.io/workerPort: "30000"
        prometheus.io/fusePort: "49999"
        prometheus.io/path: "/metrics/"

  etcd:
    enabled: true
    replicaCount: 3
  alluxio-monitor:
    enabled: true

Verify configurations

  • Modify the image, imageTag, and dataset values in alluxio-operator/alluxio-cluster.yaml. Modify the cpu, memory, and count values for master, worker, etcd, and fuse configurations as needed. Specify the startup location of pods using nodeSelector.
  • Bind SSD paths. The above configuration will have two hostpath mounts:
    • Alluxio pods will use the hostpath mount at /mnt/alluxio/meta to store Alluxio’s metadata information. It is recommended to use a SSD disk for this directory.
    • Alluxio workers will use the hostpath mount at /mnt/alluxio/page to store Alluxio’s cached data. It is also recommended to use a SSD disk for this directory.
  • The path for FUSE’s local_data_cache is at /mnt/alluxio/fuse-local-cache.

  • Other Hostpath configuration:
    • For mounting NAS, you need to first add the corresponding mount path in the hostPaths section of Workers.
    • If you want to use a different path for FUSE local data cache, you also need to add the corresponding mount path in the hostPaths section of FUSE.
    • By default, FUSE is mounted at /mnt/alluxio/fuse. You can view the mounted UFS storage file list in the host’s directory /mnt/alluxio/fuse.
  • S3 ECR configuration:
    • The configuration value for the docker images should be replaced with customer’s AWS ECR address in order to successfully pull the images served by the corresponding ECR.

Deploy cluster

Deploy Alluxio Operator

# deploy alluxio operator
$ helm install operator ./alluxio-operator \
  -f ./alluxio-operator/alluxio-operator.yaml
NAME: operator
LAST DEPLOYED: Wed Feb 28 02:10:08 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

# check alluxio operator status
$ kubectl get pod -n alluxio-operator
NAME                                  READY   STATUS    RESTARTS   AGE
alluxio-controller-669699b5d7-zlv7h   1/1     Running   0          48s
dataset-controller-5649f66b5f-f7hx9   1/1     Running   0          48s

Deploy Alluxio dataset

# create alluxio namespace (use default namespace if not creating)
$ kubectl create namespace alluxio-test
namespace/alluxio-test created

# check namespace status
$ kubectl get namespaces | grep alluxio-test
alluxio-test       Active   91m

# create alluxio dataset
$ kubectl create -f ./alluxio-operator/dataset.yaml -n alluxio-test
dataset.k8s-operator.alluxio.com/null-dataset created

# check alluxio dataset statue
$ kubectl get dataset -n alluxio-test
NAME           DATASETPHASE   BOUNDEDALLUXIOCLUSTER
null-dataset   Pending

Deploy and start Alluxio cluster

# deploy alluxio cluster
$ kubectl create -f ./alluxio-operator/alluxio-cluster.yaml -n alluxio-test
alluxiocluster.k8s-operator.alluxio.com/alluxio created

# check alluxio cluster status
$ kubectl get alluxiocluster -n alluxio-test
NAME      CLUSTERPHASE        AGE
alluxio   Creating/Updating   98s

# check alluxio cluster status
$ watch kubectl get pod -n alluxio-test
NAME                                         READY   STATUS    RESTARTS      AGE
alluxio-etcd-0                               1/1     Running   0             101s
alluxio-fuse-cd8mm                           1/1     Running   0             101s
alluxio-fuse-vqk7j                           1/1     Running   0             102s
alluxio-master-0                             1/1     Running   3 (48s ago)   101s
alluxio-monitor-grafana-56b97c5689-554c8     1/1     Running   0             102s
alluxio-monitor-prometheus-749fc5f96-cksv6   1/1     Running   0             102s
alluxio-worker-5d46cf9ddf-6c992              1/1     Running   0             101s
alluxio-worker-5d46cf9ddf-gwh8w              1/1     Running   0             101s

Mount storage

In this example, an existing S3 bucket is mounted to Alluxio

# go into alluxio worker pod
$ pod_worker=$(kubectl get pods -l name=alluxio-worker -o jsonpath='{.items[0].metadata.name}' -n alluxio-test)
$ kubectl exec -it $pod_worker -n alluxio-test -- bash

# mount ufs
$ alluxio mount add \
--option aws.accessKeyId=xxx \
--option aws.secretKey=xxx \
--option alluxio.underfs.s3.region=us-east-1 \
--path /bucket \
--ufs-uri s3://test/
Mounted ufsPath=s3://test/ to alluxioPath=/bucket with 3 options

# check mount point status
$ alluxio mount list
s3://test/  on  /bucket/ properties={aws.secretKey=xxx, alluxio.underfs.s3.region=us-east-1, aws.accessKeyId=xxx}

# go into alluxio fuse pod check data in mount point
$ pod_fuse=$(kubectl get pods -l role=alluxio-fuse -o jsonpath='{.items[0].metadata.name}' -n alluxio-test)
$ kubectl exec -it $pod_fuse -n alluxio-test -- bash
$ ls -l /mnt/alluxio/fuse/bucket/
drwx------ 1 root root          0 Jan  1  1970 2023-10-17/
drwx------ 1 root root          0 Jan  1  1970 alluxio/
drwx------ 1 root root          0 Jan  1  1970 alluxio_ufs/
-rwx------ 1 root root     173279 Oct 17 08:26 log.tar.gz*
drwx------ 1 root root          0 Jan  1  1970 pach_alluxio/

# unmount (as needed)
$ alluxio mount remove --path /bucket
Unmounted /bucket from Alluxio.

Quick Verification - FIO

Follow the instructions here to install the FIO tool on the FUSE pod.

Execute the following tests via Alluxio FUSE with FIO

# sequetial write IO test
$ fio -name=seq_write -filename=/mnt/alluxio/fuse/bucket/write_test.10G.file -direct=1 -bs=1024k -size=10GB -rw=write -ioengine=psync -numjobs=1
seq_write: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1
fio-2.14
Starting 1 process
seq_write: Laying out IO file(s) (1 file(s) / 10240MB)
fio: posix_fallocate fails: Operation not supported
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/224.0MB/0KB /s] [0/224/0 iops] [eta 00m:00s]
seq_write: (groupid=0, jobs=1): err= 0: pid=11116: Mon Mar 11 06:28:55 2024
  write: io=10240MB, bw=234828KB/s, iops=229, runt= 44653msec
    clat (msec): min=3, max=140, avg= 4.33, stdev= 3.47
     lat (msec): min=3, max=140, avg= 4.36, stdev= 3.47
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    4], 10.00th=[    5], 20.00th=[    5],
     | 30.00th=[    5], 40.00th=[    5], 50.00th=[    5], 60.00th=[    5],
     | 70.00th=[    5], 80.00th=[    5], 90.00th=[    5], 95.00th=[    5],
     | 99.00th=[    8], 99.50th=[   18], 99.90th=[   23], 99.95th=[  122],
     | 99.99th=[  128]
    lat (msec) : 4=8.82%, 10=90.60%, 20=0.37%, 50=0.13%, 100=0.02%
    lat (msec) : 250=0.07%
  cpu          : usr=0.81%, sys=0.97%, ctx=20492, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=10240/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=10240MB, aggrb=234827KB/s, minb=234827KB/s, maxb=234827KB/s, mint=44653msec, maxt=44653msec

# sequential read IO test
$ fio -iodepth=1 -rw=read -ioengine=libaio -bs=256k -size=10G -numjobs=32 -group_reporting -filename=/mnt/alluxio/fuse/bucket/read_test.10G.file -name=test -direct=1 -runtime=30
test: (g=0): rw=read, bs=256K-256K/256K-256K/256K-256K, ioengine=libaio, iodepth=1
...
fio-2.14
Starting 32 processes
Jobs: 32 (f=32): [R(32)] [100.0% done] [1227MB/0KB/0KB /s] [4906/0/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=32): err= 0: pid=3730: Mon Mar 11 06:19:05 2024
  read : io=34902MB, bw=1160.5MB/s, iops=4641, runt= 30076msec
    slat (usec): min=4, max=87212, avg=4277.07, stdev=13632.11
    clat (usec): min=62, max=104489, avg=2585.50, stdev=12348.22
     lat (usec): min=81, max=167405, avg=6862.57, stdev=18198.34
    clat percentiles (usec):
     |  1.00th=[   93],  5.00th=[  110], 10.00th=[  124], 20.00th=[  143],
     | 30.00th=[  157], 40.00th=[  169], 50.00th=[  181], 60.00th=[  195],
     | 70.00th=[  211], 80.00th=[  237], 90.00th=[  318], 95.00th=[ 9408],
     | 99.00th=[79360], 99.50th=[81408], 99.90th=[86528], 99.95th=[89600],
     | 99.99th=[93696]
    lat (usec) : 100=2.03%, 250=81.33%, 500=9.16%, 750=0.45%, 1000=0.19%
    lat (msec) : 2=0.26%, 4=0.09%, 10=1.79%, 20=2.09%, 50=0.04%
    lat (msec) : 100=2.57%, 250=0.01%
  cpu          : usr=0.14%, sys=0.42%, ctx=276829, majf=0, minf=2393
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=139609/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=34902MB, aggrb=1160.5MB/s, minb=1160.5MB/s, maxb=1160.5MB/s, mint=30076msec, maxt=30076msec

Monitoring dashboard

A Grafana dashboard is deployed in the same namespace as the Alluxio cluster, exposed through port 8080 on its host machine. The port needs to be opened and not firewalled by the host machine.

If using EKS

  1. Run kubectl get pods -owide -n <alluxio namespace> | grep grafana to get the hostname of the node. It should be in the form of ip-10-0-6-132.ec2.internal.
  2. If the machine we are using to access Grafana are in the same private network as the host machine, access the Grafana UI directly through http://<hostname>:8080. Otherwise, identify the external IP of the host machine to use as the hostname in the URL. Run kubectl get nodes -owide to find the corresponding external IP.
     [centos@ip-172-31-92-52 ~]$ k get nodes -owide
     NAME                         STATUS   ROLES    AGE    VERSION                INTERNAL-IP   EXTERNAL-IP      OS-IMAGE         KERNEL-VERSION                 CONTAINER-RUNTIME
     ip-10-0-6-132.ec2.internal   Ready    <none>   210d   v1.22.17-eks-0a21954   10.0.6.132    35.173.122.123   Amazon Linux 2   5.4.247-162.350.amzn2.x86_64   docker://20.10.23
    

    In this example, the machine has an external IP of 35.173.122.123, so the Grafana UI should be accessible through http://35.173.122.123:8080

Appendix: Access Alluxio via Kubernetes CSI

Applications can use Alluxio FUSE as a Persistent Volume Claim (PVC) via CSI.

CSI yaml configuration file

Default configuration file at alluxio-operator/charts/alluxio-csi/values.yaml

If you are not able to access the internet, you will need to download the two dependent CSI images and upload them to the local image registry, then modify the values for provisioner.image and driverRegistrar.image to point to the corresponding local image addresses.

nameOverride: alluxio

image: alluxio/csi
imageTag: latest
imagePullPolicy: IfNotPresent
imagePullSecrets:

hostNetwork: false
dnsPolicy:

kubeletPath: /var/lib/kubelet

controllerPlugin:
  # NodeSelector for scheduling Alluxio CSI controller
  nodeSelector: {}
  # Schedule Alluxio CSI controller with affinity.
  affinity: {}
  # Additional tolerations for scheduling Alluxio CSI controller
  tolerations: []
  provisioner:
    image: registry.k8s.io/sig-storage/csi-provisioner:v2.0.5
    resources:
      limits:
        cpu: 100m
        memory: 300Mi
      requests:
        cpu: 10m
        memory: 20Mi
  controller:
    resources:
      limits:
        cpu: 200m
        memory: 200Mi
      requests:
        cpu: 10m
        memory: 20Mi
nodePlugin:
  # NodeSelector for scheduling Alluxio CSI nodePlugin
  nodeSelector: {}
  # Schedule Alluxio CSI nodePlugin with affinity.
  affinity: {}
  # Additional tolerations for scheduling Alluxio CSI nodePlugin
  tolerations: []
  nodeserver:
    resources:
      limits:
        cpu: 200m
        memory: 200Mi
      requests:
        cpu: 10m
        memory: 20Mi
  driverRegistrar:
    image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.0
    resources:
      limits:
        cpu: 100m
        memory: 100Mi
      requests:
        cpu: 10m
        memory: 20Mi

Update the Alluxio operator configuration at alluxio-operator/alluxio-operator.yaml

nameOverride: alluxio-operator
image: alluxio/operator # set to the accessible registry with the images
imageTag: 1.1.2
imagePullPolicy: Always

alluxio-csi: # enable CSI
  enabled: true
  image: alluxio/csi # set to the accessible registry with the images
  imageTag: 1.1.2

To disable the FUSE daemonset, update the following section of Alluxio cluster configuration at alluxio-operator/alluxio-cluster.yaml

spec:
  fuse:
    enabled: false

Check Alluxio configuration

The following steps will add a CSI FUSE volume in the application pod.

Add a sample pod in alluxio-operator/alluxio-cluster.yaml

apiVersion: v1
kind: Pod
metadata:
  name: fuse-test
  labels:
    app: fuse-test
spec:
  containers:
    - image: alpine:3.19
      imagePullPolicy: IfNotPresent
      name: fuse-test
      command: ["sleep", "infinity"]
      volumeMounts:
        - mountPath: /data
          name: alluxio-pvc
  volumes:
    - name: alluxio-pvc
      persistentVolumeClaim:
        claimName: alluxio-alluxio-csi-fuse-pvc

Run the sample pod and check

# Run the pod
$ kubectl apply -f alluxio-operator/app.yaml -n alluxio-test
pod/fuse-test created

# Enter the pod and check
$ kubectl exec -it fuse-test -n alluxio-test -- sh
$ ls -l /data/bucket/
drwx------    1 root     root             0 Jan  1  1970 2023-10-17
drwx------    1 root     root             0 Jan  1  1970 alluxio
drwx------    1 root     root             0 Jan  1  1970 alluxio_ufs
-rwx------    1 root     root        173279 Oct 17 08:26 log.tar.gz

Troubleshooting

  • Inspect and manipulate dataset credentials
    $ kubectl get crd
    
    # delete all crds ending with k8s-operator.alluxio.com
    $ kubectl delete crd datasets.k8s-operator.alluxio.com loads.k8s-operator.alluxio.com updates.k8s-operator.alluxio.com unloads.k8s-operator.alluxio.com alluxioclusters.k8s-operator.alluxio.com
    
  • Wipe the mount table on ETCD
    $ kubectl get pvc -n alluxio-test
    NAME                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    data-alluxio-etcd-0   Bound    pvc-7f066e82-6e56-4386-bcb4-da4bdbcf80f1   8Gi        RWO            gp2            58m
    
    $ kubectl delete pvc data-alluxio-etcd-0 -n alluxio-test
    persistentvolumeclaim "data-alluxio-etcd-0" deleted