FUSE-based POSIX API

Slack Docker Pulls

Overview

The Alluxio POSIX API is a client-side protocol that allows mounting an Alluxio File System as a standard file system on most Unix variants. By using this protocol, standard tools like ls, cat, or mkdir can access the distributed cache managed by Alluxio. More importantly, with POSIX API integration, applications can interact with Alluxio regardless of the language they are written in (C, C++, Python, Ruby, Perl, or Java), without requiring any Alluxio library integrations with existing applications.

Note that Alluxio-FUSE is different from projects like S3Fs or mountable HDFS, which mount specific storage services like S3 or HDFS to the local filesystem. The Alluxio POSIX API is a generic solution for the many storage systems supported by Alluxio. Data orchestration and caching features from Alluxio speed up I/O access to frequently used data.

Currently, the Alluxio POSIX API is widely used in model training and model distribution to inference servers.

Alluxio stack with its POSIX API

The Alluxio POSIX API is based on the Filesystem in Userspace (FUSE) project. Most basic file system operations are supported. However, given the intrinsic characteristics of Alluxio, like its write-once/read-many-times file data model, the mounted file system does not have full POSIX semantics and contains some limitations. Please read the functionalities and limitations for details.

There are some special characters and patterns in file path names that are not supported in Alluxio. Please avoid creating file path names with these patterns or acquire additional handling from client end:

  1. Question mark (‘?’)
  2. Pattern with period (./ and ../)
  3. Backslash (‘')

Use FUSE on Kubernetes

Prerequisites

Before following the instructions, make sure a functional Alluxio cluster has been installed. For more information, please refer to the installing Alluxio on Kubernetes page.

Mount with PVC provisioned by CSI

The Container Storage Interface (CSI) is a standard defined by Kubernetes to expose storage systems to the containers. And it is the default way to use Alluxio FUSE on Kubernetes.

The operator will create a PVC named alluxio-alluxio-csi-fuse-pvc after the installation of the cluster. You can mount the PVC to the pods you need, and the operator will create and bind proper PV.

apiVersion: v1
kind: Pod
metadata:
  name: fuse-test-0
  labels:
    app: alluxio
spec:
  containers:
    - image: busybox:stable
      imagePullPolicy: IfNotPresent
      name: fuse-test
      command: ["sleep", "infinity"]
      volumeMounts:
        - mountPath: /data
          name: alluxio-pvc
          mountPropagation: HostToContainer
  volumes:
    - name: alluxio-pvc
      persistentVolumeClaim:
        claimName: alluxio-alluxio-csi-fuse-pvc

In the configuration above, you’ll mount the FUSE to /data directory. Note the following details about the configuration:

  • All the pods or replica sets can use the same PVC. The pods on the same node will share the same FUSE process.
  • The mountPropagation is necessary for the auto-recover when the FUSE process crashes.

You can run I/O operations (e.g., shell commands, training) on top of the local directory. Here is a simple example:

$ kubectl exec -it fuse-test-0 -- bash
root@fuse-test-0:/$ ls /data/
s3
root@fuse-test-0:/$ echo "hello, world!" >/data/s3/message.txt
root@fuse-test-0:/$ ls /data/s3
message.txt
root@fuse-test-0:/$ cat /data/s3/message.txt
hello, world!

# accessing the path `/data/s3` will be the same as accessing `/s3` with other way to access the cluster
$ kubectl exec -it alluxio-master-0 -- alluxio fs ls /s3/message.txt
             14                 06-27-2024 07:54:40:000 FILE /message.txt

The operations will be translated and executed by the Alluxio system and may be executed on the under storage based on configuration.

Functionalities and Limitations

Most basic file system operations are supported. However, due to Alluxio implicit characteristics, some operations are not fully supported.

Category Supported Operations Unsupported Operations
Metadata Write Create file, delete file, create directory, delete directory, rename, change owner, change group, change mode Symlink, link, change access/modification time (utimens), change special file attributes (chattr), sticky bit
Metadata Read Get file status, get directory status, list directory status
Data Write Sequential write, append write, random write, overwrite, truncate Concurrent writes to the same file by multiple threads/clients
Data Read Sequential read, random read, multiple threads/clients concurrently reading the same file
Combinations FIFO special file type

To enable append write and random write, we need to add the configuration alluxio.user.fuse.random.access.file.stream.enabled=true.

Advanced Configuration

FUSE mount options

You can update the mountOptions configurations in the Alluxio Cluster YAML file to set mount options. If no mount option is provided, the value of Alluxio configuration alluxio.fuse.mount.options (default: direct_io) will be used. The available Linux mount options are listed here.

fuse:
  mountOptions:
    - allow_other
    - kernel_cache
    - entry_timeout=10000
    - attr_timeout=10000
    - max_idle_threads=256
    - max_background=256
Mount option Default value Tuning suggestion Description
direct_io enabled by default set when deploying AlluxioFuse in Kubernetes environment When `direct_io` is enabled, the kernel will not cache data and read-ahead. It eliminates the use of system buffer cache and improves pod stability in kubernetes environment
kernel_cache `kernel_cache` utilizes kernel system caching and improves read performance. This should only be enabled on filesystems where the file data is never changed externally through the underlying storage
auto_cache set when deploying AlluxioFuse in plain machine `auto_cache` utilizes kernel system caching and improves read performance. Instead of unconditionally keeping cached data, the cached data is invalidated if the modification time or the size of the file has changed since it was last opened. See [libfuse documentation](https://libfuse.github.io/doxygen/structfuse__config.html#a9db154b1f75284dd4fccc0248be71f66) for more info
attr_timeout=N 1.0 600 The timeout in seconds for which file/directory attributes are cached
big_writes Set Stop Fuse from splitting I/O into small chunks and speed up write. [Not supported in libfuse3](https://github.com/libfuse/libfuse/blob/master/ChangeLog.rst#libfuse-300-2016-12-08). Will be ignored if libfuse3 is used.
entry_timeout=N 1.0 600 The timeout in seconds for which name lookups will be cached
max_read=N 131072 Use default value Define the maximum size of data can be read in a single Fuse request. The default is infinite. Note that the size of read requests is limited anyway to 32 pages (which is 128kbon i386).
max_background=N 12 256 The maximum number of outstanding background requests that the FUSE kernel driver is allowed to submit.
max_idle_threads=N 10 256 the maximum number of idle FUSE daemon threads allowed. If the value is too small, FUSE may frequently create and destroy threads which will introduce extra performance overhead.

Mount FUSE without PVC

If the version of the Kubernetes doesn’t support CSI, or the cloud vendor doesn’t provide proper permission to use CSI, you can try to use the DaemonSet type of Alluxio FUSE. In this type, the FUSE pods need to be deployed on all nodes beforehand. You can use nodeSelector to restrict the deployment to specific nodes.

To use DaemonSet FUSE, change the alluxio-cluster.yaml configuration before deploying the cluster:

apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  fuse:
    type: daemonSet
    hostPathForMount: /mnt/alluxio/fuse # will use /mnt/alluxio/fuse if not specified
    nodeSelector:
      alluxio.com/selected-for-fuse: true

FUSE pods will be deployed on all the nodes with the label alluxio.com/selected-for-fuse: true.

DaemonSet FUSE will mount the FUSE to a path on the host specified with hostPathForMount. To mount the FUSE in your pod, add a hostPath volume:

apiVersion: v1
kind: Pod
metadata:
  name: fuse-test-0
  labels:
    app: alluxio
spec:
  containers:
    - image: busybox:stable
      imagePullPolicy: IfNotPresent
      name: fuse-test
      command: ["sleep", "infinity"]
      volumeMounts:
        - mountPath: /mnt/alluxio
          name: alluxio-fuse-mount
          mountPropagation: HostToContainer
  volumes:
    - name: alluxio-fuse-mount
      hostPath:
        path: /mnt/alluxio
        type: Directory

The example mounts the parent directory of the FUSE mount point and sets the mountPropagation. In this way, the mount point in the container can auto-recover when the FUSE process crashes.

Data isolation

The default way to mount a FUSE device will allow access to the root of the Alluxio namespace, which contains all the mount points. For those who want to provide FUSE for others and want to keep them from accessing the wrong files or modifying paths, here are some methods:

Use sub-path

Mount the PVC with a sub-path. This is suitable when you have control over the user’s pod.

apiVersion: v1
kind: Pod
metadata:
  name: fuse-test-0
  labels:
    app: alluxio
spec:
  containers:
    - image: busybox:stable
      imagePullPolicy: IfNotPresent
      name: fuse-test
      command: ["sleep", "infinity"]
      volumeMounts:
        - mountPath: /data
          name: alluxio-pvc
          mountPropagation: HostToContainer
          subPath: s3/path/to/files
  volumes:
    - name: alluxio-pvc
      persistentVolumeClaim:
        claimName: alluxio-alluxio-csi-fuse-pvc

In the example configuration, accessing the /data path in the container is the same as accessing the /s3/path/to/files in the Alluxio namespace.

DaemonSet FUSE can also use subPath, but this will break the propagation of the new FUSE mount point to the mount path in the container, preventing it from auto-recovering. Use with caution.

Create separate PVCs

Creating PVCs with custom StorageClass can make the PVC bound to a sub-path. This requires additional operations, but it’s suitable when you can’t control the user’s pod.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: alluxio-csi-s3
parameters:
  alluxioClusterName: alluxio
  alluxioClusterNamespace: default
  mountPath: /s3/path/to/files
provisioner: alluxio
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: alluxio-csi-s3
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Mi
  storageClassName: alluxio-csi-s3

Create the StorageClass and the PVC above, then mount the PVC to the container. Accessing the mount point in the container is equivalent to accessing /s3/path/to/files in the Alluxio namespace.