COS Object Storage Service

Slack Docker Pulls

This guide describes how to configure Alluxio with Tencent COS (Cloud Object Storage) as the under storage system. Tencent Cloud Object Storage (COS) is a distributed storage service offered by Tencent Cloud for massive data and accessible via HTTP/HTTPS protocols. It can store massive amounts of data and features imperceptible bandwidth and capacity expansion, making it a perfect data pool for big data computation and analytics.

Prerequisites

Alluxio runs on multiple machines in cluster mode so its binary package needs to be deployed on the machines.

Before using COS with Alluxio, either create a new bucket or use an existing one. Additionally, identify the directory you wish to use within that bucket, whether by creating a new directory or selecting an existing one. For this guide, the COS bucket name is COS_ALLUXIO_BUCKET, the directory within the bucket is COS_DATA, and the bucket region is COS_REGION.

Basic Setup

Alluxio unifies access to different storage systems through the unified namespace feature. COS UFS is used to access Tencent Cloud object storage and a COS location can be either mounted at the root of the Alluxio namespace or as a top-level directory.

To configure Alluxio to use COS as under storage, you will need to modify the configuration file conf/alluxio-site.properties. To configure Alluxio, if this is your first time modifying the configuration, create the configuration file from the template located at conf/alluxio-site.properties.template.

$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties

Edit conf/alluxio-site.properties file to set the under storage address to the COS bucket and the COS directory you want to mount to Alluxio. For example, the under storage address can be cos://COS_ALLUXIO_BUCKET/ if you want to mount the whole bucket to Alluxio, or cos://COS_ALLUXIO_BUCKET/COS_DATA if only the directory /COS_DATA inside the cos bucket COS_ALLUXIO_BUCKET is mapped to Alluxio.

alluxio.master.mount.table.root.ufs=cos://COS_ALLUXIO_BUCKET/COS_DATA/

Specify credentials for COS access by adding the following properties in conf/alluxio-site.properties:

fs.cos.access.key=<COS_SECRET_ID>
fs.cos.secret.key=<COS_SECRET_KEY>
fs.cos.region=<COS_REGION>
fs.cos.app.id=<COS_APP_ID>

Advanced Setup

COS multipart upload

The default upload method uploads one file completely from start to end in one go. We use multipart-upload method to upload one file by multiple parts, every part will be uploaded in one thread. It won’t generate any temporary files while uploading.

To enable COS multipart upload, you need to modify conf/alluxio-site.properties to include:

alluxio.underfs.cos.multipart.upload.enabled=true

There are other parameters you can specify in conf/alluxio-site.properties to make the process faster and better.

# Timeout for uploading part when using multipart upload.
alluxio.underfs.object.store.multipart.upload.timeout
# Thread pool size for COS multipart upload.
alluxio.underfs.cos.multipart.upload.threads
# Multipart upload partition size for COS. The default partition size is 64MB. 
alluxio.underfs.cos.multipart.upload.partition.size