Azure Data Lake Storage Gen2
This guide describes how to configure Alluxio with Azure Data Lake Storage Gen2 as the under storage system.
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage. It converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage.
For more information about Azure Data Lake Storage Gen1, please read its documentation.
Prerequisites
If you haven’t already, please see Prerequisites before you get started.
In preparation for using Azure Data Lake Storage Gen2 with Alluxio, create a new Data Lake storage in your Azure account or use an existing Data Lake storage.
<AZURE_CONTAINER> |
The container you want to use |
<AZURE_DIRECTORY> |
The directory you want to use in the container, either by creating a new directory or using an existing one |
<AZURE_ACCOUNT> |
Your Azure storage account |
You also need a SharedKey to authorize requests.
Basic Setup
To use Azure Data Lake Storage Gen2 as the UFS of Alluxio root mount point,
you need to configure Alluxio to use under storage systems by modifying
conf/alluxio-site.properties
. If it does not exist, create the configuration file from the
template.
$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
Specify the underfs address by modifying conf/alluxio-site.properties
to include:
alluxio.dora.client.ufs.root=abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/
Setup with Shared Key
Specify the Shared Key by adding the following property in conf/alluxio-site.properties
:
fs.azure.account.key.<AZURE_ACCOUNT>.dfs.core.windows.net=<AZURE_SHARED_KEY>
Setup with OAuth 2.0 Client Credentials
Specify the OAuth 2.0 Client Credentials by adding the following property in conf/alluxio-site.properties
:
(Please note that for URL Endpoint, use the V1 token endpoint)
fs.azure.account.oauth2.client.endpoint=<OAUTH_ENDPOINT>
fs.azure.account.oauth2.client.id=<CLIENT_ID>
fs.azure.account.oauth2.client.secret=<CLIENT_SECRET>
Setup with Azure Managed Identities
Specify the Azure Managed Identities by adding the following property in conf/alluxio-site.properties
:
fs.azure.account.oauth2.msi.endpoint=<MSI_ENDPOINT>
fs.azure.account.oauth2.client.id=<CLIENT_ID>
fs.azure.account.oauth2.msi.tenant=<MSI_TENANT>
Running Alluxio Locally with Data Lake Storage
Once you have configured Alluxio to Azure Data Lake Storage Gen2, try running Alluxio locally to see that everything works.