Data Orchestration Hub
Data Orchestration Hub, or the Hub, is a management console that makes it easy to manage an analytics cluster and connect it with multiple data sources to unify data lakes. The service provides an easy to use unified management view for configuration and monitoring, and wizard based curation of deployment workflows.
- Connect Your Data Sources: Connect Alluxio to data storage and catalogs across multiple clouds, single cloud or on-premises using guided wizards.
- Monitor Your Alluxio Cluster: Monitor your Alluxio cluster.
- Manage Configuration: Set and distribute configuration for a cluster.
When to Use
Data Orchestration Hub is co-deployed on an analytics cluster running Alluxio, and is bundled with Alluxio Enterprise Edition. The Hub connects to a single co-located Alluxio cluster, and manages that instance of Alluxio only. Further instructions can be found in the deployment section below.
Once connected to an Alluxio cluster, the Hub can be used to modify the state of the Alluxio cluster, such as updating configuration and restarting processes. The following scenarios illustrate usage of the Hub web interface.
Scenario A: Managing an Alluxio cluster
The Hub can be used to view a dashboard to monitor the state of processes on the cluster, as well as update configuration and restart processes.
Monitor the status of an Alluxio cluster anywhere. You can start or stop cluster components from an intuitive UI.
Scenario B: Connecting to data sources across regions
Alluxio is used to connect a compute cluster with data sources across private data-centers and public clouds potentially over a wide area network. The Hub uses a self-guided wizard based approach to allow users to connect to data sources and catalogs in the same or remote data centers. A user is guided through the required configuration steps along with validation of the connection.
These wizards are applicable for multiple scenarios including: hybrid cloud, cross-data center, single cloud or private data center deployments.
Connect Alluxio to all your data sources across multiple clouds, single cloud or on-premises using self-guided wizards.
Further usage scenarios and descriptions for the available toolset can be found by following this section below.
Deployment
The Hub consists of the following components deployed on your Alluxio cluster.
- Hub Manager: The Hub Manager is the entrypoint for a user and the web server for the sole. This is a process that runs on the same node as a Alluxio Master by default, and provides the REST endpoints to serve UI requests. When using multiple Alluxio masters, any node can be chosen to deploy the Hub Manager.
- Hub Agent: The Hub agents are deployed on both Alluxio Masters and Alluxio Workers. These agent processes serve requests from the Hub Manager to make changes to the cluster without SSH access.
The following diagram illustrates the Hub architecture:
Hub Agents must be present on all managed nodes whereas the Hub Manager is a single instance.
Choose your compute environment to see how to deploy Data Orchestration Hub.
Configuration
For a complete list of properties applicable to the Hub, please search for properties prefixed with
alluxio.hub
on this page.
Note:
alluxio.hub.manager.web.login.username
andalluxio.hub.manager.web.login.password
define the necessary credentials to sign in to the console.alluxio.hub.manager.rpc.hostname
specifies the address of the Hub manager for the agents to register. If using a single Alluxio master, this configuration is not required when the Hub manager is co-located with the master process.
All other properties are optional. These properties should be set in alluxio-site.properties before starting the Hub processes. The mechanism varies depending on the compute environment selected as in the deployment section above.
What next
Once deployed, you can visit the web console at port 30077
(default) on the node running the Hub
Manager.
Sign in using the configured username and password.
Sign in using the admin credentials. Default: Username = 'alluxio', Password = 'alluxio'.
In the console you have access to the following:
- Process Management: Monitor status of each process part of the Alluxio cluster, and start / stop processes.
- Connect Data Storage: Connect Alluxio to your data sources across a hybrid cloud, single cloud or on-premises.
- Connect Data Catalog: Configure structured data catalogs for OLAP on Alluxio.
- Advanced Configuration: Customize your Alluxio cluster with advanced options.