Install Machine Learning Server using Cloudera Manager
महत्वपूर्ण
This content is being retired and may not be updated in the future. The support for Machine Learning Server will end on July 1, 2022. For more information, see What's happening to Machine Learning Server?
Applies to: Machine Learning Server 9.2.1 | 9.3 | 9.4
This article explains how to generate, deploy, and activate an installation parcel for Machine Learning Server on a Cloudera distribution of Apache Hadoop (CDH).
Cloudera offers a parcel installation methodology for adding services and features to a cluster. On a Hadoop cluster, Machine Learning Server runs on the edge node and all data nodes. You can use a parcel to distribute and activate the service on all nodes within your CDH cluster.
You can create a parcel generation script on any supported version of Linux, but execution requires CentOS or RHEL 7.0 as the native file system.
The parcel generator excludes any R Server features that it cannot install, such as operationalization.
If parcel installation is too restrictive, follow the instructions for a generic Hadoop installation instead.
This section explains how to obtain the parcel generation script and simulate parcel creation.
A package manager installation used for Linux or Hadoop won't provide the parcel generation scripts. To get the scripts, obtain a gzipped distribution of Machine Learning Server from Volume licensing.
- Log on as root or a user with super user privileges:
sudo su
- Switch to the /tmp/ directory (assuming it's the download location):
cd /tmp/
- Unpack the file:
tar zxvf en_microsoft_ml_server_947_for_hadoop_x64_<some-number>.tar.gz
The distribution is unpacked into a Hadoop folder at the download location. The distribution includes the following files:
File or folder | Description |
---|---|
install.sh |
Script for installing Machine Learning Server. Do not use this for a parcel install. |
generate_mlserver_parcel.sh |
Script for generating a parcel used for installing Machine Learning Server on CDH. |
EULA.txt |
End-user license agreements for each separately licensed component. |
DEB folder | Contains Machine Learning packages for deployment on Ubuntu. |
RPM folder | Contains Machine Learning packages for deployment on CentOS/RHEL and SUSE |
Parcel folder | Contains files used to generate a parcel for installation on CDH. |
The script includes a -n flag that simulates parcel generation. Start with a dry run to review the prompts.
The script downloads Microsoft R Open and builds a parcel by extracting information from RPM packages. You can append flags to run unattended setup or customize feature selections.
Switch to the Hadoop directory:
cd /Hadoop
Run the script with -n to simulate parcel generation:
bash generate_mlserver_parcel.sh -n
You are prompted to read and accept license agreements.
You are also asked to specify the underlying operating system. If the platform supports it, the parcel generator adds installation instructions for features having a dependency on .NET Core, such as Microsoft machine learning and operationalization features.
When the script is finished, the location of the parcel, checksum, and CSD is printed to the console. Remember the files do not yet exist. This is just a dry run. Running the script without -n generates the files.
You can run parcel generator with the following flags to suppress prompts or choose components.
flag | Option | Description |
---|---|---|
-m | --distro-name [DISTRO] | Target Linux distribution for this parcel, one of: el6 el7 sles11 |
-a | --accept-eula | Accept all end-user license agreements. |
-d | --download-mro | Download Microsoft r open for distribution to an offline system. |
-s | --silent | Perform a silent, unattended install. |
-t | --server-type [TYPE] | 9.4 only: Target server type for this parcel, one of: all python r. Default: all. |
-u | --unattended | Perform an unattended install. |
-n | --dry-run | Don't do anything, just show what would be done. |
-h | --help | Print this help text. |
Repeat the command without -n parameter to create the files: bash generate_mlserver_parcel.sh
- The parcel generator file name is MLServer-9.4.7-[DISTRO].parcel
- The CSD file name is MLServer-{version}-CONFIG.jar
नोट
The parcel generator file name includes a placeholder for the distribution. Remember to replace it with a valid value before executing the copy commands.
This section explains how to place parcel generator script and CSD files in CDH.
By default, Cloudera Manager finds parcels in the Cloudera parcel repository. In this step, copy the parcel you generated to the repository.
Copy MLServer-9.4.7 and MLServer-9.4.7.sha to the Cloudera parcel repository, typically /opt/cloudera/parcels and make sure the permissions are set correctly.
cp ./MLServer-9.4.7-[DISTRO].parcel /opt/cloudera/parcel-repo/
cp ./MLServer-9.4.7-[DISTRO].parcel.sha /opt/cloudera/parcel-repo/
sudo chmod 644 /opt/cloudera/parcel-repo/MLServer-{version}-[DISTRO].parcel
sudo chmod 644 /opt/cloudera/parcel-repo/MLServer-{version}-[DISTRO].parcel
sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/MLServer-{version}-[DISTRO].parcel
sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/MLServer-{version}-[DISTRO].parcel
The Custom Service Descriptor (CSD) enables monitoring and administration from within Cloudera Manager. In this step, copy the CSD (a .jar file) to the Cloudera repository for CSD files.
Copy the CSD file MLServer-9.4.7-CONFIG.jar to the Cloudera CSD directory, typically /opt/cloudera/csd.
cp ./MLServer-9.4.7-CONFIG.jar /opt/cloudera/csd/
Modify the permissions of CSD file as follows:
sudo chmod 644 /opt/cloudera/csd/MLServer-9.4.7-CONFIG.jar
sudo chown cloudera-scm:cloudera-scm /opt/cloudera/csd/MLServer-9.4.7-CONFIG.jar
Restart the cloudera-scm-server service:
sudo service cloudera-scm-server restart
In Cloudera Manager, click the parcel icon on the top right menu bar.
On the left, find and select MLServer-9.4.7 in the parcel list. If you don't see it, check the parcel-repo folder.
On the right, in the parcel details page, MLServer-9.4.7 should have a status of Downloaded with an option to Distribute. Click Distribute to roll out Machine Learning Server on available nodes.
Status changes to distributed. Click Activate on the button to make Machine Learning Server operational in the cluster.
You are finished with this task when status is "distributed, activated" and the next available action is Deactivate.
In Cloudera Manager home page, click the down arrow by the cluster name and choose Add Service.
Find and select MLServer-9.4.7 and click Continue to start a wizard for adding services.
In the next page, add role assignments on all nodes used to run the service, both edge and data nodes. Click Continue.
On the last page, click Finish to start the service.
Machine Learning Server should now be deployed in the cluster.
You have the option of rolling back the active deployment in Cloudera Manager, perhaps to use an older version. You can have multiple versions in Cloudera, but only can be active at any given time.
In Cloudera Manager, click the Parcel icon to open the parcel list.
Find MLServer-9.4.7 and click Deactivate.
The parcel still exists, but Machine Learning Server is not operational in the cluster.
The above steps apply to 9.3.0 and 9.4.7. If you have R Server (either 9.1 or 9.0.1), see Install R Server 9.1 on CDH and Install R Server 9.0.1 on CDH for release-specific documentation.
We recommend starting with How to use RevoScaleR with Spark or How to use RevoScaleR with Hadoop MapReduce.
For a list of functions that utilize Yarn and Hadoop infrastructure to process in parallel across the cluster, see Running a distributed analysis using RevoScaleR functions.
R solutions that execute on the cluster can call functions from any R package. To add new R packages, you can use any of these approaches:
- Use a parcel and create new parcel using generate_mlserver_parcel.sh script.
- Use the RevoScaleR rxExec function to add new packages.
- Manually run install.packages() on all nodes in Hadoop cluster (using distributed shell or some other mechanism).