Set up a common identity on a Data Science Virtual Machine

On a Microsoft Azure virtual machine (VM), including a Data Science Virtual Machine (DSVM), you create local user accounts while provisioning the VM. Users then authenticate to the VM by using these credentials. If you have multiple VMs that your users need to access, managing credentials can get very cumbersome. An excellent solution is to deploy common user accounts and management through a standards-based identity provider. Through this approach, you can use a single set of credentials to access multiple resources on Azure, including multiple DSVMs.

Active Directory is a popular identity provider and is supported on Azure both as a cloud service and as an on-premises directory. You can use Microsoft Entra ID or on-premises Active Directory to authenticate users on a standalone DSVM or a cluster of DSVMs in an Azure virtual machine scale set. You do this by joining the DSVM instances to an Active Directory domain.

If you already have Active Directory, you can use it as your common identity provider. If you don't have Active Directory, you can run a managed Active Directory instance on Azure through Microsoft Entra Domain Services.

The documentation for Microsoft Entra ID provides detailed management instructions, including guidance about connecting Microsoft Entra ID to your on-premises directory if you have one.

This article describes how to set up a fully managed Active Directory domain service on Azure by using Microsoft Entra Domain Services. You can then join your DSVMs to the managed Active Directory domain. This approach enables users to access a pool of DSVMs (and other Azure resources) through a common user account and credentials.

Set up a fully managed Active Directory domain on Azure

Microsoft Entra Domain Services makes it simple to manage your identities by providing a fully managed service on Azure. On this Active Directory domain, you manage users and groups. To set up an Azure-hosted Active Directory domain and user accounts in your directory, follow these steps:

  1. In the Azure portal, add the user to Active Directory:

    1. Sign in to the Azure portal as a Global Administrator.

    2. Browse to Microsoft Entra ID > Users > All users.

    3. Select New user.

      The User pane opens:

      The "User" pane

    4. Enter details for the user, such as Name and User name. The domain name portion of the user name must be either the initial default domain name "[domain name].onmicrosoft.com" or a verified, non-federated custom domain name such as "contoso.com."

    5. Copy or otherwise note the generated user password so that you can provide it to the user after this process is complete.

    6. Optionally, you can open and fill out the information in Profile, Groups, or Directory role for the user.

    7. Under User, select Create.

    8. Securely distribute the generated password to the new user so that they can sign in.

  2. Create a Microsoft Entra Domain Services instance. Follow the instructions in Enable Microsoft Entra Domain Services using the Azure portal (the "Create an instance and configure basic settings" section). It's important to update the existing user passwords in Active Directory so that the password in Microsoft Entra Domain Services is synced. It's also important to add DNS to Microsoft Entra Domain Services, as described under "Complete the fields in the Basics window of the Azure portal to create a Microsoft Entra Domain Services instance" in that section.

  3. Create a separate DSVM subnet in the virtual network created in the "Create and configure the virtual network" section of the preceding step.

  4. Create one or more DSVM instances in the DSVM subnet.

  5. Follow the instructions to add the DSVM to Active Directory.

  6. Mount an Azure Files share to host your home or notebook directory so that your workspace can be mounted on any machine. (If you need tight file-level permissions, you'll need Network File System [NFS] running on one or more VMs.)

    1. Create an Azure Files share.

    2. Mount this share on the Linux DSVM. When you select Connect for the Azure Files share in your storage account in the Azure portal, the command to run in the bash shell on the Linux DSVM appears. The command looks like this:

    sudo mount -t cifs //[STORAGEACCT].file.core.windows.net/workspace [Your mount point] -o vers=3.0,username=[STORAGEACCT],password=[Access Key or SAS],dir_mode=0777,file_mode=0777,sec=ntlmssp
    
  7. For example, assume that you mounted your Azure Files share in /data/workspace. Now, create directories for each of your users in the share: /data/workspace/user1, /data/workspace/user2, and so on. Create a notebooks directory in each user's workspace.

  8. Create symbolic links for notebooks in $HOME/userx/notebooks/remote.

You now have the users in your Active Directory instance hosted in Azure. By using Active Directory credentials, users can sign in to any DSVM (SSH or JupyterHub) that's joined to Microsoft Entra Domain Services. Because the user workspace is on an Azure Files share, users have access to their notebooks and other work from any DSVM when they're using JupyterHub.

For autoscaling, you can use a virtual machine scale set to create a pool of VMs that are all joined to the domain in this fashion and with the shared disk mounted. Users can sign in to any available machine in the virtual machine scale set and have access to the shared disk where their notebooks are saved.

Next steps