Upgrade your Data Science Virtual Machine to Ubuntu 20.04

Caution

This article references CentOS, a Linux distribution that is nearing End Of Life (EOL) status. Please consider your use and planning accordingly. For more information, see the CentOS End Of Life guidance.

If you have a Data Science Virtual Machine running an older release such as Ubuntu 18.04 or CentOS, you should migrate your DSVM to Ubuntu 20.04. Migrating will ensure that you get the latest operating system patches, drivers, preinstalled software, and library versions. This document tells you how to migrate from either older versions of Ubuntu or from CentOS.

Prerequisites

  • Familiarity with SSH and the Linux command line

Overview

There are two possible ways to migrate:

  • In-place migration, also called "same server" migration. This migration upgrades the existing VM without creating a new virtual machine. In-place migration is the easier way to migrate from Ubuntu 18.04 to Ubuntu 20.04.
  • Side-by-side migration, also called "inter-server" migration. This migration transfers data from the existing virtual machine to a newly created VM. Side-by-side migration is the way to migrate from Centos to Ubuntu 20.04. You may prefer side-by-side migration for upgrading between Ubuntu versions if you feel your old install has become needlessly cluttered.

Snapshot your VM in case you need to roll back

In the Azure portal, use the search bar to find the Snapshots functionality.

Screenshot showing Azure portal and search bar, with **Snapshots** highlighted

  1. Select Add, which will take you to the Create snapshot page. Select the subscription and resource group of your virtual machine. For Region, select the same region in which the target storage exists. Select the DSVM storage disk and additional backup options. Standard HDD is an appropriate storage type for this backup scenario.

Screenshot showing 'Create snapshot' options

  1. Once all the details are filled and validations pass, select Review + create to validate and create the snapshot. When the snapshot successfully completes, you'll see a message telling you the deployment is complete.

In-place migration

If you're migrating an older Ubuntu release, you may choose to do an in-place migration. This migration doesn't create a new virtual machine and has fewer steps than a side-by-side migration. If you wish to do a side-by-side migration because you want more control or because you're migrating from a different distribution, such as CentOS, skip to the Side-by-side migration section.

  1. From the Azure portal, start your DSVM and sign in using SSH. To do so, select Connect and SSH and follow the connection instructions.

  2. Once connected to a terminal session on your DSVM, run the following upgrade command:

    sudo do-release-upgrade
    

The upgrade process will take a while to complete. When it's over, the program will ask for permission to restart the virtual machine. Answer Yes. You will be disconnected from the SSH session as the system reboots.

If necessary, regenerate SSH keys

Important

After upgrading and rebooting, you may need to regenerate your SSH keys.

After your VM has upgraded and rebooted, attempt to access it again via SSH. The IP address may have changed during the reboot, so confirm it before attempting to connect.

If you receive the error REMOTE HOST IDENTIFICATION HAS CHANGED, you'll need to regenerate your SSH credentials.

PowerShell screenshot showing remote host identification changed warning

To do so, on your local machine, run the command:

ssh-keygen -R "your server hostname or ip"

You should now be able to connect with SSH. If you're still having trouble, in the Connect page follow the link to Troubleshoot SSH connectivity issues.

Side-by-side migration

If you're migrating from CentOS or want a clean OS install, you can do a side-by-side migration. This type of migration has more steps, but gives you control over exactly which files are carried over.

Migrations from other systems based on the same set of upstream source packages should be relatively straightforward, for example FAQ/CentOS3.

You may choose to upgrade the operating system parts of the filesystem and leave user directories, such as /home in place. If you do leave the old user home directories in place expect some problems with the GNOME/KDE menus and other desktop items. It may be easiest to create new user accounts and mount the old directories somewhere else in the filesystem for reference, copying, or linking users' material after the migration.

Migration at a glance

  1. Create a snapshot of your existing VM as described previously

  2. Create a disk from that snapshot

  3. Create a new Ubuntu Data Science Virtual Machine

  4. Recreate user account(s) on the new virtual machine

  5. Mount the disk of the snapshotted VM as a data disk on your new Data Science Virtual Machine

  6. Manually copy the wanted data

Create a disk from your VM snapshot

If you haven't already created a VM snapshot as described previously, do so.

  1. In the Azure portal, search for Disks and select Add, which will open the Disk page.

Screenshot of Azure portal showing search for Disks page and the Add button

  1. Set the Subscription, Resource group, and Region to the values of your VM snapshot. Choose a Name for the disk to be created.

  2. Select Source type as Snapshot and select the VM snapshot as the Source snapshot. Review and create the disk.

Screenshot of disk creation dialog showing options

Create a new Ubuntu Data Science Virtual Machine

Create a new Ubuntu Data Science Virtual Machine using the Azure portal or an ARM template.

Recreate user account(s) on your new Data Science Virtual Machine

Since you'll just be copying data from your old computer, you'll need to recreate whichever user accounts and software environments that you want to use on the new machine.

Linux is flexible enough to allow you to customize directories and paths on your new installation to follow your old machine. In general, though, it's easier to use the modern Ubuntu's preferred layout and modify your user environment and scripts to adapt.

For more information, see Quickstart: Set up the Data Science Virtual Machine for Linux (Ubuntu).

Mount the disk of the snapshotted VM as a data disk on your new Data Science Virtual Machine

  1. In the Azure portal, make sure that your Data Science Virtual Machine is running.

  2. In the Azure portal, go to the page of your Data Science Virtual Machine. Choose the Disks blade on the left rail. Choose Attach existing disks.

  3. In the Disk name dropdown, select the disk that you created from your old VM's snapshot.

Screenshot of DSVM options page showing disk attachment options

  1. Select Save to update your virtual machine.

Important

Your VM should be running at the time you attach the data disk. If the VM isn't running, the disks may be added in an incorrect order, leading to a confusing and potentially non-bootable system. If you add the data disk with the VM off, choose the X beside the data disk, start the VM, and re-attach it.

Manually copy the wanted data

  1. Sign on to your running virtual machine using SSH.

  2. Confirm that you've attached the disk created from your old VM's snapshot by running the following command:

    lsblk -o NAME,HCTL,SIZE,MOUNTPOINT | grep -i 'sd'
    

    The results should look something like the following image. In the image, disk sda1 is mounted at the root and sdb2 is the /mnt scratch disk. The data disk created from the snapshot of your old VM is identified as sdc1 but isn't yet available, as evidenced by the lack of a mount location. Your results might have different identifiers, but you should see a similar pattern.

    Screenshot of lsblk output, showing unmounted data drive

  3. To access the data drive, create a location for it and mount it. Replace /dev/sdc1 with the appropriate value returned by lsblk:

    sudo mkdir /datadrive && sudo mount /dev/sdc1 /datadrive
    
  4. Now, /datadrive contains the directories and files of your old Data Science Virtual Machine. Move or copy the directories or files you want from the data drive to the new VM as you wish.

For more information, see Use the portal to attach a data disk to a Linux VM.

Connect and confirm version upgrade

Whether you did an in-place or side-by-side migration, confirm that you've successfully upgraded. From a terminal session, run:

cat /etc/os-release

And you should see that you're running Ubuntu 20.04.

Screenshot of Ubuntu terminal showing OS version data

The change of version is also shown in the Azure portal.

Screenshot of portal showing DSVM properties including OS version

Next steps