Troubleshoot Linux virtual machine boot issues due to filesystem errors
This article provides guidance to troubleshoot Linux virtual machine (VM) boot issues caused by filesystem errors.
You can't connect to an Azure Linux virtual machine (VM) by using the Secure Shell Protocol (SSH), or the VM Agent status in the Azure portal isn't Ready. When you run the Boot diagnostics in the Azure portal or connect to the Serial Console, you see log entries that resemble the following examples:
- Not all examples will be present.
- A mounting failure doesn't always result in a VM entering emergency mode. If the issue is with certain critical filesystems, the VM may not use emergency mode.
Example 1: Fail to mount ext4 filesystem
EXT4-fs (sda1): INFO: recovery required on readonly filesystem EXT4-fs (sda1): write access will be enabled during recovery EXT4-fs warning (device sda1): ext4_clear_journal_err:4531: Filesystem error recorded from previous mount: IO failure EXT4-fs warning (device sda1): ext4_clear_journal_err:4532: Marking fs in need of filesystem check.
Example 2: Fail to mount ext Logical Volume Manager (LVM) device
[ 14.382472] EXT4-fs error (device dm-0): ext4_iget:4398: inode #8: comm mount: bad extra_isize 4060 (inode size 256) [ 14.389648] EXT4-fs (dm-0): no journal found <snipped> [FAILED] Failed to mount /opt/data.
Example 3: Fail to mount xfs filesystem
[ 8.543984] XFS (sdc1): Metadata CRC error detected at xfs_agi_read_verify+0xd0/0xf0 [xfs], xfs_agi block 0x10 [ 8.553867] XFS (sdc1): Unmount and run xfs_repair [ 8.558993] XFS (sdc1): First 128 bytes of corrupted metadata buffer: [ 8.564893] 00000000: 58 41 47 49 00 00 00 01 00 00 00 00 00 1f ff c0 XAGI............ [ 8.572847] 00000010: 00 00 00 40 00 00 00 06 00 00 00 01 00 00 00 3d ...@...........= [ 8.580476] 00000020: 00 00 00 60 ff ff ff ff ff ff ff ff ff ff ff ff ...`............ [ 8.588219] 00000030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 8.596280] 00000040: ff 07 f8 ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 8.603575] 00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 8.610849] 00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 8.619261] 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 8.629731] XFS (sdc1): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x10 len 8 error 74 [ 8.637799] XFS (sdc1): xfs_imap_lookup: xfs_ialloc_read_agi() returned error -117, agno 0 [FAILED] Failed to mount /data. See 'systemctl status data.mount' for details. [DEPEND] Dependency failed for Local filesystems.
Example 4: Boot into emergency mode
You are in emergency mode. After logging in, type "journalctl -xb" to view system logs, "systemctl reboot" to reboot, "systemctl default" or "exit" to boot into default mode. Give root password for maintenance (or press Control-D to continue):
The log entries above indicate disk corruption. In certain situations, disk corruption will prevent the VM from fully booting. Various issues can cause disk corruption, such as Linux kernel problems, driver errors, errors in the underlying physical or virtual hardware, and so on.
To resolve the Linux VM boot issues caused by filesystem errors, recover the VM by repairing the disk corruption. To repair the disk corruption, follow these steps:
Prepare the recovery environment according to the recovery mode you select:
Use command-line tools to repair the problematic filesystem on the disk.
- It's important to back up critical data because data loss may occur on the recovered disk.
- Before you make changes to a disk, take a snapshot to preserve the current state of the disk, even if it's in an error state. Fixing the disk corruption will change the data on the disk, which will carry risk.
Identify which disk is corrupted
To determine which disk is corrupted, download the serial log for your VM by using the Serial Console or Boot diagnostics, examine the log entries during boot up, and then look for the specific error calling out which disk or mount is failing.
Here are three log entry examples. In these examples, note the text in parenthesis, which reports the corrupted device.
In the following example, the corrupted device is
[ 14.285807] XFS (sdc1): Mounting V5 Filesystem [ 14.426283] XFS (sdc1): Metadata CRC error detected at xfs_agi_read_verify+0xde/0x100 [xfs], xfs_agi block 0x10 [ 14.426284] XFS (sdc1): Unmount and run xfs_repair <snipped> [FAILED] Failed to mount /opt/parent.
In the following example, the partition where a filesystem error occurs is
EXT4-fs (sda1): INFO: recovery required on readonly filesystem EXT4-fs (sda1): write access will be enabled during recovery EXT4-fs warning (device sda1): ext4_clear_journal_err:4531: Filesystem error recorded from previous mount: IO failure EXT4-fs warning (device sda1): ext4_clear_journal_err:4532: Marking fs in need of filesystem check. <snipped> [FAILED] Failed to mount /boot.
In the following example, the corrupted device is
dm-2. It's a Linux Device Mapper device, which indicates an LVM volume.
[ 18.014318] EXT4-fs (dm-2): VFS: Can't find ext4 filesystem [FAILED] Failed to mount /home. See 'systemctl status home.mount' for details. [DEPEND] Dependency failed for Local File Systems. [DEPEND] Dependency failed for Mark the need to relabel after reboot.
If the disk device being called out uses a name of the format "sdXN" where X is a letter from a-z and N is an optional partition number, it means that the disk is raw and can be operated on by using the /dev/sdXN path.
If the disk device being mounted uses a name such as /dev/mapper/vgname/lvname, /dev/vgname/lvname, or dm-N, it means that an LVM device is used. Take care to recognize all disk physical volumes (PVs) which may be in use.
It's not supported for the LVM volume group (VG) to contain the OS disk and any number of data disks. For such a scenario, there's a high risk of data loss. However, multiple data disks are permissible in an LVM VG.
When determining the mapping of OS disk references to Azure disk objects:
- For marketplace images, the root filesystem (/), /boot and /boot/efi is located on the OS disk.
- For LVM based images, many other system mounts may exist such as /home, /tmp, /usr, /var, /var/log, and /opt.
- Extra filesystems created for applications are located on data disks, for example, /data, /datadisk, or /sap. Configure them properly so that the system can boot even if there's an error. If a data disk is a device that boots into emergency mode, see prevent boot failure.
Identify filesystem type
While doing initial identification, the only method to determine the disk type is using the serial log as previously examined in Identify which disk is corrupted. When the disk device is reported in the serial log, errors will be displayed from the Linux kernel module for the filesystem. Note each line where
XFS is specified. For any other filesystem types, the log is in the same area. The filesystem noted in the log entries is determined by the /etc/fstab file. Take care to verify that the specified format is correct when performing a repair.
Once you have access to an interactive shell, run the
lsblk command with the
-f flag as follows to show devices, paths (if the filesystem is mounted), and the filesystem type that's read from the disk itself.
[root@localhost ~]# lsblk -f NAME FSTYPE LABEL UUID MOUNTPOINT sda |-sda1 vfat 93DA-8C20 /boot/efi |-sda2 xfs d5da486e-fdfe-4ad8-bc01-aa72b91fd47d /boot |-sda3 `-sda4 LVM2_member pdSI2Q-ZEzV-oT6P-R2JG-ZW3h-cmnf-iRN6pU |-rootvg-tmplv xfs 9098eb05-0176-4997-8132-9152a7bef207 /tmp |-rootvg-usrlv xfs 2f9ff36c-742d-4914-b463-d4152801b95d /usr |-rootvg-optlv xfs aeacea8e-3663-4569-af25-c52357f8a0a3 /opt |-rootvg-homelv xfs a79e43dc-7adc-41b4-b6e1-4e6b033b15c0 |-rootvg-varlv xfs c7cb68e9-7865-4187-b3bd-e9a869779d86 /var `-rootvg-rootlv xfs d8dc4d62-ada5-4952-a0d9-1bce6cb6f809 / sdb `-sdb1 ext4 1dac7c4c-bf8e-4964-8a59-7359eef53d0a /mnt sdc LVM2_member CRWEZQ-iLhH-ev0b-BAaA-dfLD-nbPT-GgtG0r `-vgapp-lvapp xfs 733e25ee-565f-4bfa-a2a1-2451efd25cd1 sdd `-sdd1 ext4 704d9fb1-2207-4bb9-998c-029f776dc6d2 /opt/data
Here are some important points in the output:
- By using the ASCII art display, you can see that there are LVM volumes present because there's an LVM2_MEMBER FSTYPE for sda4 containing objects with names such as
rootvg-homelvisn't mounted, which is denoted by the empty MOUNTPOINT field.
rootvg-homelvhas filesystem type XFS. It's a contrast with the EXT4 mount error during booting up. If the filesystem type is inconsistent, trust the
lsblkoutput rather than the contents of fstab.
Select recovery mode
You can recover a VM online through emergency mode or single-user mode or offline by using a rescue VM.
Requirements for online recovery
The Serial Console access to the VM.
If emergency mode is used, the Serial Console must display an emergency mode prompt, the root account must be unlocked, and the password must be known.
If single-user mode is used, the root password isn't needed. The single-user mode may be used when a filesystem other than required system partitions such as root (
Requirements for offline recovery
If the Serial Console requirements for online recovery can't be met, perform offline recovery by using a rescue VM. To perform offline recovery, the ability to create a VM and manage disks in Azure is required. Alternatively, you can use a functioning Linux VM with Azure-level access to the corrupted disks.
Prepare environment for online recovery
When the emergency mode is displayed in the sign-in prompt as follows, enter the root password:
Welcome to emergency mode! After logging in, type "journalctl -xb" to view system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to try again to Give root password for maintenance (or press Control-D to continue):
If the root password isn't known, or the root account is locked, as in the following output, use single-user mode:
Welcome to emergency mode! After logging in, typ Cannot open access to console, the root account is locked. See sulogin(8) man page for more details. Press Enter to continue.
If the online recovery environment is unusable, proceed to offline recovery.
Prepare environment for offline recovery
In single disk VMs, or when the failing mount is a system partition such as the root filesystem (
/usr, the most reliable method to repair the disk is by using a rescue VM to gain access to the disk. You can create a rescue VM automatically or manually.
For automated creation of a rescue VM, see Azure Virtual Machine Repair. For manual creation of a rescue VM, see creating a recovery VM. In either case, don't mount the volumes from the problem disk because a filesystem must not be mounted for repair utilities to operate.
Perform filesystem repair
Before repairing the filesystem, ensure that the following steps have been completed:
- The problem disk and partition, or LVM volume structure, has been identified.
- The filesystem type has been determined.
- (Optional) A copy of the problem disk, or disks in a spanned LVM volume group, has been attached to a rescue VM.
- Access to an interactive shell has been secured by using access to the disk.
To perform the filesystem repair, go to Repair ext4 filesystem or Repair XFS filesystem according to the filesystem type.
No matter what recovery mode is used, the commands to perform the filesystem repair are the same. The emergency shell may have limitations. If the commands aren't available in an emergency mode environment, or there are errors about unknown filesystem types, prepare environment for offline recovery.
The commands to repair the filesystem may not fix all errors. They work around disk corruptions, but data loss still may occur. Once the command output states that the filesystem is clean, reassemble the original VM with the repaired disk, and boot the VM to verify data.
In the following sections,
/dev/sdc1 is the corrupted filesystem in raw mode, and the LV
homelv in the VG
rootvg is the LVM volume. Substitute these values for the actual corrupted filesystem in all instances.
Repair ext4 filesystem
fsck [-y] FILESYSTEM command to repair an ext4 filesystem. Specify the filesystem as a disk partition for a raw filesystem, for example
/dev/sdc1, or the LVM logical volume path
Here's a command output example:
[root@vm1dev ~]# fsck /dev/sdc1 fsck from util-linux 2.23.2 e2fsck 1.42.9 (28-Dec-2013) ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap fsck.ext4: Group descriptors look bad... trying backup blocks... /dev/sdc1 was not cleanly unmounted, check forced. Resize inode not valid. Recreate<y>? yes Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (23508, counted=23509). Fix<y>? yes Free blocks count wrong (8211645, counted=8211646). Fix<y>? yes /dev/sdc1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sdc1: 11/2097152 files (0.0% non-contiguous), 176706/8388352 blocks [root@vm1dev ~]#
The output shows that the confirmation to modify the filesystem is requested three times. If there are many requests, press CTRL+C and restart
fsck with the
-y flag to assume "yes" to all questions. If any files are reported as being placed in
lost+found, manually identify them and place them in proper locations.
If some errors occur and are subsequently fixed, run the
fsck command again. Repeat until the
fsck command exits with the
clean status. Refer to the following output as an example:
[root@vm1dev ~]# fsck /dev/sdc1 fsck from util-linux 2.23.2 e2fsck 1.42.9 (28-Dec-2013) /dev/sdc1: clean, 11/2097152 files, 176706/8388352 blocks [root@vm1dev ~]#
Repair xfs filesystem
Here are commands to repair an XFS filesystem:
xfs_repair [-n] FILESYSTEM
xfs_repair [-L] FILESYSTEM
mount FILESYSTEM MOUNTPOINT
To repair an XFS filesystem, follow these steps:
Check filesystem errors by using the
xfs_repair -ncommand, as follows:
xfs_repair -n /dev/rootvg/homelv
If the check succeeds, continue with the repair mode by removing the
-nflag, which will try to fix any encountered errors, as follows:
For XFS filesystems, journaled but uncommitted changes are dealt with by mounting the filesystem. If you encounter the following error during the troubleshooting, attempt a mount and view the results.
ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed
If a recovery VM is used, create a directory for a temporary mount point, such as
/recovery, and mount the filesystem. If the recovery environment is in emergency or single-user mode, mount the filesystem on its intended location. Refer to the following commands as examples:
mount /dev/rootvg/homelv /recovery
If the journaled changes aren't written when you mount filesystems, use the
-L flag to discard the journal and mount the filesystem as if all changes are successfully completed. When the
-L flag is used, data loss will occur because the log shows incomplete file operations are being discarded.
xfs_repair -L /dev/rootvg/homelv /recovery
Prevent boot failure
nofail option is specified when mounting filesystems, the corruption of a non-critical filesystem may not prevent Linux from booting fully. For more information about
nofail, see Mount the disk. Most mounts aside from the root (
/var can be done with
Contact us for help
If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure community support.