适用于:✔️ Linux VM
错误的 HugePages 配置可能会导致 Linux 虚拟机(VM)中的启动问题,从而导致内存分配问题。 本文提供了启动问题的原因和解决方案。
先决条件
- 访问 Azure 串行控制台。
- 熟悉 Linux 命令和系统管理。
现象
由于 HugePages 配置不正确,可能会出现以下问题:
将 Linux VM 从本地迁移到 Azure 后,VM 无法正确启动。
缩小 Azure Linux VM 的大小后,它无法正确启动。
使用某些服务启动 Azure Linux VM 时,它无法正常启动。 登录提示显示缓慢,无法登录。
查看各种 Linux VM(Red Hat、Oracle、SUSE 或 Ubuntu)的串行控制台日志时,通常会发现以下问题:
- 无法启动 udev 内核设备管理器。
- 等待特定设备超时。
- 本地文件系统和紧急模式的依赖项故障。
- 由 systemd-udevd 服务或其他进程导致的内存不足错误。
- 无法启动各种网络和系统服务。
- 系统在登录或启动时挂起。
下面是一些日志示例:
输出 1
[FAILED] Failed to start udev Kernel Device Manager. See 'systemctl status systemd-udevd.service' for details. [FAILED] Failed to start udev Kernel Device Manager. See 'systemctl status systemd-udevd.service' for details. [ TIME ] Timed out waiting for device dev-di...\x2db5ea\x2dcc7e4fcafaad.device. [DEPEND] Dependency failed for /boot. [DEPEND] Dependency failed for Local File Systems. [DEPEND] Dependency failed for Relabel all filesystems, if necessary. [DEPEND] Dependency failed for Mark the need to relabel after reboot. [DEPEND] Dependency failed for Migrate local... structure to the new structure. [DEPEND] Dependency failed for /boot/efi. [ TIME ] Timed out waiting for device dev-ttyS0.device. [ TIME ] Timed out waiting for device dev-di...-azure_resource\x2dpart1.device. [DEPEND] Dependency failed for File System C...disk/cloud/azure_resource-part1. [ TIME ] Timed out waiting for device dev-disk-by\x2duuid-D147\x2d2497.device. [ OK ] Reached target Timers. [ OK ] Reached target Login Prompts. [ OK ] Reached target Cloud-init target. [ OK ] Reached target Network (Pre). [ OK ] Reached target Cloud-config availability. [ OK ] Reached target Network. [ OK ] Reached target Network is Online. [FAILED] Failed to start Crash recovery kernel arming. See 'systemctl status kdump.service' for details. [ OK ] Reached target Paths. [ OK ] Reached target Sockets. [FAILED] Failed to start Import network configuration from initramfs. See 'systemctl status rhel-import-state.service' for details. [FAILED] Failed to start Create Volatile Files and Directories. See 'systemctl status systemd-tmpfiles-setup.service' for details. [FAILED] Failed to start Emergency Shell. See 'systemctl status emergency.service' for details. [DEPEND] Dependency failed for Emergency Mode. [FAILED] Failed to start Tell Plymouth To Write Out Runtime Data. See 'systemctl status plymouth-read-write.service' for details. [FAILED] Failed to start Security Auditing Service. See 'systemctl status auditd.service' for details. [FAILED] Failed to start Update UTMP about System Boot/Shutdown. See 'systemctl status systemd-update-utmp.service' for details. [DEPEND] Dependency failed for Update UTMP about System Runlevel Changes. [ OK ] Started Monitoring of LVM2 mirrors,...ng dmeventd or progress polling. [ OK ] Reached target Local File Systems (Pre).
输出 2
[ 385.665208][ T709] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [ 385.675431][ T709] Call Trace: [ 385.678028][ T709] <TASK> [ 385.680574][ T709] dump_stack_lvl+0x45/0x5b [ 385.684222][ T709] dump_header+0x4a/0x220 [ 385.695420][ T709] oom_kill_process+0xe8/0x140 [ 385.710672][ T709] out_of_memory+0x113/0x580 [ 385.716986][ T709] __alloc_pages_slowpath.constprop.112+0x980/0xc70 [ 385.723770][ T709] __alloc_pages+0x2d9/0x320 [ 385.761856][ T709] pagecache_get_page+0x1a8/0x480 [ 385.774844][ T709] filemap_fault+0x4c1/0xa30 [ 385.779807][ T709] ? next_uptodate_page+0x11e/0x270 [ 385.785288][ T709] __xfs_filemap_fault+0x5c/0x280 [xfs 2321eaa9ab2ad44a628250c07d85c25988bf91a0] [ 385.805707][ T709] __do_fault+0x31/0xc0 [ 385.809628][ T709] __handle_mm_fault+0xd64/0x1220 [ 385.838902][ T709] ? switch_fpu_return+0x49/0xd0 [ 385.843779][ T709] handle_mm_fault+0xd7/0x290 [ 385.848516][ T709] do_user_addr_fault+0x1eb/0x730 [ 385.853696][ T709] exc_page_fault+0x67/0x150 [ 385.858809][ T709] ? asm_exc_page_fault+0x8/0x30 [ 385.865062][ T709] asm_exc_page_fault+0x1e/0x30 [ 385.879856][ T709] RIP: 0033:0x55dc84c10950 [ 385.891682][ T709] Code: Unable to access opcode bytes at RIP 0x55dc84c10926. [ 385.898248][ T709] RSP: 002b:00007ffeaf98d1e8 EFLAGS: 00010213 [ 385.904334][ T709] RAX: 00007f3402d98ce0 RBX: 000055dc84f47ff0 RCX: 00007ffeaf98d2e0 [ 385.918600][ T709] RDX: 0000000000000001 RSI: 0000000000000003 RDI: 000055dc84f47ff0 [ 385.926905][ T709] RBP: 00007ffeaf98d210 R08: 0000000016fae5c5 R09: 0000000000000002 [ 385.933967][ T709] R10: 00007ffeaf98d110 R11: 011d994f1f52633a R12: 00007ffeaf98d208 [ 385.942056][ T709] R13: 000055dc84f47620 R14: 0000000000000000 R15: 0000000000000001 [ 385.949403][ T709] </TASK> [ 385.952126][ T709] Mem-Info: [ 385.955048][ T709] active_anon:167 inactive_anon:10940 isolated_anon:0 [ 385.955048][ T709] active_file:15 inactive_file:362 isolated_file:0 [ 385.955048][ T709] unevictable:788 dirty:0 writeback:0 [ 385.955048][ T709] slab_reclaimable:7132 slab_unreclaimable:6898 [ 385.955048][ T709] mapped:448 shmem:5085 pagetables:634 bounce:0 [ 385.955048][ T709] free:29370 free_pcp:1347 free_cma:0 [ 385.993037][ T709] Node 0 active_anon:668kB inactive_anon:43760kB active_file:60kB inactive_file:3304kB unevictable:3152kB isolated(anon):0kB isolated(file):0kB mapped:2140kB dirty:0kB writeback:0kB shmem:20340kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 4096kB writeback_tmp:0kB kernel_stack:2352kB pagetables:2536kB all_unreclaimable? yes [ 386.023392][ T709] Node 0 DMA free:14304kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 386.050587][ T709] lowmem_reserve[]: 0 888 7864 7864 7864 [ 386.056325][ T709] Node 0 DMA32 free:35408kB boost:0kB min:7616kB low:9520kB high:11424kB reserved_highatomic:0KB active_anon:0kB inactive_anon:12kB active_file:44kB inactive_file:60kB unevictable:0kB writepending:0kB present:1032128kB managed:966592kB mlocked:0kB bounce:0kB free_pcp:296kB local_pcp:296kB free_cma:0kB [ 386.083493][ T709] lowmem_reserve[]: 0 0 6976 6976 6976 [ 386.088634][ T709] Node 0 Normal free:64832kB boost:0kB min:59836kB low:74792kB high:89748kB reserved_highatomic:4096KB active_anon:668kB inactive_anon:43748kB active_file:56kB inactive_file:4392kB unevictable:3152kB writepending:0kB present:7340032kB managed:7151896kB mlocked:80kB bounce:0kB free_pcp:5096kB local_pcp:4848kB free_cma:0kB [ 386.119264][ T709] lowmem_reserve[]: 0 0 0 0 0 [ 386.124090][ T709] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 14304kB [ 386.139306][ T709] Node 0 DMA32: 12*4kB (UME) 10*8kB (UME) 5*16kB (UME) 4*32kB (UME) 4*64kB (UME) 4*128kB (UME) 4*256kB (UME) 1*512kB (E) 2*1024kB (ME) 3*2048kB (UME) 6*4096kB (M) = 35408kB [ 386.158584][ T709] Node 0 Normal: 284*4kB (UME) 246*8kB (UME) 268*16kB (UME) 212*32kB (UME) 141*64kB (UME) 77*128kB (ME) 36*256kB (UME) 30*512kB (UME) 9*1024kB (UM) 0*2048kB 0*4096kB = 66848kB [ 386.183411][ T709] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 386.194414][ T709] Node 0 hugepages_total=3833 hugepages_free=3833 hugepages_surp=0 hugepages_size=2048kB [ 386.205473][ T709] 5679 total pagecache pages [ 386.210473][ T709] 0 pages in swap cache [ 386.214075][ T709] Swap cache stats: add 0, delete 0, find 0/0 [ 386.221319][ T709] Free swap = 0kB [ 386.226581][ T709] Total swap = 0kB [ 386.230305][ T709] 2097038 pages RAM [ 386.234587][ T709] 0 pages HighMem/MovableOnly [ 386.239784][ T709] 63576 pages reserved [ 386.243950][ T709] 0 pages cma reserved [ 386.248253][ T709] 0 pages hwpoisoned [ 386.252967][ T709] Tasks state (memory values in pages): [ 386.259648][ T709] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 386.270760][ T709] [ 709] 0 709 19824 674 188416 0 -250 systemd-journal [ 386.283947][ T709] [ 720] 0 720 21762 503 192512 0 -1000 systemd-udevd [ 386.293745][ T709] [ 800] 0 800 24367 93 86016 0 -1000 auditd [ 386.303305][ T709] [ 816] 499 816 12221 186 139264 0 -900 dbus-daemon [ 386.313051][ T709] [ 818] 0 818 1943 180 61440 0 0 hv_kvp_daemon [ 386.325447][ T709] [ 820] 0 820 23486 65 86016 0 0 irqbalance [ 386.336826][ T709] [ 825] 489 825 69833 98 159744 0 0 nscd [ 386.347420][ T709] [ 854] 0 854 17766 240 180224 0 0 systemd-logind [ 386.357018][ T709] [ 973] 0 973 10708 313 131072 0 0 wickedd-auto4 [[0;32m OK [[ 386.369228][ T709] [ 985] 0 985 10709 317 122880 0 0 wickedd-dhcp4 [ 386.380633][ T709] [ 986] 0 986 10709 314 122880 0 0 wickedd-dhcp6 0m] Started [0;[ 386.393786][ T709] [ 989] 0 989 10738 345 131072 0 0 wickedd 1;39mBackup of /[ 386.404982][ T709] [ 1002] 0 1002 10715 326 126976 0 0 wickedd-nanny etc/sysconfig[0[ 386.417223][ T709] [ 1048] 0 1048 3847 79 73728 0 0 bash m.[ 386.428454][ T709] [ 1053] 0 1053 23636 109 192512 0 0 gc_linux_servic [ 386.441650][ T709] [ 1063] 0 1063 123508 1761 143360 0 -1000 auomscollect [ 386.452738][ T709] [ 1080] 0 1080 3114 75 69632 0 0 netconfig [ 386.465579][ T709] [ 1082] 0 1082 3408 364 69632 0 0 netconfig [ 386.480799][ T709] [ 1103] 0 1103 3408 364 61440 0 0 netconfig [ 386.491063][ T709] [ 1104] 0 1104 3213 172 73728 0 0 dns-resolver [ 386.508797][ T709] [ 1105] 0 1105 3213 172 65536 0 0 dns-resolver [ 386.520964][ T709] [ 1106] 0 1106 3213 173 65536 0 0 dns-resolver [ 386.531198][ T709] [ 1107] 0 1107 7028 29 86016 0 0 systemctl [ 386.542406][ T709] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=netconfig,pid=1082,uid=0 [ 386.559765][ T709] Out of memory: Killed process 1082 (netconfig) total-vm:13632kB, anon-rss:1456kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:68kB oom_score_adj:0 [ 386.582247][ T32] oom_reaper: reaped process 1082 (netconfig), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [[0;32m OK [0m] Started [0;1;39mCheck if mainboard battery is Ok[0m.
输出 3
[ 0.847506] Kernel panic - not syncing: System is deadlocked on memory [ 0.847506] [ 0.851071] CPU: 4 PID: 1 Comm: systemd Not tainted 4.18.0-553.22.1.el8_10.x86_64 #1 [ 0.854342] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 08/23/2024 [ 0.859012] Call Trace: [ 0.860073] dump_stack+0x41/0x60 [ 0.861499] panic+0xe7/0x2ac [ 0.862980] out_of_memory.cold.36+0x5e/0x7e [ 0.864883] __alloc_pages_slowpath+0xbf0/0xcd0 [ 0.867007] __alloc_pages_nodemask+0x2e2/0x330 [ 0.868934] new_slab+0x3f4/0x4f0 [ 0.870369] ? srso_alias_return_thunk+0x5/0xfcdfd [ 0.872421] ? pcpu_block_refresh_hint+0x7f/0xb0 [ 0.874357] ___slab_alloc+0x3a3/0x950 [ 0.875968] ? __kernfs_new_node+0x5d/0x1e0 [ 0.877790] ? srso_alias_return_thunk+0x5/0xfcdfd [ 0.880110] ? __kernfs_iattrs.isra.7+0x3b/0xc0 [ 0.882048] ? kernfs_xattr_get+0x1d/0x50 [ 0.883778] ? __kernfs_new_node+0x5d/0x1e0 [ 0.885589] kmem_cache_alloc+0x252/0x280 [ 0.887343] __kernfs_new_node+0x5d/0x1e0 [ 0.889081] ? string+0x44/0x60 [ 0.890452] ? srso_alias_return_thunk+0x5/0xfcdfd [ 0.892504] ? vsnprintf+0x340/0x520 [ 0.894049] kernfs_new_node+0x31/0x50 [ 0.895869] __kernfs_create_file+0x30/0xc0 [ 0.897924] cgroup_addrm_files+0x143/0x340 [ 0.900250] ? alloc_shrinker_info+0xca/0x100 [ 0.902675] css_populate_dir+0x86/0x140 [ 0.904900] cgroup_apply_control_enable+0xed/0x310 [ 0.907558] cgroup_mkdir+0x1c2/0x490 [ 0.909521] kernfs_iop_mkdir+0x5a/0xa0 [ 0.911781] vfs_mkdir+0x11b/0x1e0 [ 0.913579] do_mkdirat+0xec/0x120 [ 0.915356] do_syscall_64+0x5b/0x1a0 [ 0.917315] entry_SYSCALL_64_after_hwframe+0x66/0xcb [ 0.919947] RIP: 0033:0x7fd43f794d4b [ 0.921829] Code: 73 01 c3 48 8b 0d 3d 71 39 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 71 39 00 f7 d8 64 89 01 48 [ 0.932128] RSP: 002b:00007ffd7e0fac28 EFLAGS: 00000206 ORIG_RAX: 0000000000000053 [ 0.936149] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd43f794d4b [ 0.939935] RDX: 00007ffd7e0faaf0 RSI: 00000000000001ed RDI: 000055b707c3c410 [ 0.945738] RBP: 00007ffd7e0fac70 R08: 0000000000000022 R09: 002f70756f726763 [ 0.949630] R10: 0000000000000100 R11: 0000000000000206 R12: 000055b707c1e500 [ 0.953284] R13: 00000000000000a0 R14: 00007fd441021a7e R15: 0000000000000020 [ 0.958110] Kernel Offset: 0x35000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 0.964040] fbcon: Taking over console [ 0.966128] ---[ end Kernel panic - not syncing: System is deadlocked on memory [ 0.966128] ]--
原因
出现此问题的原因是所有可用内存都保留给 HugePages,因此服务器没有足够的可用内存来启动和执行进程。
例如,如果在 /etc/sysctl.conf 文件中将 HugePages 配置参数vm.nr_hugepages
设置为 65536,而 VM 只有 16 GB 的总内存,则此配置消耗 134,217,728 KB 内存,超过 VM 内存,导致内存不足(OOM)条件。
解决方法
重要
查看 VM 具有多少内存以及设置了多少 HugePages。 例如,在某些情况下,如果系统需要 16 GB 的 HugePages,并且 VM 只有 16 GB RAM,则内存不足导致 VM 无法启动。 在这种情况下,建议将 VM 大小升级到至少 32 GB RAM。 有关 VM 和数据库需要的 HugePages 的详细信息,请参阅 Oracle Community。
要解决该问题,请执行以下步骤:
使用以下方法之一访问 VM:
如果串行控制台正常工作,请使用 Azure 串行控制台 在单用户模式下启动 VM。
使用 Azure VM 修复命令修复 Linux VM。
(脱机方法) 创建救援 VM 以手动修复 VM 启动问题。
如果在步骤 1 中使用 Azure VM 修复命令或脱机方法,请按照 Linux 救援 VM 中的 Chroot 环境中引入的 chroot 过程进行操作。
使用文本编辑器打开 /etc/sysctl.conf 或 /etc/sysctl.conf.d/*/ 文件,例如
vi
:sudo vi /etc/sysctl.conf
找到设置
vm.nr_hugepages
参数的位置:vm.nr_hugepages=<the number of HugePages>
注释掉行或调整
vm.nr_hugepages
值以平衡性能和内存分配。保存更改并退出文本编辑器。
根据操作系统分发手动 重新生成缺少的 initramfs。
重新启动 VM 以应用更改:
sudo reboot
监视 VM,确保其正确启动,服务按预期启动。
联系我们寻求帮助
如果你有任何疑问或需要帮助,请创建支持请求或联系 Azure 社区支持。 你还可以将产品反馈提交到 Azure 反馈社区。