主机环境 :
Ubuntu 12.04 x64
root@digoal-PowerEdge-R610:/var/log/libvirt/qemu# free
total used free shared buffers cached
Mem: 98997784 11981272 87016512 0 207876 7051404
-/+ buffers/cache: 4721992 94275792
Swap: 8385924 0 8385924
配置了8个虚拟机, 每个分配了8GB内存.
前段时间启动正常. 8个虚拟机可以同时启动.
但是今天突然之间不行了, 只能同时启动5台.
后面三台启动会报错 :
root@digoal-PowerEdge-R610:/var/log/libvirt/qemu# virsh start centos-5.9-x64-03
error: Failed to start domain centos-5.9-x64-03
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/10
Failed to allocate 8589934592 B: Cannot allocate memory
root@digoal-PowerEdge-R610:/var/log/libvirt/qemu# less centos-5.9-x64-03.log
2013-04-08 13:10:16.867+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 8192 -smp 1,sockets=1,cores=1,threads=1 -name centos-5.9-x64-03 -uuid b819df34-d4a1-4ffb-8f8e-d05d2652fd26 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos-5.9-x64-03.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -drive file=/data02/vm/centos-5.9-x64_03.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:fa:be:c5,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:5 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
char device redirected to /dev/pts/10
Failed to allocate 8589934592 B: Cannot allocate memory
2013-04-08 13:10:17.446+0000: shutting down
root@digoal-PowerEdge-R610:/var/log/libvirt# less libvirtd.log
2013-04-08 13:08:10.714+0000: 1449: error : qemuMonitorOpenUnix:295 : failed to connect to monitor socket: No such process
2013-04-08 13:08:10.714+0000: 1449: error : qemuProcessWaitForMonitor:1301 : internal error process exited while connecting to monitor: char device redirected to /dev/pts/10
Failed to allocate 8589934592 B: Cannot allocate memory
2013-04-08 13:10:17.446+0000: 1448: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
2013-04-08 13:10:21.105+0000: 1448: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
2013-04-08 13:10:24.590+0000: 1448: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
less /var/log/syslog
Apr 8 21:10:16 digoal-PowerEdge-R610 kernel: [31270.939326] device vnet5 entered promiscuous mode
Apr 8 21:10:16 digoal-PowerEdge-R610 kernel: [31270.973190] virbr0: topology change detected, propagating
Apr 8 21:10:16 digoal-PowerEdge-R610 kernel: [31270.973200] virbr0: port 6(vnet5) entered forwarding state
Apr 8 21:10:16 digoal-PowerEdge-R610 kernel: [31270.973215] virbr0: port 6(vnet5) entered forwarding state
Apr 8 21:10:17 digoal-PowerEdge-R610 kernel: [31271.559270] virbr0: port 6(vnet5) entered disabled state
Apr 8 21:10:17 digoal-PowerEdge-R610 avahi-daemon[1283]: Withdrawing workstation service for vnet5.
Apr 8 21:10:17 digoal-PowerEdge-R610 NetworkManager[1335]: SCPlugin-Ifupdown: devices removed (path: /sys/devices/virtual/net/vn
et5, iface: vnet5)
Apr 8 21:10:17 digoal-PowerEdge-R610 kernel: [31271.563422] virbr0: port 6(vnet5) entered disabled state
Apr 8 21:10:17 digoal-PowerEdge-R610 kernel: [31271.563686] device vnet5 left promiscuous mode
Apr 8 21:10:17 digoal-PowerEdge-R610 kernel: [31271.563690] virbr0: port 6(vnet5) entered disabled state
Apr 8 21:10:18 digoal-PowerEdge-R610 kernel: [31272.486100] type=1400 audit(1365426618.372:73): apparmor="STATUS" operation="profil
e_remove" name="libvirt-b819df34-d4a1-4ffb-8f8e-d05d2652fd26" pid=20960 comm="apparmor_parser"
Apr 8 21:10:20 digoal-PowerEdge-R610 kernel: [31274.438750] type=1400 audit(1365426620.329:74): apparmor="STATUS" operation="profil
e_load" name="libvirt-f4a8170a-436d-a731-a5c5-8cc4b59acb72" pid=20967 comm="apparmor_parser"
Apr 8 21:10:20 digoal-PowerEdge-R610 NetworkManager[1335]: SCPlugin-Ifupdown: devices added (path: /sys/devices/virtual/net/vnet
5, iface: vnet5)
Apr 8 21:10:20 digoal-PowerEdge-R610 NetworkManager[1335]: SCPlugin-Ifupdown: device added (path: /sys/devices/virtual/net/vnet5
, iface: vnet5): no ifupdown configuration found.
Apr 8 21:10:20 digoal-PowerEdge-R610 NetworkManager[1335]: <warn> /sys/devices/virtual/net/vnet5: couldn't determine device driver;
ignoring...
原因是设置了oom.
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
由于主机的内存+swap总计=104GB
5台虚拟机消耗40GB, 本地还开了一个PostgreSQL, shared_buffers消耗4GB.
开第六台虚拟机时申请8GB内存, 因此总计消耗52G. 加上系统其他的开销, 申请第六台虚拟机的8GB内存时已经超过了50%,
所以虚拟机就开不起来了.
将vm.overcommit_memory改为0, 恢复正常.
sysctl -w vm.overcommit_memory=0
修改/etc/sysctl.conf确保重启系统后依然生效.
具体的解释可参考 :
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
==============================================================
overcommit_memory:
This value contains a flag that enables memory overcommitment.
When this flag is 0, the kernel attempts to estimate the amount
of free memory left when userspace requests more memory.
When this flag is 1, the kernel pretends there is always enough
memory until it actually runs out.
When this flag is 2, the kernel uses a "never overcommit"
policy that attempts to prevent any overcommit of memory.
This feature can be very useful because there are a lot of
programs that malloc() huge amounts of memory "just-in-case"
and don't use much of it.
The default value is 0.
See Documentation/vm/overcommit-accounting and
security/commoncap.c::cap_vm_enough_memory() for more information.
==============================================================
overcommit_ratio:
When overcommit_memory is set to 2, the committed address
space is not permitted to exceed swap plus this percentage
of physical RAM. See above.