virsh start kvm Failed to allocate 8589934592 B: Cannot allocate memory

主机环境 : 

Ubuntu 12.04 x64

root@digoal-PowerEdge-R610:/var/log/libvirt/qemu# free

             total       used       free     shared    buffers     cached

Mem:      98997784   11981272   87016512          0     207876    7051404

-/+ buffers/cache:    4721992   94275792

Swap:      8385924          0    8385924

配置了8个虚拟机, 每个分配了8GB内存.

前段时间启动正常. 8个虚拟机可以同时启动.

但是今天突然之间不行了, 只能同时启动5台.

后面三台启动会报错 : 

root@digoal-PowerEdge-R610:/var/log/libvirt/qemu# virsh start centos-5.9-x64-03

error: Failed to start domain centos-5.9-x64-03
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/10
Failed to allocate 8589934592 B: Cannot allocate memory

root@digoal-PowerEdge-R610:/var/log/libvirt/qemu# less centos-5.9-x64-03.log

2013-04-08 13:10:16.867+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 8192 -smp 1,sockets=1,cores=1,threads=1 -name centos-5.9-x64-03 -uuid b819df34-d4a1-4ffb-8f8e-d05d2652fd26 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos-5.9-x64-03.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -drive file=/data02/vm/centos-5.9-x64_03.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:fa:be:c5,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:5 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
char device redirected to /dev/pts/10
Failed to allocate 8589934592 B: Cannot allocate memory
2013-04-08 13:10:17.446+0000: shutting down

root@digoal-PowerEdge-R610:/var/log/libvirt# less libvirtd.log 

2013-04-08 13:08:10.714+0000: 1449: error : qemuMonitorOpenUnix:295 : failed to connect to monitor socket: No such process
2013-04-08 13:08:10.714+0000: 1449: error : qemuProcessWaitForMonitor:1301 : internal error process exited while connecting to monitor: char device redirected to /dev/pts/10
Failed to allocate 8589934592 B: Cannot allocate memory
2013-04-08 13:10:17.446+0000: 1448: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
2013-04-08 13:10:21.105+0000: 1448: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
2013-04-08 13:10:24.590+0000: 1448: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer

less /var/log/syslog

Apr  8 21:10:16 digoal-PowerEdge-R610 kernel: [31270.939326] device vnet5 entered promiscuous mode
Apr  8 21:10:16 digoal-PowerEdge-R610 kernel: [31270.973190] virbr0: topology change detected, propagating
Apr  8 21:10:16 digoal-PowerEdge-R610 kernel: [31270.973200] virbr0: port 6(vnet5) entered forwarding state
Apr  8 21:10:16 digoal-PowerEdge-R610 kernel: [31270.973215] virbr0: port 6(vnet5) entered forwarding state
Apr  8 21:10:17 digoal-PowerEdge-R610 kernel: [31271.559270] virbr0: port 6(vnet5) entered disabled state
Apr  8 21:10:17 digoal-PowerEdge-R610 avahi-daemon[1283]: Withdrawing workstation service for vnet5.
Apr  8 21:10:17 digoal-PowerEdge-R610 NetworkManager[1335]:    SCPlugin-Ifupdown: devices removed (path: /sys/devices/virtual/net/vn
et5, iface: vnet5)
Apr  8 21:10:17 digoal-PowerEdge-R610 kernel: [31271.563422] virbr0: port 6(vnet5) entered disabled state
Apr  8 21:10:17 digoal-PowerEdge-R610 kernel: [31271.563686] device vnet5 left promiscuous mode
Apr  8 21:10:17 digoal-PowerEdge-R610 kernel: [31271.563690] virbr0: port 6(vnet5) entered disabled state
Apr  8 21:10:18 digoal-PowerEdge-R610 kernel: [31272.486100] type=1400 audit(1365426618.372:73): apparmor="STATUS" operation="profil
e_remove" name="libvirt-b819df34-d4a1-4ffb-8f8e-d05d2652fd26" pid=20960 comm="apparmor_parser"
Apr  8 21:10:20 digoal-PowerEdge-R610 kernel: [31274.438750] type=1400 audit(1365426620.329:74): apparmor="STATUS" operation="profil
e_load" name="libvirt-f4a8170a-436d-a731-a5c5-8cc4b59acb72" pid=20967 comm="apparmor_parser"
Apr  8 21:10:20 digoal-PowerEdge-R610 NetworkManager[1335]:    SCPlugin-Ifupdown: devices added (path: /sys/devices/virtual/net/vnet
5, iface: vnet5)
Apr  8 21:10:20 digoal-PowerEdge-R610 NetworkManager[1335]:    SCPlugin-Ifupdown: device added (path: /sys/devices/virtual/net/vnet5
, iface: vnet5): no ifupdown configuration found.
Apr  8 21:10:20 digoal-PowerEdge-R610 NetworkManager[1335]: <warn> /sys/devices/virtual/net/vnet5: couldn't determine device driver;
 ignoring...

原因是设置了oom.

vm.overcommit_memory = 2

vm.overcommit_ratio = 50

由于主机的内存+swap总计=104GB

5台虚拟机消耗40GB, 本地还开了一个PostgreSQL, shared_buffers消耗4GB.

开第六台虚拟机时申请8GB内存, 因此总计消耗52G. 加上系统其他的开销, 申请第六台虚拟机的8GB内存时已经超过了50%,

所以虚拟机就开不起来了.

将vm.overcommit_memory改为0, 恢复正常.

sysctl -w vm.overcommit_memory=0

修改/etc/sysctl.conf确保重启系统后依然生效.

具体的解释可参考 : 

https://www.kernel.org/doc/Documentation/sysctl/vm.txt

==============================================================

overcommit_memory:

This value contains a flag that enables memory overcommitment.

When this flag is 0, the kernel attempts to estimate the amount
of free memory left when userspace requests more memory.

When this flag is 1, the kernel pretends there is always enough
memory until it actually runs out.

When this flag is 2, the kernel uses a "never overcommit"
policy that attempts to prevent any overcommit of memory.

This feature can be very useful because there are a lot of
programs that malloc() huge amounts of memory "just-in-case"
and don't use much of it.

The default value is 0.

See Documentation/vm/overcommit-accounting and
security/commoncap.c::cap_vm_enough_memory() for more information.

==============================================================

overcommit_ratio:

When overcommit_memory is set to 2, the committed address
space is not permitted to exceed swap plus this percentage
of physical RAM.  See above.
时间: 2024-08-01 14:31:31

virsh start kvm Failed to allocate 8589934592 B: Cannot allocate memory的相关文章

libvirt(virsh命令总结)

virsh回车进入交互式界面: version pwd hostname 显示本节点主机名 nodeinfo  显示节点信息 list --all 显示所有云主机 7种状态: running  运行中 idel 空闲,未运行 paused 暂停状态 shutdown 关闭 crashed 虚拟机崩溃 dying 垂死状态,但是又没有完全关闭或崩溃 shutdown <domain> destroy    <domain> 强制关闭虚拟机(相当于直接拨电源) start <do

KVM下window主机优化配置图解

一.磁盘读写情能测试 1.disk基准读测试 使用原生的IDE硬盘进行基准读测试:   为了便于比对,我这里还是用的另一块磁盘,打上virtio 驱动后,virsh edit KVM的xml文件,修改原磁盘模式后,仍使用原来的磁盘进行测试的方法: 改用virtio 驱动的 SCSI 硬盘后 ,其准的平均读取速度上升了60多M/s ,CPU使用率也降低了进一半. 2.文件基准读写测试 由于直接使用HD Tune 的基准写入测试时,会提示"写入功能关闭.如果要打开写入功能,请删除所有分区.请查看手册

Oracle备份与恢复案例

oracle|备份|恢复 一. 理解什么是数据库恢复   当我们使用一个数据库时,总希望数据库的内容是可靠的.正确的,但由于计算机系统的故障(硬件故障.软件故障.网络故障.进程故障和系统故障)影响数据库系统的操作,影响数据库中数据的正确性,甚至破坏数据库,使数据库中全部或部分数据丢失.因此当发生上述故障后,希望能重构这个完整的数据库,该处理称为数据库恢复.恢复过程大致可以分为复原(Restore)与恢复(Recover)过程.   数据库恢复可以分为以下两类:   1.1实例故障的一致性恢复 当

使用Perl进行虚拟化环境的自动化管理

概述 虚拟化作为云计算的基础,是目前一个重要的趋势.通过虚拟化可以提高 IT 资源和应用程序的效率和可用 性.基于内核的虚拟机 KVM 在 2008 年被 RedHat 收购后,在 IBM 和 RedHat 的联合推动下得到了全面的发展.最新发布 的 RHEL 版本中已经全面支持了 KVM 虚拟机,并集成了一整套基于 libvirt 的管理工具 (virsh/virt-top/virt- install/virt-manager 等 ).虚拟化领域的主要厂商 VMware 的 vSphere 虚

Oracle的常见错误及解决办法

                      ORA-12528: TNS:listener: all appropriate instances are blocking new connections     ORA-12528问题是因为监听中的服务使用了动态服务,实例虽然启动,但没有注册到监听.实例是通过PMON进程注册到监听上的,而PMON进程需要在MOUNT状态下才会启动.所以造成了上面的错误. 解决这个问题,有三种方法:1.把监听设置为静态:2.在tnsnames.ora中追加(UR=

EXPDP Fails With ORA-04031 (&quot;streams pool&quot;, ...)

相信大家都有遇到ora-04031这种错误,在导出时也有这样的错误出现.   问题的症状: expdp 报告如下错误: ORA-31626: job does not exist ORA-31637: cannot create job SYS_EXPORT_FULL_01 for user SYSTEM ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95 ORA-06512: at "SYS.KUPV$FT_INT", line

诊断案例:从实例挂起到归档失败和内存管理的蝴蝶效应

杨廷琨(yangtingkun) 云和恩墨 CTO 高级咨询顾问,Oracle ACE 总监,ITPUB Oracle 数据库管理版版主 编辑手记:在很多数据库的故障案例中,一个简单的疏忽可能导致问题被层层放大,最终导致故障,这就是蝴蝶效应的传播原理.这里分享的小案例自顶向下的追溯可以显见:实例挂起->归档失败->实例错误->参数配置.根本的原因往往很简单,DBA的严谨尤其重要. 客户的11.2.0.3 RAC数据库出现了归档失败的情况,导致单个实例出现HANG死的状况. 检查错误信息发

11.2.0.3 ASM实例出现ORA-4031错误导致数据库归档失败

环境:平台:RedHat EnterPrise 5.8 X86_X64 数据库:Oracle EnterPrise 11.2.0.3 集群软件:Oracle grid 11.2.0.3 故障现象:数据库出现了归档失败,其中有一个节点的实例出现HANG死的状况. 日志信息如下: Fri Feb 28 19:49:04 2014 ARC1: Error 19504 Creating archive log file to '+DATA02' ARCH: Archival stopped, error

Hadoop On Demand用户指南

本文讲的是Hadoop On Demand用户指南,[IT168 资讯]后面的文档包括一个快速入门指南能让你快速上手HOD,一个所有HOD特性的详细手册,命令行选项,一些已知问题和故障排除的信息.HOD使用入门 在这部分,我们将会逐步骤地介绍使用HOD涉及到的最基本的操作.在开始遵循这些步骤之前,我们假定HOD及其依赖的软硬件均已被正确安装和配置.这步通常由集群的系统管理员负责. HOD的用户界面是一个命令行工具,叫做hod.它被一个通常由系统管理员为用户设置好的配置文件所驱动.用户在使用hod