PostgreSQL 内存OOM控制策略导致数据库无法启动的诊断一例(如何有效避免oom)

你可能遇到过类似的数据库无法启动的问题，
postgres@digoal-> FATAL:  XX000: could not map anonymous shared memory: Cannot allocate memory
HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 3322716160 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
LOCATION:  CreateAnonymousSegment, pg_shmem.c:398

通过查看meminfo可以得到原因。
CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
              this is the total amount of  memory currently available to
              be allocated on the system. This limit is only adhered to
              if strict overcommit accounting is enabled (mode 2 in
              'vm.overcommit_memory').
              The CommitLimit is calculated with the following formula:
              CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
              overcommit_ratio / 100 + [total swap pages]
              For example, on a system with 1G of physical RAM and 7G
              of swap with a `vm.overcommit_ratio` of 30 it would
              yield a CommitLimit of 7.3G.
              For more details, see the memory overcommit documentation
              in vm/overcommit-accounting.
Committed_AS: The amount of memory presently allocated on the system.
              The committed memory is a sum of all of the memory which
              has been allocated by processes, even if it has not been
              "used" by them as of yet. A process which malloc()'s 1G
              of memory, but only touches 300M of it will show up as
              using 1G. This 1G is memory which has been "committed" to
              by the VM and can be used at any time by the allocating
              application. With strict overcommit enabled on the system
              (mode 2 in 'vm.overcommit_memory'),allocations which would
              exceed the CommitLimit (detailed above) will not be permitted.
              This is useful if one needs to guarantee that processes will
              not fail due to lack of memory once that memory has been
              successfully allocated.

依据vm.overcommit_memory设置的值，
当vm.overcommit_memory=0时，不允许普通用户overcommit, 但是允许root用户轻微的overcommit。
当vm.overcommit_memory=1时，允许overcommit.
当vm.overcommit_memory=2时，Committed_AS不能大于CommitLimit。
commit 限制 计算方法
              The CommitLimit is calculated with the following formula:
              CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
              overcommit_ratio / 100 + [total swap pages]
              For example, on a system with 1G of physical RAM and 7G
              of swap with a `vm.overcommit_ratio` of 30 it would
              yield a CommitLimit of 7.3G.
[root@digoal postgresql-9.4.4]# free
             total       used       free     shared    buffers     cached
Mem:       1914436     713976    1200460      72588      32384     529364
-/+ buffers/cache:     152228    1762208
Swap:      1048572     542080     506492
[root@digoal ~]# cat /proc/meminfo |grep Commit
CommitLimit:     2005788 kB
Committed_AS:     132384 kB
这个例子的2G就是以上公式计算得来。

overcommit限制的初衷是malloc后，内存并不是立即使用掉，所以如果多个进程同时申请一批内存的话，不允许OVERCOMMIT可能导致某些进程申请内存失败，但实际上内存是还有的。所以Linux内核给出了几种选择，2是比较靠谱或者温柔的做法。1的话风险有点大，因为可能会导致OOM。

所以当数据库无法启动时，要么你降低一下数据库申请内存的大小（例如降低shared_buffer或者max conn），要么就是修改一下overcommit的风格。

[参考]
1. kernel-doc-2.6.32/Documentation/filesystems/proc.txt
    MemTotal: Total usable ram (i.e. physical ram minus a few reserved
              bits and the kernel binary code)
     MemFree: The sum of LowFree+HighFree
MemAvailable: An estimate of how much memory is available for starting new
              applications, without swapping. Calculated from MemFree,
              SReclaimable, the size of the file LRU lists, and the low
              watermarks in each zone.
              The estimate takes into account that the system needs some
              page cache to function well, and that not all reclaimable
              slab will be reclaimable, due to items being in use. The
              impact of those factors will vary from system to system.
              This line is only reported if sysctl vm.meminfo_legacy_layout = 0
     Buffers: Relatively temporary storage for raw disk blocks
              shouldn't get tremendously large (20MB or so)
      Cached: in-memory cache for files read from the disk (the
              pagecache).  Doesn't include SwapCached
  SwapCached: Memory that once was swapped out, is swapped back in but
              still also is in the swapfile (if memory is needed it
              doesn't need to be swapped out AGAIN because it is already
              in the swapfile. This saves I/O)
      Active: Memory that has been used more recently and usually not
              reclaimed unless absolutely necessary.
    Inactive: Memory which has been less recently used.  It is more
              eligible to be reclaimed for other purposes
   HighTotal:
    HighFree: Highmem is all memory above ~860MB of physical memory
              Highmem areas are for use by userspace programs, or
              for the pagecache.  The kernel must use tricks to access
              this memory, making it slower to access than lowmem.
    LowTotal:
     LowFree: Lowmem is memory which can be used for everything that
              highmem can be used for, but it is also available for the
              kernel's use for its own data structures.  Among many
              other things, it is where everything from the Slab is
              allocated.  Bad things happen when you're out of lowmem.
   SwapTotal: total amount of swap space available
    SwapFree: Memory which has been evicted from RAM, and is temporarily
              on the disk
       Dirty: Memory which is waiting to get written back to the disk
   Writeback: Memory which is actively being written back to the disk
   AnonPages: Non-file backed pages mapped into userspace page tables
AnonHugePages: Non-file backed huge pages mapped into userspace page tables
      Mapped: files which have been mmaped, such as libraries
        Slab: in-kernel data structures cache
SReclaimable: Part of Slab, that might be reclaimed, such as caches
  SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
  PageTables: amount of memory dedicated to the lowest level of page
              tables.
NFS_Unstable: NFS pages sent to the server, but not yet committed to stable
              storage
      Bounce: Memory used for block device "bounce buffers"
WritebackTmp: Memory used by FUSE for temporary writeback buffers
 CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
              this is the total amount of  memory currently available to
              be allocated on the system. This limit is only adhered to
              if strict overcommit accounting is enabled (mode 2 in
              'vm.overcommit_memory').
              The CommitLimit is calculated with the following formula:
              CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
              overcommit_ratio / 100 + [total swap pages]
              For example, on a system with 1G of physical RAM and 7G
              of swap with a `vm.overcommit_ratio` of 30 it would
              yield a CommitLimit of 7.3G.
              For more details, see the memory overcommit documentation
              in vm/overcommit-accounting.
Committed_AS: The amount of memory presently allocated on the system.
              The committed memory is a sum of all of the memory which
              has been allocated by processes, even if it has not been
              "used" by them as of yet. A process which malloc()'s 1G
              of memory, but only touches 300M of it will show up as
              using 1G. This 1G is memory which has been "committed" to
              by the VM and can be used at any time by the allocating
              application. With strict overcommit enabled on the system
              (mode 2 in 'vm.overcommit_memory'),allocations which would
              exceed the CommitLimit (detailed above) will not be permitted.
              This is useful if one needs to guarantee that processes will
              not fail due to lack of memory once that memory has been
              successfully allocated.
VmallocTotal: total size of vmalloc memory area
 VmallocUsed: amount of vmalloc area which is used
VmallocChunk: largest contiguous block of vmalloc area which is free

2. kernel-doc-2.6.32/Documentation/vm/overcommit-accounting
The Linux kernel supports the following overcommit handling modes

0       -       Heuristic overcommit handling. Obvious overcommits of
                address space are refused. Used for a typical system. It
                ensures a seriously wild allocation fails while allowing
                overcommit to reduce swap usage.  root is allowed to
                allocate slighly more memory in this mode. This is the
                default.

1       -       Always overcommit. Appropriate for some scientific
                applications.

2       -       Don't overcommit. The total address space commit
                for the system is not permitted to exceed swap + a
                configurable amount (default is 50%) of physical RAM.
                Depending on the amount you use, in most situations
                this means a process will not be killed while accessing
                pages but will receive errors on memory allocation as
                appropriate.

The overcommit policy is set via the sysctl `vm.overcommit_memory'.

The overcommit amount can be set via `vm.overcommit_ratio' (percentage)
or `vm.overcommit_kbytes' (absolute value).

The current overcommit limit and amount committed are viewable in
/proc/meminfo as CommitLimit and Committed_AS respectively.

Gotchas
-------

The C language stack growth does an implicit mremap. If you want absolute
guarantees and run close to the edge you MUST mmap your stack for the
largest size you think you will need. For typical stack usage this does
not matter much but it's a corner case if you really really care

In mode 2 the MAP_NORESERVE flag is ignored. 

How It Works
------------

The overcommit is based on the following rules

For a file backed map
        SHARED or READ-only     -       0 cost (the file is the map not swap)
        PRIVATE WRITABLE        -       size of mapping per instance

For an anonymous or /dev/zero map
        SHARED                  -       size of mapping
        PRIVATE READ-only       -       0 cost (but of little use)
        PRIVATE WRITABLE        -       size of mapping per instance

Additional accounting
        Pages made writable copies by mmap
        shmfs memory drawn from the same pool

Status
------

o       We account mmap memory mappings
o       We account mprotect changes in commit
o       We account mremap changes in size
o       We account brk
o       We account munmap
o       We report the commit status in /proc
o       Account and check on fork
o       Review stack handling/building on exec
o       SHMfs accounting
o       Implement actual limit enforcement

To Do
-----
o       Account ptrace pages (this is hard)

时间： 2025-01-07 04:01:51

PostgreSQL 内存OOM控制策略导致数据库无法启动的诊断一例(如何有效避免oom)的相关文章

oracle 11.1.0.7版本也会出现access$表丢失导致数据库无法启动

下面我们来看看oracle 11.1.0.7版本也会出现access$表丢失导致数据库无法启动,有需要了解的朋友可以进入看看吧. 有网友咨询数据库启动报ora-01092:ORACLE 实例终止.强制断开连接,请求帮忙处理数据库版本 Trace file d:appadministratordiagrdbmsorclorcltraceorcl_ora_5648.trc Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64b

SPFILE 错误导致数据库无法启动(ORA-01565)

--========================================== --SPFILE 错误导致数据库无法启动(ORA-01565) --========================================== SPFILE错误导致数据库无法启动 SQL> startup ORA-01078: failure in processing system parameters ORA-01565: error in identifying

内存安装错误导致系统无法启动

在开机的时候,系统会报警,然后电脑无法启动. 系统报警,可以通过报警声来初步判断故障所在.报警声得知是内存故障,打开机箱检查,内存和插槽没有明显故障,内存安装到其他电脑上工作正常.将主板上的内存更换插槽,故障就排除了. 有一些主板对内存的安装有限制,必须插在特定的插槽中才能正常使用,要留意主板对内存的要求.

memory_target设置不当导致数据库无法启动的问题

今天在做一个问题排查的时候碰到了另外一个有些"奇怪的"问题. 我们在测试库中已经禁用了SGA自动存储管理,结果在spfile文件里丢掉了shared_pool_size的配置测试环境的参数类似下面的样子 sga_max_size big integer 12000M sga_target big integer 0 shared_pool_size

oracle库存储问题导致数据库重启后无法正常启动恢复教程

某客户的核心数据库存储问题导致数据库重启后无法正常启动,根据客户反馈最开始在启动数据库时报错控制文件IO错误,如下: Sun Mar 15 11:59:37 2015 Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_arc1_630876.trc: ORA-00204: error in reading (block 1, # blocks 1) of control file ORA-00202: control file:

oracle数据库强制关机导致数据库无法正常启动恢复例子

有客户qq找到我,说有朋友推荐,让我帮他们恢复数据库.由于强制关机后,数据库无法正常启动. 数据库recover database失败 Mon Mar 28 10:20:33 2016 ALTER DATABASE RECOVER database Media Recovery Start started logmerger process Parallel Media Recovery started with 32 slaves Mon Mar 28 10:20:36 2016 Rec

阿里云HybridDB for PostgreSQL内存与负载管理(resource queue)实践

标签 PostgreSQL , Greenplum , 阿里云HybridDB for PostgreSQL , 内存管理 , OOM , 操作系统内核参数 , 资源队列 , 数据库内存保护参数背景 Greenplum是一个重计算和重资源的MPP数据库,可谓有多少资源就能消耗多少资源,带来的好处是处理速度变快了,坏处就是容易用超. CPU.网络.硬盘用超的话,关系不大,因为大不了就是到硬件瓶颈,但是内存用超的话会带来较大的负面影响,例如操作系统OOM用户进程,导致数据库崩溃等. 如果要达到非常

PostgreSQL 9.6 引领开源数据库攻克多核并行计算难题

PostgreSQL 9.6 引领开源数据库攻克多核并行计算难题作者 digoal 日期 2016-10-01 标签 PostgreSQL , 9.6 , 并行计算 , 多核计算背景经过多年的酝酿(从支持work process到支持动态fork共享内存,再到内核层面支持并行计算),PostgreSQL 的多核并行计算功能终于在2016年发布的9.6版本中正式上线,为PG的scale up能力再次拔高一个台阶,标志着开源数据库已经攻克了并行计算的难题. 相信有很多小伙伴已经开始测试了. 在

[转]Apache+PHP3+PostgreSQL作为基于Web的数据库平台的安装

apache|web|数据|数据库发信人: xiaoning (我要飞), 信区: Linux 标题: Apache+PHP3+PostgreSQL作为基于Web的数据库平台的安发信站: BBS 水木清华站 (Thu Jan 20 16:00:28 2000) 以下将说明使用Apache+PHP3+PostgreSQL作为基于Web的数据库平台的安装和配置方法 . 关于Apache.PHP3和PostgreSQL的更多内容可以从软件的附带文档.Linux的HOWTO文件以及以下