内存控制器错误信息[备忘]

参考日志错误信息:

[root@hh-yun-compute-130125 ~]# cat /var/log/messages | grep -i error
Mar  1 04:58:05 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 04:58:06 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x16113a9000 => socket=1, Channel=2(mask=4), rank=0
Mar  1 10:27:08 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 10:27:09 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x15e1c49000 => socket=1, Channel=2(mask=4), rank=0
Mar  1 13:52:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 13:52:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x160e949000 => socket=1, Channel=2(mask=4), rank=0
Mar  2 04:16:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  2 04:16:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  2 04:16:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x1613a61000 => socket=1, Channel=2(mask=4), rank=0
Mar  2 04:16:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x1613a79000 => socket=1, Channel=2(mask=4), rank=0

参考信息2:

[root@hh-yun-compute-130125 ~]# cat /sys/devices/system/edac/mc/mc?/ce*count
0
0
8
0
[root@hh-yun-compute-130125 ~]# cat /sys/devices/system/edac/mc/mc1/ce_count
8

模块信息

[root@hh-yun-compute-130125 ~]# modinfo sb_edac
filename:       /lib/modules/2.6.32-504.3.3.el6.x86_64/kernel/drivers/edac/sb_edac.ko
description:    MC Driver for Intel Sandy Bridge and Ivy Bridge memory controllers -  Ver: 1.1.0
author:         Red Hat Inc. (http://www.redhat.com)
author:         Mauro Carvalho Chehab <mchehab@redhat.com>
license:        GPL
srcversion:     01CFEEBE911D55B6FE660BE
alias:          pci:v00008086d00002FA0sv*sd*bc*sc*i*
alias:          pci:v00008086d00000EA8sv*sd*bc*sc*i*
alias:          pci:v00008086d00003CA8sv*sd*bc*sc*i*
depends:        edac_core
vermagic:       2.6.32-504.3.3.el6.x86_64 SMP mod_unload modversions
parm:           edac_op_state:EDAC Error Reporting state: 0=Poll,1=NMI (int)

[root@hh-yun-compute-130125 ~]# modinfo edac_core
filename:       /lib/modules/2.6.32-504.3.3.el6.x86_64/kernel/drivers/edac/edac_core.ko
description:    Core library routines for EDAC reporting
author:         Doug Thompson www.softwarebitmaker.com, et al
license:        GPL
srcversion:     C21E296292A2174839A086C
depends:
vermagic:       2.6.32-504.3.3.el6.x86_64 SMP mod_unload modversions
parm:           check_pci_errors:Check for PCI bus parity errors: 0=off 1=on (int)
parm:           edac_pci_panic_on_pe:Panic on PCI Bus Parity error: 0=off 1=on (int)
parm:           edac_mc_panic_on_ue:Panic on uncorrected error: 0=off 1=on (int)
parm:           edac_mc_log_ue:Log uncorrectable error to console: 0=off 1=on (int)
parm:           edac_mc_log_ce:Log correctable error to console: 0=off 1=on (int)
parm:           edac_mc_poll_msec:Polling period in milliseconds

官方解释:

Total Correctable Errors count attribute file:

	'ce_count'

	This attribute file displays the total count of correctable
	errors that have occurred on this csrow. This
	count is very important to examine. CEs provide early
	indications that a DIMM is beginning to fail. This count
	field should be monitored for non-zero values and report
	such information to the system administrator.

启用 mcelog

[root@hh-yun-compute-130125 ~]# service  mcelogd restart
Stopping mcelog                                     [确定]
Starting mcelog daemon                              [确定]
[root@hh-yun-compute-130125 ~]# mcelog
mcelog: Family 6 Model 3e CPU: only decoding architectural errors

查询日志

[root@hh-yun-compute-130125 ~]# tail /var/log/mcelog
mcelog: failed to prefill DIMM database from DMI data
mcelog: mcelog server already running

相关评估

This is a harmless warning message. The DIMM database prefill relies on a specific non-standard format of the DIMMs in the DMI BIOS tables. If this format is not used by the BIOS, mcelog will only discover DIMMs as they get their first error (if the CPU reports DIMMs in machine check errors). Please understand for the most part, mcelog should be ignored.

因此最终决定忽略该信息



时间: 2024-09-20 14:42:33

内存控制器错误信息[备忘]的相关文章

RPM 编译错误信息备忘

参见下面 RPM 编译信息 warning: Installed (but unpackaged) file(s) found: /.channels/.alias/pear.txt /.channels/.alias/pecl.txt /.channels/.alias/phpdocs.txt /.channels/__uri.reg /.channels/doc.php.net.reg /.channels/pear.php.net.reg /.channels/pecl.php.net.r

pdns 错误解决[备忘]

参见日志: pdns (master) server  /var/log/messages 错误信息提示: Jan 30 10:08:08 kylezhuang-hh-qa-dns-crguy pdns[6363]: AXFR of domain '199.10.in-addr.arpa' initiated by 10.199.132.168 Jan 30 10:08:08 kylezhuang-hh-qa-dns-crguy pdns[6363]: AXFR of domain '199.1

技术备忘3

   shell单引号与变量 [root@test] a=55 [root@test] echo $a 55 [root@test] echo '$a' $a [root@test] echo ''$a'' #注意此处是两个单引不是一个双引 55 总结: 在单引号中引用变量,需要这样来写(单引号括起来双引号内的变量.) :   '"${a}"' 原文地址 date星期求取 <span style="font-family:Microsoft YaHei">

备忘:maven 错误信息: Plugin execution not covered by lifecycle configuration

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">  <modelVers

VM配置文件所在磁盘空间小于其配给内存时的错误信息

前几天在自己工作机(Win2k8 R2)上安装的VS等一些工具,机器的系统盘空间大幅减少到3G以下. 在此 机器上用Hyper-v启动MyVirtualMachine(配给内存为5120M)时,Hyper-v报错误信息: Could not initialize memory: There is not enough space on the disk. 具体如下: [Window Title] Virtual Machine Connection [Main Instruction] The

java.lang.OutOfMemoryError: Java heap space java内存溢出问题 有错误信息

问题描述 java.lang.OutOfMemoryError: Java heap space java内存溢出问题 有错误信息 2013-12-26 11:18:09 [ERROR]-[rmss:165] Housekeeping log.error( : java.lang.OutOfMemoryError: Java heap spaceat java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)at java.la

一个备库中ORA错误信息的分析

最近也在处理一些遗留的问题,所以对于使用orabbix的报警还是心怀敬畏之心,一方面是我们让它能够做全方位的监控,另一方面也让我发现我们还是存在不少的小问题,小问题虽小,但是放大了,就是大麻烦,甚至数据库事故. 自从上次在社群分享了DB time的抖动案例之后,有不少的朋友似乎对这个工具很感兴趣,我做这个分享的一个主要原因就是希望大家在有些细节中发现问题,至于我分享的问题原因,都是各种各样的小问题,有些朋友也纳闷这种错误似乎还是比较低级的,通过一般的监控都应该解决,但是确实存在,发现了解决了,就

win8.1系统怎么使用便签记录备忘信息

  1.打开win8.1系统的开始菜单,找到便签,如果没有可以在程序里面找到,并固定在开始页面,以方便使用; 2.点击便签,就会自动创建一个新便签到桌面,我们可以在里面输入自己备忘的事情或者一些重要知识等; 3.如果一个便签不够用,可以点击便签左上角的加号再新建便签使用; 4.如果便签使用完了,不需要了,想要删除,可以点击便签右上角的"×",提示点击确定后就删除了; 5.便签上面没有最小化的按钮,只有从任务栏里点击便签才能使它最小化; 6.便签还有多种颜色可供选择,对于颜色控,选择自己

nginx 编译错误解决方法 [备忘]

参见编译错误信息 can not detect int size 增加参数可以跳过 --with-ld-opt="-lstdc++" --with-cpp_test_module   参考编译错误信息 make[3]: Leaving directory `/apps/lib/openssl-1.0.0k/crypto' make[2]: Leaving directory `/apps/lib/openssl-1.0.0k' make[1]: *** [/apps/lib/opens