问题描述
- oracle 11g ORA-00445:每个月数据库都会死,pl/sql登录不上
-
每个月数据库都会无缘无故的死掉,重启数据库又正常了,下面是alert_orcl.log里面的日志,window服务器内存8G ,oracle版本11gR2,这个是什么原因导致的呢
Tue Oct 28 20:18:25 2014
Errors in file d:oraclediagrdbmsorclorcltraceorcl_cjq0_2628.trc (incident=65106):
ORA-00445: background process "J000" did not start after 120 seconds
Incident details in: d:oraclediagrdbmsorclorclincidentincdir_65106orcl_cjq0_2628_i65106.trc
kkjcre1p: unable to spawn jobq slave process
Errors in file d:oraclediagrdbmsorclorcltraceorcl_cjq0_2628.trc:
Tue Oct 28 20:18:28 2014
Trace dumping is performing id=[cdmp_20141028201828]
Tue Oct 28 20:20:16 2014
Thread 1 cannot allocate new log, sequence 6830
Private strand flush not complete
Current log# 1 seq# 6829 mem# 0: D:ORACLEORADATAorclREDO01.LOG
Thread 1 advanced to log sequence 6830 (LGWR switch)
Current log# 2 seq# 6830 mem# 0: D:ORACLEORADATAorclREDO02.LOG
下面是重启的参数:
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
Using parameter settings in server-side spfile D:ORACLEPRODUCT11.2.0DBHOME_1DATABASESPFILEorcl.ORA
System parameters with non-default values:
processes = 1000
sga_max_size = 3104M
memory_target = 3504M
memory_max_target = 3504M
control_files = "D:ORACLEORADATAorclCONTROL01.CTL"
control_files = "D:ORACLEFLASH_RECOVERY_AREAorclCONTROL02.CTL"
db_block_size = 8192
compatible = "11.2.0.0.0"
db_recovery_file_dest = "D:oracleflash_recovery_area"
db_recovery_file_dest_size= 3912M
undo_tablespace = "UNDOTBS1"
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
audit_file_dest = "D:ORACLEADMINorclADUMP"
audit_trail = "DB"
db_name = "orcl"
open_cursors = 300
diagnostic_dest = "D:ORACLE"
Tue Oct 28 20:50:40 2014
解决方案
Anything that can go wrong will go wrong. :“凡事只要有可能出错,那就一定会出错。”
墨菲定律再次被验证,早上数据库宕机,日志如下:
Dump continued from file: /opt/ora11g/diag/rdbms/prodb/ORABJ/trace/ORABJ_cjq0_4189.trc
ORA-00445: background process "J000" did not start after 120 seconds
*** 2013-07-10 09:13:24.852
*** SESSION ID:(618.5) 2013-07-10 09:13:24.852
*** CLIENT ID:() 2013-07-10 09:13:24.852
*** SERVICE NAME:(SYS$BACKGROUND) 2013-07-10 09:13:24.852
*** MODULE NAME:() 2013-07-10 09:13:24.852
*** ACTION NAME:() 2013-07-10 09:13:24.852
Dump continued from file: /opt/ora11g/diag/rdbms/prodb/ORABJ/trace/ORABJ_cjq0_4189.trc
ORA-00445: background process "J000" did not start after 120 seconds
【ID 1379200.1】中对这个错误的描述:
What does this message mean ?
The message indicates that we failed to spawn a new process at the Operating System level to serve the request. There are various causes for this issue. This typically occurs when there is a shortage or misconfiguration in Operating System Resources, and thereby the problem should be investigated from an OS perspective. However there are a few causes related to the Oracle Database as well.
The default 120 seconds (after which Oracle times out) can be extended dynamically (without a database restart) by setting the following event:
$ sqlplus / as sysdba
alter system set events '10281 trace name context forever, level xxx';
-- where xxxxxx is the number of seconds to timeout at.
eg: alter system set events '10281 trace name context forever, level 300';
的确我们的硬件是有些问题,每隔半年就要重启一次,否则就会操作系统就会hang住。前几
次到半年人工重启了机器,没造成事故。这次过了半年,由于停机申请还在走审批流程,可
是机器等不到那一天了,于是最后一根稻草压垮了它。庆幸的是在我休假前它挂了,如果我
在火车上它挂掉了,后果就……
按照文档【ID 1379200.1】中所说的检查了操作系统的参数设置,发现有些参数设置有问题。
但这些参数的调整需要经过严格的测试和验证,才能在生产上进行。况且这篇文章最后的
更新日期是2013-5-13,说明Oracle也是刚刚发现这个问题不久,所以它推荐的方法也不能
轻易的尝试。
- kernel.randomize_va_space
Issues caused by the Linux feature Address Space Layout Randomization (ASLR
This problem is reported in Redhat 5 and Oracle 11.2.0.2. You can verify whether ASLR is being used as follows:
/sbin/sysctl -a | grep randomize
kernel.randomize_va_space = 1
If the parameter is set to any value other than 0 then ASLR is in use. Refer the document for details:
Note 1345364.1: ORA-00445: Background Process "xxxx" Did Not Start After 120 Seconds
The solution will be to disable ASLR
- Setting PGA_AGGREGATE_TARGET=TRUE
The parameter pga_aggregate_target is a numeric value not a boolean value and therefore must be set to a number for it to function correctly. By specifying it to a text string, we will try to convert it to a meaningful value but which may be insufficient for your environment
Solution: Properly set PGA_AGGREGATE_TARGET to a numeric value.
- Setting the PRE_PAGE_SGA to TRUE or Altering SGA_SIZE with PRE_PAGE_SGA set to TRUE
PRE_PAGE_SGA instructs Oracle to read the entire SGA into active memory at instance startup. Operating system page table entries are then prebuilt for each page of the SGA. This setting can increase the amount of time necessary for instance startup, but it is likely to decrease the amount of time necessary for Oracle to reach its full performance capacity after startup. PRE_PAGE_SGA can increase the process startup duration, because every process that starts must access every page in the SGA, this can cause the PMON process to take longer to start and exceed the timeout which is by default 120 seconds causing the instance startup to fail.
Setting PRE_PAGE_SGA to TRUE can increase the process startup duration, because every process that starts must access every page in the SGA, however overhead can be significant if your system frequently creates and destroys processes by, for example, continually logging on and logging off.
Check whether PRE_PAGE_SGA is set to TRUE
--OR--
Verify the generate trace for the occurance of function: ksmprepage()
Solution: Setting PRE_PAGE_SGA to FALSE will avoid this code executing so pages are only touched as needed rather than touching every single page when the process starts. This can avoid or minimize the problem from occuring however the underlying cause is still an Operating System resource shortage
参考文档:
-
Troubleshooting Guide (TSG) - ksvcreate: Process(xxxx) creation failed / ORA-00445: background process "xxxx" did not start after n seconds [ID 1379200.1]
-
Bug 9871302 - Windows: Cannot make new connection to database on Windows platforms with TNS-12560 [ID 9871302.8]
-
ORA-00445: Background Process "xxxx" Did Not Start After 120 Seconds [ID 1345364.1]
解决方案二:
参考链接
http://blog.csdn.net/lwei_998/article/details/9296001