BUG 10008092 caused instance crash

早上同事发emai,提到一个双节点rac,其中某节点被重启了,如下:
###### 1节点 02:23:55 2011 ######

Sat Dec  3 02:23:55 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_pmon_12765.trc:
ORA-00469: CKPT process terminated with error
Sat Dec  3 02:23:55 2011
ORA-469 encountered when generating server alert SMG-3503
Sat Dec  3 02:23:55 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_j000_8539.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00469: CKPT process terminated with error
Sat Dec  3 02:23:56 2011
###### 1节点crash ######

Sat Dec  3 02:23:57 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_smon_12876.trc:
ORA-00469: CKPT process terminated with error
Sat Dec  3 02:23:58 2011
Shutting down instance (abort)
License high water mark = 55
Sat Dec  3 02:24:00 2011

从上面来看,由于检查点进程ckpt出现问题,导致实例crash。
###### 1节点pmon进程trace如下:######

*** 2011-12-03 02:23:55.731
Background process CKPT found dead
Oracle pid = 24
OS pid (from detached process) = 12869
OS pid (from process state) = 12869
dtp = c000000040016e40, proc = c0000004950057c8
Dump of memory from 0xC000000040016E40 to 0xC000000040016E88
C000000040016E40 00000076 00000000 C0000004 950057C8  [...v..........W.]
C000000040016E50 00000000 00000000 00000000 434B5054  [............CKPT]
C000000040016E60 00020000 00000000 00003245 00000000  [..........2E....]
....................
....................
....................
....................
        Repeat 13 times
C000000495005CF0 6F726163 6C650000 00000000 00000000  [oracle..........]
C000000495005D00 00000000 00000000 00000000 00000000  [................]
C000000495005D10 00000000 00000006 6A6C6372 6D310000  [........jlcrm1..]
C000000495005D20 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
C000000495005D50 00000000 00000000 00000000 00000006  [................]
C000000495005D60 554E4B4E 4F574E00 00000000 00000000  [UNKNOWN.........]
C000000495005D70 00000000 00000000 00000000 00000000  [................]
C000000495005D80 00000000 00000008 31323836 39000000  [........12869...]
C000000495005D90 00000000 00000000 00000000 00000000  [................]
C000000495005DA0 00000000 00000005 6F726163 6C65406A  [........oracle@j]
C000000495005DB0 6C63726D 31202843 4B505429 00000000  [lcrm1 (CKPT)....]
C000000495005DC0 00000000 00000000 00000000 00000000  [................]
....................
....................
....................
....................
C000000495005FA0 00000000 00000000 00000000 00001308  [................]
C000000495005FB0 00000006 00000000                    [........]

error 469 detected in background process
ORA-00469: CKPT process terminated with error
*** 2011-12-03 02:24:07.798
ksuitm: waiting up to [5] seconds before killing DIAG

经同事确认,diag trace,甚至ckpt trace都没用生成,跟bug 10008092描述十分相似,
包括版本,diagnostic analysis 都十分吻合,大概情况如下:

ckpt 进程死掉(可能是hang) --> pmon cleanup --> 保护后台进程,pmon crash instance

对于 alert 中的如下信息就非常容易解释了:
*** SESSION ID:(1089.34028) 2011-12-03 02:23:56.172
kgefec: fatal error 0
*** 2011-12-03 02:23:56.172
ksedmp: internal or fatal error
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LCK0' unexpectedly terminated with error 469
ORA-00469: CKPT process terminated with error
ORA-00469: CKPT process terminated with error
Current SQL statement for this session:
TRUNCATE TABLE DINF.TEMP1_IN_PDT_CM_USER
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
c00000006e89ab60       200  procedure DINF.P_IN_PDT_CM_USER
c00000009e142850         1  anonymous block
----- Call Stack Trace -----
为什么这么说呢?因为truncate table是要触发object checkpoint的。

该bug如下:


Bug 10008092: INSTANCE CRASH WITH ORA-00469: CKPT PROCESS TERMINATED WITH ERROR


评论

  1. 博主大大可否贴一下这个BUG的描述~以及触发的原因~
    俺没办法查询BUG,~谢谢~!

  2. … [Trackback]…

    […] Read More here: killdb.com/2011/12/05/bug-10008092-caused-instance-crash.html […]…

  3. Bug 10008092: INSTANCE CRASH WITH ORA-00469: CKPT PROCESS TERMINATED WITH ERROR
    
    --------------------------------------------------------------------------------
    
    
     Bug 属性
    
    
    
    --------------------------------------------------------------------------------
    类型 B - Defect 已在产品版本中修复 -
    严重性 2 - Severe Loss of Service 产品版本 10.2.0.4
    状态 45 - Vendor OS Problem, to Filer 平台 197 - HP-UX Itanium
    创建时间 09-Aug-2010 平台版本 11.31
    更新时间 10-Aug-2010 基本 Bug -
    数据库版本 10.2.0.4
    影响平台  Port-Specific
    产品源 Oracle
    
    
     相关产品
    
    
    
    --------------------------------------------------------------------------------
    产品线 Oracle Database Products 系列 Oracle Database
    区域 Oracle Database 产品 5 - Oracle Server - Enterprise Edition
    
    
    Hdr: 10008092 10.2.0.4 RDBMS 10.2.0.4 VOS BACK PROC PRODID-5 PORTID-197Abstract: INSTANCE CRASH WITH ORA-469: CKPT PROCESS TERMINATED WITH ERROR*** 08/09/10 02:37 am ***----3-1979465411PROBLEM:--------Instance P2BL26C terminated by PMON with below errors in alert.logSat Jul 31 22:28:41 2010Errors in file /logs/ORACLE/P2BL26C/bdump/p2bl26c_pmon_3936.trc:ORA-469: CKPT process terminated with errorSat Jul 31 22:28:41 2010PMON: terminating instance due to error 469Sat Jul 31 22:28:41 2010Errors in file /logs/ORACLE/P2BL26C/bdump/p2bl26c_smon_4077.trc:ORA-469: CKPT process terminated with errorInstance terminated by PMON, pid = 3936Another instance DLSREP on same host terminated in similar fashion after 4 days of P2BL26C crash mentioned above.DIAGNOSTIC ANALYSIS:--------------------From the trace files it is evident that PMON terminated the instance after finding CKPT is dead. There was no response from CKPT. Nothing significant in the trace files to find out why CKPT was dead.p2bl26c_pmon_3936.trc shows:Background process CKPT found deadOracle pid = 19OS pid (from detached process) = 4069 OS pid (from process state) = 4069....error 469 detected in background processORA-469: CKPT process terminated with errorp2bl26c_smon_4077.trc shows:error 469 detected in background processORA-469: CKPT process terminated with errorWORKAROUND:-----------UnknownRELATED BUGS:-------------8622257, 6489596REPRODUCIBILITY:----------------UnknownTEST CASE:----------STACK TRACE:------------Not availableSUPPORTING INFORMATION:-----------------------24 HOUR CONTACT INFORMATION FOR P1 BUGS:----------------------------------------DIAL-IN INFORMATION:--------------------IMPACT DATE:------------*** 08/09/10 02:38 am *** (CHG: Sta->16)*** 08/09/10 02:41 am ****** 08/09/10 02:41 am ****** 08/09/10 05:15 am ****** 08/09/10 04:53 pm *** *** 08/09/10 09:36 pm ****** 08/10/10 02:52 pm *** (CHG: Sta->45 G/P->P Asg->NEW OWNER)*** 08/10/10 02:52 pm ***
    
    

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注