lmon terminating the instance due to error 481

今天某客户反馈说其中一套业务系统数据库实例crash重启了,通过分析了日志发现报错如下:

Tue Oct 18 21:34:34 2022
Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_lmon_2491932.trc  (incident=512156):
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x1108E6388], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xxxx/xxxx1/incident/incdir_512156/xxxx1_lmon_2491932_i512156.trc
Tue Oct 18 21:34:44 2022
Dumping diagnostic data in directory=[cdmp_20221018213444], requested by (instance=1, osid=2491932 (LMON)), summary=[incident=512156].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_lmon_2491932.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x1108E6388], [], [], [], [], [], [], [], [], [], []
LMON (ospid: 2491932): terminating the instance due to error 481
Tue Oct 18 21:34:44 2022
opiodr aborting process unknown ospid (7210544) as a result of ORA-1092
Tue Oct 18 21:34:44 2022
opiodr aborting process unknown ospid (5047352) as a result of ORA-1092
Tue Oct 18 21:34:44 2022
opiodr aborting process unknown ospid (10618076) as a result of ORA-1092
Tue Oct 18 21:34:44 2022
ORA-1092 : opitsk aborting process
Tue Oct 18 21:34:44 2022
ORA-1092 : opitsk aborting process
Tue Oct 18 21:34:44 2022
opiodr aborting process unknown ospid (5441064) as a result of ORA-1092
Tue Oct 18 21:34:44 2022
ORA-1092 : opitsk aborting process
Tue Oct 18 21:34:49 2022
Instance terminated by LMON, pid = 2491932
Tue Oct 18 21:34:53 2022
Starting ORACLE instance (normal)

可以看到实例被LMON进程给异常终止了,详细内容还需要进一步看lmon trace内容:

*** SERVICE NAME:(SYS$BACKGROUND) 2022-10-18 21:34:34.668
*** MODULE NAME:() 2022-10-18 21:34:34.668
*** ACTION NAME:() 2022-10-18 21:34:34.668
Dump continued from file: /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_lmon_2491932.trc
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x1108E6388], [], [], [], [], [], [], [], [], [], []
========= Dump for incident 512156 (ORA 600 [kghstack_underflow_internal_2]) ========
*** 2022-10-18 21:34:34.691
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+40        bl       0000000109B4CD24     000000000 ? 000000001 ?
000000003 ? 000000000 ?
000000000 ? 000000001 ?
000000003 ? 000000000 ?
ksedst1()+112        call     skdstdst()           171F2D30C8558AB1 ?
4844284100000000 ?
FFFFFFFFFFF6500 ?
28E4DEBE4CBF3 ? 10A81AD8C ?
000000000 ? 11072A8C0 ?
2050033FFFF6508 ?
ksedst()+40          call     ksedst1()            000000000 ? 00000000A ?
000003000 ? 10A5BFFA8 ?
000000000 ? 000000000 ?
000002004 ? 000000001 ?
dbkedDefDump()+1516  call     ksedst()             000000000 ? 000000000 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
000000000 ? 300000003 ?
ksedmp()+72          call     dbkedDefDump()       31072A8C0 ? 110000A60 ?
FFFFFFFFFFF6D10 ? 1106AC1B8 ?
100125838 ? FFFFFFFFFFF7730 ?
1000F0D94 ? 1106AC1B8 ?
ksfdmp()+100         call     ksedmp()             000000002 ? 000000000 ?
000000002 ? 10AAE5CB0 ?
10A07CFD0 ? 000000000 ?
1109D3E30 ? 11072A8C0 ?
dbgexPhaseII()+1904  call     ksfdmp()             000000000 ? 00000000A ?
000000002 ? 000000000 ?
000000002 ? 10A07CFC8 ?
000000000 ? 001050005 ?
dbgexProcessError()  call     dbgexPhaseII()       11072A8C0 ? 1109D2040 ?
+1556                                              00007D09C ? 200000000 ?
FFFFFFFFFFF7C28 ? 000000082 ?
000000000 ? 000000000 ?
dbgeExecuteForError  call     dbgexProcessError()  11072A8C0 ? 1109D3E30 ?
()+72                                              1FFFFB6A0 ? 000000001 ?
000000703 ? 000000011 ?
000000006 ? 1109D5B78 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  000000000 ? 00A4D1050 ?
2044                          ()                   FFFFFFFFFFFFB210 ?
00A4D1050 ? 000000000 ?
90000000D6969D8 ? 000000000 ?
110000C58 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000003000 ? 10A5BFFA8 ?
68                                                 25800000002 ? 109E85570 ?
000000000 ? 000000000 ?
FFFFFFFFFFFBEE0 ? 11113A600 ?
kgeadse()+380        call     dbkePostKGE_kgsf()   102DA1484 ? 100000000 ?
FFFFFFFFFFFC0D8 ? 000000000 ?
110AED1A0 ? 1108EA610 ?
000000002 ? 700000000013680 ?
kgerinv_internal()+  call     kgeadse()            000000000 ? 000000000 ?
48                                                 000000000 ? 1700000010 ?
100000000 ? 000003000 ?
110D33350 ? 1108EA610 ?
kgerinv()+48         call     kgerinv_internal()   8311AABF3BAF ? 8311AABF3FD4 ?
8311AABF3BAF ? 8311AABF3BAF ?
000000000 ? 10A5A3090 ?
000000000 ? 000000000 ?
kgeasnmierr()+72     call     kgerinv()            000000000 ? 000000023 ?
000000001 ? 000000004 ?
000000000 ? 000000001 ?
110D33350 ? 110AED398 ?
kghstack_underflow_  call     kgeasnmierr()        000000000 ? FFFFFFFFFFFC100 ?
internal()+280                                     00000001E ? 100000001 ?
000000002 ? 1108E6388 ?
000000000 ? 000000000 ?
kghstack_free()+716  call     kghstack_underflow_  000000001 ? 08DBD1E85 ?
internal()           700011351BB7B48 ? 0000F4240 ?
000000000 ? 00000000A ?
000003000 ? 10A5BFFA8 ?
kccgrd()+264         call     kghstack_free()      FFFFFFFFFFFC0C0 ?
4224282B00000000 ?
103D2C888 ? 000004000 ?
500000005 ? C0000000C ?
400003000 ? 10A5BFFA8 ?
kjxgrf_rr_read()+66  call     kccgrd()             1FFFD02FAFF35E5 ? 110A5BD70 ?
0                                                  FFFFFFFFFFFC180 ? 000000000 ?
110A5BD70 ? 110FBCF48 ?
0037D6E50 ? 1106AC1B8 ?
kjxgrDD_rr_read()+1  call     kjxgrf_rr_read()     110A032D0 ? 700011342677E98 ?
04                                                 000000000 ? 000000001 ?
FFFFFFFFFFFC6A4 ? 110A03B38 ?
FFFFFFFFFFFC630 ?
42245280FFFFC790 ?
kjxgrimember()+124   call     kjxgrDD_rr_read()    000003000 ? 10A5BFFA8 ?
000000002 ? 700000000013680 ?
11011EAD0 ? FFFFFFFFFFFCD80 ?
000000001 ? 218DBD1E85 ?
kjxggpoll()+804      call     kjxgrimember()       FFFFFFFFFFFC6D0 ? 0000186A0 ?
101FECE90 ? 8311AABD8A40 ?
70000000000C0D0 ? 000000000 ?
000001568 ? 100000000 ?
kjfmact()+508        call     kjxggpoll()          000000000 ? 000000000 ?
000000000 ? 000000000 ?
FFFFFFFFFFFC7A0 ? 000000000 ?
1037BB124 ? 000000000 ?
kjfdact()+32         call     kjfmact()            11011EAD0 ? FFFFFFFFFFFCD80 ?
000000001 ? 000000000 ?
002050000 ? 001160000 ?
10896F91E ?
14616E27FFFFC930 ?
kjfcln()+2240        call     kjfdact()            000000000 ? 10A5B3014 ?
700011351BB7B48 ? 000000002 ?
700011351BB7B54 ? 000000004 ?
2FFFFF570 ? 200000002 ?
ksbrdp()+2216        call     kjfcln()             700000000013198 ?
7000000000131B4 ? 048245028 ?
000000E00 ? 1108B2310 ?
100638128 ? 000000001 ?
700000007 ?
opirip()+1620        call     ksbrdp()             FFFFFFFFFFFFEA7 ? 10B2ADCF0 ?
FFFFFFFFFFFDE50 ? 000000000 ?
000000001 ? 000000000 ?
01099067F ? 000000001 ?
opidrv()+608         call     opirip()             10AE1FAC0 ? 410134198 ?
FFFFFFFFFFFEFC0 ?
2F7530312F ? 108354684 ?
1106AC1B8 ?
7264626D732F6462 ?
1106AC1B8 ?
sou2o()+136          call     opidrv()             32067E1DB0 ? 4FFFFF388 ?
FFFFFFFFFFFEFC0 ?
25001D022C0000 ? 000000010 ?
1106AC1B8 ? 000000000 ?
000000000 ?
opimai_real()+188    call     sou2o()              FFFFFFFFFFFF030 ?
5524445B00000001 ?
9000000000DC64C ?
BADC0FFEE0DDF00D ?
000000003 ? 9001000A008DB98 ?
A0000000A000000 ? 10B6B6F40 ?
ssthrdmain()+276     call     opimai_real()        FFFFFFFFFFFF110 ?
9001000A0092DC0 ?
FFFFFFFFFFFF130 ? 10B6F72B8 ?
90000000008AB0C ?
9001000A008DB98 ?
FFFFFFFFFFFF110 ?
9001000A008DB98 ?
main()+204           call     ssthrdmain()         3F0003720 ? FFFFFFFFFFFF478 ?
FFFFFFFFFFFF4E0 ?
9FFFFFFF000D6F0 ?
9FFFFFFF00009E0 ? 000000000 ?
000000000 ? 9FFFFFFF000D6F0 ?
__start()+112        call     main()               000000000 ? 000000000 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?

跟进前面的call stack信息,很容易定位到如下的bug,详细内容可以参考mos的文章:

SYMPTOMS

  1. The LMON or LMS process crash the instance with an error like:
    ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x110A10838], [], [], [], [], [], [], [], [], [], []

    ORA-1092 : opitsk aborting process
    Instance terminated by LMS1, pid = 14024818

  2. Review of the generated tracefiles reveals a call stack similar to:
    … kghstack_underflow_internal kghstack_free kccgrd kjxgrf_rr_read kjxgrDD_rr_read kjxgrimember kjxggpoll kjfmact kjfdact kjfcln ksbrdp …

     
    – OR –
     

    … kghstack_underflow_internal kghstack_free ktundo kturcrbackoutonechg ktrgcm ktrget3 ktrget2 kclgcr …

      

CHANGES

CAUSE

The cause of this problem has been identified in a.o.:
Bug 18687067 – ORA-600 [KGHSTACK_UNDERFLOW_INTERNAL_2]
closed as duplicate of Bug 20675347 – ORA-07445 [KGHSTACK_OVERFLOW_INTERNAL()+644]

The bug is caused by an AIX compiler issue causing volatile variables in the Oracle kernel not to be handled properly.

The bug is a regression introduced in 11.2.0.4.
The issue does not reproduce in later versions, i.e. 12.1.

 

SOLUTION

To solve the issue, use any of below alternatives:

  • Upgrade to 12.1

    – OR –

  • Apply interim patch 20675347, if available for your platform and Oracle version.

    To check for conflicting patches, please use the MOS Patch Planner Tool
    Please refer to
    Note 1317012.1 – How To Use MOS Patch Planner To Check And Request The Conflict Patches?

    If no patch exists for your version, please contact Oracle Support for a backport request.

 

从文档来看,该问题在11.2.0.4还是比较常见,主要是该用户没有安装相应的PSU。问题相对简单,简单记录一下,以备后查!


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注