Oracle 19c rac instance crash due to ora-00600 kghuclientasp_03 and ora-00600 17112

近几天某客户核心业务系统进行全面改造,将其他数据迁移并加工处理到zdata一体机环境中;其中数据库环境为Oracle RAC 19.14版本,4个计算节点,存储节点为5个zdata stroage(全闪)。整体性能是比较强劲的。

然而此次业务迁移改造,所有业务逻辑处理几乎均为PL/SQL来实现,每个节都同时调用数十个Job运行,且采用了大量的nologging操作,最终导致某个节点instance crash,如下所示:

2022-05-25T18:41:31.513484+08:00
Thread 1 advanced to log sequence 33773 (LGWR switch),  current SCN: 17721853016325
  Current log# 2 seq# 33773 mem# 0: +DG_DATA01/SCSBGJB/ONLINELOG/group_2.275.1098292431
2022-05-25T18:41:56.795929+08:00
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc  (incident=2727144) (PDBNAME=PDBSCSB04):
ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], []
PDBSCSB04(7):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc  (incident=2727145) (PDBNAME=PDBSCSB04):
ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], []
PDBSCSB04(7):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc  (incident=2727146) (PDBNAME=PDBSCSB04):
ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], []
PDBSCSB04(7):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2727146/xxxxx1_j00c_45595_i2727146.trc
2022-05-25T18:41:58.696027+08:00
Thread 1 advanced to log sequence 33774 (LGWR switch),  current SCN: 17721853454395
  Current log# 3 seq# 33774 mem# 0: +DG_DATA01/xxxxx/ONLINELOG/group_3.273.1098292435
2022-05-25T18:42:01.188468+08:00
。。。。。
2022-05-25T18:44:02.245773+08:00
PDBSCSB00(3):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_89170.trc  (incident=2732186) (PDBNAME=PDBSCSB00):
ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghfrmrg:nxt], [0x0FB6CEBE0], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x0DB4CCBD8], [], [], [], [], [], [], [], [], [], []
PDBSCSB00(3):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2732186/xxxxx1_ora_89170_i2732186.trc
2022-05-25T18:44:06.010875+08:00
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc  (incident=2726200) (PDBNAME=PDBSCSB00):
ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], []
PDBSCSB00(3):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2726200/xxxxx1_cl04_82221_i2726200.trc
PDBSCSB00(3):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-05-25T18:44:06.510726+08:00
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc  (incident=2735720) (PDBNAME=PDBSCSB02):
ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], []
PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735720/xxxxx1_p01v_60845_i2735720.trc
PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-05-25T18:44:08.899072+08:00
opidrv aborting process CL04 ospid (82221) as a result of ORA-600
2022-05-25T18:44:08.899220+08:00
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc:
ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], []
2022-05-25T18:44:12.455063+08:00
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc  (incident=2735721) (PDBNAME=PDBSCSB02):
ORA-00600: 内部错误代码, 参数: [kghfrmrg:nxt], [0x0FB6CFBE0], [], [], [], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], []
PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735721/xxxxx1_p01v_60845_i2735721.trc
2022-05-25T18:44:15.338840+08:00
PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc  (incident=2735722) (PDBNAME=PDBSCSB02):
ORA-00700: 软内部错误, 参数: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kghfrmrg:nxt], [0x0FB6CFBE0], [], [], [], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], []
PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735722/xxxxx1_p01v_60845_i2735722.trc
2022-05-25T18:44:18.354945+08:00
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl02_82217.trc  (incident=2726160) (PDBNAME=PDBSCSB02):
ORA-00600: internal error code, arguments: [17112], [0x0DB4CDBC0], [], [], [], [], [], [], [], [], [], []
PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2726160/xxxxx1_cl02_82217_i2726160.trc
PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-05-25T18:44:18.414790+08:00
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_90359.trc  (incident=2727648) (PDBNAME=PDBSJQY):
ORA-00600: internal error code, arguments: [17147], [0x0DB4CABC0], [], [], [], [], [], [], [], [], [], []
PDBSJQY(8):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2727648/xxxxx1_ora_90359_i2727648.trc
PDBSJQY(8):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-05-25T18:44:21.695996+08:00
opidrv aborting process CL02 ospid (82217) as a result of ORA-600
2022-05-25T18:44:21.696114+08:00
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl02_82217.trc:
ORA-00600: internal error code, arguments: [17112], [0x0DB4CDBC0], [], [], [], [], [], [], [], [], [], []
2022-05-25T18:44:21.702634+08:00
Dumping diagnostic data in directory=[cdmp_20220525184421], requested by (instance=1, osid=82217 (CL02)), summary=[incident=2726160].
2022-05-25T18:44:21.787634+08:00
PMON (ospid: 72253): terminating the instance due to ORA error 12752
2022-05-25T18:44:21.787800+08:00

从上述节点alert log来看,出现了大量的ora-00600和ora-07445 错误。其中kghuclientasp_03 相对少见,另外ksepop:1 ksepop recursion、kghfrmrg:nxt、[kghfrh:ds] 也是见过不少了。后面2个都与Oracle内存有关系。

实际上对于kghuclientasp_03这个函数而言,从前面的关键字可以猜出也也必然跟内存有关。

从其中一个trace文件中可以看到如下内容:

Chunk 7fa8d2191340 sz=     1600  alloc     "pmuccst: adt/re"
  Chunk 7fa8d2191980 sz=     1600  alloc     "pmuccst: adt/re"
ERROR, BATCH-HEAP MISMATCH for batch 68 [7fa8d1e90000][7fa8d1e9a060]
BATCH HEADER 68 addr=7fa8d1ea4aa8 (prv=7fa8d218e020 nxt=7fa8d2316688)
  Chunk 7fa8d1ea4ad0 sz=      104  alloc     "pl/sql vc2     "
ERROR, ZERO-SIZED CHUNK addr=7fa8d1ea4b38
BATCH HEADER 69 addr=7fa8d2316670 (prv=7fa8d1ea4ac0 nxt=7fa8d23176c8)
  Chunk 7fa8d2316698 sz=       72  alloc     "pl/sql vc2     "
  Chunk 7fa8d23166e0 sz=       64  alloc     "pl/sql vc2     "

从上述信息不难看出,确实出现了ERROR,batch-heap mismatch的报错信息,而且对应均为PL/SQL操作。

进一步搜索发现相关进程的操作几乎均为INSERT /*+ nologging */ INTO  操作;由于是并发调用,而且均是数十个JOB同时调用,因此这里我们不得不怀疑,在nologging+并发过大的情况下,会出现heap error的情况。

当然对于此次实例crash,进一步分析可以看到本质是CL02、CL04进程异常最后导致实例crash掉。

从Oracle 12c 开始引入了clmn(进程清理主进程即 Cleanup Main Process)以及CLnn(进程清理辅进程即Cleanup Helper Processes)。这2类进程的引入,可以极大的缓解PMON进程的压力。

对于CL进程,本质上也是Oracle RAC的核心进程;对于Oracle核心进程(即不能kill的进程),可通过如下脚本进行查询:

SQL> SELECT indx,ksuprpnm,TO_CHAR(ksuprflg,'XXXXXXXXXXXXXXXX'),KSUPROSID
2  FROM x$ksupr
3  WHERE BITAND(ksuprflg,4) = 4 ORDER BY indx
4  /
INDX KSUPRPNM                                         TO_CHAR(KSUPRFLG, KSUPROSID
---------- ------------------------------------------------ ----------------- ------------------------
2 oracle@dbser11 (PMON)                                            E 92141
3 oracle@dbser11 (CLMN)                                            E 92145
4 oracle@dbser11 (PSP0)                                            6 92149
5 oracle@dbser11 (IPC0)                                            6 92156
6 oracle@dbser11 (VKTM)                                            6 92161
7 oracle@dbser11 (GEN0)                                            6 92167
8 oracle@dbser11 (MMAN)                                            6 92171
15 oracle@dbser11 (DBRM)                                            6 92186
19 oracle@dbser11 (ACMS)                                            6 92196
20 oracle@dbser11 (PMAN)                                            6 92200
22 oracle@dbser11 (LMON)                                            6 92206
23 oracle@dbser11 (LMD0)                                            6 92210
24 oracle@dbser11 (LMS0)                                            6 92212_92222
26 oracle@dbser11 (LMS1)                                            6 92214_92224
28 oracle@dbser11 (LMS2)                                            6 92216_92227
30 oracle@dbser11 (LMS3)                                            6 92219_92239
32 oracle@dbser11 (LMS4)                                            6 92223_92238
34 oracle@dbser11 (LMS5)                                            6 92226_92242
36 oracle@dbser11 (LMS6)                                            6 92231_92247
38 oracle@dbser11 (LMS7)                                            6 92236_92254
40 oracle@dbser11 (LMS8)                                            6 92241_92264
42 oracle@dbser11 (LMS9)                                            6 92245_92266
44 oracle@dbser11 (LMSA)                                            6 92251_92270
46 oracle@dbser11 (LMSB)                                            6 92258_92283
48 oracle@dbser11 (LMSC)                                            6 92262_92285
50 oracle@dbser11 (LMSD)                                            6 92267_92293
52 oracle@dbser11 (LMSE)                                            6 92271_92297
54 oracle@dbser11 (LMSF)                                            6 92275_92298
56 oracle@dbser11 (LMSG)                                            6 92279_92306
58 oracle@dbser11 (LMSH)                                            6 92282_92312
60 oracle@dbser11 (LMSI)                                            6 92286_92316
62 oracle@dbser11 (LMSJ)                                            6 92288_92335
64 oracle@dbser11 (LMSK)                                            6 92295_92327
66 oracle@dbser11 (LMSL)                                            6 92300_92358
68 oracle@dbser11 (LMSM)                                            6 92304_92344
70 oracle@dbser11 (LMSN)                                            6 92310_92351
72 oracle@dbser11 (LMSO)                                            6 92314_92362
74 oracle@dbser11 (LMSP)                                            6 92317_92374
76 oracle@dbser11 (LMSQ)                                            6 92321_92373
78 oracle@dbser11 (LMSR)                                            6 92325_92380
80 oracle@dbser11 (LMSS)                                            6 92329_92359
82 oracle@dbser11 (LMST)                                            6 92332_92391
84 oracle@dbser11 (LMSU)                                            6 92334_92387
86 oracle@dbser11 (LMSV)                                            6 92337_92360
88 oracle@dbser11 (LMSW)                                            6 92342_92372
90 oracle@dbser11 (LMSX)                                            6 92346_92365
92 oracle@dbser11 (LMSY)                                            6 92348_92361
94 oracle@dbser11 (LMSZ)                                            6 92353_92397
96 oracle@dbser11 (LM10)                                            6 92355_92401
98 oracle@dbser11 (LMD1)                                            6 92357
99 oracle@dbser11 (LMD2)                                            6 92390
100 oracle@dbser11 (LMD3)                                            6 92409
101 oracle@dbser11 (LMD4)                                            6 92414
102 oracle@dbser11 (RMS0)                                            6 92418
104 oracle@dbser11 (LCK1)                                            6 92424
105 oracle@dbser11 (DBW0)                                            6 92429
106 oracle@dbser11 (DBW1)                                            6 92433
107 oracle@dbser11 (DBW2)                                            6 92437
108 oracle@dbser11 (DBW3)                                            6 92441
109 oracle@dbser11 (DBW4)                                            6 92445
110 oracle@dbser11 (DBW5)                                            6 92449
111 oracle@dbser11 (DBW6)                                            6 92453
112 oracle@dbser11 (DBW7)                                            6 92457
113 oracle@dbser11 (DBW8)                                            6 92461
114 oracle@dbser11 (DBW9)                                            6 92465
115 oracle@dbser11 (DBWA)                                            6 92469
116 oracle@dbser11 (DBWB)                                            6 92474
117 oracle@dbser11 (DBWC)                                            6 92478
118 oracle@dbser11 (DBWD)                                            6 92482
119 oracle@dbser11 (DBWE)                                            6 92486
120 oracle@dbser11 (DBWF)                                            6 92490
121 oracle@dbser11 (DBWG)                                            6 92498
122 oracle@dbser11 (DBWH)                                            6 92502
123 oracle@dbser11 (DBWI)                                            6 92506
124 oracle@dbser11 (DBWJ)                                            6 92512
125 oracle@dbser11 (DBWK)                                            6 92516
126 oracle@dbser11 (DBWL)                                            6 92520
127 oracle@dbser11 (DBWM)                                            6 92524
128 oracle@dbser11 (DBWN)                                            6 92530
129 oracle@dbser11 (DBWO)                                            6 92534
130 oracle@dbser11 (CR00)                                            6 92214_92535
131 oracle@dbser11 (DBWP)                                            6 92540
133 oracle@dbser11 (DBWQ)                                            6 92546
134 oracle@dbser11 (RS01)                                            6 92214_92551
135 oracle@dbser11 (DBWR)                                            6 92550
136 oracle@dbser11 (LGWR)                                            6 92555
137 oracle@dbser11 (CKPT)                                            6 92559
138 oracle@dbser11 (CR00)                                            6 92216_92560
139 oracle@dbser11 (SMON)                                           16 92564
140 oracle@dbser11 (CR00)                                            6 92219_92565
143 oracle@dbser11 (CR00)                                            6 92226_92571
144 oracle@dbser11 (CR00)                                            6 92212_92574
145 oracle@dbser11 (LREG)                                            6 92576
146 oracle@dbser11 (CR00)                                            6 92231_92577
147 oracle@dbser11 (CR00)                                            6 92223_92578
148 oracle@dbser11 (RS02)                                            6 92216_92599
150 oracle@dbser11 (RBAL)                                            6 92584
151 oracle@dbser11 (ASMB)                                            6 92588
152 oracle@dbser11 (FENC)                                            6 92592
155 oracle@dbser11 (CR00)                                            6 92241_92602
156 oracle@dbser11 (CR00)                                            6 92245_92603
158 oracle@dbser11 (CR00)                                            6 92251_92604
159 oracle@dbser11 (RS03)                                            6 92219_92607
160 oracle@dbser11 (CR00)                                            6 92236_92608
161 oracle@dbser11 (CR00)                                            6 92262_92609
162 oracle@dbser11 (CR00)                                            6 92288_92610
163 oracle@dbser11 (CR00)                                            6 92275_92611
165 oracle@dbser11 (RS05)                                            6 92226_92614
166 oracle@dbser11 (CR00)                                            6 92271_92615
167 oracle@dbser11 (CR00)                                            6 92337_92616
168 oracle@dbser11 (CR00)                                            6 92286_92617
169 oracle@dbser11 (CR00)                                            6 92258_92620
170 oracle@dbser11 (CR00)                                            6 92332_92621
171 oracle@dbser11 (RS00)                                            6 92212_92623
172 oracle@dbser11 (CR00)                                            6 92282_92624
173 oracle@dbser11 (CR00)                                            6 92329_92625
174 oracle@dbser11 (RS06)                                            6 92231_92626
175 oracle@dbser11 (CR00)                                            6 92300_92627
176 oracle@dbser11 (CR00)                                            6 92353_92628
177 oracle@dbser11 (CR00)                                            6 92334_92629
178 oracle@dbser11 (RS04)                                            6 92223_92630
179 oracle@dbser11 (CR00)                                            6 92325_92631
180 oracle@dbser11 (CR00)                                            6 92295_92632
181 oracle@dbser11 (CR00)                                            6 92346_92633
182 oracle@dbser11 (CR00)                                            6 92310_92634
183 oracle@dbser11 (CR00)                                            6 92267_92635
184 oracle@dbser11 (CR00)                                            6 92279_92636
185 oracle@dbser11 (CR00)                                            6 92317_92637
186 oracle@dbser11 (CR00)                                            6 92304_92638
187 oracle@dbser11 (CR00)                                            6 92342_92639
188 oracle@dbser11 (CR00)                                            6 92355_92640
189 oracle@dbser11 (CR00)                                            6 92314_92641
190 oracle@dbser11 (CR00)                                            6 92321_92642
191 oracle@dbser11 (CR00)                                            6 92348_92643
192 oracle@dbser11 (RS08)                                            6 92241_92644
193 oracle@dbser11 (RS09)                                            6 92245_92645
194 oracle@dbser11 (RS0A)                                            6 92251_92646
195 oracle@dbser11 (RS07)                                            6 92236_92647
196 oracle@dbser11 (RS0C)                                            6 92262_92648
197 oracle@dbser11 (RS0J)                                            6 92288_92650
198 oracle@dbser11 (RS0F)                                            6 92275_92651
200 oracle@dbser11 (RS0E)                                            6 92271_92654
201 oracle@dbser11 (RS0V)                                            6 92337_92655
202 oracle@dbser11 (RS0I)                                            6 92286_92656
203 oracle@dbser11 (RS0B)                                            6 92258_92657
204 oracle@dbser11 (RS0T)                                            6 92332_92658
205 oracle@dbser11 (RS0H)                                            6 92282_92661
206 oracle@dbser11 (RS0S)                                            6 92329_92662
207 oracle@dbser11 (RS0L)                                            6 92300_92665
208 oracle@dbser11 (RS0Z)                                            6 92353_92666
209 oracle@dbser11 (RS0U)                                            6 92334_92667
210 oracle@dbser11 (RS0R)                                            6 92325_92668
211 oracle@dbser11 (RS0K)                                            6 92295_92669
212 oracle@dbser11 (RS0X)                                            6 92346_92670
213 oracle@dbser11 (RS0N)                                            6 92310_92673
214 oracle@dbser11 (RS0D)                                            6 92267_92674
215 oracle@dbser11 (RS0G)                                            6 92279_92675
216 oracle@dbser11 (RS0P)                                            6 92317_92676
217 oracle@dbser11 (RS0M)                                            6 92304_92678
218 oracle@dbser11 (RS0W)                                            6 92342_92679
219 oracle@dbser11 (RS10)                                            6 92355_92680
220 oracle@dbser11 (RS0O)                                            6 92314_92687
221 oracle@dbser11 (RS0Q)                                            6 92321_92703
222 oracle@dbser11 (RS0Y)                                            6 92348_92706
224 oracle@dbser11 (IMR0)                                            6 92712
226 oracle@dbser11 (LCK0)                                            6 92855
229 oracle@dbser11 (CL00)                                            E 124232
275 oracle@dbser11 (CL01)                                            E 124256
277 oracle@dbser11 (CL02)                                            E 124268
278 oracle@dbser11 (CL03)                                            E 124280
281 oracle@dbser11 (CL04)                                            E 124294
171 rows selected.
SQL>

当核心进程CL02、CL04异常后,实例肯定会crash,这是毋容置疑的。从CL04 进程的trace来看,本质上也是遭遇了heap error:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.14.0.0.0
Build label:    RDBMS_19.14.0.0.0DBRU_LINUX.X64_211224.3
ORACLE_HOME:    /u01/app/oracle/product/19.0.0/dbhome_1
System name:	Linux
Node name:	xxxxx
Release:	3.10.0-1160.el7.x86_64
Version:	#1 SMP Tue Aug 18 14:50:17 EDT 2020
Machine:	x86_64
Instance name: xxxx
Redo thread mounted by this instance: 1
Oracle process number: 281
Unix process pid: 82221, image: oracle@xxxxx (CL04)
......
......
[TOC00000]
Jump to table of contents
Dump continued from file: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc
[TOC00001]
ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], []
[TOC00001-END]
[TOC00002]
========= Dump for incident 2726200 (ORA 600 [17112]) ========
[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
********** Internal heap ERROR 17112 addr=0xdb4ccbc0 *********
***** Dump of memory around addr 0xdb4ccbc0:
0DB4CBBC0 20202020 20202020 20202020 20202020  [                ]
Repeat 511 times
Decoding of possible comments in or near previous range
[0xdb4caa68] = 0x14edadd8 ==> kkqcscpfro:kglhd2
[0xdb4d2bf8] = 0x13f5fff0 ==> audRegFro:audtab

进一步查看相关堆栈情况可以看到如下内容:

Error Descriptor: ORA-600 [17112] [0x0DB4CCBC0] [] [] [] [] [] [] [] [] [] []
Error class: 0
Problem Key # of args: 1
Number of actions: 18
----- Incident Context Dump -----
Address: 0x7f8997316de0
Incident ID: 2726200
Problem Key: ORA 600 [17112]
Error: ORA-600 [17112] [0x0DB4CCBC0] [] [] [] [] [] [] [] [] [] []
[00]: dbgexExplicitEndInc [diag_dde]
[01]: dbgeEndDDEInvocationImpl [diag_dde]
[02]: kgherror_flag [KGH]<-- Signaling
[03]: kgherror_quar_chk [KGH]
[04]: kghfre [KGH]
[05]: kghfrh_internal [KGH]
[06]: kksFreeHeap [cursor]
[07]: kksLockRecovery [cursor]
[08]: kgxCleanup []
[09]: kksClearMutexSessionState [cursor]
[10]: kksCleanSessionState [cursor]
[11]: ksudlp_int [ksu]
[12]: ksudlp [ksu]
[13]: kss_del_cb [state_object]
[14]: kssxdl [state_object]
[15]: kssdel [state_object]
[16]: ksuxdl [ksu]
[17]: ksucln_dpc_cleanup [ksu]
[18]: ksucln_dpc_dfs [ksu]
[19]: ksucln_dpc_main [ksu]
[20]: ksucln_slave_main [ksu]
[21]: ksbdispatch [background_proc]
[22]: opirip [OPI]
[23]: opidrv [OPI]
[24]: sou2o []
[25]: opimai_real [OPI]
[26]: ssthrdmain []
[27]: main []
[28]: __libc_start_main []
MD [00]: 'SID'='6899.5088' (0x2)
MD [01]: 'ProcId'='281.2' (0x2)
MD [02]: 'Service'='SYS$BACKGROUND' (0x200)

我们可以看到Oracle在调用khfre进行内存释放时发现了heap error,最终触发了此次问题。根据同事的反馈说之前测试过程中也出现了类似的问题,也出现过导致宕机的问题,报错一样。同时也定位是同一个业务逻辑。最后我建议应用将该业务逻辑中相关nologging hint全部去掉;后面在看似乎错误少了很多。至少alert log来看,没有类似的错误了。

针对上述的其中一个ora-00600错误,搜索Oracle MOS也可以看到类似的bug描述,供参考:

Bug 28276054 – Various ORA-600 / ORA-7445 Internal Errors Raised When Using PLSQL Reset Session (Doc ID 28276054.8)

不过该bug没有相关的解决方案,而且看上去说在19.7之后已经解决,然而我们这里是19.14。由此可见仍然存在类似的问题。

Bug 28276054  Various ORA-600 / ORA-7445 Internal Errors Raised When Using PLSQL Reset Session

 This note gives a brief overview of bug 28276054.
The content was last updated on: 08-MAR-2022
Click here for details of each of the sections below.

Affects:

Product (Component) Oracle Server (Xdb)
Range of versions believed to be affected Versions BELOW 20.1
Versions confirmed as being affected
Platforms affected Generic (all / most platforms affected)

Fixed:

The fix for 28276054 is first included in

Interim patches may be available for earlier versions – click here to check.

Symptoms:

Related To:

Description

Various internal errors can occur due to a memory corruption that happens when we try to clear
a handle resource for a DOM.
REDISCOVERY INFORMATION:
If various ORA-600 / ORA-7445 errors occur due to a bad pointer "kgiobicd", then you may be hitting
this issue. This has been known to be associated with inconsistencies in "qmxdpls_subhea" subheaps
when trying to free UGA memory while reinitializing a session from PLSQL.
Workaround
None.

最后至于说该问题是不是Oracle Bug导致的crash,我认为可能性是极大的。


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注