近几天某客户核心业务系统进行全面改造,将其他数据迁移并加工处理到zdata一体机环境中;其中数据库环境为Oracle RAC 19.14版本,4个计算节点,存储节点为5个zdata stroage(全闪)。整体性能是比较强劲的。
然而此次业务迁移改造,所有业务逻辑处理几乎均为PL/SQL来实现,每个节都同时调用数十个Job运行,且采用了大量的nologging操作,最终导致某个节点instance crash,如下所示:
2022-05-25T18:41:31.513484+08:00 Thread 1 advanced to log sequence 33773 (LGWR switch), current SCN: 17721853016325 Current log# 2 seq# 33773 mem# 0: +DG_DATA01/SCSBGJB/ONLINELOG/group_2.275.1098292431 2022-05-25T18:41:56.795929+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc (incident=2727144) (PDBNAME=PDBSCSB04): ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] PDBSCSB04(7):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc (incident=2727145) (PDBNAME=PDBSCSB04): ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] PDBSCSB04(7):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc (incident=2727146) (PDBNAME=PDBSCSB04): ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] PDBSCSB04(7):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2727146/xxxxx1_j00c_45595_i2727146.trc 2022-05-25T18:41:58.696027+08:00 Thread 1 advanced to log sequence 33774 (LGWR switch), current SCN: 17721853454395 Current log# 3 seq# 33774 mem# 0: +DG_DATA01/xxxxx/ONLINELOG/group_3.273.1098292435 2022-05-25T18:42:01.188468+08:00 。。。。。 2022-05-25T18:44:02.245773+08:00 PDBSCSB00(3):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_89170.trc (incident=2732186) (PDBNAME=PDBSCSB00): ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [kghfrmrg:nxt], [0x0FB6CEBE0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [kghfrh:ds], [0x0DB4CCBD8], [], [], [], [], [], [], [], [], [], [] PDBSCSB00(3):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2732186/xxxxx1_ora_89170_i2732186.trc 2022-05-25T18:44:06.010875+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc (incident=2726200) (PDBNAME=PDBSCSB00): ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], [] PDBSCSB00(3):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2726200/xxxxx1_cl04_82221_i2726200.trc PDBSCSB00(3):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2022-05-25T18:44:06.510726+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc (incident=2735720) (PDBNAME=PDBSCSB02): ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], [] PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735720/xxxxx1_p01v_60845_i2735720.trc PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2022-05-25T18:44:08.899072+08:00 opidrv aborting process CL04 ospid (82221) as a result of ORA-600 2022-05-25T18:44:08.899220+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc: ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], [] 2022-05-25T18:44:12.455063+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc (incident=2735721) (PDBNAME=PDBSCSB02): ORA-00600: 内部错误代码, 参数: [kghfrmrg:nxt], [0x0FB6CFBE0], [], [], [], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], [] PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735721/xxxxx1_p01v_60845_i2735721.trc 2022-05-25T18:44:15.338840+08:00 PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc (incident=2735722) (PDBNAME=PDBSCSB02): ORA-00700: 软内部错误, 参数: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghfrmrg:nxt], [0x0FB6CFBE0], [], [], [], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], [] PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735722/xxxxx1_p01v_60845_i2735722.trc 2022-05-25T18:44:18.354945+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl02_82217.trc (incident=2726160) (PDBNAME=PDBSCSB02): ORA-00600: internal error code, arguments: [17112], [0x0DB4CDBC0], [], [], [], [], [], [], [], [], [], [] PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2726160/xxxxx1_cl02_82217_i2726160.trc PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2022-05-25T18:44:18.414790+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_90359.trc (incident=2727648) (PDBNAME=PDBSJQY): ORA-00600: internal error code, arguments: [17147], [0x0DB4CABC0], [], [], [], [], [], [], [], [], [], [] PDBSJQY(8):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2727648/xxxxx1_ora_90359_i2727648.trc PDBSJQY(8):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2022-05-25T18:44:21.695996+08:00 opidrv aborting process CL02 ospid (82217) as a result of ORA-600 2022-05-25T18:44:21.696114+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl02_82217.trc: ORA-00600: internal error code, arguments: [17112], [0x0DB4CDBC0], [], [], [], [], [], [], [], [], [], [] 2022-05-25T18:44:21.702634+08:00 Dumping diagnostic data in directory=[cdmp_20220525184421], requested by (instance=1, osid=82217 (CL02)), summary=[incident=2726160]. 2022-05-25T18:44:21.787634+08:00 PMON (ospid: 72253): terminating the instance due to ORA error 12752 2022-05-25T18:44:21.787800+08:00
从上述节点alert log来看,出现了大量的ora-00600和ora-07445 错误。其中kghuclientasp_03 相对少见,另外ksepop:1 ksepop recursion、kghfrmrg:nxt、[kghfrh:ds] 也是见过不少了。后面2个都与Oracle内存有关系。
实际上对于kghuclientasp_03这个函数而言,从前面的关键字可以猜出也也必然跟内存有关。
从其中一个trace文件中可以看到如下内容:
Chunk 7fa8d2191340 sz= 1600 alloc "pmuccst: adt/re" Chunk 7fa8d2191980 sz= 1600 alloc "pmuccst: adt/re" ERROR, BATCH-HEAP MISMATCH for batch 68 [7fa8d1e90000][7fa8d1e9a060] BATCH HEADER 68 addr=7fa8d1ea4aa8 (prv=7fa8d218e020 nxt=7fa8d2316688) Chunk 7fa8d1ea4ad0 sz= 104 alloc "pl/sql vc2 " ERROR, ZERO-SIZED CHUNK addr=7fa8d1ea4b38 BATCH HEADER 69 addr=7fa8d2316670 (prv=7fa8d1ea4ac0 nxt=7fa8d23176c8) Chunk 7fa8d2316698 sz= 72 alloc "pl/sql vc2 " Chunk 7fa8d23166e0 sz= 64 alloc "pl/sql vc2 "
从上述信息不难看出,确实出现了ERROR,batch-heap mismatch的报错信息,而且对应均为PL/SQL操作。
进一步搜索发现相关进程的操作几乎均为INSERT /*+ nologging */ INTO 操作;由于是并发调用,而且均是数十个JOB同时调用,因此这里我们不得不怀疑,在nologging+并发过大的情况下,会出现heap error的情况。
当然对于此次实例crash,进一步分析可以看到本质是CL02、CL04进程异常最后导致实例crash掉。
从Oracle 12c 开始引入了clmn(进程清理主进程即 Cleanup Main Process)以及CLnn(进程清理辅进程即Cleanup Helper Processes)。这2类进程的引入,可以极大的缓解PMON进程的压力。
对于CL进程,本质上也是Oracle RAC的核心进程;对于Oracle核心进程(即不能kill的进程),可通过如下脚本进行查询:
SQL> SELECT indx,ksuprpnm,TO_CHAR(ksuprflg,'XXXXXXXXXXXXXXXX'),KSUPROSID 2 FROM x$ksupr 3 WHERE BITAND(ksuprflg,4) = 4 ORDER BY indx 4 / INDX KSUPRPNM TO_CHAR(KSUPRFLG, KSUPROSID ---------- ------------------------------------------------ ----------------- ------------------------ 2 oracle@dbser11 (PMON) E 92141 3 oracle@dbser11 (CLMN) E 92145 4 oracle@dbser11 (PSP0) 6 92149 5 oracle@dbser11 (IPC0) 6 92156 6 oracle@dbser11 (VKTM) 6 92161 7 oracle@dbser11 (GEN0) 6 92167 8 oracle@dbser11 (MMAN) 6 92171 15 oracle@dbser11 (DBRM) 6 92186 19 oracle@dbser11 (ACMS) 6 92196 20 oracle@dbser11 (PMAN) 6 92200 22 oracle@dbser11 (LMON) 6 92206 23 oracle@dbser11 (LMD0) 6 92210 24 oracle@dbser11 (LMS0) 6 92212_92222 26 oracle@dbser11 (LMS1) 6 92214_92224 28 oracle@dbser11 (LMS2) 6 92216_92227 30 oracle@dbser11 (LMS3) 6 92219_92239 32 oracle@dbser11 (LMS4) 6 92223_92238 34 oracle@dbser11 (LMS5) 6 92226_92242 36 oracle@dbser11 (LMS6) 6 92231_92247 38 oracle@dbser11 (LMS7) 6 92236_92254 40 oracle@dbser11 (LMS8) 6 92241_92264 42 oracle@dbser11 (LMS9) 6 92245_92266 44 oracle@dbser11 (LMSA) 6 92251_92270 46 oracle@dbser11 (LMSB) 6 92258_92283 48 oracle@dbser11 (LMSC) 6 92262_92285 50 oracle@dbser11 (LMSD) 6 92267_92293 52 oracle@dbser11 (LMSE) 6 92271_92297 54 oracle@dbser11 (LMSF) 6 92275_92298 56 oracle@dbser11 (LMSG) 6 92279_92306 58 oracle@dbser11 (LMSH) 6 92282_92312 60 oracle@dbser11 (LMSI) 6 92286_92316 62 oracle@dbser11 (LMSJ) 6 92288_92335 64 oracle@dbser11 (LMSK) 6 92295_92327 66 oracle@dbser11 (LMSL) 6 92300_92358 68 oracle@dbser11 (LMSM) 6 92304_92344 70 oracle@dbser11 (LMSN) 6 92310_92351 72 oracle@dbser11 (LMSO) 6 92314_92362 74 oracle@dbser11 (LMSP) 6 92317_92374 76 oracle@dbser11 (LMSQ) 6 92321_92373 78 oracle@dbser11 (LMSR) 6 92325_92380 80 oracle@dbser11 (LMSS) 6 92329_92359 82 oracle@dbser11 (LMST) 6 92332_92391 84 oracle@dbser11 (LMSU) 6 92334_92387 86 oracle@dbser11 (LMSV) 6 92337_92360 88 oracle@dbser11 (LMSW) 6 92342_92372 90 oracle@dbser11 (LMSX) 6 92346_92365 92 oracle@dbser11 (LMSY) 6 92348_92361 94 oracle@dbser11 (LMSZ) 6 92353_92397 96 oracle@dbser11 (LM10) 6 92355_92401 98 oracle@dbser11 (LMD1) 6 92357 99 oracle@dbser11 (LMD2) 6 92390 100 oracle@dbser11 (LMD3) 6 92409 101 oracle@dbser11 (LMD4) 6 92414 102 oracle@dbser11 (RMS0) 6 92418 104 oracle@dbser11 (LCK1) 6 92424 105 oracle@dbser11 (DBW0) 6 92429 106 oracle@dbser11 (DBW1) 6 92433 107 oracle@dbser11 (DBW2) 6 92437 108 oracle@dbser11 (DBW3) 6 92441 109 oracle@dbser11 (DBW4) 6 92445 110 oracle@dbser11 (DBW5) 6 92449 111 oracle@dbser11 (DBW6) 6 92453 112 oracle@dbser11 (DBW7) 6 92457 113 oracle@dbser11 (DBW8) 6 92461 114 oracle@dbser11 (DBW9) 6 92465 115 oracle@dbser11 (DBWA) 6 92469 116 oracle@dbser11 (DBWB) 6 92474 117 oracle@dbser11 (DBWC) 6 92478 118 oracle@dbser11 (DBWD) 6 92482 119 oracle@dbser11 (DBWE) 6 92486 120 oracle@dbser11 (DBWF) 6 92490 121 oracle@dbser11 (DBWG) 6 92498 122 oracle@dbser11 (DBWH) 6 92502 123 oracle@dbser11 (DBWI) 6 92506 124 oracle@dbser11 (DBWJ) 6 92512 125 oracle@dbser11 (DBWK) 6 92516 126 oracle@dbser11 (DBWL) 6 92520 127 oracle@dbser11 (DBWM) 6 92524 128 oracle@dbser11 (DBWN) 6 92530 129 oracle@dbser11 (DBWO) 6 92534 130 oracle@dbser11 (CR00) 6 92214_92535 131 oracle@dbser11 (DBWP) 6 92540 133 oracle@dbser11 (DBWQ) 6 92546 134 oracle@dbser11 (RS01) 6 92214_92551 135 oracle@dbser11 (DBWR) 6 92550 136 oracle@dbser11 (LGWR) 6 92555 137 oracle@dbser11 (CKPT) 6 92559 138 oracle@dbser11 (CR00) 6 92216_92560 139 oracle@dbser11 (SMON) 16 92564 140 oracle@dbser11 (CR00) 6 92219_92565 143 oracle@dbser11 (CR00) 6 92226_92571 144 oracle@dbser11 (CR00) 6 92212_92574 145 oracle@dbser11 (LREG) 6 92576 146 oracle@dbser11 (CR00) 6 92231_92577 147 oracle@dbser11 (CR00) 6 92223_92578 148 oracle@dbser11 (RS02) 6 92216_92599 150 oracle@dbser11 (RBAL) 6 92584 151 oracle@dbser11 (ASMB) 6 92588 152 oracle@dbser11 (FENC) 6 92592 155 oracle@dbser11 (CR00) 6 92241_92602 156 oracle@dbser11 (CR00) 6 92245_92603 158 oracle@dbser11 (CR00) 6 92251_92604 159 oracle@dbser11 (RS03) 6 92219_92607 160 oracle@dbser11 (CR00) 6 92236_92608 161 oracle@dbser11 (CR00) 6 92262_92609 162 oracle@dbser11 (CR00) 6 92288_92610 163 oracle@dbser11 (CR00) 6 92275_92611 165 oracle@dbser11 (RS05) 6 92226_92614 166 oracle@dbser11 (CR00) 6 92271_92615 167 oracle@dbser11 (CR00) 6 92337_92616 168 oracle@dbser11 (CR00) 6 92286_92617 169 oracle@dbser11 (CR00) 6 92258_92620 170 oracle@dbser11 (CR00) 6 92332_92621 171 oracle@dbser11 (RS00) 6 92212_92623 172 oracle@dbser11 (CR00) 6 92282_92624 173 oracle@dbser11 (CR00) 6 92329_92625 174 oracle@dbser11 (RS06) 6 92231_92626 175 oracle@dbser11 (CR00) 6 92300_92627 176 oracle@dbser11 (CR00) 6 92353_92628 177 oracle@dbser11 (CR00) 6 92334_92629 178 oracle@dbser11 (RS04) 6 92223_92630 179 oracle@dbser11 (CR00) 6 92325_92631 180 oracle@dbser11 (CR00) 6 92295_92632 181 oracle@dbser11 (CR00) 6 92346_92633 182 oracle@dbser11 (CR00) 6 92310_92634 183 oracle@dbser11 (CR00) 6 92267_92635 184 oracle@dbser11 (CR00) 6 92279_92636 185 oracle@dbser11 (CR00) 6 92317_92637 186 oracle@dbser11 (CR00) 6 92304_92638 187 oracle@dbser11 (CR00) 6 92342_92639 188 oracle@dbser11 (CR00) 6 92355_92640 189 oracle@dbser11 (CR00) 6 92314_92641 190 oracle@dbser11 (CR00) 6 92321_92642 191 oracle@dbser11 (CR00) 6 92348_92643 192 oracle@dbser11 (RS08) 6 92241_92644 193 oracle@dbser11 (RS09) 6 92245_92645 194 oracle@dbser11 (RS0A) 6 92251_92646 195 oracle@dbser11 (RS07) 6 92236_92647 196 oracle@dbser11 (RS0C) 6 92262_92648 197 oracle@dbser11 (RS0J) 6 92288_92650 198 oracle@dbser11 (RS0F) 6 92275_92651 200 oracle@dbser11 (RS0E) 6 92271_92654 201 oracle@dbser11 (RS0V) 6 92337_92655 202 oracle@dbser11 (RS0I) 6 92286_92656 203 oracle@dbser11 (RS0B) 6 92258_92657 204 oracle@dbser11 (RS0T) 6 92332_92658 205 oracle@dbser11 (RS0H) 6 92282_92661 206 oracle@dbser11 (RS0S) 6 92329_92662 207 oracle@dbser11 (RS0L) 6 92300_92665 208 oracle@dbser11 (RS0Z) 6 92353_92666 209 oracle@dbser11 (RS0U) 6 92334_92667 210 oracle@dbser11 (RS0R) 6 92325_92668 211 oracle@dbser11 (RS0K) 6 92295_92669 212 oracle@dbser11 (RS0X) 6 92346_92670 213 oracle@dbser11 (RS0N) 6 92310_92673 214 oracle@dbser11 (RS0D) 6 92267_92674 215 oracle@dbser11 (RS0G) 6 92279_92675 216 oracle@dbser11 (RS0P) 6 92317_92676 217 oracle@dbser11 (RS0M) 6 92304_92678 218 oracle@dbser11 (RS0W) 6 92342_92679 219 oracle@dbser11 (RS10) 6 92355_92680 220 oracle@dbser11 (RS0O) 6 92314_92687 221 oracle@dbser11 (RS0Q) 6 92321_92703 222 oracle@dbser11 (RS0Y) 6 92348_92706 224 oracle@dbser11 (IMR0) 6 92712 226 oracle@dbser11 (LCK0) 6 92855 229 oracle@dbser11 (CL00) E 124232 275 oracle@dbser11 (CL01) E 124256 277 oracle@dbser11 (CL02) E 124268 278 oracle@dbser11 (CL03) E 124280 281 oracle@dbser11 (CL04) E 124294 171 rows selected. SQL>
当核心进程CL02、CL04异常后,实例肯定会crash,这是毋容置疑的。从CL04 进程的trace来看,本质上也是遭遇了heap error:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.14.0.0.0 Build label: RDBMS_19.14.0.0.0DBRU_LINUX.X64_211224.3 ORACLE_HOME: /u01/app/oracle/product/19.0.0/dbhome_1 System name: Linux Node name: xxxxx Release: 3.10.0-1160.el7.x86_64 Version: #1 SMP Tue Aug 18 14:50:17 EDT 2020 Machine: x86_64 Instance name: xxxx Redo thread mounted by this instance: 1 Oracle process number: 281 Unix process pid: 82221, image: oracle@xxxxx (CL04) ...... ...... [TOC00000] Jump to table of contents Dump continued from file: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc [TOC00001] ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], [] [TOC00001-END] [TOC00002] ========= Dump for incident 2726200 (ORA 600 [17112]) ======== [TOC00003] ----- Beginning of Customized Incident Dump(s) ----- ********** Internal heap ERROR 17112 addr=0xdb4ccbc0 ********* ***** Dump of memory around addr 0xdb4ccbc0: 0DB4CBBC0 20202020 20202020 20202020 20202020 [ ] Repeat 511 times Decoding of possible comments in or near previous range [0xdb4caa68] = 0x14edadd8 ==> kkqcscpfro:kglhd2 [0xdb4d2bf8] = 0x13f5fff0 ==> audRegFro:audtab
进一步查看相关堆栈情况可以看到如下内容:
Error Descriptor: ORA-600 [17112] [0x0DB4CCBC0] [] [] [] [] [] [] [] [] [] [] Error class: 0 Problem Key # of args: 1 Number of actions: 18 ----- Incident Context Dump ----- Address: 0x7f8997316de0 Incident ID: 2726200 Problem Key: ORA 600 [17112] Error: ORA-600 [17112] [0x0DB4CCBC0] [] [] [] [] [] [] [] [] [] [] [00]: dbgexExplicitEndInc [diag_dde] [01]: dbgeEndDDEInvocationImpl [diag_dde] [02]: kgherror_flag [KGH]<-- Signaling [03]: kgherror_quar_chk [KGH] [04]: kghfre [KGH] [05]: kghfrh_internal [KGH] [06]: kksFreeHeap [cursor] [07]: kksLockRecovery [cursor] [08]: kgxCleanup [] [09]: kksClearMutexSessionState [cursor] [10]: kksCleanSessionState [cursor] [11]: ksudlp_int [ksu] [12]: ksudlp [ksu] [13]: kss_del_cb [state_object] [14]: kssxdl [state_object] [15]: kssdel [state_object] [16]: ksuxdl [ksu] [17]: ksucln_dpc_cleanup [ksu] [18]: ksucln_dpc_dfs [ksu] [19]: ksucln_dpc_main [ksu] [20]: ksucln_slave_main [ksu] [21]: ksbdispatch [background_proc] [22]: opirip [OPI] [23]: opidrv [OPI] [24]: sou2o [] [25]: opimai_real [OPI] [26]: ssthrdmain [] [27]: main [] [28]: __libc_start_main [] MD [00]: 'SID'='6899.5088' (0x2) MD [01]: 'ProcId'='281.2' (0x2) MD [02]: 'Service'='SYS$BACKGROUND' (0x200)
我们可以看到Oracle在调用khfre进行内存释放时发现了heap error,最终触发了此次问题。根据同事的反馈说之前测试过程中也出现了类似的问题,也出现过导致宕机的问题,报错一样。同时也定位是同一个业务逻辑。最后我建议应用将该业务逻辑中相关nologging hint全部去掉;后面在看似乎错误少了很多。至少alert log来看,没有类似的错误了。
针对上述的其中一个ora-00600错误,搜索Oracle MOS也可以看到类似的bug描述,供参考:
Bug 28276054 – Various ORA-600 / ORA-7445 Internal Errors Raised When Using PLSQL Reset Session (Doc ID 28276054.8)
不过该bug没有相关的解决方案,而且看上去说在19.7之后已经解决,然而我们这里是19.14。由此可见仍然存在类似的问题。
|
最后至于说该问题是不是Oracle Bug导致的crash,我认为可能性是极大的。
发表回复