11.2.0.4版本仍然存在的一个未修复Bug ora-00600 [13013]

之前某客户的报表系统在运行业务期间遭遇Ora-00600 [13013]错误;该错误本质上来讲是非常常见的;如果大家搜索也会发现有部分bug的存在。然而根据的要求,必须要有一个最终结论。熟悉我们来看下报错:

ORA-00600: internal error code, arguments: [13013], [5001], [3636415], [98890439], [54], [98890439], [17], [], [], [], [], []

针对该错误,Oracle mos有文档进行了解释,这里不多说:

ORA-600 [13013] [a] [b] [c] [d] [e] [f]

Arg [a] Passcount
Arg [b] Data Object number
Arg [c] Tablespace Relative DBA of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Relative DBA of block being updated (should be same as [c])
Arg [f] Code

根据查询我们可以很容易定位到是什么表:

SQL> select owner,object_name,object_type,object_id,data_object_id from dba_objects where  data_object_id=3636415;

OWNER      OBJECT_NAME                    OBJECT_TYPE          OBJECT_ID DATA_OBJECT_ID
---------- ------------------------------ ------------------- ---------- --------------
FSD        PT_PARTY_FIN_REPT_DTL_H_200404 TABLE                  1061458        3636415

这里为什么表名称带时间呢?我们进一步看下报错的业务代码:

UPDATE FSD.PT_PARTY_FIN_REPT_DTL_H DEST
   SET EDATE = :B2
 WHERE NOT EXISTS (SELECT REPORT_NO, REPORT_DATA_DATE
          FROM TMPTBL.PT_PARTY_FIN_REPT_DTL_H_TMP01 SOUR
         WHERE DEST.REPORT_NO = SOUR.REPORT_NO
           AND DEST.REPORT_DATA_DATE = SOUR.REPORT_DATA_DATE
           AND DEST.COMPANY = SOUR.COMPANY)
   AND :B2 BETWEEN DEST.SDATE AND DEST.EDATE
   AND :B1 = DEST.EDATE是

其中上述代码为存储过程FSD.PR_PT_PARTY_FIN_REPT_H 中一部分代码;这里之所以出现了表名称带月份的原因是因为存储过程在执行时传入了相关参数;这一点从报错的trace文件中也可以验证这一点:

ORA-00600: internal error code, arguments: [13013], [5001], [3636415], [98890439], [54], [98890439], [17], [], [], [], [], []

……

al8sqlp: at 0x7ffee7411240

[49474542 5250204E 5F54505F 54524150]  [BEGIN PR_PT_PART]

[49465F59 45525F4E 445F5450 485F4C54]  [Y_FIN_REPT_DTL_H]

[2C313A28 3B29323A 444E4520 DE54003B]  [(:1,:2); END;.T.]

我们进一步来看下trace文件的堆栈信息:

----- Incident Context Dump -----
Address: 0x7ffee74055b0
Incident ID: 1001549
Problem Key: ORA 600 [13013]
Error: ORA-600 [13013] [5001] [3636415] [98890439] [54] [98890439] [17] [] [] [] [] []
[00]: dbgexProcessError [diag_dde]
[01]: dbgeExecuteForError [diag_dde]
[02]: dbgePostErrorKGE [diag_dde]
[03]: dbkePostKGE_kgsf [rdbms_dde]
[04]: kgeade []
[05]: kgeriv_int []
[06]: kgeriv []
[07]: kgesiv []
[08]: ksesic6 [KSE]
[09]: updThreePhaseExe [DML]<-- Signaling
[10]: updexe [DML]
[11]: opiexe []
[12]: opipls []
[13]: opiodr []
[14]: rpidrus []
[15]: skgmstack []
[16]: rpiswu2 []
[17]: rpidrv []
[18]: psddr0 [PLSQL_PSD_Generic]
[19]: psdnal [PLSQL_PSD_Generic]
[20]: pevm_EXECC [PLSQL_Code_Execution]
[21]: pfrinstr_EXECC [PLSQL_Code_Execution]
[22]: pfrrun_no_tool [PLSQL_Code_Execution]
[23]: pfrrun [PLSQL_Code_Execution]
[24]: plsql_run [PLSQL_PSD_Standalones]
[25]: peicnt [PLSQL_Code_Execution]
[26]: kkxexe []
[27]: opiexe []
[28]: kpoal8 []
[29]: opiodr []
[30]: ttcpip []
[31]: opitsk []
[32]: opiino []
[33]: opiodr []
[34]: opidrv []
[35]: sou2o []
[36]: opimai_real []
[37]: ssthrdmain []
[38]: main []

从堆栈来看在update时报错。由于该问题发生的时间比较久了,用户之前已经对表进行了重建,因此现在在客户环境无法再重现了。因此我们只能自己想办法了。

经过分析Oracle 有2个bug的描述是非常像的。

Bug 16086769 – ORA-600 [13011] ORA-600 [13013] when executing a DML if the WHERE clause includes an added column with a default value (Doc ID 16086769.8)

Bug 12345717 – ORA-600 [13013] or hang/spin from MERGE into table with added column (Doc ID 12345717.8)

其中下面这篇文档虽然提及到的是merge报错,但是文中提到的堆栈信息基本上一致;同时该文档提供了脚本的测试验证脚本;这里我们在测试环境中(版本和PSU跟客户环境一致)进行了测试并复现了该问题。如下是简单的测试过程:

SQL> CREATE TABLE TAB1 ( ID_NACE NUMBER(5) );
insert into TAB1 values(0);

ALTER TABLE TAB1 ADD (

Table dropped.

SQL>
Table created.

SQL>
1 row created.

SQL> SQL>   2     ID_INDUSTRY  NUMBER(5) DEFAULT -1 NOT NULL
  3   );


Table altered.

SQL> SQL>  ALTER TABLE TAB1 ADD (
  2     ID_ind2  varchar2(1) DEFAULT 'U' NOT NULL
  3   );

Table altered.

SQL>
SQL> ALTER TABLE TAB1 ADD (
  2     err  varchar2(30)
  3   );

Table altered.

SQL>
SQL>  insert into TAB1 values(1, 1, 'C',null);

1 row created.

SQL>  insert into TAB1 values(2, 2, 'C',null);

1 row created.

SQL>  drop table tab2;
 CREATE TABLE TAB2 (
   ID_NACE      NUMBER(5),

Table dropped.

SQL>   2    3     ID_INDUSTRY  NUMBER(5)
  4   );

Table created.

SQL>  insert into TAB2 values(1, 3);

1 row created.

SQL>  commit;

Commit complete.

SQL>
SQL> oradebug setmypid
Statement processed.
SQL> alter session set events 'TRACE[DML] disk=highest';

Session altered.

SQL> update tab1
  2  set err = 'xxxxx'
  3  where id_ind2 = 'C'
  4    and not exists (select 1 from tab2 where tab2.id_nace = tab1.id_nace and
  5  tab2.ID_INDUSTRY = tab1.ID_INDUSTRY);
update tab1
       *
ERROR at line 1:
ORA-00600: internal error code, arguments: [13013], [5001], [80978],
[20972967], [1], [20972967], [17], [], [], [], [], []

进一步分析上述trace文件发现,确实存在针对列的compcol操作:

kflag

[0] CMPCOL

cmpp (2) c1 02

[1] CMPCOL

cmpp (2) c1 03

[2] CMPCOL

cmpp (1) 43

[3] UPDCOL

updp (5) 78 78 78 78 78

updrowFastPath: kddlkr objn 80978 table 0  rowid 00013c52.014005a7.1 code 17

updThreePhaseExe:objn=80978 pass=4999 stat=2 err=17

updThreePhaseExe:began locking pass 5000

这里tab1测试表前面2个列是含非null default的。因此存在cmpcol操作。对于第三个不含not null default的,则操作为updcol。

下面我们来调整数据库参数规避该问题:

SQL> alter system set "_add_col_optim_enabled"=false;

System altered.

SQL> @hide
SQL> set echo off
Enter Search Parameter (i.e. max|all) : add_col_optim

PARAMETER                                SESSION_VALUE        INSTANCE_VALUE       DESCRIPTION
---------------------------------------- -------------------- -------------------- ------------------------------------------------------------
_add_col_optim_enabled                   FALSE                FALSE                Allows new add column optimization

SQL> drop table tab1;
CREATE TABLE TAB1 ( ID_NACE NUMBER(5) );
insert into TAB1 values(0);

ALTER TABLE TAB1 ADD (
   ID_INDUSTRY  NUMBER(5) DEFAULT -1 NOT NULL
 );


Table dropped.

SQL>  ALTER TABLE TAB1 ADD (

Table created.

SQL>    ID_ind2  varchar2(1) DEFAULT 'U' NOT NULL
 );

1 row created.

SQL> SQL>   2    3
Table altered.

SQL> SQL>   2    3

Table altered.

SQL> SQL> ALTER TABLE TAB1 ADD (
  2     err  varchar2(30)
  3   );


Table altered.

SQL> SQL>  insert into TAB1 values(1, 1, 'C',null);

1 row created.

SQL>  insert into TAB1 values(2, 2, 'C',null);

1 row created.

SQL>  drop table tab2;
 CREATE TABLE TAB2 (
   ID_NACE      NUMBER(5),
   ID_INDUSTRY  NUMBER(5)

Table dropped.

SQL>   2    3    4   );

Table created.

SQL>  insert into TAB2 values(1, 3);

1 row created.

SQL>  commit;

Commit complete.

SQL>
SQL> oradebug setmypid
Statement processed.
SQL> alter session set events 'TRACE[DML] disk=highest';

Session altered.

SQL> update tab1
  2  set err = 'xxxxx'
  3  where id_ind2 = 'C'
  4    and not exists (select 1 from tab2 where tab2.id_nace = tab1.id_nace and
  5  tab2.ID_INDUSTRY = tab1.ID_INDUSTRY);

2 rows updated.

此时的dml trace中的内容如下:

updSetExecCmpColInfo: not RHS: objn=80976, cid=1
  kduukcmpf=0x7ffff50942da, kduukcmpl=0x7ffff50942d8, kduukcmpp=(nil)
updSetExecCmpColInfo: not RHS: objn=80976, cid=2
  kduukcmpf=0x7ffff5094322, kduukcmpl=0x7ffff5094320, kduukcmpp=(nil)
updSetExecCmpColInfo: not RHS: objn=80976, cid=3
  kduukcmpf=0x7ffff509436a, kduukcmpl=0x7ffff5094368, kduukcmpp=(nil)
updThreePhaseExe: objn=80976 phase=NOT LOCKED
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x0000.000c17b5  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  96sch: scn: 0x0000.00000000  mascn: (scn: 0x0000.000c167f) env: (scn: 0x0000.00000000  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  512sch: scn: 0x0000.00000000  mascn: (scn: 0x0000.00000000)

可见此时是没有任何问题的,通过调整该参数成功避免了该问题。

该问题本质上是Oracle 11.1引入的一个新特性;然而也引入了一些Bug。从测试来看在11.2.0.4.180717 这个版本都仍然存在。不过我们测试12.2.0.1版本已经不存在这个问题(既然参数保持默认值)。


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注