今天一同事遇到一个600错误,比较怪异,发过来看一下,如下:
SQL> select inst_id,count(1) from gv$session group by inst_id;
select inst_id,count(1) from gv$session group by inst_id
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [],
[], []
SQL> select count(*) from gv$session group by inst_id;
select count(*) from gv$session group by inst_id
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [],
[], []
对于的trace call stack信息如下:
*** 2011-07-03 21:24:47.324
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
Current SQL statement for this session:
select inst_id,count(:"SYS_B_0") from gv$session group by inst_id
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+64 call ksedst1() 000000000 ? 000000001 ?
ksedmp()+2176 call ksedst() 000000000 ?
C000000000000C9F ?
4000000003FEDB60 ?
000000000 ? 000000000 ?
000000000 ?
ksfdmp()+48 call ksedmp() 000000003 ?
kgerinv()+304 call ksfdmp() C000000000000612 ?
000000003 ?
4000000009441D90 ?
000030223 ? 000000000 ?
000000000 ?
kgeasnmierr()+144 call kgerinv() 6000000000031340 ?
40000000019EF530 ?
60000000000323F8 ?
40000000019EF530 ?
9FFFFFFFFFFF35C0 ?
$cold_kgeade()+64 call kgeasnmierr() 6000000000031340 ?
9FFFFFFFBF3A6310 ?
9FFFFFFFBF3A6320 ?
6000000000032770 ?
9FFFFFFFBF45E568 ?
C0000002EED97058 ?
000000000 ? 000000002 ?
kgerev()+96 call $cold_kgeade() 6000000000031340 ?
60000000000314C0 ?
9FFFFFFFBF3A6310 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
000000000 ?
kserec0()+160 call kgerev() 6000000000031340 ?
9FFFFFFFBF3A6310 ?
000000000 ?
6000000000032760 ?
9FFFFFFFFFFF3688 ?
$cold_kxfpg1sg()+35 call kserec0() 000000000 ? 000000002 ?
52 9FFFFFFFFFFF35A0 ?
60000000000B5C78 ?
C0000000000017B7 ?
40000000044D3F70 ?
00003C9A3 ?
60000000000A4390 ?
kxfpgsg()+4320 call $cold_kxfpg1sg() C0000002EED97058 ?
000000000 ? 000000002 ?
C0000000000026D5 ?
60000000000A59F8 ?
C0000002E61BA758 ?
9FFFFFFFBF3D9990 ?
9FFFFFFFBF3D3D68 ?
kxfrAllocSlaves()+6 call kxfpgsg() 9FFFFFFFFFFF3FE0 ?
40 4000000003DAF660 ?
000000002 ? 000030021 ?
9FFFFFFFFFFF3A50 ?
000000000 ?
C000000000000B9F ?
000000000 ?
kxfrialo()+3680 call kxfrAllocSlaves() 0001A58D9 ?
9FFFFFFFBF3F68F0 ?
000000001 ?
9FFFFFFFFFFF3FF8 ?
000000002 ? 000000002 ?
9FFFFFFFBF39F7F0 ?
9FFFFFFFBF39F7F8 ?
kxfralo()+704 call kxfrialo() 9FFFFFFFFFFF4740 ?
4000000003E2D480 ?
00003832D ?
C0000000000017B7 ?
9FFFFFFFFFFF41C0 ?
9FFFFFFFFFFF4088 ?
0000010E4 ?
60000000000C2630 ?
qerpx_rowsrc_start( call kxfralo() 0001A58D9 ? 000000001 ?
)+2256 000000001 ?
C0000002ACADB728 ?
9FFFFFFFBF3F6998 ?
000000000 ?
C0000002ACADB6E8 ?
000000000 ?
qerpxStart()+1184 call qerpx_rowsrc_start( 9FFFFFFFBF3D5CA0 ?
) 9FFFFFFFFFFF4DD0 ?
60000000000B5C78 ?
9FFFFFFFFFFF53A0 ?
C000000000000CA1 ?
4000000003D39A40 ?
000038327 ? 000000000 ?
qergsStart()+1312 call qerpxStart() C0000002ACADBA70 ?
000000001 ? 000000000 ?
C000000000000DA3 ?
4000000002F7AC20 ?
000000000 ?
9FFFFFFFBF3D3F90 ?
9FFFFFFFBF3D3F8C ?
selexe()+1920 call qergsStart() 4000000001B008C0 ?
000000001 ?
60000000000B5C78 ?
opiexe()+7984 call selexe() C0000002ACB48C38 ?
9FFFFFFFFFFF5720 ?
000004678 ?
60000000000B5C78 ?
C0000000000026D5 ?
4000000002E626B0 ?
000038269 ? 000000000 ?
kpoal8()+3872 call opiexe() 9FFFFFFFFFFF70C0 ?
4000000002A89D80 ?
00001E915 ?
9FFFFFFFFFFF5440 ?
60000000000B5C78 ?
C0000000000012AD ?
60000000000C2638 ?
60000000000314C0 ?
opiodr()+2128 call kpoal8() 9FFFFFFFFFFF77F0 ?
C000000000001530 ?
9FFFFFFFFFFF9ED0 ?
9FFFFFFFFFFF7110 ?
60000000000B5C78 ?
9FFFFFFFBF3D3D90 ?
ttcpip()+1680 call opiodr() 00000005E ? 000000017 ?
4000000001B03730 ?
0000046B0 ?
9FFFFFFFFFFF7800 ?
opitsk()+2336 call ttcpip() 600000000003D0C0 ?
000000001 ?
9FFFFFFFFFFF9ED0 ?
000000001 ?
9FFFFFFFFFFFA040 ?
9FFFFFFFFFFF9E34 ?
4000000001BE9730 ?
000000000 ?
opiino()+1840 call opitsk() 000000000 ? 000000000 ?
60000000000B5C78 ?
40000000027FBFB0 ?
00001804D ?
4000000001B03748 ?
opiodr()+2128 call opiino() 00000003C ?
9FFFFFFFFFFFC890 ?
9FFFFFFFFFFFF030 ?
9FFFFFFFFFFFBD50 ?
60000000000B5C78 ?
C000000000001530 ?
opidrv()+1088 call opiodr() 00000003C ? 000000004 ?
4000000001B031E0 ?
0000046B0 ?
9FFFFFFFFFFFC8A0 ?
60000000000B5C78 ?
sou2o()+336 call opidrv() 00000003C ?
9FFFFFFFFFFFF030 ?
60000000000C2630 ?
opimai_real()+224 call sou2o() 9FFFFFFFFFFFF050 ?
00000003C ? 000000004 ?
9FFFFFFFFFFFF030 ?
main()+368 call opimai_real() 000000000 ?
9FFFFFFFFFFFF080 ?
main_opd_entry()+80 call main() 000000002 ?
9FFFFFFFFFFFF538 ?
60000000000B5C78 ?
C000000000000004 ?
在trace中搜索Plan Table,找到如下信息:
============
Plan Table
============
--------------------------------------------+-----------------------------------+-------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib |
--------------------------------------------+-----------------------------------+-------------------------+
| 0 | SELECT STATEMENT | | | | | | | | |
| 1 | SORT GROUP BY | | | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | | | | |:Q1000| P->S |QC (RANDOM)|
| 4 | VIEW | GV$SESSION| | | | |:Q1000| PCWP | |
| 5 | MERGE JOIN | | | | | |:Q1000| PCWP | |
| 6 | FIXED TABLE FULL | X$KSLED | | | | |:Q1000| PCWP | |
| 7 | SORT JOIN | | | | | |:Q1000| PCWP | |
| 8 | FIXED TABLE FULL | X$KSUSE | | | | |:Q1000| PCWP | |
--------------------------------------------+-----------------------------------+-------------------------+
Content of other_xml column
===========================
db_version : 10.2.0.4
parse_schema : SYS
plan_hash : 303308595
Outline Data:
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('10.2.0.4')
OPT_PARAM('optimizer_index_cost_adj' 50)
RBO_OUTLINE
OUTLINE_LEAF(@"SEL$2")
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$2" "E"@"SEL$2")
FULL(@"SEL$2" "S"@"SEL$2")
LEADING(@"SEL$2" "E"@"SEL$2" "S"@"SEL$2")
USE_MERGE(@"SEL$2" "S"@"SEL$2")
END_OUTLINE_DATA
*/
从执行计划来看,上面报错的sql语句走了并行。执行如下的语句也会报错,
*** 2011-07-03 21:33:13.485
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
Current SQL statement for this session:
select sid from gv$session_wait where event like :"SYS_B_0"
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+64 call ksedst1() 000000000 ? 000000001 ?
ksedmp()+2176 call ksedst() 000000000 ?
C000000000000C9F ?
4000000003FEDB60 ?
000000000 ? 000000000 ?
000000000 ?
ksfdmp()+48 call ksedmp() 000000003 ?
kgerinv()+304 call ksfdmp() C000000000000612 ?
000000003 ?
4000000009441D90 ?
000030223 ? 000000000 ?
000000000 ?
kgeasnmierr()+144 call kgerinv() 6000000000031340 ?
40000000019EF530 ?
60000000000323F8 ?
40000000019EF530 ?
9FFFFFFFFFFF3690 ?
$cold_kgeade()+64 call kgeasnmierr() 6000000000031340 ?
9FFFFFFFBF3B0040 ?
9FFFFFFFBF3B0050 ?
6000000000032770 ?
9FFFFFFFBF45E568 ?
C0000002E61C0BE0 ?
000000000 ? 000000002 ?
kgerev()+96 call $cold_kgeade() 6000000000031340 ?
60000000000314C0 ?
9FFFFFFFBF3B0040 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
000000000 ?
kserec0()+160 call kgerev() 6000000000031340 ?
9FFFFFFFBF3B0040 ?
000000000 ?
6000000000032760 ?
9FFFFFFFFFFF3758 ?
$cold_kxfpg1sg()+35 call kserec0() 000000000 ? 000000002 ?
52 9FFFFFFFFFFF3670 ?
60000000000B5C78 ?
C0000000000017B7 ?
40000000044D3F70 ?
00003C9A3 ?
60000000000A4390 ?
kxfpgsg()+4320 call $cold_kxfpg1sg() C0000002E61C0BE0 ?
000000000 ? 000000002 ?
C0000000000026D5 ?
60000000000A59F8 ?
C0000002E61CCCF0 ?
9FFFFFFFBF3D9990 ?
9FFFFFFFBF3D4A48 ?
kxfrAllocSlaves()+6 call kxfpgsg() 9FFFFFFFFFFF40B0 ?
40 4000000003DAF660 ?
000000002 ? 000030021 ?
9FFFFFFFFFFF3B20 ?
000000000 ?
C000000000000B9F ?
000000000 ?
kxfrialo()+3680 call kxfrAllocSlaves() 0001A50DC ?
9FFFFFFFBF3F3A68 ?
000000001 ?
9FFFFFFFFFFF40C8 ?
000000002 ? 000000002 ?
9FFFFFFFBF39F7F0 ?
9FFFFFFFBF39F7F8 ?
kxfralo()+704 call kxfrialo() 9FFFFFFFFFFF4810 ?
4000000003E2D480 ?
00003832D ?
C0000000000017B7 ?
9FFFFFFFFFFF4290 ?
9FFFFFFFFFFF4158 ?
0000010E4 ?
60000000000C2630 ?
qerpx_rowsrc_start( call kxfralo() 0001A50DC ? 000000001 ?
)+2256 000000001 ?
C0000002CFE9A518 ?
9FFFFFFFBF3F3B10 ?
000000000 ?
C0000002CFE9A4D8 ?
000000000 ?
qerpxStart()+1184 call qerpx_rowsrc_start( 9FFFFFFFBF3D7CE0 ?
) 9FFFFFFFFFFF4EA0 ?
60000000000B5C78 ?
9FFFFFFFFFFF5470 ?
C000000000000CA1 ?
4000000003D39A40 ?
000038329 ? 000000000 ?
selexe()+1920 call qerpxStart() C0000002CFE9AB20 ?
000000001 ?
9FFFFFFFBF3D7A14 ?
opiexe()+7984 call selexe() C0000002CFEC3380 ?
9FFFFFFFFFFF57D0 ?
000004678 ?
60000000000B5C78 ?
C0000000000026D5 ?
4000000002E626B0 ?
000038269 ? 000000000 ?
opiall0()+2992 call opiexe() 9FFFFFFFFFFF7170 ?
4000000003021E60 ?
00001F201 ?
9FFFFFFFFFFF54F0 ?
60000000000B5C78 ?
C000000000001736 ?
60000000000C2638 ?
60000000000314C0 ?
opial7()+928 call opiall0() 9FFFFFFFFFFF7930 ?
4000000002419500 ?
000010207 ?
60000000000A3CE0 ?
000000040 ?
9FFFFFFFFFFF7A50 ?
000000000 ?
C000000000000593 ?
opiodr()+2128 call opial7() C000000000001530 ?
4000000002E34A00 ?
9FFFFFFFFFFF7970 ?
60000000000B5C78 ?
000018287 ?
9FFFFFFFFFFFA1C0 ?
ttcpip()+1680 call opiodr() 000000047 ? 00000000F ?
4000000001B03398 ?
0000046B0 ?
9FFFFFFFFFFF7AB0 ?
opitsk()+2336 call ttcpip() 600000000003D0C0 ?
000000001 ?
9FFFFFFFFFFFA180 ?
000000001 ?
9FFFFFFFFFFFA2F0 ?
9FFFFFFFFFFFA0E4 ?
4000000001BE9730 ?
000000000 ?
opiino()+1840 call opitsk() 000000000 ? 000000000 ?
60000000000B5C78 ?
40000000027FBFB0 ?
0000180CD ?
4000000001B033B0 ?
opiodr()+2128 call opiino() 00000003C ?
9FFFFFFFFFFFCB40 ?
9FFFFFFFFFFFF2E0 ?
9FFFFFFFFFFFC000 ?
60000000000B5C78 ?
C000000000001530 ?
opidrv()+1088 call opiodr() 00000003C ? 000000004 ?
4000000001B031E0 ?
0000046B0 ?
9FFFFFFFFFFFCB50 ?
60000000000B5C78 ?
sou2o()+336 call opidrv() 00000003C ?
9FFFFFFFFFFFF2E0 ?
60000000000C2630 ?
opimai_real()+224 call sou2o() 9FFFFFFFFFFFF300 ?
00000003C ? 000000004 ?
9FFFFFFFFFFFF2E0 ?
main()+368 call opimai_real() 000000000 ?
9FFFFFFFFFFFF330 ?
main_opd_entry()+80 call main() 000000002 ?
9FFFFFFFFFFFF7E0 ?
60000000000B5C78 ?
C000000000000004 ?
对于的执行计划如下:
============
Plan Table
============
------------------------------------------------+-----------------------------------+-------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib |
------------------------------------------------+-----------------------------------+-------------------------+
| 0 | SELECT STATEMENT | | | | | | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | | | | |:Q1000| P->S |QC (RANDOM)|
| 3 | VIEW | GV$SESSION_WAIT| | | | |:Q1000| PCWP | |
| 4 | MERGE JOIN | | | | | |:Q1000| PCWP | |
| 5 | FIXED TABLE FULL | X$KSLED | | | | |:Q1000| PCWP | |
| 6 | SORT JOIN | | | | | |:Q1000| PCWP | |
| 7 | FIXED TABLE FULL | X$KSUSECST | | | | |:Q1000| PCWP | |
------------------------------------------------+-----------------------------------+-------------------------+
Content of other_xml column
===========================
db_version : 10.2.0.4
parse_schema : ITNMAGT
plan_hash : 3419475036
Outline Data:
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('10.2.0.4')
OPT_PARAM('optimizer_index_cost_adj' 50)
RBO_OUTLINE
OUTLINE_LEAF(@"SEL$3")
OUTLINE_LEAF(@"SEL$F5BB74E1")
MERGE(@"SEL$2")
OUTLINE(@"SEL$1")
OUTLINE(@"SEL$2")
FULL(@"SEL$3" "E"@"SEL$3")
FULL(@"SEL$3" "S"@"SEL$3")
LEADING(@"SEL$3" "E"@"SEL$3" "S"@"SEL$3")
USE_MERGE(@"SEL$3" "S"@"SEL$3")
END_OUTLINE_DATA
*/
2次的call stakc完全一样,这样来看,似乎是查询gv$视图就好出现这个600错误,怪哉。
查metalink,看了差不多10多篇文章,都分别对比了一下,发现如下2个比较靠谱:
Bug 8339221: ORA-600 [KGEADE_IS_0], [], [],
Bug 7504296: ORA-600 [KGEADE_IS_0] ERRORS IF BOTH INSTANCES ARE UP
特别的下面这个bug,基本上call stack都完全符合。
另外在一篇文章中发现了可能还跟参数parallel_execution_message_size 有关系,解释如下:
Symptoms
Query against gv$ views on RAC system like
select INST_ID, NAME, VALUE
from gv$parameter
where NAME ='parallel_threads_per_cpu';
.
Would fail with
.
ERROR at line 1:
ORA-12850: Could not allocate slaves on all specified instances: 2 needed, 1
allocated
ORA-12801: error signaled in parallel query server PZ99, instance xxxx
ORA-00000: normal, successful completion
or instead of the ORA-0 with a ORA-600 [Kgeade_is_0]
There was a internal bug fix that a ORA-0 should be reported as ORA-600 [Kgeade_is_0].
And this was merged into the 10.2.0.4 patchset.
Cause
Often the cause is a different size of parallel_execution_message_size. If parallel_execution_message_size is not the same
on all nodes in a cluster, a parallel query can not spawn on all nodes parallel execution servers what is need for a query
against a gv$ view.
A different size of parallel_execution_message_size as example can be caused by different setting of parallel_automatic_tuning.
This parameter implicit change the default of parallel_execution_message_size.
Solution
To fix the you can explicit set parallel_execution_message_size to the same value cluster-wide
恰恰这套rac的2个节点目前该参数的值不一样,节点1是2152,节点2是16384,(也就是默认值)。
不过根据经验,不见得一定是这个参数导致,根据同事的描述,该库以前是正常的,而且以前该值也是
出现问题以后才调整过的,所以这个参数的可能性不大。
总结一下,我非常怀疑是后面的那个bug,不过这一切还需要进一步验证。
补充:同事最后调整这个参数以后,测试ok了,难道真是这个参数?
发表回复