cursor_sharing+outline的问题

昨天同事遇到一个有些怪异的问题,使用了outlines,但是SQLID执行仍然出现执行计划错误的情况,
导致大量的read by other session等待,严重影响业务,最后通过revoke和grant来解决了。
下面的问题就是如何去找到当时的问题所在?换句话说,当时的那个sql语句为什么不走index scan而
选择低效的full table scan呢?

错误的执行计划如下:
Plan hash value: 1414349908

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                        |       |       |       |   100K(100)|          |       |       |
|   1 |  SORT ORDER BY                   |                        |   157K|    16M|    35M|   100K  (2)| 00:20:03 |       |       |
|   2 |   HASH UNIQUE                    |                        |   157K|    16M|    35M| 96389   (2)| 00:19:17 |       |       |
|   3 |    HASH JOIN                     |                        |   157K|    16M|       | 92578   (2)| 00:18:31 |       |       |
|   4 |     TABLE ACCESS FULL            | ACCT_ITEM_TYPE         |  2817 | 73242 |       |    15   (0)| 00:00:01 |       |       |
|   5 |     HASH JOIN                    |                        |   157K|    12M|       | 92561   (2)| 00:18:31 |       |       |
|   6 |      TABLE ACCESS FULL           | BILLING_CYCLE          |   156 |  3900 |       |    15   (0)| 00:00:01 |       |       |
|   7 |      HASH JOIN                   |                        |   158K|  8647K|       | 92544   (2)| 00:18:31 |       |       |
|   8 |       TABLE ACCESS BY INDEX ROWID| A_PAY_OWE_STATE        |     3 |    21 |       |     2   (0)| 00:00:01 |       |       |
|   9 |        INDEX RANGE SCAN          | INDX_PAYMENT_METHOD_01 |     3 |       |       |     1   (0)| 00:00:01 |       |       |
|  10 |       PARTITION HASH SINGLE      |                        |   368K|    17M|       | 92538   (2)| 00:18:31 |   KEY |   KEY |
|  11 |        TABLE ACCESS FULL         | ACCT_ITEM_OWE          |   368K|    17M|       | 92538   (2)| 00:18:31 |   KEY |   KEY |
-----------------------------------------------------------------------------------------------------------------------------------
下面是正确的执行计划:
Plan hash value: 2474252999

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                        |       |       |   198 (100)|          |       |       |
|   1 |  SORT ORDER BY                           |                        |    52 |  5304 |   198   (3)| 00:00:03 |       |       |
|   2 |   HASH UNIQUE                            |                        |    52 |  5304 |   197   (2)| 00:00:03 |       |       |
|   3 |    HASH JOIN                             |                        |    52 |  5304 |   196   (2)| 00:00:03 |       |       |
|   4 |     HASH JOIN                            |                        |    52 |  4004 |   181   (2)| 00:00:03 |       |       |
|   5 |      TABLE ACCESS BY INDEX ROWID         | A_PAY_OWE_STATE        |     3 |    21 |     2   (0)| 00:00:01 |       |       |
|   6 |       INDEX RANGE SCAN                   | INDX_PAYMENT_METHOD_01 |     3 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      MERGE JOIN                          |                        |   121 |  8470 |   178   (1)| 00:00:03 |       |       |
|   8 |       TABLE ACCESS BY INDEX ROWID        | BILLING_CYCLE          |    70 |  1470 |     4   (0)| 00:00:01 |       |       |
|   9 |        INDEX FULL SCAN                   | PK_BILLING_CYCLE       |   157 |       |     1   (0)| 00:00:01 |       |       |
|  10 |       SORT JOIN                          |                        |   132 |  6468 |   174   (1)| 00:00:03 |       |       |
|  11 |        TABLE ACCESS BY GLOBAL INDEX ROWID| ACCT_ITEM_OWE          |   132 |  6468 |   173   (0)| 00:00:03 | ROW L | ROW L |
|  12 |         INDEX RANGE SCAN                 | IDX_ACCT_ITEM_ACCT_ID  |   132 |       |     9   (0)| 00:00:01 |       |       |
|  13 |     TABLE ACCESS FULL                    | ACCT_ITEM_TYPE         |  2845 | 71125 |    15   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------------------
我们可以发现,区别就在是否使用了index IDX_ACCT_ITEM_ACCT_ID,进而导致整个执行计划差别非常之大,
还可以看到正常的情况下使用了merge join。

处理问题的思路是这样的:
1. 首先查看outline是否正常使用

SQL> select owner,name,used,version,ENABLED from DBA_OUTLINES
  2  where owner='ACCT';

OWNER     NAME                           USED VERSION       ENABLED
--------- ------------------------------ ------------------- --------
ACCT      SYS_OUTLINE_11091615175537003  USED 10.2.0.4.0    ENABLED

-- outline正常。
2. 检查outline定义是否包含index hint IDX_ACCT_ITEM_ACCT_ID.

SQL> select ol_name, hint#, hint_text
  2  from OUTLN.ol$hints
  3  order by ol_name, hint#;

OL_NAME                        HINT# HINT_TEXT
------------------------------ ----- ------------------------------------------------------------------------------------------
SYS_OUTLINE_11091615175537003      1 USE_HASH(@"SEL$5DA710D3" "D"@"SEL$1")
SYS_OUTLINE_11091615175537003      2 USE_HASH(@"SEL$5DA710D3" "A_PAY_OWE_STATE"@"SEL$2")
SYS_OUTLINE_11091615175537003      3 USE_MERGE(@"SEL$5DA710D3" "A"@"SEL$1")
SYS_OUTLINE_11091615175537003      4 LEADING(@"SEL$5DA710D3" "H"@"SEL$1" "A"@"SEL$1" "A_PAY_OWE_STATE"@"SEL$2" "D"@"SEL$1")
SYS_OUTLINE_11091615175537003      5 FULL(@"SEL$5DA710D3" "D"@"SEL$1")
SYS_OUTLINE_11091615175537003      6 INDEX_RS_ASC(@"SEL$5DA710D3" "A_PAY_OWE_STATE"@"SEL$2" ("A_PAY_OWE_STATE"."PAYMENT_METHOD"))
SYS_OUTLINE_11091615175537003      7 INDEX_RS_ASC(@"SEL$5DA710D3" "A"@"SEL$1" ("ACCT_ITEM_OWE"."ACCT_ID"))
SYS_OUTLINE_11091615175537003      8 INDEX(@"SEL$5DA710D3" "H"@"SEL$1" ("BILLING_CYCLE"."BILLING_CYCLE_ID"))
SYS_OUTLINE_11091615175537003      9 OUTLINE(@"SEL$2")
SYS_OUTLINE_11091615175537003     10 OUTLINE(@"SEL$1")
SYS_OUTLINE_11091615175537003     11 UNNEST(@"SEL$2")
SYS_OUTLINE_11091615175537003     12 OUTLINE_LEAF(@"SEL$5DA710D3")
SYS_OUTLINE_11091615175537003     13 ALL_ROWS
SYS_OUTLINE_11091615175537003     14 OPTIMIZER_FEATURES_ENABLE('10.2.0.4')
SYS_OUTLINE_11091615175537003     15 IGNORE_OPTIM_EMBEDDED_HINTS

15 rows selected.

从INDEX_RS_ASC(@"SEL$5DA710D3" "A"@"SEL$1" ("ACCT_ITEM_OWE"."ACCT_ID"))看,outline是有的。
3. 检查相关的对象在故障期间之前是否做过相应的ddl操作

SQL> select OWNER,OBJECT_NAME,SUBOBJECT_NAME,LAST_DDL_TIME,STATUS
  2  from dba_objects
  3  where OBJECT_NAME='ACCT_ITEM_OWE';

OWNER    OBJECT_NAME      SUBOBJECT_NAME         LAST_DDL_TIM STATUS
-------- ---------------- ---------------------- ------------ -------
COMM     ACCT_ITEM_OWE                           02-MAR-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_0      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_1      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_2      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_3      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_4      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_5      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_6      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_7      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_8      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE    P_ACCT_ITEM_OWE_9      28-FEB-09    VALID
ACCT     ACCT_ITEM_OWE                           19-SEP-11    VALID

12 rows selected.

SQL> select OWNER,OBJECT_NAME,SUBOBJECT_NAME,LAST_DDL_TIME,STATUS from
  2  dba_objects where OBJECT_NAME='IDX_ACCT_ITEM_ACCT_ID';

OWNER       OBJECT_NAME             SUBOBJECT_NAME    LAST_DDL_TIM STATUS
----------- ----------------------- ----------------- ------------ -------
ACCT        IDX_ACCT_ITEM_ACCT_ID                     28-FEB-09    VALID

检查发现, 9月18号并无DDL操作。
4. 查询该SQLID是否存在多个child number或存在多个执行计划
   昨天下午查询v$sql已经找不到那个SQLID了,通过如下查询发现:

SQL> select sql_id,CHILD_NUMBER,SQL_TYPE_MISMATCH,OPTIMIZER_MISMATCH,OUTLINE_MISMATCH,BIND_MISMATCH,STATS_ROW_MISMATCH
  2  from V$SQL_SHARED_CURSOR
  3  where SQL_ID='&id';

Enter value for id: 4u9y643v2k8fw
old   2: from V$SQL_SHARED_CURSOR where SQL_ID='&id'
new   2: from V$SQL_SHARED_CURSOR where SQL_ID='4u9y643v2k8fw'

SQL_ID        CHILD_NUMBER S O O B S
------------- ------------ - - - - -
4u9y643v2k8fw            0 N N N N N
4u9y643v2k8fw            1 N N Y N N
4u9y643v2k8fw            2 N N N N N
4u9y643v2k8fw            3 N N Y N N
4u9y643v2k8fw            4 N N Y N N
4u9y643v2k8fw            5 N N Y N N
4u9y643v2k8fw            6 N N N N N
4u9y643v2k8fw            7 N N N N N
4u9y643v2k8fw            8 N N N N N
4u9y643v2k8fw            9 N N N N N
4u9y643v2k8fw           10 N N N N N
4u9y643v2k8fw           11 N N N N N
4u9y643v2k8fw           12 N N N N N
4u9y643v2k8fw           13 N N N N N
4u9y643v2k8fw           14 N N N N N
4u9y643v2k8fw           15 N N N N N
4u9y643v2k8fw           16 N N N N N
4u9y643v2k8fw           17 N N N N N
4u9y643v2k8fw           18 N N N N N
4u9y643v2k8fw           19 N N N N N
4u9y643v2k8fw           20 N N N N N

21 rows selected.

备注:这里的第2个O是指OUTLINE_MISMATCH。

可以发现该sqlid出现了多个child cursor,另外检查发现参数cursor_sharing设置为simlair了。
该sqlid sql version肯定也较高。

这里有一点让我不解的是,通过DBA_HIST_*相关视图查询,发现当时出问题的时候的sqlid的child number
跟V$SQL_SHARED_CURSOR并不相同,当然这也不能完全说明问题,很可能sql已经被flush out多次了。

通过查hist视图,可以发现当时故障期间出现大量的read by other session等待:

  EVENT                    COUNT(*) SESSION_ID SESSION SQL_ID        SQL_CHILD_NUMBER SQL_PLAN_HASH_VALUE FORCE_MATCHING_SIGNATURE TIME_WAITED
----------------------- --------- ---------- ------- ------------- ---------------- ------------------- ------------------------ -----------
db file scattered read          1       1382 WAITING 4u9y643v2k8fw               18          1414349908                        0        4502
read by other session           1       1398 WAITING 4u9y643v2k8fw               18          1414349908                        0        8099
read by other session           1       1404 WAITING 4u9y643v2k8fw               18          1414349908                        0          89
db file scattered read          1       1424 WAITING 4u9y643v2k8fw               18          1414349908                        0        8483
read by other session           1       1633 WAITING 4u9y643v2k8fw               18          1414349908                        0       10470
read by other session           1       2081 WAITING 4u9y643v2k8fw               18          1414349908               4.3893E+18        2047
read by other session           1       2239 WAITING 4u9y643v2k8fw               18          1414349908                        0        5403
read by other session           1        685 WAITING 4u9y643v2k8fw               18          1414349908                        0       11577
read by other session           1       1094 WAITING 4u9y643v2k8fw               18          1414349908                        0        6340
read by other session           1       1168 WAITING 4u9y643v2k8fw               18          1414349908               4.3893E+18        1427
read by other session           1       1304 WAITING 4u9y643v2k8fw               18          1414349908                        0       12656
5. 查看sqlid是否使用了outline

SQL> select SQL_ID,CHILD_NUMBER
  2  from v$sql_plan
  3  where sql_id='4u9y643v2k8fw';

SQL_ID        CHILD_NUMBER
------------- ------------
4u9y643v2k8fw           12
4u9y643v2k8fw           12
.....
4u9y643v2k8fw           12
4u9y643v2k8fw            2
4u9y643v2k8fw            2
4u9y643v2k8fw            2
......
4u9y643v2k8fw            2
4u9y643v2k8fw            2

28 rows selected.

下午查询时只有number为2 12

SQL> SELECT info outline_name
  2    FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
  3           FROM v$sql_plan p,
  4                TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
  5                                             '/other_xml/info'
  6                                            )
  7                                   )
  8                      ) d
  9          WHERE other_xml IS NOT NULL
  10           AND ID = 1
  11           AND sql_id = '4u9y643v2k8fw'
  12           AND child_number = 3)
  13 WHERE r = 4;

no rows selected

SQL> SELECT info outline_name
  2    FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
  3            FROM v$sql_plan p,
  4                 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
  5                                              '/other_xml/info'
  6                                             )
  7                                    )
  8                       ) d
  9           WHERE other_xml IS NOT NULL
  10            AND ID = 1
  11            AND sql_id = '4u9y643v2k8fw'
  12            AND child_number = 4)
  13  WHERE r = 4;

no rows selected

SQL> SELECT info outline_name
  2    FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
  3            FROM v$sql_plan p,
  4                 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
  5                                              '/other_xml/info'
  6                                             )
  7                                    )
  8                       ) d
  9           WHERE other_xml IS NOT NULL
  10            AND ID = 1
  11            AND sql_id = '4u9y643v2k8fw'
  12            AND child_number = 5)
  13  WHERE r = 4;

no rows selected

SQL> SELECT info outline_name
  2    FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
  3            FROM v$sql_plan p,
  4                 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
  5                                              '/other_xml/info'
  6                                             )
  7                                    )
  8                       ) d
  9           WHERE other_xml IS NOT NULL
  10            AND ID = 1
  11            AND sql_id = '4u9y643v2k8fw'
  12            AND child_number = 2)
  13  WHERE r = 4;

OUTLINE_NAME
--------------------------------------------------------------------------------
"SYS_OUTLINE_11091615175537003"

SQL> SELECT info outline_name
  2    FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
  3            FROM v$sql_plan p,
  4                 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
  5                                              '/other_xml/info'
  6                                             )
  7                                    )
  8                       ) d
  9           WHERE other_xml IS NOT NULL
  10            AND ID = 1
  11            AND sql_id = '4u9y643v2k8fw'
  12            AND child_number = 12)
  13  WHERE r = 4;

OUTLINE_NAME
--------------------------------------------------------------------------------
"SYS_OUTLINE_11091615175537003"

可以看到,该sql的当前的2个child cursor都是使用了outline的。
但是这说明不了问题,既然该sql存在多个child cursor,那么必然存在多个plan hash value值。
这里我们需要去查询那个出问题的sqlid对应的plan_hash_value所对应的执行计划。
虽然说是有awr快照,但是通过这个方式已经查不到信息了。

select * from table(dbms_xplan.display_awr('&sql_id','&child_number',null,'advanced'));

我这里使用的如下的查询方式:

SQL> set long 1000000
SQL> set linesize 200

SQL> SELECT plan_hash_value, XMLTYPE (other_xml).EXTRACT ('/*')
  2    FROM dba_hist_sql_plan
  3   WHERE sql_id = '4u9y643v2k8fw'
  4     AND plan_hash_value IN (1414349908)
  5     AND DBMS_LOB.SUBSTR (other_xml, 1000) LIKE '%
  10.2.0.4
  
  1414349908
  
    c10c
    c103
    313042

PLAN_HASH_VALUE
---------------
XMLTYPE(OTHER_XML).EXTRACT('/*')
--------------------------------------------------------------------------------------------------------------------------------
    313044
    c6481504081c43
  
  
    
    
    
    
    

PLAN_HASH_VALUE
---------------
XMLTYPE(OTHER_XML).EXTRACT('/*')
--------------------------------------------------------------------------------------------------------------------------------
    
    
    
    
    
    
    
    
    

PLAN_HASH_VALUE
---------------
XMLTYPE(OTHER_XML).EXTRACT('/*')
--------------------------------------------------------------------------------------------------------------------------------
    
    
    
  


从上面的查询来看,当时是使用了outline,但是问题出在这里:
    
    
    
也就是说正常情况下,这3条当中至少有1条是ACCT_ID的index rang scan才对。

今天上午,和同事沟通,说又出问题了,同事将参数cursor_sharing调为exact了,即默认值。
虽然这样会导致很多sql语句的硬解析,但是听他说上午调整以后到11点时,还没出现问题。
跟itpub版主棉花糖ONE交流了一下,他认为很可能是outline和cursor_sharing的bug。

但是我仍然觉得cursor_sharing可能不是真正的原因,因为ACCT_ID列是应用绑定变量列,
并不是通过参数cursor_sharing的设置来实现绑定变量的。

最后总结一下:

9i以后都不推荐再用outline了,感觉不太稳定,推荐使用sql profile来固定执行计划。
至于最后的结论到底是outline的bug还是cursor_sharing=similar的bug,现在还确定不了,只是可能性非常大。

评论

  1. CrazyTony 的头像
    CrazyTony

    精彩!Roger的BLOG越来越好了。

  2. 第二个执行计划有很大问题

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注