案例:Troubleshooting library cache lock(SQL AREA BUILD)

本案例来自南区某客户,数据库版本为rac 19c。故障现象为2节点大约凌晨2点开始,堆积了大量等待library cache lock和library cache:mutex X的session。一直持续了几分钟自动恢复正常。

Snap Id Snap Time Sessions Cursors/Session Instances
Begin Snap: 4250 18-Oct-22 02:00:20 1101 12.5 2
End Snap: 4251 18-Oct-22 03:00:13 1106 12.0 2
Elapsed: 59.89 (mins)
DB Time: 14,394.56 (mins)

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Avg Wait % DB time Wait Class
library cache lock 1,931,334 569.8K 295.05ms 66.0 Concurrency
library cache: mutex X 3,686,589 205.1K 55.63ms 23.7 Concurrency
DB CPU 36K 4.2

从awr的一些基本信息来看,确实lc lock和lc mutex x等待非常严重。通常关于mutex类的争用,可以去看看awr中Mutex Sleep Summary部分。

Mutex Sleep Summary

  • ordered by number of sleeps desc
Mutex Type Location Sleeps Wait Time (ms)
Library Cache kgllkal2 81 18,969,541 64,710,767
Library Cache kgllkdl1 85 5,079,086 26,691,704
Library Cache kglhdgn2 106 1,383,344 6,754,413
  • kgllkdl – kernel generic library cache management library cache lock delete
  • kgllkal – kernel generic library cache management library cache lock allocate
  • kglhdgn – kernel generic library cache management object handle get handle, if it doesn’t exist, create it

根据sleep的情况来看,mutex的争用主要来自kgllkal、kgllkdl、kglhdgn,分别对应lc lock的分配和释放。而kglhdgn对应的是library cache handle。

我们知道在父游标句柄中保存有SQL文本。搜索hash链表的目的,就是对比每个父游标句柄中的SQL文本,找到目标父游标句柄。找到之后,要再次以独占方式申请持有类型为“Library Cache”的Mutex,如果遇到竞争,此处的等待事件也是library cache: mutex X,为了加以区分,此处的Mutex Miss通常是“kglhdgn2 106”。
在此处的Mutex保护下,进程将进一步获得父游标句柄上的Library cache lock。Library cache lock获得成功,Mutex才会释放。这也是为什么library cache lock和library cache:mutex X会在top event同时出现的原因。

lc handle上的Mutex,代替的是以前版本的Library Cache Lock Latch。由于要依靠Library Cache Lock实现依赖链,Mutex并没有取代Library Cache Lock,只是把Library Cache Lock Latch替代了。

分析到这里,可以判断问题的根源是library cache lock。此前多篇lc lock的文章都提到过lc lock的分析主要是分析P3参数。通过awr中的ash信息可以查看。

Top Event P1/P2/P3 Values

  • Top Events by DB Time and the top P1/P2/P3 values for those events.
  • % Event is the percentage of DB Time due to the event
  • % Activity is the percentage of DB Time due to the event with the given P1,P2,P3 Values.
Event % Event P1, P2, P3 Values % Activity Parameter 1 Parameter 2 Parameter 3
library cache lock 69.38 “215109965320”,”208720642768″,”5373954″ 0.05 handle address lock address 100*mode+namespace
library cache: mutex X 25.23 “1370341983”,”4514010628096″,”5374033″ 1.54 idn value where

对P3进行换算之后可以查出对应的namespace和请求mode

SQL> select to_char('5373954','xxxxxxxxxxxxxxxxxxxxx') from dual;

TO_CHAR('5373954','XXX
----------------------
                520002

SQL> select KGLSTDSC from x$kglst where INDX=(select to_number('52','xxxxxxxxxxxxxxxxx') from dual);

KGLSTDSC
----------------------------------------------------------------
SQL AREA BUILD

lc lock等待是发生在SQL AREA BUILD的handle上的,并且请求模式是S,说明当时肯定有session持有X的lc lock或者有X的lc lock请求。

首先说说SQL AREA BUILD,该namespace的lc lock只会发生在硬解析时生成子游标中,Oracle 在11.2 版本引入了Cursor Build Lock 机制,这一机制使得在某个父游标下创建子游标的工作串行化。当获取Build Lock 时,需要持有对应SQL AREA BUILD的Library Cache Lock。

通过_kgl_debug跟踪发现,当产生子游标时,在SQL AREA BUILD上,kgllkal的lockMode为S,而kgllkdl的lockMode为X。其中name部分LXY.$BUILD$.8af744b82f4cff89中的8af744b82f4cff89为sql的full hash value。

<KGLTRACE>
  <Timestamp>2022-10-31 22:30:00.732</Timestamp>
  <SID>31</SID>
  <Function>kgllkal</Function>
  <Reason>TRACELOCK</Reason>
  <Param1>0x6bf307f8</Param1>
  <Param2>0</Param2>
  <LibraryHandle>
    <Address>0x6125cd30</Address>
    <Hash>c5b6f396</Hash>
    <LockMode>S</LockMode>
    <PinMode>0</PinMode>
    <LoadLockMode>0</LoadLockMode>
    <Status>0</Status>
    <Subpool>1</Subpool>
    <ObjectName>
      <Name>LXY.$BUILD$.8af744b82f4cff89  </Name>
      <FullHashValue>0db08ca1c94cf646b31eb140c5b6f396</FullHashValue>
      <Namespace>SQL AREA BUILD(82)</Namespace>
      <Type>CURSOR(00)</Type>
      <ContainerId>3</ContainerId>
      <ContainerUid>2360028483</ContainerUid>
      <Identifier>0</Identifier>
      <OwnerIdn>0</OwnerIdn>
    </ObjectName>
  </LibraryHandle>
  <LibraryObjectLock>
    <Address>0x6bf307f8</Address>
    <Handle>0x6125cd30</Handle>
    <Mode>S</Mode>
  </LibraryObjectLock>
</KGLTRACE>

<KGLTRACE>
  <Timestamp>2022-10-31 22:30:00.780</Timestamp>
  <SID>31</SID>
  <Function>kgllkdl</Function>
  <Reason>TRACELOCK</Reason>
  <Param1>0x6bf307f8</Param1>
  <Param2>1</Param2>
  <LibraryHandle>
    <Address>0x6125cd30</Address>
    <Hash>c5b6f396</Hash>
    <LockMode>X</LockMode>
    <PinMode>0</PinMode>
    <LoadLockMode>0</LoadLockMode>
    <Status>0</Status>
    <Subpool>1</Subpool>
    <ObjectName>
      <Name>LXY.$BUILD$.8af744b82f4cff89  </Name>
      <FullHashValue>0db08ca1c94cf646b31eb140c5b6f396</FullHashValue>
      <Namespace>SQL AREA BUILD(82)</Namespace>
      <Type>CURSOR(00)</Type>
      <ContainerId>3</ContainerId>
      <ContainerUid>2360028483</ContainerUid>
      <Identifier>0</Identifier>
      <OwnerIdn>0</OwnerIdn>
    </ObjectName>
  </LibraryHandle>
  <LibraryObjectLock>
    <Address>0x6bf307f8</Address>
    <Handle>0x6125cd30</Handle>
    <Mode>X</Mode>
  </LibraryObjectLock>
</KGLTRACE>

那么可以判断如果发生了SQL AREA BUILD上的lc lock争用,那么当时肯定有新的子游标产生。剩下的问题就是分析为何会产生该子游标了。

SQL> select child_number,plan_hash_value,FIRST_LOAD_TIME,last_load_time,IS_BIND_SENSITIVE,is_bind_aware,is_shareable from gv$sql where inst_id=2 and sql_id='a2v3du5a69074' ;

CHILD_NUMBER PLAN_HASH_VALUE FIRST_LOAD_TIME                          LAST_LOAD_TIME                           IS IS IS
------------ --------------- ---------------------------------------- ---------------------------------------- -- -- --
           5      3403358360 2022-06-03/18:52:40                      2022-10-19/04:35:49                      Y  N  Y
           8      3403358360 2022-06-03/18:52:40                      2022-10-18/02:05:41                      Y  N  Y
          25      3403358360 2022-06-03/18:52:40                      2022-10-19/02:22:05                      Y  N  Y
          26      3403358360 2022-06-03/18:52:40                      2022-10-19/03:30:07                      Y  N  Y
          27      3403358360 2022-06-03/18:52:40                      2022-10-19/03:39:51                      Y  N  Y
          28      3403358360 2022-06-03/18:52:40                      2022-10-19/05:00:52                      Y  N  Y
          32      3403358360 2022-06-03/18:52:40                      2022-10-09/09:19:23                      Y  N  Y

7 rows selected.

可以看到故障前实例中存在的子游标是32,而子游标8正是故障时段产生的。故障是2点开始,而该子游标2点05分才被load成功。说明花了5分钟才生成该子游标。

从sql的执行情况来看,该sql本身就是并发非常高的sql。对于产生新子游标这种事情出问题的概率就非常高了。

+------------------------------------------------------------------------+
| information from v$sql                 |
+------------------------------------------------------------------------+
PLAN  CHI USER        CPU(MS)  ELA(MS)     DISK          GET        ROWS      ROWS APPLI(MS) CONCUR(MS) CLUSTER(MS) USER_IO(MS) FIRST_LOAD_TIME
EXEC          HASH VALUE  NUM NAME       PRE EXEC PRE EXEC PRE EXEC     PRE EXEC    PRE EXEC PRE FETCH  PER EXEC   PER EXEC    PER EXEC    PER EXEC LAST_LOAD_TIME
---------- ------------- ---- ---------- -------- -------- -------- ------------ ----------- --------- --------- ---------- ----------- ----------- ----------------------
4192782       3403358360    5 APHMES            0        0        0            4           1         1         0          0           0           0 06-03/18:5.10-19/04:3
1989093       3403358360    8 APHMES            0       24        0            4           1         1         0         23           0           0 06-03/18:5.10-18/02:0
19208800      3403358360   25 APHMES            0        0        0            4           1         1         0          0           0           0 06-03/18:5.10-19/02:2
23192453      3403358360   26 APHMES            0        0        0            4           1         1         0          0           0           0 06-03/18:5.10-19/03:3
30173         3403358360   27 APHMES            0        0        0            5           1         1         0          0           0           0 06-03/18:5.10-19/03:3
1440766       3403358360   28 APHMES            0        0        0            4           1         1         0          0           0           0 06-03/18:5.10-19/05:0
24842454      3403358360   32 APHMES            0        0        0            4           1         1         0          0           0           0 06-03/18:5.10-09/09:1
7 rows selected.
+------------------------------------------------------------------------+
| information from awr   sysdate-7                                       |
+------------------------------------------------------------------------+
PLAN                     GET     DISK    WRITE              ROWS      ROWS USER_IO(MS)  ELA(MS)  CPU(MS) CLUSTER(MS)    PLSQL
END_TI I NAME               HASH VALUE EXEC           PRE EXEC PRE EXEC PER EXEC ROW_P    PRE EXEC PRE FETCH    PER EXEC PRE EXEC PRE EXEC    PER EXEC PER EXEC
------ - --------------- ------------- ---------- ------------ -------- -------- ----- ----------- --------- ----------- -------- -------- ----------- --------
14 23  2 APHMES             3403358360 250W                  4        0        0 250W            0         1           0       34       24           0        0
15 00  2 APHMES             3403358360 265W                  4        0        0 265W            0         1           0       34       24           0        0
15 01  2 APHMES             3403358360 244W                  4        0        0 244W            0         1           0       34       24           0        0
15 02  2 APHMES             3403358360 239W                  4        0        0 239W            0         1           0       33       23           0        0
15 03  2 APHMES             3403358360 256W                  4        0        0 256W            0         1           0       33       24           0        0
15 04  2 APHMES             3403358360 253W                  4        0        0 253W            0         1           0       34       24           0        0
15 05  2 APHMES             3403358360 248W                  4        0        0 248W            0         1           0       33       24           0        0
15 06  2 APHMES             3403358360 223W                  4        0        0 223W            0         1           0       33       24           0        0
15 07  2 APHMES             3403358360 111W                  4        0        0 111W            0         1           0       35       25           0        0
15 08  2 APHMES             3403358360 72.W                  4        0        0 72.W            0         1           0       35       25           0        0
15 09  2 APHMES             3403358360 263W                  4        0        0 263W            0         1           0       34       24           0        0
15 10  2 APHMES             3403358360 310W                  4        0        0 310W            0         1           0       34       25           0        0
15 11  2 APHMES             3403358360 307W                  4        0        0 307W            0         1           0       34       24           0        0
15 12  2 APHMES             3403358360 292W                  4        0        0 292W            0         1           0       34       24           0        0
15 13  2 APHMES             3403358360 265W                  4        0        0 265W            0         1           0       32       23           0        0
15 14  2 APHMES             3403358360 274W                  4        0        0 274W            0         1           0       33       25           0        0
15 15  2 APHMES             3403358360 307W                  4        0        0 307W            0         1           0       34       26           0        0
15 16  2 APHMES             3403358360 307W                  4        0        0 307W            0         1           0       35       26           0        0
15 17  2 APHMES             3403358360 303W                  4        0        0 303W            1         1           0       35       26           0        0
15 18  2 APHMES             3403358360 276W                  4        0        0 276W            0         1           0       34       26           0        0
15 19  2 APHMES             3403358360 202W                  4        0        0 202W            0         1           0       35       26           0        0
15 20  2 APHMES             3403358360 118W                  4        0        0 118W            0         1           0       34       26           0        0
15 21  2 APHMES             3403358360 239W                  4        0        0 239W            0         1           0       33       25           0        0
15 22  2 APHMES             3403358360 271W                  4        0        0 271W            1         1           0       33       25           0        0
15 23  2 APHMES             3403358360 273W                  4        0        0 273W            0         1           0       34       25           0        0
16 00  2 APHMES             3403358360 277W                  4        0        0 277W            0         1           0       34       25           0        0
16 01  2 APHMES             3403358360 260W                  4        0        0 260W            0         1           0       33       25           0        0
16 02  2 APHMES             3403358360 259W                  4        0        0 259W            0         1           0       33       25           0        0
16 03  2 APHMES             3403358360 265W                  4        0        0 265W            0         1           0       33       25           0        0
16 04  2 APHMES             3403358360 266W                  4        0        0 266W            0         1           0       34       25           0        0
16 05  2 APHMES             3403358360 268W                  4        0        0 268W            0         1           0       34       25           0        0
16 06  2 APHMES             3403358360 198W                  4        0        0 198W            0         1           0       33       25           0        0
16 07  2 APHMES             3403358360 127W                  4        0        0 127W            0         1           0       35       26           0        0
16 08  2 APHMES             3403358360 82.W                  4        0        0 82.W            0         1           0       34       26           0        0
16 09  2 APHMES             3403358360 279W                  4        0        0 279W            0         1           0       34       25           0        0
16 10  2 APHMES             3403358360 299W                  4        0        0 299W            0         1           0       34       26           0        0
16 11  2 APHMES             3403358360 312W                  4        0        0 312W            0         1           0       34       26           0        0
16 12  2 APHMES             3403358360 316W                  4        0        0 316W            0         1           0       35       26           0        0
16 13  2 APHMES             3403358360 292W                  4        0        0 292W            0         1           0       34       26           0        0
16 14  2 APHMES             3403358360 299W                  4        0        0 299W            0         1           0       34       26           0        0
16 15  2 APHMES             3403358360 310W                  4        0        0 310W            1         1           0       35       27           0        0
16 16  2 APHMES             3403358360 314W                  4        0        0 314W            0         1           0       35       27           0        0
16 17  2 APHMES             3403358360 302W                  4        0        0 302W            1         1           0       35       27           0        0
16 18  2 APHMES             3403358360 243W                  4        0        0 243W            0         1           0       35       27           0        0
16 19  2 APHMES             3403358360 157W                  4        0        0 157W            0         1           0       35       26           0        0
16 20  2 APHMES             3403358360 80.W                  4        0        0 80.W            0         1           0       35       26           0        0
16 21  2 APHMES             3403358360 233W                  4        0        0 233W            0         1           0       34       26           0        0
16 22  2 APHMES             3403358360 255W                  4        0        0 255W            0         1           0       33       25           0        0
16 23  2 APHMES             3403358360 259W                  4        0        0 259W            0         1           0       33       25           0        0
17 00  2 APHMES             3403358360 274W                  4        0        0 274W            1         1           0       34       26           0        0
17 01  2 APHMES             3403358360 252W                  4        0        0 252W            0         1           0       33       25           0        0
17 02  2 APHMES             3403358360 247W                  4        0        0 247W            0         1           0       33       25           0        0
17 03  2 APHMES             3403358360 258W                  4        0        0 258W            0         1           0       33       25           0        0
17 04  2 APHMES             3403358360 267W                  4        0        0 267W            0         1           0       34       25           0        0
17 05  2 APHMES             3403358360 257W                  4        0        0 257W            0         1           0       33       25           0        0
17 06  2 APHMES             3403358360 202W                  4        0        0 202W            0         1           0       33       25           0        0
17 07  2 APHMES             3403358360 120W                  4        0        0 120W            0         1           0       34       25           0        0
17 08  2 APHMES             3403358360 80.W                  4        0        0 80.W            0         1           0       34       25           0        0
17 09  2 APHMES             3403358360 262W                  4        0        0 262W            0         1           0       33       25           0        0
17 10  2 APHMES             3403358360 314W                  4        0        0 314W            0         1           0       33       25           0        0
17 11  2 APHMES             3403358360 304W                  4        0        0 304W            0         1           0       33       25           0        0
17 12  2 APHMES             3403358360 301W                  4        0        0 301W            0         1           0       33       25           0        0
17 13  2 APHMES             3403358360 281W                  4        0        0 281W            0         1           0       32       25           0        0
17 14  2 APHMES             3403358360 288W                  4        0        0 288W            0         1           0       33       25           0        0
17 15  2 APHMES             3403358360 315W                  4        0        0 315W            0         1           0       33       26           0        0
17 16  2 APHMES             3403358360 314W                  4        0        0 314W            0         1           0       34       26           0        0
17 17  2 APHMES             3403358360 309W                  4        0        0 309W            0         1           0       34       26           0        0
17 18  2 APHMES             3403358360 250W                  4        0        0 250W            0         1           0       34       26           0        0
17 19  2 APHMES             3403358360 162W                  4        0        0 162W            0         1           0       35       26           0        0
17 20  2 APHMES             3403358360 89.W                  4        0        0 89.W            0         1           0       35       26           0        0
17 21  2 APHMES             3403358360 245W                  4        0        0 245W            0         1           0       33       25           0        0
17 22  2 APHMES             3403358360 267W                  4        0        0 267W            0         1           0       34       26           0        0
17 23  2 APHMES             3403358360 275W                  4        0        0 275W            0         1           0       34       26           0        0
18 00  2 APHMES             3403358360 274W                  4        0        0 274W            0         1           0       34       26           0        0
18 01  2 APHMES             3403358360 249W                  4        0        0 249W            0         1           0       33       25           0        0
18 02  2 APHMES             3403358360 244W                  4        0        0 244W            0         1           0       36       26           0        0
18 03  2 APHMES             3403358360 185W                  4        0        0 185W            0         1           0  229,010      184           0        0
18 04  2 APHMES             3403358360 262W                  4        0        0 262W            0         1           0       34       25           0        0
18 05  2 APHMES             3403358360 253W                  4        0        0 253W            0         1           0       34       25           0        0
18 06  2 APHMES             3403358360 213W                  4        0        0 213W            0         1           0       33       25           0        0
18 07  2 APHMES             3403358360 121W                  4        0        0 121W            0         1           0       34       26

通常会通过v$sql_shared_cursor去分析子游标不能共享的原因。子游标不能共享的原因会按发现有cursor不能共享的时间以xml的格式添加在reason字段中。

当时32号子游标为啥不能共享的原因可以看到是Bind mismatch。

              32         Bind mismatch(17):                                           <ChildNode>
<ChildNumber>32</ChildNumber>
<ID>39</ID>
<reason>Bind mismatch(17)</reason>
<size>4x8</size>
<bind_position>0000000100000000 </bind_position>
<original_oacflg>0000000300000000 </original_oacflg>
<original_oacflg>0000000300000000 </original_oacflg>
<new_oacflg>0000001300000000 </new_oacflg>
</ChildNode>

通过DBMS_SQLTUNE包可以查看子游标的bind值。

childnumber:32

SQL> select position, datatype_string,last_captured,value_string
2    FROM TABLE (SELECT DBMS_SQLTUNE.extract_binds (bind_data)
3                  FROM v$sql
4                WHERE sql_id='a2v3du5a69074'  AND child_number = 32);
POSITION DATATYPE_STRING                LAST_CAPTURED       VALUE_STRING
---------- ------------------------------ ------------------- ----------------------------------------
1 VARCHAR2(8192)                 2022-10-14 01:34:22 GRW237303Z01XYYAU+014
2 VARCHAR2(128)                  2022-10-14 01:34:22 e396691574a049208bec0d520f40efd7

childnumber:8

SQL> select position, datatype_string,last_captured,value_string
2    FROM TABLE (SELECT DBMS_SQLTUNE.extract_binds (bind_data)
3                  FROM v$sql
4                WHERE sql_id='a2v3du5a69074'  AND child_number = 8);
POSITION DATATYPE_STRING                LAST_CAPTURED       VALUE_STRING
---------- ------------------------------ ------------------- ----------------------------------------
1 VARCHAR2(32767)                2022-10-19 04:50:41 F8Y243105421LRHAH+235
2 VARCHAR2(32)                   2022-10-19 04:50:41 e396691574a049208bec0d520f40efd7

可以看到绑定变量长度差距很大,明显32和128的varchar2是不能共享的。

后面又发现在故障前,shared pool中大量内存kglhd(handle)和sqla(heap 6)被释放,初步怀疑被flush了shared pool。

 

对于该故障我的建议就是对于高并发的sql,应用尽量使用绑定变量长度一致的绑定变量,避免此类问题发生。

over!!!


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注