troubleshooting LGWR wait library cache lock instance lock on standby

本案例来自东区某金融客户,数据库版本为19c,故障现象为备库每天早上8点40左右,备库的LGWR都会被阻塞。从而其他应用因为请求不到instance lock也被LGWR阻塞,让客户在第二天重现的时候收集了systemstate dump进行分析。

LGWR PROCESS STATE OBJECT:

PROCESS 14: LGWR
  ----------------------------------------
  SO: 0x3d21de9858, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x3d21de9858, name=process, file=ksu.h LINE:12721, pg=0
  (process) Oracle pid:14, ser:2, calls cur/top: 0x3ec22fda30/0x3ec22fda30
            flags : (0x6) SYSTEM
            flags2: (0x0),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 26
              last post received-location: ksa2.h LINE:285 ID:ksasnd
              last process to post me: 0x3da1e98788 1 0
              last post sent: 0 0 259
              last post sent-location: kgl.h LINE:8873 ID:kgllkdl: post after freeing latch
              last process posted by me: 0x3da1eb2980 171 0
    (latch info) wait_event=0 bits=0x0
    Process Group: DEFAULT, pseudo proc: 0x3d21ec8180
    O/S info: user: oracle, term: UNKNOWN, ospid: 172824
    OSD pid info: Unix process pid: 172824, image: oracle@FMS11DG (LGWR)
    Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2066<-kslwaitctx()+163<-ksfwaitctx()+14<-kglLockWait()+871<-kgllkal()+1166<-kglLock()+1118<-kglLockInstance()+259<-kksExclusiveParseLock()+47<-krdrsb_mdflush_local()+126<-krdrsb_adv_qscn()+1652<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+245
    ----------------------------------------

LGWR SESSION STATE OBJECT:

SO: 0x3d21f74f20, type: 4, owner: 0x3d21de9858, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x3d21de9858, name=session, file=ksu.h LINE:12729, pg=0
    (session) sid: 337 ser: 3 trans: (nil), creator: 0x3d21de9858
              flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x409) -/-/INC
              DID: , short-term DID:
              txn branch: (nil)
              edition#: 0              oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
     0: waiting for 'library cache lock'
        handle address=0x3e5ff630f8, lock address=0x3e7ff41450, 100*mode+namespace=0x1004a0003
        wait_id=9628259 seq_num=60150 snap_id=1
        wait times: snap=1.306065 sec, exc=1.306065 sec, total=1.306065 sec
        wait times: max=5.500000 sec, heur=1 min 8 sec
        wait counts: calls=2 os=2
        in_wait=1 iflags=0x15a2

LGWR SID为337,等待lc lock,namespace一看就是instance lock,请求模式为X。

这里偷了个懒,很碰巧看到了final blocking session,就省的去搜索哪个session持有了instance lock了。

    SO: 0x3d42261708, type: 4, owner: 0x3da1eae6a0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x3da1eae6a0, name=session, file=ksu.h LINE:12729, pg=0
    (session) sid: 2045 ser: 1 trans: (nil), creator: 0x3da1eae6a0
              flags: (0x8000041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x40009) -/-/INC
              DID: , short-term DID:
              txn branch: (nil)
              edition#: 2605814              oct: 3, prv: 0, sql: 0x3e9fb65bd0, psql: 0x3e5e5b1860, user: 141/DBAAS_SYS
    ksuxds FALSE at location: 0
    service name: fmsdg
    client details:
      O/S info: user: zcloud, term: , ospid: 7395
      machine: VCOM01PXY program: zoramon_collector@VCOM01PXY (TNS V1-V3)
      application name: zoramon_collector@VCOM01PXY (TNS V1-V3), hash value=2183771371
    Current Wait Stack:
     0: waiting for 'library cache lock'
        handle address=0x3e5ff630f8, lock address=0x3e1fbb8dd0, 100*mode+namespace=0x1004a0002
        wait_id=5523383 seq_num=18444 snap_id=1
        wait times: snap=1.336710 sec, exc=1.336710 sec, total=1.336710 sec
        wait times: max=15 min 0 sec, heur=1.336710 sec
        wait counts: calls=2 os=2
        in_wait=1 iflags=0x15a2
    There is at least one session blocking this session.
      Dumping 1 direct blocker(s):
        inst: 1, sid: 337, ser: 3
      Dumping final blocker:
        inst: 1, sid: 1976, ser: 29501

final blocker为session 1976

final blocker session state object:

   SO: 0x3dc221a830, type: 4, owner: 0x3ec1e41a68, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x3ec1e41a68, name=session, file=ksu.h LINE:12729, pg=0
    (session) sid: 1976 ser: 29501 trans: (nil), creator: 0x3ec1e41a68
              flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x40009) -/-/INC
              DID: , short-term DID:
              txn branch: (nil)
              edition#: 2605814              oct: 3, prv: 0, sql: 0x3e5e197d50, psql: 0x3e9f487090, user: 142/FIX_DBQUERY
    ksuxds FALSE at location: 0
    service name: SYS$USERS
    client details:
      O/S info: user: afaim, term: unknown, ospid: 1234
      machine: VFIX02APP program: JDBC Thin Client
      application name: JDBC Thin Client, hash value=2546894660
    Current Wait Stack:
      Not in wait; last wait ended 1 min 16 sec ago
    There are 2 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 337, ser: 3
      wait event: 'library cache lock'
        p1: 'handle address'=0x3e5ff630f8
        p2: 'lock address'=0x3e7ff41450
        p3: '100*mode+namespace'=0x1004a0003
      row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
      min_blocked_time: 0 secs, waiter_cache_ver: 24873

final blocker没有任何等待,正在执行的sql为:

  SO: 0x3df82faf90, type: 78, owner: 0x3dc221a830, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
       proc=0x3ec1e41a68, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0

      LibraryObjectLock:  Address=0x3df82faf90 Handle=0x3e5e197d50 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

        User=0x3dc221a830 Session=0x3dc221a830 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=6552c1ec
      LibraryHandle:  Address=0x3e5e197d50 Hash=3584acbe LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
        ObjectName:  Name=select
        s.VC_INTER_CODE,
        s.C_MARKET_NO,
        s.L_BUY_UNIT,
        s.VC_TRADE_CURRENCY_NO,
        s.EN_YESTERDAY_CLOSE_PRICE,
        s.VC_MIXED_TYPE,
        s.C_STOCK_TYPE,
        s.C_ASSET_CLASS,
        s.C_STOP_FLAG,
        s.VC_REPORT_CODE
        from trade.TSTOCKINFO s
         WHERE s.VC_INTER_CODE in (

                    :1
                ,
                    :2
                ,
                    :3
                ,
                    :4

          FullHashValue=dbbe27bcd13925c46366f2fc3584acbe Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=897887422 OwnerIdn=142
        Statistics:  InvalidationCount=12 ExecutionCount=2804 LoadCount=14 ActiveLocks=1 TotalLockCount=58 TotalPinCount=1
        Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=57 HandleInUse=57 HandleReferenceCount=0
        Concurrency:  DependencyMutex=0x3e5e197e00(0, 135, 0, 0) Mutex=0x3e5e197e90(0, 209907, 0, 0)
        Flags=RON/PIN/TIM/PN0/DBN/[10012841]
        WaitersLists:
          Lock=0x3e5e197de0[0x3e5e197de0,0x3e5e197de0]
          Pin=0x3e5e197dc0[0x3e5e197dc0,0x3e5e197dc0]
          LoadLock=0x3e5e197e38[0x3e5e197e38,0x3e5e197e38]
        Timestamp:  Current=10-29-2023 08:40:12
        HandleReference:  Address=0x3e5e198110 Handle=(nil) Flags=[00]
        LibraryObject:  Address=0x3c7e8a0e00 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
          ChildTable:  size='16'
            Child:  id='0' Table=0x3c7e8a1cb0 Reference=0x3c7e8a16f8 Handle=0x3e5e2ea8f0
            Child:  id='1' Table=0x3c7e8a1cb0 Reference=0x3c7e8972c0 Handle=0x3e5de512f0
        NamespaceDump:
          Parent Cursor:  sql_id=66trkzhus9b5y parent=0x3c7e8a0ea0 maxchild=2 plk=y ppn=n

有两个子游标handle,

  • Handle=0x3e5e2ea8f0
  • Handle=0x3e5de512f0

查看子游标lc state object:

      SO: 0x3df82fa9f8, type: 78, owner: 0x3dc221a830, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
       proc=0x3ec1e41a68, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0

      LibraryObjectLock:  Address=0x3df82fa9f8 Handle=0x3e5e2ea8f0 Mode=N CanBeBrokenCount=23 Incarnation=23 ExecutionCount=0
        ClusterLock=0x3dfe9ca0a8 Context=0x7f59809d8030 User=0x3dc221a830 Session=0x3dc221a830 ReferenceCount=1
        Flags=CBK/[0020] SavepointNum=0
      LibraryHandle:  Address=0x3e5e2ea8f0 Hash=0 LockMode=N PinMode=X LoadLockMode=0 Status=VALD
        Name:  Namespace=SQL AREA(00) Type=CURSOR(00)
        Statistics:  InvalidationCount=11 ExecutionCount=0 LoadCount=13 ActiveLocks=1 TotalLockCount=59 TotalPinCount=2652
        Counters:  BrokenCount=23 RevocablePointer=23 KeepDependency=0 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0
        Concurrency:  DependencyMutex=0x3e5e2ea9a0(0, 0, 0, 0) Mutex=0x3e5e197e90(0, 209907, 0, 0)
        Flags=RON/PIN/PN0/EXP/CHD/[10012111]
        WaitersLists:
          Lock=0x3e5e2ea980[0x3e5e2ea980,0x3e5e2ea980]
          Pin=0x3e5e2ea960[0x3e5e2ea960,0x3e5e2ea960]
          LoadLock=0x3e5e2ea9d8[0x3e5e2ea9d8,0x3e5e2ea9d8]
        LibraryObject:  Address=0x3c7f36d0b0 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
          DataBlocks:
            Block:  #='0' name=KGLH0^3584acbe pins=0 Change=NONE
              Heap=0x3e5ce16cb0 Pointer=0x3c7f36d150 Extent=0x3c7f36d030 Flags=I/-/P/A/-/-
              FreedLocation=0 Alloc=64.843750 Size=67.601562 LoadTime=6923111190
            Block:  #='6' name=SQLA^3584acbe pins=0 Change=NONE
              Heap=0x3c7e8a14c8 Pointer=0x3024187ea8 Extent=0x3024187268 Flags=I/-/P/A/-/E
              FreedLocation=0 Alloc=69385.406250 Size=70401.898438 LoadTime=0
        NamespaceDump:
          Child Cursor:  Heap0=0x3c7f36d150 Heap6=0x3024187ea8 Heap0 Load Time=11-14-2023 08:40:13 Heap6 Load Time=11-14-2023 08:40:13       ----------------------------------------
      KGX Atomic Operation Log 0x3dfe9ca0a8
       Mutex 0x3c7e8a1420(1976, 0) idn 3584acbe oper LONG_EXCL(18)
       Cursor Pin uid 1976 efd 0 whr 1 slp 0
       opr=3 pso=0x3df82fa9f8 flg=0
       pcs=0x3c7e8a1388 nxt=0x3c7e897020 flg=34 cld=0 hd=0x3e5e2ea8f0 par=0x3c7e8a0ea0
       ct=c hsh=0 unp=(nil) unn=0 hvl=7e897d28 nhv=1 ses=0x3dc221a830
       hep=0x3c7e8a1420 flg=80 ld=1 ob=0x3c7f36d0b0 ptr=0x3024187ea8 fex=0x3024187268

load时间与故障时间吻合,并且长时间的X模式持有cursor pin mutex和lc pin,这里可以看到heap 6占用了大量的空间,不由得想去看看这个sql的执行计划。

Plan hash value: 3616047354

------------------------------------------------------
| Id  | Operation                    | Name          |
------------------------------------------------------
|   0 | SELECT STATEMENT             |               |
|   1 |  CONCATENATION               |               |
|   2 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|*  3 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|   4 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|*  5 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|   6 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|*  7 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|   8 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|*  9 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|  10 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|* 11 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|  12 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|* 13 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|  14 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
...
...
|1995 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|1996 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|1997 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|1998 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|1999 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
|2000 |   TABLE ACCESS BY INDEX ROWID| TSTOCKINFO    |
|2001 |    INDEX UNIQUE SCAN         | PK_TSTOCKINFO |
------------------------------------------------------

看了吓一跳,首先优化器选用的RBO,很简单的sql,但是in里面变量达到1000个,正好还走了or展开。1000个变量意味着有1000个or分支,难怪heap 6那么大。

看看blocker 的stack信息:

 SO: 0x3ec1e41a68, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x3ec1e41a68, name=process, file=ksu.h LINE:12721, pg=0
  (process) Oracle pid:178, ser:231, calls cur/top: 0x3d6b429b98/0x3d6b514a00
            flags : (0x0) -
            flags2: (0x0),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 160
              last post received-location: kcb2.h LINE:4243 ID:kcbzww
              last process to post me: 0x3ec1e57980 1 0
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=0x0
    Process Group: DEFAULT, pseudo proc: 0x3d21ec8180
    O/S info: user: oracle, term: UNKNOWN, ospid: 301884
    OSD pid info: Unix process pid: 301884, image: oracle@FMS11DG
    Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-kghalf()+246<-kghalp()+94<-qcopCreateLog()+54<-qkebCreateUnaryLog()+43<-apaclg()+205<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apasor()+281<-apaqba()+1122<-apaqbdDescendents()+496<-apaqbdList()+76<-apaqbd()+14<-apadrv()+922<-opitca()+2089<-kksFullTypeCheck()+69<-rpiswu2()+1776<-kksSetBindType()+2299<-kksfbc()+11192<-opiexe()+2330<-kpoal8()+2380<-opiodr()+917<-ttcpip()+1255<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+245
    ----------------------------------------

大量的循环了apaclg函数。  (apa)clg – SQL Access Path Analysis ??

分析到此基本可以给出建议了。3个建议同时调整:

  • 减少in变量为100个
  • 使用all_rows hint让执行计划走inlist
  • keep sql执行计划

 


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注