案例:troubleshooting Large Waits With The Wait Event “resmgr:cpu quantum”

本案例来自西区某客户,数据库版本为11.2.0.4,客户反馈应用异常缓慢,几乎处于不可用的状态。

查看历史活动会话信息发现,从2022-05-01 09:43开始,活动会话开始异常增长。

2022-05-01 09:40               1         39
2022-05-01 09:40               2         83
2022-05-01 09:41               1         38
2022-05-01 09:41               2         75
2022-05-01 09:42               1         50
2022-05-01 09:42               2        199
2022-05-01 09:43               1         22
2022-05-01 09:43               2       1130
2022-05-01 09:44               1         20
2022-05-01 09:44               2       1766
2022-05-01 09:45               1         26
2022-05-01 09:45               2       2027
2022-05-01 09:46               1         88
2022-05-01 09:46               2       3462
2022-05-01 09:47               1         97
2022-05-01 09:47               2       9765
2022-05-01 09:48               1        103
2022-05-01 09:48               2      12402
2022-05-01 09:49               1         78
2022-05-01 09:49               2      17489
2022-05-01 09:50               1         96
2022-05-01 09:50               2      16880
2022-05-01 09:51               1        101
2022-05-01 09:51               2      19659
2022-05-01 09:52               1        126
2022-05-01 09:52               2      18511
2022-05-01 09:53               1        124
2022-05-01 09:53               2      15975
2022-05-01 09:54               1        100
2022-05-01 09:54               2      17142
2022-05-01 09:55               1         94
2022-05-01 09:55               2      20933
2022-05-01 09:56               1        115
2022-05-01 09:56               2      17159
2022-05-01 09:57               1         72
2022-05-01 09:57               2      19454
2022-05-01 09:58               1         33
2022-05-01 09:58               2      15774
2022-05-01 09:59               1         28
2022-05-01 09:59               2      20397

大量异常的活动会话几乎全是等待resmgr:cpu quantum

2022-05-01 09:38                                                                                2        214
2022-05-01 09:39                                                                                2         84
2022-05-01 09:40                                                                                2         80
2022-05-01 09:41                                                                                2         69
2022-05-01 09:42                                                                                2        192
2022-05-01 09:43 resmgr:cpu quantum                                                             2        424
2022-05-01 09:43                                                                                2        689
2022-05-01 09:44 resmgr:cpu quantum                                                             2        810
2022-05-01 09:44                                                                                2        922
2022-05-01 09:45 resmgr:cpu quantum                                                             2       1114
2022-05-01 09:45                                                                                2        880
2022-05-01 09:46 resmgr:cpu quantum                                                             2       2679
2022-05-01 09:46                                                                                2        757
2022-05-01 09:47 resmgr:cpu quantum                                                             2       8831
2022-05-01 09:47                                                                                2        901
2022-05-01 09:48 resmgr:cpu quantum                                                             2      11566
2022-05-01 09:48                                                                                2        802
2022-05-01 09:49 resmgr:cpu quantum                                                             2      16565
2022-05-01 09:49                                                                                2        887
2022-05-01 09:50 resmgr:cpu quantum                                                             2      16118
2022-05-01 09:50                                                                                2        737
2022-05-01 09:51 resmgr:cpu quantum                                                             2      18701
2022-05-01 09:51                                                                                2        926
2022-05-01 09:52 resmgr:cpu quantum                                                             2      17589
2022-05-01 09:52                                                                                2        895
2022-05-01 09:53 resmgr:cpu quantum                                                             2      15183
2022-05-01 09:53                                                                                2        770
2022-05-01 09:54 resmgr:cpu quantum                                                             2      16357
2022-05-01 09:54                                                                                2        752
2022-05-01 09:55 control file sequential read                                                   2         11
2022-05-01 09:55 resmgr:cpu quantum                                                             2      19959
2022-05-01 09:55                                                                                2        926
2022-05-01 09:56 resmgr:cpu quantum                                                             2      16361
2022-05-01 09:56                                                                                2        765
2022-05-01 09:57 control file sequential read                                                   2         12
2022-05-01 09:57 latch: cache buffers chains                                                    2       2734
2022-05-01 09:57 resmgr:cpu quantum                                                             2      15686
2022-05-01 09:57                                                                                2        985
2022-05-01 09:58 control file sequential read                                                   2         11
2022-05-01 09:58 resmgr:cpu quantum                                                             2      14989
2022-05-01 09:58                                                                                2        755
2022-05-01 09:59 control file sequential read                                                   2         12
2022-05-01 09:59 resmgr:cpu quantum                                                             2      19440
2022-05-01 09:59                                                                                2        910

resmgr:cpu quantum等待事件的含义在WAITEVENT: “resmgr:cpu quantum” Reference Note (Doc ID 2097889.1) 中有说明

Event 'resmgr: cpu quantum' is a standard event used by resource manager to control the allocation of CPU to processes. When a session waits for 'resmgr: cpu quantum' that session is waiting to be allocated a quantum of CPU time.
This wait occurs when the resource manager is enabled and is throttling CPU consumption. To reduce the occurrence of this wait event, increase the CPU allocation for the session's current consumer group.

简单来说就是,当resource manager启用对CPU的限制时,进程对应消费组所占用的CPU达到限额时,该进程将以等待resmgr: cpu quantum的形式进入等待,以保证该消费组的cpu消耗不超过限额。

从osw的vmstat可以看到刚刚出现大量resmgr: cpu quantum的时段cpu使用率仅仅为50%。

zzz ***Sun May 1 09:40:59 CST 2022
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
39  0      0 315695840  72296 21418404    0    0   624   164    0    0  2  1 96  0  0
12  0      0 315666016  72304 21419448    0    0    67 14298 96858 95762  8  2 91  0  0
 7  0      0 315674624  72312 21418280    0    0    83 16846 89062 98593  6  1 93  0  0
zzz ***Sun May 1 09:41:29 CST 2022
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
17  0      0 315677088  72696 21420468    0    0   624   164    0    0  2  1 96  0  0
12  0      0 315655424  72704 21421572    0    0  8436   213 78322 83804  4  1 95  0  0
32  0      0 315662432  72736 21421272    0    0    91  8558 110771 99331 17  1 82  0  0
zzz ***Sun May 1 09:41:59 CST 2022
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
23  1      0 315660384  73020 21422720    0    0   624   164    0    0  2  1 96  0  0
12  1      0 315610208  73028 21419544    0    0 76604  4974 171824 162508  9  3 87  1  0
15  2      0 315620288  73036 21423504    0    0  5859 17278 126332 131118  8  2 90  0  0
zzz ***Sun May 1 09:42:29 CST 2022
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
100  2      0 315660832  73400 21424484    0    0   624   164    0    0  2  1 96  0  0
70  2      0 315639616  73408 21425508    0    0 41699  9383 198482 143324 48  2 49  1  0
86  3      0 315636544  73424 21425092    0    0 43388 17133 193937 144246 47  2 50  1  0

查看参数resource_manager_plan,发现启用了DEFAULT_MAINTENANCE_PLAN

SQL> show parameter resource_manager_plan

NAME                                 TYPE       VALUE
------------------------------------ ---------- ------------------------------
resource_manager_plan                string     SCHEDULER[0x32DF]:DEFAULT_MAIN
                                                TENANCE_PLAN

看到DEFAULT_MAINTENANCE_PLAN应该非常熟悉,这就是11g自动任务维护时间窗口默认将会启用的resource mangager plan。

WINDOW_NAME          REPEAT_INTERVAL                                                         DURATION             ENABL ACTIV
-------------------- ----------------------------------------------------------------------  -------------------- ----- -----
MONDAY_WINDOW        freq=daily;byday=MON;byhour=22;byminute=0; bysecond=0                   +000 04:00:00        TRUE  FALSE
TUESDAY_WINDOW       freq=daily;byday=TUE;byhour=22;byminute=0; bysecond=0                   +000 04:00:00        TRUE  FALSE
WEDNESDAY_WINDOW     freq=daily;byday=WED;byhour=22;byminute=0; bysecond=0                   +000 04:00:00        TRUE  FALSE
THURSDAY_WINDOW      freq=daily;byday=THU;byhour=22;byminute=0; bysecond=0                   +000 04:00:00        TRUE  FALSE
FRIDAY_WINDOW        freq=daily;byday=FRI;byhour=22;byminute=0; bysecond=0                   +000 04:00:00        TRUE  FALSE
SATURDAY_WINDOW      freq=daily;byday=SAT;byhour=6;byminute=0; bysecond=0                    +000 20:00:00        TRUE  FALSE
SUNDAY_WINDOW        freq=daily;byday=SUN;byhour=6;byminute=0; bysecond=0                    +000 20:00:00        TRUE  TRUE
WEEKNIGHT_WINDOW     freq=daily;byday=MON,TUE,WED,THU,FRI;byhour=22;byminute=0; bysecond=0   +000 08:00:00        FALSE FALSE
WEEKEND_WINDOW       freq=daily;byday=SAT;byhour=0;byminute=0;bysecond=0                     +002 00:00:00        FALSE FALSE

今天51劳动节,正好是周日,该库时间窗口启用时间为早上6点,从alert也可以看到6点时启用了该时间窗口的resource manager plan

2022-05-01T06:00:00.111668+08:00
Setting Resource Manager plan SCHEDULER[0x4D52]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter

DEFAULT_MAINTENANCE_PLAN对应消费组以及限制在dba_rsrc_plan_directives中查看

GROUP_OR_SUBPLAN               TYPE                              MGMT_P1    MGMT_P2    MGMT_P3 COMMENTS                                           MAX_UTILIZATION_LIMIT
------------------------------ ------------------------------ ---------- ---------- ---------- -------------------------------------------------- ---------------------
ORA$AUTOTASK_SUB_PLAN          PLAN                                    0         25          0 Directive for automated maintenance tasks                             90
ORA$DIAGNOSTICS                CONSUMER_GROUP                          0          5          0 Directive for automated diagnostic tasks                              90
SYS_GROUP                      CONSUMER_GROUP                         75          0          0 Directive for system operations
OTHER_GROUPS                   CONSUMER_GROUP                          0         70          0 Directive for all other operations

自动任务维护属于ORA$AUTOTASK_SUB_PLAN,SYS_GROUP和OTHER_GROUPS分别代表SYS/SYSTEM会话消费组和业务会话消费组。CPU限额按照MGMT_P*定义的优先级进行百分比分配,该限额是会动态调整的。假如此时几乎只有ORA$AUTOTASK_SUB_PLAN在运行,ORA$AUTOTASK_SUB_PLAN的限额可能会被调高,最大限制是MAX_UTILIZATION_LIMIT默认90%,而对应真正运行业务的OTHER_GROUPS就会降低限额。

在周日的早上6点开启的自动任务维护窗口,很可能几乎只有自动维护任务的相关session会被执行,所以有可能会提高ORA$AUTOTASK_SUB_PLAN,并且降低OTHER_GROUPS的限额。如果假设9点开始业务开始增加,那么就很容易导致resmgr: cpu quantum。

此外发现当时等待resmgr: cpu quantum的sql基本是同一个sql。

2022-05-01 09:40 b81avsvs9ccpp               2         76
2022-05-01 09:41 b81avsvs9ccpp               2         66
2022-05-01 09:42 b81avsvs9ccpp               2        181
2022-05-01 09:43 3t74kg8jmw4dg               2         14
2022-05-01 09:43 b81avsvs9ccpp               2       1087
2022-05-01 09:44 3t74kg8jmw4dg               2         14
2022-05-01 09:44 b81avsvs9ccpp               2       1692
2022-05-01 09:44 fm8d2gv7gz1bq               2         12
2022-05-01 09:44                             2         17
2022-05-01 09:45 3t74kg8jmw4dg               2         22
2022-05-01 09:45 b81avsvs9ccpp               2       1945
2022-05-01 09:45 fm8d2gv7gz1bq               2         12
2022-05-01 09:45                             2         18
2022-05-01 09:46 3t74kg8jmw4dg               2         40
2022-05-01 09:46 b81avsvs9ccpp               2       3351
2022-05-01 09:46                             2         23
2022-05-01 09:47 3t74kg8jmw4dg               2         48
2022-05-01 09:47 aunqkadqdumxj               2         16
2022-05-01 09:47 b81avsvs9ccpp               2       9598
2022-05-01 09:47 by3xu361c3mfy               2         12
2022-05-01 09:47                             2         31
2022-05-01 09:48 3t74kg8jmw4dg               2         48
2022-05-01 09:48 4z322qbnykw3q               2         13
2022-05-01 09:48 b81avsvs9ccpp               2      12238
2022-05-01 09:48 by3xu361c3mfy               2         13
2022-05-01 09:48                             2         33
2022-05-01 09:49 3t74kg8jmw4dg               2         77
2022-05-01 09:49 4z322qbnykw3q               2         13
2022-05-01 09:49 6zd1tmy35rf7b               2         11
2022-05-01 09:49 b81avsvs9ccpp               2      17280
2022-05-01 09:49 by3xu361c3mfy               2         23
2022-05-01 09:49                             2         34
2022-05-01 09:50 3t74kg8jmw4dg               2         47
2022-05-01 09:50 4z322qbnykw3q               2         16
2022-05-01 09:50 aunqkadqdumxj               2         16
2022-05-01 09:50 b81avsvs9ccpp               2      16688
2022-05-01 09:50 by3xu361c3mfy               2         28
2022-05-01 09:50                             2         16
2022-05-01 09:51 3t74kg8jmw4dg               2         59
2022-05-01 09:51 4z322qbnykw3q               2         46
2022-05-01 09:51 6zd1tmy35rf7b               2         13
2022-05-01 09:51 b81avsvs9ccpp               2      19415
2022-05-01 09:51 by3xu361c3mfy               2         28
2022-05-01 09:51                             2         30
2022-05-01 09:52 3t74kg8jmw4dg               2         62
2022-05-01 09:52 4z322qbnykw3q               2         43
2022-05-01 09:52 b81avsvs9ccpp               2      18304
2022-05-01 09:52 by3xu361c3mfy               2         22
2022-05-01 09:52 fm8d2gv7gz1bq               2         12
2022-05-01 09:52                             2         13
2022-05-01 09:53 3t74kg8jmw4dg               2         57
2022-05-01 09:53 4z322qbnykw3q               2         21
2022-05-01 09:53 b81avsvs9ccpp               2      15802
2022-05-01 09:53 by3xu361c3mfy               2         21
2022-05-01 09:54 3t74kg8jmw4dg               2        102
2022-05-01 09:54 4z322qbnykw3q               2         16
2022-05-01 09:54 b81avsvs9ccpp               2      16922
2022-05-01 09:54 by3xu361c3mfy               2         26
2022-05-01 09:54 fm8d2gv7gz1bq               2         14
2022-05-01 09:55 3t74kg8jmw4dg               2        106
2022-05-01 09:55 4z322qbnykw3q               2         31
2022-05-01 09:55 6zd1tmy35rf7b               2         15
2022-05-01 09:55 b81avsvs9ccpp               2      20637
2022-05-01 09:55 by3xu361c3mfy               2         38
2022-05-01 09:55 fm8d2gv7gz1bq               2         18
2022-05-01 09:55                             2         31
2022-05-01 09:56 3t74kg8jmw4dg               2         92
2022-05-01 09:56 4z322qbnykw3q               2         16
2022-05-01 09:56 b81avsvs9ccpp               2      16934
2022-05-01 09:56 by3xu361c3mfy               2         32
2022-05-01 09:56 fm8d2gv7gz1bq               2         15
2022-05-01 09:56                             2         20
2022-05-01 09:57 3t74kg8jmw4dg               2        111
2022-05-01 09:57 4z322qbnykw3q               2         34
2022-05-01 09:57 b81avsvs9ccpp               2      19171
2022-05-01 09:57 by3xu361c3mfy               2         30
2022-05-01 09:57 fm8d2gv7gz1bq               2         18
2022-05-01 09:57                             2         34
2022-05-01 09:58 3t74kg8jmw4dg               2        104
2022-05-01 09:58 4z322qbnykw3q               2         21
2022-05-01 09:58 b81avsvs9ccpp               2      15548

怀疑有sql的并发的比之前的要高的因素存在

****************************************************************************************
PLAN STAT FROM ASH
****************************************************************************************
SQL_ID  b81avsvs9ccpp, child number 4
-------------------------------------
select a.offer_code,T.FIELD_VALUE FIELD_VALUE from pm_ext_cha t,
pm_offer a where t.from_table_name = 'TD_B_DISCNT' and t.field_name =
'DISCNT_LABLE' and t.object_id = a.offer_id and a.offer_type = 'D' and
a.OFFER_CODE  =:1  order by A.EXPIRE_DATE desc
Plan hash value: 14354385
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                            |       |       |  1126 (100)|          |latch: cache buffers chai(1)(0%)     |
|                                                                                                             |CPU(68)(.01%)                        |
|                                                                                                             |SQL*Net message to client(1)(0%)     |
|                                                                                                             |CPU(1)(0%)                           |
|   1 |  SORT ORDER BY                   |                            |     1 |    70 |  1126   (1)| 00:00:01 |CPU(10)(0%)                          |
|   2 |   NESTED LOOPS                   |                            |     1 |    70 |  1125   (1)| 00:00:01 |CPU(39)(0%)                          |
|   3 |    NESTED LOOPS                  |                            |   485 |    70 |  1125   (1)| 00:00:01 |CPU(125)(.02%)                       |
|*  4 |     MAT_VIEW ACCESS FULL         | MV_PM_EXT_CHA              |   485 | 22310 |   349   (1)| 00:00:01 |CPU(5816)(.72%)                      |
|                                                                                                             |resmgr:cpu quantum(656886)(81.55%)   |
|                                                                                                             |resmgr:internal state cha(5)(0%)     |
|                                                                                                             |latch: cache buffers chai(332)(.04%) |
|*  5 |     INDEX UNIQUE SCAN            | SYS_C_SNAP$_202PK_PM_OFFER |     1 |       |     1   (0)| 00:00:01 |latch: cache buffers chai(57227)(7.1%|
|                                                                                                             |wait list latch free(25)(0%)         |
|                                                                                                             |CPU(79731)(9.9%)                     |
|*  6 |    MAT_VIEW ACCESS BY INDEX ROWID| MV_PM_OFFER                |     1 |    24 |     2   (0)| 00:00:01 |latch: cache buffers chai(485)(.06%) |
|                                                                                                             |CPU(4787)(.59%)                      |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(("T"."FIELD_NAME"='DISCNT_LABLE' AND "T"."FROM_TABLE_NAME"='TD_B_DISCNT'))
5 - access("T"."OBJECT_ID"="A"."OFFER_ID")
6 - filter(("A"."OFFER_CODE"=TO_NUMBER(:1) AND "A"."OFFER_TYPE"='D'))
PL/SQL procedure successfully completed.
+------------------------------------------------------------------------+
| information from awr   sysdate-7                                       |
+------------------------------------------------------------------------+
PLAN                     GET     DISK    WRITE              ROWS      ROWS USER_IO(MS)  ELA(MS)  CPU(MS) CLUSTER(MS)    PLSQL
END_TI I NAME               HASH VALUE EXEC           PRE EXEC PRE EXEC PER EXEC ROW_P    PRE EXEC PRE FETCH    PER EXEC PRE EXEC PRE EXEC    PER EXEC PER EXEC
------ - --------------- ------------- ---------- ------------ -------- -------- ----- ----------- --------- ----------- -------- -------- ----------- --------
01 00  2 UOP_PARAM            14354385 88.W              1,225        0        0 22.W            0         1           0 ########  172,251           0        0
01 00  2 UOP_PARAM            14354385 59.W                  0        0        0 20.W            0         0           0  191,574   23,961           0        0
01 01  2 UOP_PARAM            14354385 19.W             10,877        0        0 4.8W            0         0           0    8,776    8,277           0        0
01 01  2 UOP_PARAM            14354385 31.W                  0        0        0 8.0W            0         0           0    9,409    8,907           0        0
01 02  2 UOP_PARAM            14354385 10.W             10,893        0        0 2.6W            0         0           0    8,394    7,901           0        0
01 02  2 UOP_PARAM            14354385 13.W                  0        0        0 3.3W            0         0           0    8,466    7,968           0        0
01 03  2 UOP_PARAM            14354385 7.5W             10,899        0        0 1.9W            0         0           0    8,314    7,824           0        0
01 03  2 UOP_PARAM            14354385 6.6W             10,910        0        0 1.6W            0         0           0    8,188    7,697           0        0
01 04  2 UOP_PARAM            14354385 6.7W             10,898        0        0 2.0W            0         0           0    8,156    7,661           0        0
01 04  2 UOP_PARAM            14354385 6.4W                  0        0        0 1.7W            0         0           0    8,183    7,693           0        0
01 05  2 UOP_PARAM            14354385 9.3W             10,894        0        0 2.5W            0         0           0    8,342    7,844           0        0
01 05  2 UOP_PARAM            14354385 8.3W             10,900        0        0 2.4W            0         0           0    8,271    7,778           0        0
01 06  2 UOP_PARAM            14354385 10.W             10,897        0        0 2.8W            0         0           0   10,480    9,976           0        0
01 06  2 UOP_PARAM            14354385 12.W                  0        0        0 3.1W            0         0           0    8,416    7,911           0        0
01 07  2 UOP_PARAM            14354385 28.W             10,881        0        0 7.6W            0         0           0   11,449   10,956           0        0
01 07  2 UOP_PARAM            14354385 35.W                  0        0        0 9.9W            0         0           0    9,500    9,010           0        0
01 08  2 UOP_PARAM            14354385 41.W                441        0        0 11.W            0         0           0    9,562    9,071           0        0
01 08  2 UOP_PARAM            14354385 54.W                  0        0        0 15.W            0         0           0   10,766   10,276           0        0
01 09  2 UOP_PARAM            14354385 101W                  0        0        0 27.W            0         0           0   91,885   52,828           0        0
01 09  2 UOP_PARAM            14354385 59.W              3,687        0        0 16.W            0         0           0   11,327   10,832           0        0
01 10  2 UOP_PARAM            14354385 141W              1,843        0        0 38.W            0         1           0 ########  209,143          40        0
01 10  2 UOP_PARAM            14354385 109W                  0        0        0 29.W            0         0           0 ########  136,456           0        0
01 11  2 UOP_PARAM            14354385 72.W                  0        0        0 20.W            0         0           0  278,573  113,357          86        0
01 11  2 UOP_PARAM            14354385 118W                  4        0        0 34.W            0         0           0  194,557   85,951           1        0
01 12  2 UOP_PARAM            14354385 98.W                  0        0        0 30.W            0         0           0   44,274   32,320           0        0
01 12  2 UOP_PARAM            14354385 111W              3,182        0        0 33.W            0         0           0  110,326   57,733           0        0
01 13  2 UOP_PARAM            14354385 101W                  0        0        0 30.W            0         0           0   17,081   16,574           0        0
29 08  2 UOP_PARAM            14354385 28.W             10,850        0        0 9.5W            0         0           0    9,262    8,763           0        0
29 08  2 UOP_PARAM            14354385 40.W                208        0        0 13.W            0         0           0    9,965    9,464           0        0
29 09  2 UOP_PARAM            14354385 42.W                833        0        0 13.W            0         0           0    9,988    9,487           0        0
29 09  2 UOP_PARAM            14354385 55.W                  0        0        0 17.W            0         0           0   10,919   10,412           0        0
29 10  2 UOP_PARAM            14354385 64.W                  0        0        0 21.W            0         0           0   11,627   11,119           1        0
29 10  2 UOP_PARAM            14354385 58.W              3,417        0        0 18.W            0         0           0   11,142   10,639           0        0
29 11  2 UOP_PARAM            14354385 73.W                  0        0        0 23.W            0         0           0   12,627   12,118           1        0
29 11  2 UOP_PARAM            14354385 70.W                  0        0        0 22.W            0         0           0   12,308   11,795           0        0
29 12  2 UOP_PARAM            14354385 70.W                  0        0        0 22.W            0         0           0   12,261   11,749           0        0
29 12  2 UOP_PARAM            14354385 70.W              4,754        0        0 22.W            0         0           0   12,485   11,970           0        0
29 13  2 UOP_PARAM            14354385 65.W                  0        0        0 22.W            0         0           0   11,713   11,202           0        0
29 13  2 UOP_PARAM            14354385 70.W                  0        0        0 22.W            0         0           0   12,102   11,591

通过对比可以看到故障期间的sql并发确实比之前要多1倍,并且对于执行次数如此之高的sql,sql性能并不算太好,因为对于一个非聚合的sql来说,平均返回行数非常少的情况下,消耗了几千甚至10000多的逻辑读,这是不合理的。假如该sql足够优化,可能也不会导致此次故障。

我们知道对于resource manager的cpu限制,是根据cpu_count参数来进行百分比配额的,而该库的cpu_count为136,实际的lcpu数量为160,并且该主机只有1个实例,这也是为何主机的cpu使用率并没有100%使用完,始终在85-90%之间的原因。虽然cpu_count与lcpu差距不大但是也或多或少算是此次故障的另一个因素。

最终建议:

  1. 假如周末白天也有业务的系统,建议调整自动维护任务时间窗口的时间,比如都放在晚上22点进行。
  2. 禁用自动维护窗口默认启用的resource manager plan
  3. 优化该执行次数如此高但并不够优化的sql
  4. 建议在确认主机只有一个实例存在的情况下,cpu_count与实际的lcpu保持一致。该参数参与了大量资源与后台进程个数的运算。如LMS、LMD、ADG pr进程等等。

 

 


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注