Oracle instance hung due to gc buffer busy acquire/gc current request

一客户某系统反馈在上周深夜出现严重性能问题,整个系统出现了短暂hung,导致无法使用。通过用户提供的awr元数据,这里做了简单分析:

SQL> select a.instance_number,
2         TO_CHAR(a.sample_time, 'yyyy-mm-dd hh24:mi:ss'),
3         b.EVENT_NAME,
4         count(1)
5    from WRH$_ACTIVE_SESSION_HISTORY a,WRH$_EVENT_NAME b
6   where  a.dbid=3866720943
7   and a.event_id=b.event_id
8     and TO_CHAR(a.sample_time, 'yyyy-mm-dd hh24:mi:ss') > '2022-11-18 23:23:00'
9     and TO_CHAR(a.sample_time, 'yyyy-mm-dd hh24:mi:ss') < '2022-11-18 23:33:00'
10         ---and b.WAIT_CLASS<> 'Idle'
and b.event_name not like '%SQL*Net%'
11   12     and b.event_name not like  'gc%%'
group by a.instance_number,
TO_CHAR(a.sample_time, 'yyyy-mm-dd hh24:mi:ss'),
13   14   15            b.EVENT_NAME
16   order by 1, 2;
INSTANCE_NUMBER TO_CHAR(A.SAMPLE_TIME,'YYYY-MM-DDHH24: EVENT_NAME                                 COUNT(1)
--------------- -------------------------------------- ---------------------------------------- ----------
1 2022-11-18 23:27:59                    CGS wait for IPC msg                              2
1 2022-11-18 23:27:59                    buffer busy waits                                58
1 2022-11-18 23:27:59                    db file scattered read                            2
1 2022-11-18 23:27:59                    db file sequential read                           2
1 2022-11-18 23:27:59                    enq: SQ - contention                            308
1 2022-11-18 23:27:59                    enq: TX - index contention                      150
1 2022-11-18 23:27:59                    gc buffer busy acquire                          400
1 2022-11-18 23:27:59                    gc current block busy                             4
1 2022-11-18 23:27:59                    gc current request                              378
1 2022-11-18 23:27:59                    row cache lock                                   78
1 2022-11-18 23:28:09                    buffer busy waits                                58
1 2022-11-18 23:28:09                    enq: SQ - contention                            308
1 2022-11-18 23:28:09                    enq: TX - index contention                      150
1 2022-11-18 23:28:09                    gc buffer busy acquire                          402
1 2022-11-18 23:28:09                    gc cr multi block request                         2
1 2022-11-18 23:28:09                    gc current block busy                             4
1 2022-11-18 23:28:09                    gc current request                              378
1 2022-11-18 23:28:09                    row cache lock                                   78
1 2022-11-18 23:28:19                    buffer busy waits                                58
1 2022-11-18 23:28:19                    db file scattered read                            2
1 2022-11-18 23:28:19                    enq: SQ - contention                            308
1 2022-11-18 23:28:19                    enq: TX - index contention                      150
1 2022-11-18 23:28:19                    gc buffer busy acquire                          404
1 2022-11-18 23:28:19                    gc current block busy                             6
1 2022-11-18 23:28:19                    gc current request                              378
1 2022-11-18 23:28:19                    row cache lock                                   78
1 2022-11-18 23:28:30                    buffer busy waits                                58
1 2022-11-18 23:28:30                    db file scattered read                            2
1 2022-11-18 23:28:30                    enq: SQ - contention                            308
1 2022-11-18 23:28:30                    enq: TX - index contention                      150
1 2022-11-18 23:28:30                    gc buffer busy acquire                          416
1 2022-11-18 23:28:30                    gc current block busy                            16
1 2022-11-18 23:28:30                    gc current request                              378
1 2022-11-18 23:28:30                    row cache lock                                   78
1 2022-11-18 23:28:40                    buffer busy waits                                58
1 2022-11-18 23:28:40                    db file scattered read                            2
1 2022-11-18 23:28:40                    enq: SQ - contention                            308
1 2022-11-18 23:28:40                    enq: TX - index contention                      150
1 2022-11-18 23:28:40                    gc buffer busy acquire                          416
1 2022-11-18 23:28:40                    gc current request                              394
1 2022-11-18 23:28:40                    row cache lock                                   78
1 2022-11-18 23:28:50                    buffer busy waits                                58
1 2022-11-18 23:28:50                    control file sequential read                      2
1 2022-11-18 23:28:50                    db file scattered read                            2
1 2022-11-18 23:28:50                    enq: SQ - contention                            310
1 2022-11-18 23:28:50                    enq: TX - index contention                      150
1 2022-11-18 23:28:50                    gc buffer busy acquire                          414
1 2022-11-18 23:28:50                    gc current block busy                            16
1 2022-11-18 23:28:50                    gc current request                              378
1 2022-11-18 23:28:50                    reliable message                                  2
1 2022-11-18 23:28:50                    row cache lock                                   78
1 2022-11-18 23:29:01                    buffer busy waits                                58
1 2022-11-18 23:29:01                    db file scattered read                            2
1 2022-11-18 23:29:01                    enq: SQ - contention                            312
1 2022-11-18 23:29:01                    enq: TX - index contention                      150
1 2022-11-18 23:29:01                    gc buffer busy acquire                          416
1 2022-11-18 23:29:01                    gc current block busy                            18
1 2022-11-18 23:29:01                    gc current request                              378
1 2022-11-18 23:29:01                    row cache lock                                   78
1 2022-11-18 23:29:12                    SQL*Net message to client                         2
1 2022-11-18 23:29:12                    buffer busy waits                                58
1 2022-11-18 23:29:12                    db file scattered read                            2
1 2022-11-18 23:29:12                    enq: SQ - contention                            314
1 2022-11-18 23:29:12                    enq: TX - index contention                      150
1 2022-11-18 23:29:12                    gc buffer busy acquire                          416
1 2022-11-18 23:29:12                    gc current block busy                            20
1 2022-11-18 23:29:12                    gc current request                              378
1 2022-11-18 23:29:12                    row cache lock                                   78
1 2022-11-18 23:29:22                    buffer busy waits                                58
1 2022-11-18 23:29:22                    db file scattered read                            2
1 2022-11-18 23:29:22                    enq: SQ - contention                            314
1 2022-11-18 23:29:22                    enq: TX - index contention                      150
1 2022-11-18 23:29:22                    gc buffer busy acquire                          416
1 2022-11-18 23:29:22                    gc current block 2-way                            2
1 2022-11-18 23:29:22                    gc current block busy                            20
1 2022-11-18 23:29:22                    gc current request                              378
1 2022-11-18 23:29:22                    row cache lock                                   78
1 2022-11-18 23:29:32                    buffer busy waits                                58
1 2022-11-18 23:29:32                    db file scattered read                            2
1 2022-11-18 23:29:32                    enq: SQ - contention                            316
1 2022-11-18 23:29:32                    enq: TX - index contention                      150
1 2022-11-18 23:29:32                    gc buffer busy acquire                          416
1 2022-11-18 23:29:32                    gc current block busy                            20
1 2022-11-18 23:29:32                    gc current request                              378
1 2022-11-18 23:29:32                    row cache lock                                   78
1 2022-11-18 23:29:43                    buffer busy waits                                58
1 2022-11-18 23:29:43                    db file scattered read                            2
1 2022-11-18 23:29:43                    enq: SQ - contention                            318
1 2022-11-18 23:29:43                    enq: TX - index contention                      150
1 2022-11-18 23:29:43                    gc buffer busy acquire                          418
1 2022-11-18 23:29:43                    gc cr multi block request                         2
1 2022-11-18 23:29:43                    gc current block busy                            26
1 2022-11-18 23:29:43                    gc current request                              380
1 2022-11-18 23:29:43                    row cache lock                                   78
1 2022-11-18 23:29:53                    buffer busy waits                                58
1 2022-11-18 23:29:53                    control file sequential read                      4
1 2022-11-18 23:29:53                    db file scattered read                            2
1 2022-11-18 23:29:53                    enq: SQ - contention                            318
1 2022-11-18 23:29:53                    enq: TX - index contention                      150
1 2022-11-18 23:29:53                    gc buffer busy acquire                          418
1 2022-11-18 23:29:53                    gc current block busy                            26
1 2022-11-18 23:29:53                    gc current request                              380
1 2022-11-18 23:29:53                    row cache lock                                   78
1 2022-11-18 23:30:04                    buffer busy waits                                58
1 2022-11-18 23:30:04                    db file scattered read                            2
1 2022-11-18 23:30:04                    enq: SQ - contention                            318
1 2022-11-18 23:30:04                    enq: TX - index contention                      150
1 2022-11-18 23:30:04                    gc buffer busy acquire                          418
1 2022-11-18 23:30:04                    gc current block busy                            26
1 2022-11-18 23:30:04                    gc current request                              378
1 2022-11-18 23:30:04                    row cache lock                                   78
1 2022-11-18 23:30:14                    buffer busy waits                                58
1 2022-11-18 23:30:14                    db file scattered read                            2
1 2022-11-18 23:30:14                    enq: SQ - contention                            318
1 2022-11-18 23:30:14                    enq: TX - index contention                      150
1 2022-11-18 23:30:14                    gc buffer busy acquire                          418
1 2022-11-18 23:30:14                    gc current block busy                            26
1 2022-11-18 23:30:14                    gc current request                              378
1 2022-11-18 23:30:14                    gc current retry                                  2
1 2022-11-18 23:30:14                    row cache lock                                   78
1 2022-11-18 23:30:24                    buffer busy waits                                58
1 2022-11-18 23:30:24                    db file scattered read                            2
1 2022-11-18 23:30:24                    enq: SQ - contention                            318
1 2022-11-18 23:30:24                    enq: TX - index contention                      150
1 2022-11-18 23:30:24                    gc buffer busy acquire                          418
1 2022-11-18 23:30:24                    gc current block busy                            26
1 2022-11-18 23:30:24                    gc current request                              378
1 2022-11-18 23:30:24                    gc current retry                                  2
1 2022-11-18 23:30:24                    row cache lock                                   78
1 2022-11-18 23:30:35                    buffer busy waits                                58
1 2022-11-18 23:30:35                    enq: SQ - contention                            318
1 2022-11-18 23:30:35                    enq: TX - index contention                      150
1 2022-11-18 23:30:35                    gc buffer busy acquire                          418
1 2022-11-18 23:30:35                    gc cr multi block request                         2
1 2022-11-18 23:30:35                    gc cr request                                     2
1 2022-11-18 23:30:35                    gc current block busy                            26
1 2022-11-18 23:30:35                    gc current request                              380
1 2022-11-18 23:30:35                    row cache lock                                   80
1 2022-11-18 23:30:45                    buffer busy waits                                58
1 2022-11-18 23:30:45                    db file scattered read                            2
1 2022-11-18 23:30:45                    enq: SQ - contention                            318
1 2022-11-18 23:30:45                    enq: TX - index contention                      150
1 2022-11-18 23:30:45                    gc buffer busy acquire                          418
1 2022-11-18 23:30:45                    gc cr block 2-way                                 2
1 2022-11-18 23:30:45                    gc current block 2-way                            2
1 2022-11-18 23:30:45                    gc current block busy                            26
1 2022-11-18 23:30:45                    gc current request                              380
1 2022-11-18 23:30:45                    row cache lock                                   80
1 2022-11-18 23:30:55                    buffer busy waits                                58
1 2022-11-18 23:30:55                    db file scattered read                            2
1 2022-11-18 23:30:55                    db file sequential read                           2
1 2022-11-18 23:30:55                    enq: SQ - contention                            318
1 2022-11-18 23:30:55                    enq: TX - index contention                      150
1 2022-11-18 23:30:55                    gc buffer busy acquire                          418
1 2022-11-18 23:30:55                    gc current block busy                            26
1 2022-11-18 23:30:55                    gc current request                              380
1 2022-11-18 23:30:55                    row cache lock                                   80
1 2022-11-18 23:31:06                    buffer busy waits                                58
1 2022-11-18 23:31:06                    db file parallel write                            2
1 2022-11-18 23:31:06                    db file scattered read                            2
1 2022-11-18 23:31:06                    enq: SQ - contention                            318
1 2022-11-18 23:31:06                    enq: TX - index contention                      150
1 2022-11-18 23:31:06                    gc buffer busy acquire                          418
1 2022-11-18 23:31:06                    gc current block busy                            26
1 2022-11-18 23:31:06                    gc current request                              378
1 2022-11-18 23:31:06                    gc current retry                                  2
1 2022-11-18 23:31:06                    row cache lock                                   80
1 2022-11-18 23:31:17                    buffer busy waits                                58
1 2022-11-18 23:31:17                    enq: SQ - contention                            318
1 2022-11-18 23:31:17                    enq: TX - index contention                      150
1 2022-11-18 23:31:17                    gc buffer busy acquire                          418
1 2022-11-18 23:31:17                    gc cr multi block request                         2
1 2022-11-18 23:31:17                    gc current block busy                            26
1 2022-11-18 23:31:17                    gc current request                              378
1 2022-11-18 23:31:17                    gc current retry                                  2
1 2022-11-18 23:31:17                    row cache lock                                   80
1 2022-11-18 23:31:27                    buffer busy waits                                58
1 2022-11-18 23:31:27                    db file scattered read                            2
1 2022-11-18 23:31:27                    enq: SQ - contention                            318
1 2022-11-18 23:31:27                    enq: TX - index contention                      150
1 2022-11-18 23:31:27                    gc buffer busy acquire                          418
1 2022-11-18 23:31:27                    gc current block busy                            26
1 2022-11-18 23:31:27                    gc current request                              378
1 2022-11-18 23:31:27                    row cache lock                                   80
2 2022-11-18 23:30:25                    buffer busy waits                               346
2 2022-11-18 23:30:25                    control file sequential read                      2
2 2022-11-18 23:30:25                    db file parallel write                            4
2 2022-11-18 23:30:25                    enq: FB - contention                              2
2 2022-11-18 23:30:25                    enq: SQ - contention                            186
2 2022-11-18 23:30:25                    enq: TX - index contention                        2
2 2022-11-18 23:30:25                    gc buffer busy release                           32
2 2022-11-18 23:30:25                    log file switch (checkpoint incomplete)        3560
2 2022-11-18 23:30:25                    row cache lock                                   14
2 2022-11-18 23:30:25                    write complete waits                             46
2 2022-11-18 23:30:35                    SQL*Net more data to client                       2
2 2022-11-18 23:30:35                    buffer busy waits                               346
2 2022-11-18 23:30:35                    control file sequential read                      2
2 2022-11-18 23:30:35                    db file parallel write                            4
2 2022-11-18 23:30:35                    enq: FB - contention                              2
2 2022-11-18 23:30:35                    enq: SQ - contention                            186
2 2022-11-18 23:30:35                    enq: TX - index contention                        2
2 2022-11-18 23:30:35                    gc buffer busy release                           34
2 2022-11-18 23:30:35                    log file switch (checkpoint incomplete)        3562
2 2022-11-18 23:30:35                    row cache lock                                   14
2 2022-11-18 23:30:35                    write complete waits                             46
2 2022-11-18 23:30:46                    buffer busy waits                               346
2 2022-11-18 23:30:46                    db file parallel write                            4
2 2022-11-18 23:30:46                    enq: FB - contention                              2
2 2022-11-18 23:30:46                    enq: SQ - contention                            186
2 2022-11-18 23:30:46                    enq: TX - index contention                        2
2 2022-11-18 23:30:46                    gc buffer busy release                           34
2 2022-11-18 23:30:46                    log file switch (checkpoint incomplete)        3572
2 2022-11-18 23:30:46                    row cache lock                                   14
2 2022-11-18 23:30:46                    write complete waits                             46
2 2022-11-18 23:30:57                    CSS operation: action                             2
2 2022-11-18 23:30:57                    buffer busy waits                               346
2 2022-11-18 23:30:57                    control file sequential read                      2
2 2022-11-18 23:30:57                    db file parallel write                            4
2 2022-11-18 23:30:57                    enq: FB - contention                              2
2 2022-11-18 23:30:57                    enq: SQ - contention                            186
2 2022-11-18 23:30:57                    enq: TX - index contention                        2
2 2022-11-18 23:30:57                    gc buffer busy release                           32
2 2022-11-18 23:30:57                    latch: redo writing                               4
2 2022-11-18 23:30:57                    log file switch (checkpoint incomplete)        3578
2 2022-11-18 23:30:57                    row cache lock                                   14
2 2022-11-18 23:30:57                    write complete waits                             50
2 2022-11-18 23:31:07                    buffer busy waits                               346
2 2022-11-18 23:31:07                    db file parallel write                            2
2 2022-11-18 23:31:07                    enq: FB - contention                              2
2 2022-11-18 23:31:07                    enq: SQ - contention                            188
2 2022-11-18 23:31:07                    enq: TX - index contention                        2
2 2022-11-18 23:31:07                    gc buffer busy release                           32
2 2022-11-18 23:31:07                    log file switch (checkpoint incomplete)        3584
2 2022-11-18 23:31:07                    row cache lock                                   14
2 2022-11-18 23:31:07                    write complete waits                             50
2 2022-11-18 23:31:18                    buffer busy waits                               346
2 2022-11-18 23:31:18                    control file sequential read                      2
2 2022-11-18 23:31:18                    db file parallel write                            4
2 2022-11-18 23:31:18                    enq: FB - contention                              2
2 2022-11-18 23:31:18                    enq: SQ - contention                            192
2 2022-11-18 23:31:18                    enq: TX - index contention                        2
2 2022-11-18 23:31:18                    gc buffer busy release                           32
2 2022-11-18 23:31:18                    log file switch (checkpoint incomplete)        3586
2 2022-11-18 23:31:18                    row cache lock                                   14
2 2022-11-18 23:31:18                    write complete waits                             50
2 2022-11-18 23:31:29                    buffer busy waits                               346
2 2022-11-18 23:31:29                    db file parallel write                            4
2 2022-11-18 23:31:29                    enq: FB - contention                              2
2 2022-11-18 23:31:29                    enq: SQ - contention                            192
2 2022-11-18 23:31:29                    enq: TX - index contention                        2
2 2022-11-18 23:31:29                    gc buffer busy release                           32
2 2022-11-18 23:31:29                    log file switch (checkpoint incomplete)        3586
2 2022-11-18 23:31:29                    row cache lock                                   14
2 2022-11-18 23:31:29                    write complete waits                             50
254 rows selected.

从上面的查询接过来有如下几个疑点:

1、节点1实例从23:27:59开始突然出现大量gc等待,以及enq: SQ – contention,包括buffer busy waits等。

2、23:23 — 23:27:59之间,节点1 无任何相关等待记录

3、用户反馈的日志切换无法完成即log file switch(checkpoint incomplete)在节点2的上是突然出现的,时间点是23:30:25

4、节点2在23:30之前到23:23分之间这7分钟,没有任何性能数据记录。

从上述等待的情况和规律,我们可以大致得到如下几个基本结论:

log file switch等待属于叠加现象,并非根本原因,因为在此之前系统已经运行极为缓慢了,其次节点1在27分突然出现大量gc等待,可排除SQL本身性能影响,如执行计划突然改变导致性能变差,进而产生大量gc,引发该问题。尽管如此,这里我仍然查了一下gc top SQL:

SQL> SQL>   select a.instance_number,
2         TO_CHAR(a.sample_time, 'yyyy-mm-dd hh24:mi:ss'),
b.event_name,
a.sql_id,
count(1)
from WRH$_ACTIVE_SESSION_HISTORY a,WRH$_EVENT_NAME b
where  a.dbid=3866720943
and a.event_id=b.event_id
3    4    5    6    7    8    9     and TO_CHAR(a.sample_time, 'yyyy-mm-dd hh24:mi:ss') = '2022-11-18 23:27:59'
10     ---and TO_CHAR(a.sample_time, 'yyyy-mm-dd hh24:mi:ss') < '2022-11-18 23:31:30'
11     and b.WAIT_CLASS<> 'Idle'
12     and b.event_name like 'gc%'
13     ---and a.INSTANCE_NUMBER=1
14   group by a.instance_number,
15            TO_CHAR(a.sample_time, 'yyyy-mm-dd hh24:mi:ss'),
b.event_name,
16   17            a.sql_id
order by 1, 2; 18
INSTANCE_NUMBER TO_CHAR(A.SAMPLE_TIME,'YYYY-MM-DDHH24: EVENT_NAME                               SQL_ID                       COUNT(1)
--------------- -------------------------------------- ---------------------------------------- -------------------------- ----------
1 2022-11-18 23:27:59                    gc buffer busy acquire                   070x0h38a9wdj                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   14tfmqd4n7nm5                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   161cd6mnw8ng6                       6
1 2022-11-18 23:27:59                    gc buffer busy acquire                   311un5s5ug5yz                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   41uxzsq73sqya                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   42ck1jx53qubd                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   44xyf34792k2d                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   575n36005r5u3                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   5m068a02bdzm3                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   66cg7zfgzgkc2                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   6pyp8r5zvk2fp                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   6tzsyrszbuwj4                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   79wbq83vc3ty4                     112
1 2022-11-18 23:27:59                    gc buffer busy acquire                   7wrp7q9027usg                      46
1 2022-11-18 23:27:59                    gc buffer busy acquire                   7y9dhybsfd028                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   8hpztpr1vx9gp                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   97khkmrn4kp09                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   atuz9t4aqvdmm                     156
1 2022-11-18 23:27:59                    gc buffer busy acquire                   azr364vxnbwm2                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   cd3dxusq5djgp                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   cmnp0rx2k4yz3                       6
1 2022-11-18 23:27:59                    gc buffer busy acquire                   d1sh0yu458vfr                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   d93suk0kjny04                       6
1 2022-11-18 23:27:59                    gc buffer busy acquire                   dpsxw3rkuvkqq                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   f88a91rzaaj5t                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   fdabpgn8h14ba                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   fjqnh3937fk06                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                   fzcvkj0mp6wmc                       4
1 2022-11-18 23:27:59                    gc buffer busy acquire                   gh6t7aztdn8qu                       2
1 2022-11-18 23:27:59                    gc buffer busy acquire                                                       2
1 2022-11-18 23:27:59                    gc current block busy                    5m068a02bdzm3                       4
1 2022-11-18 23:27:59                    gc current request                       00pkrmq6sxw9r                       2
1 2022-11-18 23:27:59                    gc current request                       038d01h60v2zt                       2
1 2022-11-18 23:27:59                    gc current request                       062j8smyzkhwh                       2
1 2022-11-18 23:27:59                    gc current request                       070x0h38a9wdj                       2
1 2022-11-18 23:27:59                    gc current request                       09xfwud5jxbw8                       2
1 2022-11-18 23:27:59                    gc current request                       0dcjdxx0n8cw6                       2
1 2022-11-18 23:27:59                    gc current request                       0fkgdvsz385af                       2
1 2022-11-18 23:27:59                    gc current request                       161cd6mnw8ng6                       4
1 2022-11-18 23:27:59                    gc current request                       17n64vj7dcbyt                       6
1 2022-11-18 23:27:59                    gc current request                       18ard9mx5qvng                       2
1 2022-11-18 23:27:59                    gc current request                       1abtyr0fgr9df                       2
1 2022-11-18 23:27:59                    gc current request                       1qqvtj9u6kyuq                       4
1 2022-11-18 23:27:59                    gc current request                       1qyp832vkzmdy                       2
1 2022-11-18 23:27:59                    gc current request                       1vukxmd490wqd                       2
1 2022-11-18 23:27:59                    gc current request                       2czdn0wnnpjg2                       2
1 2022-11-18 23:27:59                    gc current request                       2gw2xmvr7vtv1                       2
1 2022-11-18 23:27:59                    gc current request                       3fgygmw80svq4                       2
1 2022-11-18 23:27:59                    gc current request                       3gh06zjqtdu2m                       8
1 2022-11-18 23:27:59                    gc current request                       3nht95b2va65x                       6
1 2022-11-18 23:27:59                    gc current request                       3uz6p5nqf6304                       4
1 2022-11-18 23:27:59                    gc current request                       3z3fgx3y6k3gv                       2
1 2022-11-18 23:27:59                    gc current request                       44xyf34792k2d                       4
1 2022-11-18 23:27:59                    gc current request                       4hzdv022v5cp6                       6
1 2022-11-18 23:27:59                    gc current request                       4ncqpqyqhzw7m                       2
1 2022-11-18 23:27:59                    gc current request                       53cz8x8znkmcv                       6
1 2022-11-18 23:27:59                    gc current request                       5amw15qxajduk                       2
1 2022-11-18 23:27:59                    gc current request                       5b4hjw5y8c3pn                       6
1 2022-11-18 23:27:59                    gc current request                       5ktnsgz6zt67h                       2
1 2022-11-18 23:27:59                    gc current request                       5m068a02bdzm3                       8
1 2022-11-18 23:27:59                    gc current request                       65pm9rq1x691x                       2
1 2022-11-18 23:27:59                    gc current request                       667z1rnng9pub                       2
1 2022-11-18 23:27:59                    gc current request                       66cg7zfgzgkc2                       6
1 2022-11-18 23:27:59                    gc current request                       6d7wmntzx7x5b                       2
1 2022-11-18 23:27:59                    gc current request                       6tfq2fcm1rvqy                       2
1 2022-11-18 23:27:59                    gc current request                       6ttzmtxpcx7tw                       2
1 2022-11-18 23:27:59                    gc current request                       6tzsyrszbuwj4                      20
1 2022-11-18 23:27:59                    gc current request                       737yqgwjnvxxf                       6
1 2022-11-18 23:27:59                    gc current request                       79wbq83vc3ty4                       8
1 2022-11-18 23:27:59                    gc current request                       7c2tgwdf07av1                       4
1 2022-11-18 23:27:59                    gc current request                       7fths5kjcw7za                       2
1 2022-11-18 23:27:59                    gc current request                       7jb8htya1uumm                       4
1 2022-11-18 23:27:59                    gc current request                       7rh4jsd9pt3tn                       6
1 2022-11-18 23:27:59                    gc current request                       7wrp7q9027usg                      10
1 2022-11-18 23:27:59                    gc current request                       7y9dhybsfd028                       4
1 2022-11-18 23:27:59                    gc current request                       86369w0pb0rhc                       2
1 2022-11-18 23:27:59                    gc current request                       8hpztpr1vx9gp                      16
1 2022-11-18 23:27:59                    gc current request                       8jbd7zcqr8kzg                       8
1 2022-11-18 23:27:59                    gc current request                       8uz1jman9rqf2                       4
1 2022-11-18 23:27:59                    gc current request                       94fr6pbh6yd6q                       4
1 2022-11-18 23:27:59                    gc current request                       94z0rmj029ad9                       4
1 2022-11-18 23:27:59                    gc current request                       97khkmrn4kp09                      10
1 2022-11-18 23:27:59                    gc current request                       9cbxmhp0r4jqg                       6
1 2022-11-18 23:27:59                    gc current request                       9xdd22c1guxsj                       2
1 2022-11-18 23:27:59                    gc current request                       avph9wxh79p42                       4
1 2022-11-18 23:27:59                    gc current request                       aw43z1x20qp6w                       2
1 2022-11-18 23:27:59                    gc current request                       azr364vxnbwm2                      20
1 2022-11-18 23:27:59                    gc current request                       b36hnuv6fzt97                       4
1 2022-11-18 23:27:59                    gc current request                       cc6dknr8grn7s                       4
1 2022-11-18 23:27:59                    gc current request                       chrhs04u5yd9p                       6
1 2022-11-18 23:27:59                    gc current request                       cmnp0rx2k4yz3                       2
1 2022-11-18 23:27:59                    gc current request                       cz5q8x68gg32j                      12
1 2022-11-18 23:27:59                    gc current request                       d0z8qt8nas9vu                       2
1 2022-11-18 23:27:59                    gc current request                       d93suk0kjny04                      20
1 2022-11-18 23:27:59                    gc current request                       f7f8g58wgmjpf                       2
1 2022-11-18 23:27:59                    gc current request                       f88a91rzaaj5t                      10
1 2022-11-18 23:27:59                    gc current request                       f94mgcvs2as5b                       2
1 2022-11-18 23:27:59                    gc current request                       fav0q2jkywa7v                       2
1 2022-11-18 23:27:59                    gc current request                       fdabpgn8h14ba                      10
1 2022-11-18 23:27:59                    gc current request                       fjqnh3937fk06                       2
1 2022-11-18 23:27:59                    gc current request                       fvdyxajud739z                       8
1 2022-11-18 23:27:59                    gc current request                       fzcvkj0mp6wmc                      22
1 2022-11-18 23:27:59                    gc current request                       g97hgk75m2cd0                       8
1 2022-11-18 23:27:59                    gc current request                       gnmuju4p9bumr                       2

对于上述几个top SQL的 gc等待,让客户查询了相关执行计划,plan hash value并没有改变过,跟ash数据表现一致。

到这里就比较有意思了,既然并非SQL本身性能问题,为什么会突然出现400个gc 等待呢?(因为如果是SQL性能问题,通常是逐步叠加等待,并非突然一下就会涌现400个出来)。

分析到这里,我们认为有几个可能性:

1、在23:27分这个时间点,集群心跳有问题,导致gc 操作受到较大影响

2、数据库实例节点1,或者节点2 在23:27:59之前出现了异常,比如核心进程hung,实例hung,挂起等状态,进而影响数据库gc

3、数据库实例1,或者实例2 在23:27:59之前cpu资源异常,如系统cpu 100%等情况,导致lgwr,lms等进程无法获取cpu资源,进而导致该问题。

对于第一个可能性,让客户检查了zabbix监控,并没有发现异常,只是从监控来看,心跳流量较高,高峰期维持在60-70Mb/s,对于比较老旧的小型机环境来讲,相对有一些偏高,但还不不足以产生这么大的影响。

同样监控也发现cpu 资源相对是正常的,因此我们可以排除第1,3 两种可能性原因。

带着这种疑问和猜测,远程了客户环境,检查了节点2 trace目录,发现在当天晚上确实产生了一个diag dump,找了如下内容:

*** 2022-11-18 23:25:09.055
HM: Short Stack of immediate waiter session ID 12642, OSPID 16261 of hang ID 67
*** 2022-11-18 23:25:09.191
Short stack dump:
ksedsts()+592<-ksdxfstk()+48<-ksdxcb()+3376<-sspuser()+688<-<kernel><-_pw_wait()+48<-pw_wait()+112<-sskgpwwait()+432<-skgpwwait()+336<-ksliwat()+3552<-kslwaitctx()+304<-kcrfws()+864<-kcbchg1_main()+26976<-kcbchg1()+352<-ktubgn()+2224<-knledir()+1424<-knleEva()+160<-evaopn2()+1792<-qesltcEvalOutofLineCols()+1568<-qesltcBeforeRowProcessing()+1184<-qerltcNoKdtBufferedInsRowCBK()+464<-qerltcSingleRowLoad()+640<-qerltcFetch()+720<-insexe()+1600<-opiexe()+16336<-kpoal8()+4832<-opiodr()+1920<-ttcpip()+1744<-opitsk()+2992<-opiino()+1696<-opiodr()+1920<-opidrv()+1616<-sou2o()+256<-opimai_real()+624<-ssthrdmain()+608<-main()+336<-main_opd_entry()+80
HM: Short Stack of root session ID 4510, OSPID 3151 of hang ID 67
Short stack dump:
ksedsts()+592<-ksdxfstk()+48<-ksdxcb()+3376<-sspuser()+688<-<kernel><-_read_sys()+48<-_read()+224<-$cold_skgfqio()+880<-ksfd_skgfqio()+416<-ksfd_io()+1184<-ksfdread()+992<-kfk_ufs_sync_io()+1568<-kfk_submit_ufs_io()+416<-kfk_submit_io()+224<-kfk_io1()+928<-kfkRequest()+48<-kfk_transitIO()+2608<-kfioSubmitIO()+3248<-kfioRequestPriv()+208<-kfioRequest()+880<-ksfd_kfioRequest()+800<-ksfd_osmio()+624<-$cold_ksfd_io()+464<-ksfdread()+992<-kccrbp()+736<-kccbmp_get()+608<-kccsed_rbl()+432<-kccocx()+2016<-kcc_begin_txn_internal()+272<-kcrfwl()+544<-ksbabs()+1520<-ksbrdp()+2832<-opirip()+1312<-opidrv()+1152<-sou2o()+256<-opimai_real()+352<-ssthrdmain()+608<-main()+336<-main_opd_entry()+80

可以看到在23:25:09分,显示节点2 的Session 4510 会话出于hang的状态。那么该会话是什么呢?

Blocking Sid (Inst) % Activity Event Caused % Event User Program # Samples Active XIDs
4510, 1( 2) 57.91 log file switch (checkpoint incomplete) 54.11 SYS oracle@fdcdb2 (LGWR) 41/120 [ 34%] 0
log file sync 3.77
6953, 707( 2) 2.19 enq: SQ – contention 2.19 FDCMGR JDBC Thin Client 7/120 [ 6%] 0
3771,39737( 2) 1.71 buffer busy waits 1.71 FDCMGR JDBC Thin Client 7/120 [ 6%] 0

从ASH 报告我们可以得知4510 session 就是我们的LGWR 进程。可见在23:25:09  节点2 lgwr进程就出现了异常。

那么这就能解释通了。。。

我们继续带着疑问分析这个问题,因为过一段时间就自动恢复了,我始终怀疑是应用可能有些不正常,因此让客户检查了之前提到的2个TOP gc 等待的SQL,发现该sql在最近一周的执行计划,分别是13号,16号,18号。 其中这3个时间点都是业务出问题的时间点。唯一不同的是:

13号sql执行次数是9w次,16号是15w次,18号是17w次。根据用户的反馈,13号和16号也出现了类似问题,但是影响要比18号要小,尤其是13号,影响比较短暂。

结合实际情况和现象来看,我认为问题本身还是出现了这个业务模块上,可能是非固定时间执行,当业务量过大时,进而产生了大量gc,最终出现了该问题;实际上从统计来看,随着执行次数增加,sql的单次平均执行时间逐渐拉长。

最后,不得不说,对于这个比较大型的业务系统来讲,单个节点Session在3800-4000左右,每秒redo 在30M/s(单节点).  这么古老的小型机改换了。。。。

本以为故事即将终结,今天客户反馈早上又出现了该问题,虽然影响不大,仍然需要再次分析,从用户提供的diag trace文件来看,跟之前的问题看上去有一些不同:

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (xxxxx.xxxxx1)
os id: 5621
process id: 3840, oracle@xxxxx1
session id: 17
session serial #: 62917
}
is waiting for 'log file switch (checkpoint incomplete)' with wait info:
{
time in wait: 17.585836 sec (last interval)
time in wait: 44.028985 sec (total)
timeout after: never
wait id: 13
blocking: 0 sessions
wait history:
* time between current wait and wait #1: 0.000000 sec
1.       event: 'latch: redo writing'
time waited: 0.008565 sec
wait id: 19              p1: 'address'=0xc000000087dfeb48
p2: 'number'=0xcf
p3: 'tries'=0x0
* time between wait #1 and #2: 0.000000 sec
2.       event: 'log file switch (checkpoint incomplete)'
time waited: 3.465187 sec (last interval)
time waited: 26.434584 sec (total)
wait id: 13
* time between wait #2 and #3: 0.000000 sec
3.       event: 'latch: redo writing'
time waited: 0.008045 sec
wait id: 18              p1: 'address'=0xc000000087dfeb48
p2: 'number'=0xcf
p3: 'tries'=0x0
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (xxxxx.xxxxx1)
os id: 26277
process id: 20, oracle@xxxxx1 (LGWR)
session id: 4510
session serial #: 1
}
which is waiting for 'control file sequential read' with wait info:
{
p1: 'file#'=0x1
p2: 'block#'=0x1
p3: 'blocks'=0x1
time in wait: 0.000000 sec
timeout after: never
wait id: 2047504176
blocking: 1249 sessions
wait history:
* time between current wait and wait #1: 0.000013 sec
1.       event: 'control file sequential read'
time waited: 0.000157 sec
wait id: 2047504175      p1: 'file#'=0x0
p2: 'block#'=0x1
p3: 'blocks'=0x1
* time between wait #1 and #2: 0.000074 sec
2.       event: 'KSV master wait'
time waited: 0.000013 sec
wait id: 2047504174
* time between wait #2 and #3: 0.000034 sec
3.       event: 'control file sequential read'
time waited: 0.000242 sec
wait id: 2047504173      p1: 'file#'=0x0
p2: 'block#'=0x2e
p3: 'blocks'=0x1
}
Chain 1 Signature: 'control file sequential read'<='log file switch (checkpoint incomplete)'
Chain 1 Signature Hash: 0xf23230fd

从hung 分析结果来看,所有会话几乎均被lgwr进程阻塞,被阻塞会话超过1200个。其中lgwr在等待control file sequential read。似乎有一些奇怪?

由于是同一个trace文件,可以看到上次问题也就18号故障时间点的情况:

*** 2022-11-18 23:24:27.017
===============================================================================
END OF HANG ANALYSIS
===============================================================================
*** 2022-11-18 23:25:08.510
Suspected Hangs in the System
Root       Chain Total               Hang
Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution
ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action
----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
67 HANG    VALID    2  4510     4  1446   HIGH GLOBAL Terminate Process
inst# SessId  Ser#     OSPID Event
----- ------ ----- --------- -----
1     54 16663      9690 enq: TX - index contention
1   6946 47773      9701 gc current request
2  12642 11983     16261 log file switch (checkpoint incomplete)
2   4510     1      3151 control file sequential read
......
*** 2022-11-18 23:25:34.711
Suspected Hangs in the System
Root       Chain Total               Hang
Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution
ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action
----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
74 HANG    VALID    2  4510     4  2402   HIGH GLOBAL Terminate Process
inst# SessId  Ser#     OSPID Event
----- ------ ----- --------- -----
1     11 15447     18842 enq: TX - index contention
1   4763 31315     10670 gc current request
2  11642    39     17120 log file switch (checkpoint incomplete)
2   4510     1      3151 not in wait
*** 2022-11-18 23:25:41.065
HM: Matched Hang ID=71 first occurred 11/18/2022 23:25:07, matched 1 times
2/345/44037/latch: redo writing <- 2/228/36601/buffer busy waits
*** 2022-11-18 23:25:41.065
HM: Matched Hang ID=72 first occurred 11/18/2022 23:25:07, matched 1 times
2/17224/35815/latch: redo writing <- 2/12442/46953/buffer busy waits

可以看到lgwr进程(session 4510) 也是在等待control file sequential read或者就是not in wait。可见本质上这几次问题是一样的。

此时怀疑检查点方面可能出现了问题,导致长时间没有触发,进一步分析之前的awr发现部分数据确实有一些异常:

physical writes from cache	20,738,580	5,733.14	6.66
physical writes non checkpoint	17,877,771	4,942.28	5.74

其中physical writes from cache  – physical writes not checkpoint 的部分就是Oracle 增量检查点触发的dbwr physical writes操作。由此可见,增量检查点触发的写比例非常低,也该也就15%。 换句话讲其他85%可能就是db cache自身写脏等操作触发的。

Oracle从 8i引入了增量检查点,10g 版本引入了自动调节增量检查点机制,主要有如下几个相关参数:

NAME                                               VALUE                DESCRIB
-------------------------------------------------- -------------------- ------------------------------------------------------------
_selftune_checkpoint_write_pct                     3                    Percentage of total physical i/os for self-tune ckpt
_log_checkpoint_recovery_check                     0                    # redo blocks to verify after checkpoint
log_checkpoint_interval                            0                    # redo blocks checkpoint threshold
log_checkpoint_timeout                             1800                 Maximum time interval between checkpoints in seconds
_gc_global_checkpoint_scn                          TRUE                 if TRUE, enable global checkpoint scn
_disable_incremental_checkpoints                   FALSE                Disable incremental checkpoints for thread recovery
_disable_selftune_checkpointing                    FALSE                Disable self-tune checkpointing
_selftune_checkpointing_lag                        300                  Self-tune checkpointing lag the tail of the redo log
log_checkpoints_to_alert                           FALSE                log checkpoint begin/end to alert file
_kdli_checkpoint_flush                             FALSE                do not invalidate cache buffers after write

Oracle 10g版本开始,默认启用了增量检查点自动调节机制。除了上述跟dbwr写相关的参数之外,还有跟redo log size相关的参数:

NAME                                               VALUE                DESCRIB
-------------------------------------------------- -------------------- ------------------------------------------------------------
_target_rba_max_lag_percentage                     81                   target rba max log lag percentage

该参数在11.2版本中默认值为81%,在10g版本中为90%。 该参数大致意思指最大限制情况下,当所有redo log大小写满到其81%的情况下才触发检查点。从mos上文章也看到了一些描述,某些特定情况下增量检查点机制是有待完善的,与cpu_count,log_buffer大小,redo log size,以及mttr有关。

对于用户这个问题,我们建议用户通过如下2个方式调整之后进行观察:

1、log_buffer 调整为300m

2、fast_start_mttr_target 设置为600秒

 


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注