案例:troubleshooting ORA-04030 in MGA on 19c

本案例来自东区某客户,db版本为19.12,业务应用程序频繁报ORA-04030导致业务中断。

[TOC00000]
Jump to table of contents
Dump continued from file: /oracle/app/oracle/diag/rdbms/hsuatdb/hsuatdb1/trace/hsuatdb1_ora_208220.trc
[TOC00001]
ORA-04030: out of process memory when trying to allocate 538800 bytes (KSIPC Top Loca,ksipc pga chnk)

[TOC00001-END]

继续分析trace查看报错的heap

[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
Dumping MGA handle on ORA-4030 failure.
Alloc chunk = ksipc pga chnk, req size = 538800, min size = 538800
----------------------------------------------------
ksm_mga_dump_stats : dumping type stats
----------------------------------------------------
  Type        = Default
  namespaces  = 2
  total segs  = 10
  total size  = 2684354560
   1. pagesize(    4k) = 2684354560
----------------------------------------------------
ksm_mga_dump_stats : dumping namespace type stats
----------------------------------------------------
 Dumping stats for KSIPC_MGA_NAMESPACE
  namespaces  = 2
  total segs  = 10
  total size  = 2684354560
   1. pagesize(    4k) = 2684354560
ksm_mga_dump_stats : dumped 1 namespace type stats
----------------------------------------------------
ksm_mga_dump_stats : dumping namespace stats
----------------------------------------------------
 Dumping stats for KSIPC_MGA_NMSPC_1_0
  namespaces  = 1
  total segs  = 10
  total size  = 2684354560
   1. pagesize(    4k) = 2684354560
--------------------------------------------------
Dumping name space heap : [ KSIPC_MGA_NMSPC_1_0 ] [ sctx = 0x1caff50c0 ]
--------------------------------------------------
heap [0] : heapds = 0x1868e3008 : heaplt = 0x1caff5330 : ltobj = 0x1caff53d0
******************************************************

报错与MGA有关,MGA是12的新特性,mos文档MGA (Managed Global Area) Reference Note (Doc ID 2638904.1)中有比较详细的描述。本文就不作过多的介绍。

About MGA (Managed Global Area):

The MGA allows a “smaller set” of processes (or even all processes) to share an address space for the duration (typically a query).  The MGA is made up of namespaces which contain segments and heaps with space management either directly or through a heap manager (KGH).

Processes can attach to their namespace for the duration they require.  For e.g., in a parallel query environment, PQs participating with the QC can share the namespace from QC to share the results.  Once done, they will detach from the same and the namespace can be tiered down.

The MGA is allocated dynamically which gives us more flexibility to create, control and share.  The sharing is dynamic in the sense that processes coordinate before sharing.  So in that sense, MGA sits between the SGA and PGA.  In addition, since the MGA is a shared memory area, a latch (latch: MGA) is used to control access and protect it.

Difference between MGA/PGA/SGA:

The MGA is not the SGA or PGA.  The PGA is private and the SGA is completely shared by all processes and is not elastic.  The MGA is elastic (processes can create and drop their MGA segments dynamically), shared between a set of processes or all processes, and counted under the PGA target/limit values (in v$PGASTAT).  The SGA is typically created once while the MGA is created on demand.

There can be multiple MGAs in an instance while we have just one SGA.

 

MGA Clients and Use Cases:

The MGA is a general feature which has multiple consumers and the benefit is based on the consumers.  MGA consumers are:  IPC, MGA hash joinIMCDT and PQ in current releases.  PQ is not a client by default.  When a query uses features like MGA hash join or IMCDT, PQs internally attach to the namespace.  (In other words, MGA hash join and IMCDT are actually executed by PQs with MGA being shared.)

从trace中可以看到总共2个namespace,总共分配了10个mga segments,对于MGA内存可以看到并没有使用大页,仍然使用的4k页面,关于MGA的segment大小与是否使用大页都有参数来控制

SQL> @sp mga

-- show parameter by sp

-- show hidden parameter by sp
old   3: where x.indx=y.indx and ksppinm like '_%&p%'
new   3: where x.indx=y.indx and ksppinm like '_%mga%'

NAME                                     VALUE      DESC
---------------------------------------- ---------- ------------------------------------------------------------------------------------------
_ksipc_mga_segment_size                  268435456  KSIPC MGA Segment Size
_use_large_pages_for_mga                 FALSE      MGA largepage enabled
_use_fallocate_for_mga                   FALSE      MGA fallocate enabled
_mga_large_page_path                                large page path

可以看到每个mga segment大小都是一致的,都是256M,由参数_ksipc_mga_segment_size控制,参数_use_large_pages_for_mga控制MGA是否使用大页,默认是关闭的。这10个mga segment都能从trace的process map中看到,位于/dev/shm下。

400000000000-400010000000 rw-s 00000000 00:13 1819687176                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_0.dat
400010000000-400020000000 rw-s 00000000 00:13 1819710049                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_1.dat
400020000000-400030000000 rw-s 00000000 00:13 1819876463                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_2.dat
400030000000-400040000000 rw-s 00000000 00:13 1820690206                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_3.dat
400040000000-400050000000 rw-s 00000000 00:13 1824880267                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_4.dat
400050000000-400060000000 rw-s 00000000 00:13 1834118750                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_5.dat
400060000000-400070000000 rw-s 00000000 00:13 1939490055                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_6.dat
400070000000-400080000000 rw-s 00000000 00:13 2076262097                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_7.dat
400080000000-400090000000 rw-s 00000000 00:13 2096523313                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_8.dat
400090000000-4000a0000000 rw-s 00000000 00:13 2117674979                 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_9.dat

从heapdump上看,基本TOP heap都是MGA。

Heapdump Analyzer v1.03        by Tanel Poder(https://blog.tanelpoder.com
Total size #Chunks        Chunk_size,         _heap,        Chunk_type, Alloc_reason
1267680024   1209        1048536 ,     KSIPC MGA NMSP,    recreate,    KSIPC Top Loca
1234277528   1177        1048664        KSIPC MGA NMSP,    freeable,    KSIPC Top Loca
262934400     488        538800         KSIPC Top Loca,    freeable,    ksipc pga chnk
233373448     457        510664         KSIPC Top Loca,        perm,
...
...

官方文档中介绍如果启用了MGA,那么PGA_AGGREGATE_LIMIT的大小需要考虑到processes,大概的公式为:

PGA_AGGREGATE_LIMIT = (original PGA_AGGREGATE_LIMIT value) + ((maximum number of connected processes) * 4M)

继续分析trace,发现MGA大概分配到PGA_AGGREGATE_LIMIT的20%时,就会报出ORA-04030

going to retry allocation
ksipc_memallocb: retry allocation on alt nmspc(dflt) 0 pref 0
KSIPC MGA segment precreate succeeded: pga_aggr_limit 16106127360 IPC limit:40% curr allocation:2684354560 new allocation 2952790016 segsize 268435456 hdl 0x7ffcbfae3bf8

MGA heap extent alloc failed : req = 8388552, min = 8388552
Actual req size = 8388552, min size = 8388552, sctx = 0x1caff50c0, segsz = 268435456

发现匹配BUG:32521805 – ORA-4030 IN MGA, MGA 20% OF PGA_AGGREGATE_LIMIT,从bug文档中并没有找到该bug的fixed版本。解决方法有两种都比较消极:

  • 加大PGA_AGGREGATE_LIMIT
  • 禁用MGA(“_ksipc_service_mask”=0)

评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注