Some Cases

Oracle  ASM Recover Case:

Tue Mar 04 18:09:59 CST 2014
Errors in file /home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_lgwr_15943.trc:
ORA-00345: redo log write error block 68145 count 5
ORA-00312: online log 6 thread 2: '+DATA/xxxx/onlinelog/o2_t2_redo3.log'
ORA-15078: ASM diskgroup was forcibly dismounted
Tue Mar 04 18:09:59 CST 2014
SUCCESS: diskgroup DATA was dismounted
SUCCESS: diskgroup DATA was dismounted
Tue Mar 04 18:10:00 CST 2014
Errors in file /home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_lmon_15892.trc:
ORA-00202: control file: '+DATA/xxxx/controlfile/o1_mf_4g1zr1yo_.ctl'
ORA-15078: ASM diskgroup was forcibly dismounted
Tue Mar 04 18:10:00 CST 2014
KCF: write/open error block=0x1f41e online=1
 file=31 +DATA/xxxx/datafile/apps_ts_queues.310.692585175
 error=15078 txt: ''
Tue Mar 04 18:10:00 CST 2014
KCF: write/open error block=0x47d5d online=1
 file=51 +DATA/xxx/datafile/apps_ts_tx_data.353.692593409
 error=15078 txt: ''
Tue Mar 04 18:10:00 CST 2014
Errors in file /home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_dbw2_15939.trc:
ORA-00202: control file: '+DATA/prod/controlfile/o1_mf_4g1zr1yo_.ctl'
ORA-15078: ASM diskgroup was forcibly dismounted
Tue Mar 04 18:10:00 CST 2014
KCF: write/open error block=0x47d5b online=1
 file=51 +DATA/prod/datafile/apps_ts_tx_data.353.692593409
 error=15078 txt: ''
Tue Mar 04 18:10:00 CST 2014

asm instance alert log as fellow:

NOTE: SMON starting instance recovery for group 1 (mounted)
Tue Mar 04 18:10:04 CST 2014
WARNING: IO Failed.  au:0 diskname:/dev/raw/raw5
 rq:0x200000000207b518 buffer:0x200000000235c600 au_offset(bytes):0 iosz:4096 operation:0
 status:2
WARNING: IO Failed.  au:0 diskname:/dev/raw/raw5
 rq:0x200000000207b518 buffer:0x200000000235c600 au_offset(bytes):0 iosz:4096 operation:0
 status:2
NOTE: F1X0 found on disk 0 fcn 0.160230519
WARNING: IO Failed.  au:33 diskname:/dev/raw/raw5
 rq:0x60000000002d64f0 buffer:0x400405df000 au_offset(bytes):0 iosz:4096 operation:0
 status:2
WARNING: cache failed to read gn 1 fn 3 blk 10752 count 1 from disk 2
ERROR: cache failed to read fn=3  blk=10752 from disk(s): 2
ORA-15081: failed to submit an I/O operation to a disk
NOTE: cache initiating offline of disk 2  group 1
WARNING: process 12863 initiating offline of disk 2.2526420198 (DATA_0002) with mask 0x3 in group 1
NOTE: PST update: grp = 1, dsk = 2, mode = 0x6
Tue Mar 04 18:10:04 CST 2014
ERROR: too many offline disks in PST (grp 1)
Tue Mar 04 18:10:04 CST 2014
ERROR: PST-initiated MANDATORY DISMOUNT of group DATA
Tue Mar 04 18:10:04 CST 2014
WARNING: Disk 2 in group 1 in mode: 0x7,state: 0x2 was taken offline
Tue Mar 04 18:10:05 CST 2014
NOTE: halting all I/Os to diskgroup DATA
NOTE: active pin found: 0x0x40045bb0fd0
Tue Mar 04 18:10:05 CST 2014
Abort recovery for domain 1
Tue Mar 04 18:10:05 CST 2014
NOTE: cache dismounting group 1/0xD916EC16 (DATA)
Tue Mar 04 18:10:06 CST 2014

Oracle kfed read disk:

WARNING: cache read a corrupted block gn=1 dsk=2 blk=1 from disk 2
OSM metadata block dump:
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
 CE: (0x0x400417ee4e0)  group=1 (DATA) obj=2 (disk)  blk=1
 hashFlags=0x0002  lid=0x0002  lruFlags=0x0000  bastCount=1
 redundancy=0x11  fileExtent=-2147483648  AUindex=0 blockIndex=1
 copy #0:  disk=2  au=0
 BH: (0x0x40041795000)  bnum=4586 type=reading state=reading chgSt=not modifying
 flags=0x00000000  pinmode=excl  lockmode=share  bf=0x0x40041400000
 kfbh_kfcbh.fcn_kfbh = 0.0  lowAba=655.8572  highAba=0.0
 last kfcbInitSlot return code=null cpkt lnk is null

Finally, we use the self-developed database extraction tool to recover all the Oracle data files.

Distributed storage failure:

Tue Mar 03 11:40:49 2020
ERROR: too many offline disks in PST (grp 1)
Tue Mar 03 11:40:49 2020
......
NOTE: cache closing disk 122 of grp 1: (not open) DG_DATA01_0122
ERROR: disk 35(DG_DATA01_0035) in group 1(DG_DATA01) cannot be offlined because all disks [35(DG_DATA01_0035), 83(DG_DATA01_0083)] with mirrored data would be offline.
Tue Mar 03 11:40:49 2020
ERROR: too many offline disks in PST (grp 1)
Tue Mar 03 11:40:49 2020
NOTE: cache dismounting (not clean) group 1/0xFF593AAC (DG_DATA01)
NOTE: messaging CKPT to quiesce pins Unix process pid: 94856, image: oracle@zhjqc01 (B000)
Tue Mar 03 11:40:49 2020
NOTE: cache closing disk 0 of grp 1: (not open) DG_DATA01_0000
Tue Mar 03 11:40:49 2020
NOTE: halting all I/Os to diskgroup 1 (DG_DATA01)
Tue Mar 03 11:40:49 2020
NOTE: cache closing disk 80 of grp 1: (not open) DG_DATA01_0080
Tue Mar 03 11:40:49 2020
NOTE: cache closing disk 82 of grp 1: (not open) DG_DATA01_0082
Tue Mar 03 11:40:49 2020
WARNING: Offline of disk 35 (DG_DATA01_0035) in group 1 and mode 0x7f failed on ASM inst 1
Tue Mar 03 11:40:49 2020
NOTE: cache closing disk 83 of grp 1: (not open) DG_DATA01_0083
Tue Mar 03 11:40:50 2020
......
NOTE: LGWR doing non-clean dismount of group 1 (DG_DATA01) thread 2
NOTE: LGWR sync ABA=583.8239 last written ABA 583.8239
Tue Mar 03 11:40:50 2020
NOTE: initiating dirty detach from lock domain 1
Tue Mar 03 11:40:50 2020
.......
Tue Mar 03 11:40:50 2020
Dirty Detach Reconfiguration complete (total time 0.3 secs) 
ASM Health Checker found 1 new failures
Tue Mar 03 11:40:51 2020
ERROR: ORA-15130 in COD recovery for diskgroup 1/0xff593aac (DG_DATA01)
ERROR: ORA-15130 thrown in RBAL for group number 1
Tue Mar 03 11:40:51 2020
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_55928.trc:
ORA-15130: diskgroup "DG_DATA01" is being dismounted
Tue Mar 03 11:40:54 2020

The following error message is displayed when you manually mount the diskgroup:

ERROR: diskgroup DG_DATA01 was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "35" in group "DG_DATA01" may result in a data loss
ORA-15042: ASM disk "35" is missing from group number "1" 
ORA-15080: synchronous I/O operation failed to read block 0 of disk 13 in disk group 
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: 4294967295
Additional information: 4096

We manually fixed the above error through kfed, and the solution was tested as follows:

SQL> alter diskgroup test mount force;
alter diskgroup test mount force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "1" in group "TEST" may result in a data loss
ORA-15042: ASM disk "1" is missing from group number "2"
 
=============== PST ==================== 
grpNum:    2 
state:     2 
callCnt:   91 
(lockvalue) valid=1 ver=0.0 ndisks=2 flags=0x3 from inst=0 (I am 1) last=0
--------------- HDR -------------------- 
next:    46 
last:    46 
pst count:       2 
pst locations:   2  0 
incarn:          45 
dta size:        6 
version:         1 
ASM version:     186646528 = 11.2.0.0.0
contenttype:     1
partnering pattern:      [ ]
--------------- LOC MAP ---------------- 
0: dirty 0       cur_loc: 0      stable_loc: 0
1: dirty 1       cur_loc: 1      stable_loc: 1
--------------- DTA -------------------- 
0: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 2429200834 parts: 5 (amp) 4 (amp) 3 (amp) 2 (amp) 
1: sts v v(-w) p(-w) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 4 (amp) 5 (amp) 2 (amp) 3 (amp) 
2: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 1 (amp) 4 (amp) 5 (amp) 0 (amp) 
3: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 5 (amp) 0 (amp) 1 (amp) 4 (amp) 
4: sts v v(--) p(--) a(-) d(-) fg# = 3 addTs = 2429203972 parts: 1 (amp) 0 (amp) 2 (amp) 3 (amp) 
5: sts v v(--) p(--) a(-) d(-) fg# = 3 addTs = 2429203972 parts: 0 (amp) 1 (amp) 3 (amp) 2 (amp) 
 
 
[grid@Prod ~]$ kfed read /dev/asm-diske aun=1 blkn=2|grep "status"|grep -v "I=0"
kfdpDtaEv1[0].status:               127 ; 0x000: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[1].status:               127 ; 0x030: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[2].status:               127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[3].status:               127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[4].status:                21 ; 0x0c0: I=1 V=0 V=1 P=0 P=1 A=0 D=0
kfdpDtaEv1[5].status:                21 ; 0x0f0: I=1 V=0 V=1 P=0 P=1 A=0 D=0
 
[grid@Prod ~]$ kfed read /dev/asm-diskc aun=1 blkn=2|grep "status"|grep -v "I=0" 
kfdpDtaEv1[0].status:               127 ; 0x000: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[1].status:               127 ; 0x030: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[2].status:               127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[3].status:               127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[4].status:                21 ; 0x0c0: I=1 V=0 V=1 P=0 P=1 A=0 D=0
kfdpDtaEv1[5].status:                21 ; 0x0f0: I=1 V=0 V=1 P=0 P=1 A=0 D=0
 
 
[grid@test ~]$ kfed read /dev/asm-diskh aun=1 blkn=2|grep "status"|grep -v "I=0" 
kfdpDtaEv1[0].status:               127 ; 0x000: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[1].status:               127 ; 0x030: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[2].status:               127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[3].status:               127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[4].status:               127 ; 0x0c0: I=1 V=0 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[5].status:               127 ; 0x0f0: I=1 V=1 V=1 P=1 P=1 A=1 D=1
 
[grid@test ~]$ kfed read /dev/asm-diskh aun=1 blkn=2|grep "status"|grep -v "I=0" > repair.txt
[grid@test ~]$ kfed merge /dev/asm-diske aun=1 blkn=2 text=repair.txt                        
[grid@test ~]$ kfed merge /dev/asm-diskc aun=1 blkn=2 text=repair.txt 
[grid@test ~]$ kfed merge /dev/asm-diske aun=1 blkn=3 text=repair.txt                        
[grid@test ~]$ kfed merge /dev/asm-diskc aun=1 blkn=3 text=repair.txt 
 
SQL> alter diskgroup test mount force;
 
Diskgroup altered.

 

Storage array damage

Mon Apr 27 02:01:15 2020
Read of datafile '+DATADG/orclsk/datafile/sysaux.260.956077231' (fno 2) header failed with ORA-01207
Rereading datafile 2 header from mirror side 'DATADG_0000' failed with ORA-01207
Rereading datafile 2 header from mirror side 'DATADG_0006' failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/orclsk/orclsk1/trace/orclsk1_ckpt_4893.trc:
ORA-63999: data file suffered media failure
ORA-01122: database file 2 failed verification check
ORA-01110: data file 2: '+DATADG/orclsk/datafile/sysaux.260.956077231'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ckpt_4893.trc:
ORA-63999: data file suffered media failure
ORA-01122: database file 2 failed verification check
ORA-01110: data file 2: '+DATADG/xxxxx/datafile/sysaux.260.956077231'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 4893): terminating the instance due to error 63999

The customer restarted the database environment and found that a large number of errors were still reported.

Mon Apr 27 02:02:03 2020
Read of datafile '+DATADG/xxxxx/datafile/mscsz.dbf' (fno 40) header failed with ORA-01207
Rereading datafile 40 header from mirror side 'DATADG_0000' failed with ORA-01207
Rereading datafile 40 header from mirror side 'DATADG_0005' failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_dbw0_115709.trc:
ORA-01186: file 40 failed verification tests
ORA-01122: database file 40 failed verification check
ORA-01110: data file 40: '+DATADG/xxxxx/datafile/mscsz.dbf'
ORA-01207: file is more recent than control file - old control file
File 40 not verified due to error ORA-01122
.......
Mon Apr 27 02:02:07 2020
QMNC started with pid=46, OS id=115887 
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_115771.trc  (incident=299881):
ORA-01578: ORACLE data block corrupted (file # 1, block # 3620)
ORA-01110: data file 1: '+DATADG/xxxxx/datafile/system.259.956077225'
Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_299881/xxxxx1_ora_115771_i299881.trc
Mon Apr 27 02:02:16 2020
Dumping diagnostic data in directory=[cdmp_20200427020216], requested by (instance=1, osid=115771), summary=[incident=299881].
Mon Apr 27 02:02:17 2020
Sweep [inc][299881]: completed
Mon Apr 27 02:02:18 2020
Completed: ALTER DATABASE OPEN /* db agent *//* {0:13:1899} */
Mon Apr 27 02:02:18 2020
ORA-21700 encountered when generating server alert SMG-4120
ORA-21700 encountered when generating server alert SMG-4121
ORA-21700 encountered when generating server alert SMG-4121
Mon Apr 27 02:02:20 2020
Starting background process CJQ0
Mon Apr 27 02:02:20 2020
CJQ0 started with pid=54, OS id=115990 
Mon Apr 27 02:02:20 2020
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_m004_115984.trc  (incident=300025):
ORA-01578: ORACLE data block corrupted (file # 2, block # 101561)
ORA-01110: data file 2: '+DATADG/xxxxx/datafile/sysaux.260.956077231'
Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_300025/xxxxx1_m004_115984_i300025.trc
Dumping diagnostic data in directory=[cdmp_20200427020222], requested by (instance=1, osid=115984 (M004)), summary=[incident=300025].
Mon Apr 27 02:02:22 2020
Sweep [inc][300025]: completed
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_m004_115984.trc  (incident=300026):
ORA-01578: ORACLE data block corrupted (file # , block # )
Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_300026/xxxxx1_m004_115984_i300026.trc
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_m004_115984.trc  (incident=300027):
ORA-01578: ORACLE data block corrupted (file # , block # )
ORA-06512: at "SYS.PRVT_ADVISOR", line 6144
ORA-06512: at line 1
Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_300027/xxxxx1_m004_115984_i300027.trc
Dumping diagnostic data in directory=[cdmp_20200427020224], requested by (instance=1, osid=115984 (M004)), summary=[incident=300026].
Dumping diagnostic data in directory=[cdmp_20200427020224], requested by (instance=1, osid=115984 (M004)), summary=[incident=300027].
Mon Apr 27 02:02:46 2020
Hex dump of (file 1, block 24895) in trace file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_116134.trc
Corrupt block relative dba: 0x0040613f (file 1, block 24895)
Bad header found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00405c3f
 last change scn: 0x0000.0002a412 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xa4120601
 check value in block header: 0xa486
 computed block checksum: 0x0
Reading datafile '+DATADG/xxxxx/datafile/system.259.956077225' for corruption at rdba: 0x0040613f (file 1, block 24895)
Read datafile mirror 'DATADG_0001' (file 1, block 24895) found same corrupt data (no logical check)
Read datafile mirror 'DATADG_0003' (file 1, block 24895) found valid data
Hex dump of (file 1, block 24895) in trace file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_116134.trc
Repaired corruption at (file 1, block 24895)

Finally, we successfully restored the normal operation of this business system through some means.