Restore Database with ORA-01861

今天一个客户问我,有个异机恢复的问题,原库是生产,还在运行,需要将NBU中的备份恢复到另外一个主机上,数据文件均为raw。首先我们来看一下错误:

RMAN> run{
2>
3> allocate channel t1 type 'sbt_tape';
4> allocate channel t2 type 'sbt_tape';
5> send 'NB_ORA_CLIENT=sth05v01';
6> set until time "to_date('2016-11-25 06:50:00','yyyy-mm-dd hh24:mi:ss')";
7> restore database;
8> release channel t1;
9> release channel t2;}

allocated channel: t1
channel t1: sid=540 devtype=SBT_TAPE
channel t1: Veritas NetBackup for Oracle - Release 7.5 (2012020801)

allocated channel: t2
channel t2: sid=539 devtype=SBT_TAPE
channel t2: Veritas NetBackup for Oracle - Release 7.5 (2012020801)

sent command to channel: t1
sent command to channel: t2

executing command: SET until clause

Starting restore at 01-DEC-16

released channel: t1
released channel: t2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 12/01/2016 14:46:30
ORA-01861: literal does not match format string

大家不难看出,rman执行报错了。从上述错误来看,感觉像是nls_date_format设置的问题。的确,如果我们在rman命令行中使用了until time操作,那么建议设置nls_date_Format参数,可以设置os环境变量也可以在rman中执行sql执行。但是这里其实并不是这种情况。实际上我无论是在os设置还是rman命令行中设置,均不会起作用。

针对该问题,Oracle mos有一篇文档RMAN Recovery Session Fails with ORA-1861 (文档 ID 852723.1) 记录十分详细,里面提到了一些解决方案,其中也提到了一些检测判断是否为文章中提到的bug的方式。

不幸的是,我参考文章的的方式,执行了如下脚本发现根本不返回记录,也就说明不是文档中描述的这个问题。

于是我进一步使用rman target / debug trace=/tmp/rman.trc进行跟踪,检查发现rman在执行脚本的时候,在运行如下的SQL时报错:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 12/01/2016 17:38:21
RMAN-06003: ORACLE error from target database:
ORA-01861: literal does not match format string
RMAN-06097: text of failing SQL statement: select fhscn ,to_date(fhtim,'MM/DD/RR HH24:MI:SS','NLS_CALENDAR=Gregorian') ,fhcrs ,fhrls ,to_date(fhrlc,'MM/DD/RR HH24:MI:SS','NLS_CALENDAR=Gregorian') ,fhafs ,fhrfs ,fhrft ,hxerr ,fhfsz ,fhsta into :b1,:b2:b3,:b4,:b5,:b6:b7,:b8:b9,:b10:b11,:b12:b13,:b14,:b15,:b16  from x$kcvfhall where hxfil=:b17
RMAN-06099: error occurred in source file: krmk.pc, line: 5655
DBGMISC:        ENTERED krmkursr [17:38:21.493]

DBGSQL:          EXEC SQL AT TARGET select decode(open_mode,'MOUNTED',0,'READ WRITE',1,'READ ONLY',1,0) into :b1  from v$database  [17:38:21.493]
DBGSQL:             sqlcode=0 [17:38:21.494]
DBGSQL:                :b1 = 0

从上面的内容来看,上述的select fhscn语句是报错的关键。将该SQL挑出来,单独执行,发现果然有问题。如下是我的测试结果:

SQL> select fhscn,
  2         to_date(fhtim, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
  3         fhcrs,
  4         fhrls,
  5         to_date(fhrlc, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
  6         fhafs,
  7         fhrfs,
  8         fhrft,
  9         hxerr,
 10         fhfsz,
 11         fhsta
 12    from x$kcvfhall
 13   where hxfil =252;
       to_date(fhtim, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
               *
ERROR at line 2:
ORA-01861: literal does not match format string

SQL> l
  1  select fhscn,
  2         to_date(fhtim, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
  3         fhcrs,
  4         fhrls,
  5         to_date(fhrlc, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
  6         fhafs,
  7         fhrfs,
  8         fhrft,
  9         hxerr,
 10         fhfsz,
 11         fhsta
 12    from x$kcvfhall
 13*  where hxfil <252 and hxfil > 252
SQL> /

FHSCN            TO_DATE(FHTIM,'MM FHCRS            FHRLS            TO_DATE(FHRLC,'MM FHAFS            FHRFS            FHRFT                     HXERR      FHFSZ      FHSTA
---------------- ----------------- ---------------- ---------------- ----------------- ---------------- ---------------- -------------------- ---------- ---------- ----------
408027070532     02/23/88 02:19:50 0                0                                  313537986655                                                    8          0          0
339307855959     02/21/88 13:55:10 0                0                                  266292363311                                                    8          0          0
231931576374     02/07/88 04:01:52 0                0                                  206161641523                                                    8          0          0
214751510576     02/09/88 10:38:52 0                0                                  206161772597                                                    8          0          0
287767330911     03/02/88 17:11:37 0                0                                  313537593439                                                    8          0          0
287769034817     03/11/88 01:26:40 0                0                                  335013675081                                                    8          0          0
356487069774     03/11/88 01:26:33 0                0                                  335013675081                                                    8          0          0
257701183536     02/27/88 15:33:21 0                0                                  300653936719                                                    8          0          0
137441050656     01/25/88 06:33:04 0                0                                  137441050656                                                    8          0          0
0                                  0                0
......

从测试结果来看,我们可以断定第252号文件在控制文件中的信息可能有点问题。这个问题我也是第一次遇到。不过我将controlfile 的内容dump出来之后,过滤定位到file=252的内容,发现似乎并没有异常的地方:

sth14n01$(/oracle/app/admin/SETTLE/bdump)cat /oracle/app/product/10.2.0/db_1/rdbms/log/settle1_ora_10832.trc|grep 'File=252'
 RECID #39423 Recno 1071 Record timestamp  11/25/16 00:51:28 File=252 Incremental backup level=0
 RECID #38917 Recno 565 Record timestamp  11/17/16 20:53:38 File=252 Incremental backup level=0

RECID #39423 Recno 1071 Record timestamp  11/25/16 00:51:28 File=252 Incremental backup level=0
  File is part of the incremental strategy
  Backup set key: stamp=928792972, count=17980
  Creation checkpointed at scn: 0x0caf.169d9ef4 10/28/14 12:19:55
  File header checkpointed at scn: 0x0dc6.4f902943 11/24/16 22:02:53
  Resetlogs scn and time scn: 0x0000.0007c754 12/31/13 10:28:23
  Incremental Change scn: 0x0000.00000000
  Absolute Fuzzy scn: 0x0000.00000000
  Newly-marked media corrupt blocks  0 Total media corrupt blocks 0
  Total logically corrupt blocks 0  Block images written to backup 1048456
  File size at backup time 1048575  Block size 8192
  Low Offline Range Recid 0
  Number of blocks read during backup 1048575

   RECID #38917 Recno 565 Record timestamp  11/17/16 20:53:38 File=252 Incremental backup level=0
  File is part of the incremental strategy
  Backup set key: stamp=928173622, count=17846
  Creation checkpointed at scn: 0x0caf.169d9ef4 10/28/14 12:19:55
  File header checkpointed at scn: 0x0dc4.fb8c934d 11/17/16 18:00:23
  Resetlogs scn and time scn: 0x0000.0007c754 12/31/13 10:28:23
  Incremental Change scn: 0x0000.00000000
  Absolute Fuzzy scn: 0x0000.00000000
  Newly-marked media corrupt blocks  0 Total media corrupt blocks 0
  Total logically corrupt blocks 0  Block images written to backup 1048456
  File size at backup time 1048575  Block size 8192
  Low Offline Range Recid 0
  Number of blocks read during backup 1048575

不难看出,上述的内容似乎没有什么问题。从整体现象来看与Oracle mos文档中提到的bug确实很像,几乎完全一样。但是通过mos文档中提到的脚本检查缺没有问题,这让我感觉很奇怪。

不过这里有点不确认的是,我是在生产库进行检测的,同时执行select语句也没有任何问题。说明原生产库的controlfile本身就是ok的。按理说应该在新环境中进行检测。

简而言之,这个问题确实有些怪异。不过还好的是,要解决或者绕过这个问题并不难,比如可以restore tablespace  xx,xxx  skip datafile 252即可。最后单独备份这个252文件进行restore,然后即可。


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注