Focus On Oracle

Installing, Backup & Recovery, Performance Tuning,
Troubleshooting, Upgrading, Patching, Zero-Downtime Upgrade, GoldenGate

Oracle Exadata ,Oracle ODA, Oracle ZDLRA


当前位置: 首页 » 技术文章 » 内部原理

如何用BBED使Offline的数据文件Online

环境:IBM AIX两节点RAC,数据库版本10.2.0.4(使用了ASM) 

故障原因:

经了解,目前此数据库的容量为6T,ASM磁盘组空间几乎用尽,在向磁盘组增加磁盘时,由于种种原因最终导致2个数据文件offline。由于数据库比较大,数据库没有备份,可怜的是,归档日志是定期清除的,当发现这个问题时,所需的归档日志已被清除,想通过常规手段使文件online已不可能,幸运的时,通过BBED最终使文件online成功,虽然后续还要一些问题,最终得以解决。

alert日志内容
Mon Jul  4 09:55:25 2016
KCF: write/open error block=0x63a093 online=1
     file=11 +ASM_DG02/abdd/datafile/dth_img.261.769872303
     error=15078 txt: ''
Automatic datafile offline due to write error on
file 11: +ASM_DG02/abdd/datafile/dth_img.261.769872303
Mon Jul  4 09:55:38 2016
Errors in file /u01/oracle/admin/abdd/bdump/abdd1_j000_3605220.trc:
ORA-12012: error on auto execute of job 10303
ORA-01115: IO error reading block from file 67 (block # 42653)
ORA-01110: data file 67: '+ASM_DG02/abdd/datafile/data02.dbf'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-01115: IO error reading block from file 67 (block # 198500)
ORA-01110: data file 67: '+ASM_DG02/abdd/datafile/data02.dbf'

恢复思路:
A.安装BBED(由于是10.2.0.4的库,自身就有bbed编译所需的文件)
B.找出2个Offline文件在磁盘上的位置
C.通过dd生成备份/恢复这两个数据文件头的命令
D.正常关闭数据库
E.用dd复制出2个正常的数据文件头部和2个Offline的数据文件头部
   注意:这2个offline的文件头部备份2份,因为后面要修改。复制出2个正常的数据文件头部用作参考。
F.用bbed查看正常文件的头部在偏移量484到512的数值
G.用bbed修改2个offline文件头部在偏移量484到512的数值,确保Offline文件和正常文件头部的数值是一致的
H.用sqlplus连接其中一节点并启动数据库到mount
I.恢复数据文件
   recover datafile 67;
   recover datafile 11;
J.已只读模式打开数据库
   alter database open read only;
   alter database datafile 11 online;
   alter database datafile 67 online;
   select distinct status from v$datafile;
   select * from v$recover_file;
K.关闭数据库,以正常模式打开数据库
   shutdown immediate
   startup
L.经过一段时间观察,没有出现ora-600等异常情况
M.关闭数据库,用srvctl start database

注意:在操作前,注意备份system表空间,offline的文件,当前的控制文件,在线日志文件

打开链接参考生成备份ASM数据文件头部的脚本

dd if=/dev/rhdisk5 of=/home/oracle/backup/1_1_264.FH bs=1048576 skip=283 count=1
dd if=/dev/rhdisk8 of=/home/oracle/backup/1_2_262.FH bs=1048576 skip=116 count=1
dd if=/dev/rhdisk17 of=/home/oracle/backup/2_8_261.FH bs=1048576 skip=53055 count=1
dd if=/dev/rhdisk12 of=/home/oracle/backup/2_2_264.FH bs=1048576 skip=3616 count=1

BBED具体操作
BBED> info all
 File#  Name                                                        Size(blks)
 -----  ----                                                        ----------
     1  /home/oracle/backup/1_1_264.FH                                       0
     2  /home/oracle/backup/1_2_262.FH                                       0
     3  /home/oracle/backup/2_2_264.FH                                       0
     4  /home/oracle/backup/2_8_261.FH                                       0 

BBED> set dba 1,1
        DBA             0x00400001 (4194305 1,1)

BBED> p kcvfhckp
struct kcvfhckp, 160 bytes                  @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x36a28488
      ub2 kscnwrp                           @488      0x0d58
   ub4 kcvcptim                             @492      0x369faee7
   ub2 kcvcpthr                             @496      0x0002
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x00015a93
         ub4 kcrbabno                       @504      0x00006b8d
         ub2 kcrbabof                       @508      0x0010
为了方便查看,后面部分省略
BBED> set dba 2,1 DBA 0x00800001 (8388609 2,1) BBED> p kcvfhckp struct kcvfhckp, 160 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0x36a28488 ub2 kscnwrp @488 0x0d58 ub4 kcvcptim @492 0x369faee7 ub2 kcvcpthr @496 0x0002 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x00015a93 ub4 kcrbabno @504 0x00006b8d ub2 kcrbabof @508 0x0010
为了方便查看,后面部分省略
BBED> set dba 3,1 DBA 0x00c00001 (12582913 3,1) BBED> p kcvfhckp struct kcvfhckp, 160 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0x36195936 ub2 kscnwrp @488 0x0d58 ub4 kcvcptim @492 0x369dbc5a ub2 kcvcpthr @496 0x0002 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x000159e9 ub4 kcrbabno @504 0x00000002 ub2 kcrbabof @508 0x0010 为了方便查看,后面部分省略 BBED> set dba 4,1 DBA 0x01000001 (16777217 4,1) BBED> p kcvfhckp struct kcvfhckp, 160 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0x36195936 ub2 kscnwrp @488 0x0d58 ub4 kcvcptim @492 0x369dbc5a ub2 kcvcpthr @496 0x0002 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x000159e9 ub4 kcrbabno @504 0x00000002 ub2 kcrbabof @508 0x0010
为了方便查看,后面部分省略

从上面我们可以看到,文件1,2头部是一样的,这2个文件是正常的,后面2个文件是Offline的文件,我们需要做的就是修改checkpoint的信息以及RBA的信息

BBED操作

注意:AIX是大端,修改时顺序无需调换

先查看偏移量484到508的相信信息

BBED> set count 26
        COUNT           26
BBED> d offset 484 dba 1,1
 File: /home/oracle/backup/1_1_264.FH (1)
 Block: 1                Offsets:  484 to  509           Dba:0x00400001
------------------------------------------------------------------------
 36a28488 0d580000 369faee7 0002a054 00015a93 00006b8d 0010 

 <32 bytes per line>

BBED> d offset 484 dba 2,1
 File: /home/oracle/backup/1_2_262.FH (2)
 Block: 1                Offsets:  484 to  509           Dba:0x00800001
------------------------------------------------------------------------
 36a28488 0d580000 369faee7 0002a054 00015a93 00006b8d 0010 

 <32 bytes per line>

BBED> d offset 484 dba 3,1
 File: /home/oracle/backup/2_2_264.FH (3)
 Block: 1                Offsets:  484 to  509           Dba:0x00c00001
------------------------------------------------------------------------
 36195936 0d580cf8 369dbc5a 0002de80 000159e9 00000002 0010 

 <32 bytes per line>

BBED> d offset 484 dba 4,1
 File: /home/oracle/backup/2_8_261.FH (4)
 Block: 1                Offsets:  484 to  509           Dba:0x01000001
------------------------------------------------------------------------
 36195936 0d580cf8 369dbc5a 0002de80 000159e9 00000002 0010 

 <32 bytes per line>

BBED>
修改文件3,4在偏移量484到508的相关信息,修改完成后,确认1,2,3,4偏移量在484到508的信息是一致的
BBED> set dba 3,1
        DBA             0x00c00001 (12582913 3,1)

BBED> m /x 0x0d580000 offset 488
 File: /home/oracle/backup/2_2_264.FH (3)
 Block: 1                Offsets:  488 to  513           Dba:0x00c00001
------------------------------------------------------------------------
 0d580000 369faee7 0002de80 00015a93 00006b8d 00100000 0600 

 <32 bytes per line>

BBED> m /x 0x0002a054 offset 496
 File: /home/oracle/backup/2_2_264.FH (3)
 Block: 1                Offsets:  496 to  521           Dba:0x00c00001
------------------------------------------------------------------------
 0002a054 00015a93 00006b8d 00100000 06000000 00000000 0000 

 <32 bytes per line>

BBED> sum
Check value for File 3, Block 1:
current = 0x44a7, required = 0x368b

BBED> sum apply
Check value for File 3, Block 1:
current = 0x368b, required = 0x368b

BBED> set dba 4,1
        DBA             0x01000001 (16777217 4,1)

BBED> m /x 0x0d580000 offset 488
 File: /home/oracle/backup/2_8_261.FH (4)
 Block: 1                Offsets:  488 to  513           Dba:0x01000001
------------------------------------------------------------------------
 0d580000 369faee7 0002de80 00015a93 00006b8d 00100000 0600 

 <32 bytes per line>

BBED> m /x 0x0002a054 offset 496
 File: /home/oracle/backup/2_8_261.FH (4)
 Block: 1                Offsets:  496 to  521           Dba:0x01000001
------------------------------------------------------------------------
 0002a054 00015a93 00006b8d 00100000 06000000 00000000 0000 

 <32 bytes per line>

BBED> sum
Check value for File 4, Block 1:
current = 0x8ef7, required = 0xfcdb

BBED> sum apply
Check value for File 4, Block 1:
current = 0xfcdb, required = 0xfcdb

BBED> d offset 484 dba 1,1
 File: /home/oracle/backup/1_1_264.FH (1)
 Block: 1                Offsets:  484 to  509           Dba:0x00400001
------------------------------------------------------------------------
 36a28488 0d580000 369faee7 0002a054 00015a93 00006b8d 0010 

 <32 bytes per line>

BBED> d offset 484 dba 2,1
 File: /home/oracle/backup/1_2_262.FH (2)
 Block: 1                Offsets:  484 to  509           Dba:0x00800001
------------------------------------------------------------------------
 36a28488 0d580000 369faee7 0002a054 00015a93 00006b8d 0010 

 <32 bytes per line>

BBED> d offset 484 dba 3,1
 File: /home/oracle/backup/2_2_264.FH (3)
 Block: 1                Offsets:  484 to  509           Dba:0x00c00001
------------------------------------------------------------------------
 36a28488 0d580000 369faee7 0002a054 00015a93 00006b8d 0010 

 <32 bytes per line>

BBED> d offset 484 dba 4,1
 File: /home/oracle/backup/2_8_261.FH (4)
 Block: 1                Offsets:  484 to  509           Dba:0x01000001
------------------------------------------------------------------------
 36a28488 0d580000 369faee7 0002a054 00015a93 00006b8d 0010 

 <32 bytes per line>
用dd还原修改后的数据文件头部
dd  if=/home/oracle/backup/2_8_261.FH of=/dev/rhdisk17 bs=1048576 seek=53055 count=1 conv=notrunc
dd  if=/home/oracle/backup/2_2_264.FH of=/dev/rhdisk12 bs=1048576 seek=3616 count=1 conv=notrunc

正常打开后数据库alert日志

ALTER DATABASE RECOVER  datafile 11  
Tue Jul  5 23:03:43 2016
Media Recovery Start
Tue Jul  5 23:03:43 2016
SUCCESS: diskgroup ASM_DG02 was mounted
Tue Jul  5 23:03:45 2016
 parallel recovery started with 15 processes
Tue Jul  5 23:03:45 2016
Media Recovery Complete (abdd1)
Tue Jul  5 23:03:45 2016
SUCCESS: diskgroup ASM_DG02 was dismounted
Tue Jul  5 23:03:45 2016
Completed: ALTER DATABASE RECOVER  datafile 11  
Tue Jul  5 23:03:54 2016
ALTER DATABASE RECOVER  datafile 67  
Media Recovery Start
Tue Jul  5 23:03:54 2016
SUCCESS: diskgroup ASM_DG02 was mounted
Tue Jul  5 23:03:54 2016
 parallel recovery started with 15 processes
Tue Jul  5 23:03:54 2016
Media Recovery Complete (abdd1)
Tue Jul  5 23:03:54 2016
SUCCESS: diskgroup ASM_DG02 was dismounted
Tue Jul  5 23:03:54 2016
Completed: ALTER DATABASE RECOVER  datafile 67  
Tue Jul  5 23:04:16 2016
alter database open read only
Tue Jul  5 23:04:16 2016
This instance was first to open
Tue Jul  5 23:04:16 2016
SUCCESS: diskgroup ASM_DG02 was mounted
Tue Jul  5 23:04:24 2016
Picked broadcast on commit scheme to generate SCNs
Tue Jul  5 23:04:32 2016
SMON: enabling cache recovery
Tue Jul  5 23:04:32 2016
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 16
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open read only
Tue Jul  5 23:04:34 2016
Errors in file /u01/oracle/admin/abdd/udump/abdd1_ora_5571420.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access
ORA-06512: at line 2
alter database datafile 11 online
Tue Jul  5 23:08:05 2016
Starting control autobackup
Control autobackup written to DISK device
	handle '+ASM_DG01/abdd/autobackup/2016_07_05/s_916434663.16892.916441687'
Completed: alter database datafile 11 online
Tue Jul  5 23:08:17 2016
alter database datafile 67 online
Tue Jul  5 23:08:17 2016
Starting control autobackup
Control autobackup written to DISK device
	handle '+ASM_DG01/abdd/autobackup/2016_07_05/s_916434663.16727.916441699'
Completed: alter database datafile 67 online
Tue Jul  5 23:09:01 2016
Thread 1 advanced to log sequence 134864 (LGWR switch)
  Current log# 5 seq# 134864 mem# 0: +ASM_DG01/abdd/onlinelog/group_5.408.776366043
  Current log# 5 seq# 134864 mem# 1: +ASM_DG01/abdd/onlinelog/group_5.420.776366045
Wed Jul  6 00:01:36 2016
Thread 1 advanced to log sequence 134865 (LGWR switch)
  Current log# 7 seq# 134865 mem# 0: +ASM_DG01/abdd/onlinelog/group_7.314.776366093
  Current log# 7 seq# 134865 mem# 1: +ASM_DG01/abdd/onlinelog/group_7.325.776366093
Wed Jul  6 00:10:50 2016
Thread 1 advanced to log sequence 134866 (LGWR switch)
  Current log# 2 seq# 134866 mem# 0: +ASM_DG01/abdd/onlinelog/group_2.260.769366221
  Current log# 2 seq# 134866 mem# 1: +ASM_DG01/abdd/onlinelog/group_2.261.769366221
Wed Jul  6 00:16:49 2016
Thread 1 advanced to log sequence 134867 (LGWR switch)
  Current log# 1 seq# 134867 mem# 0: +ASM_DG01/abdd/onlinelog/group_1.258.769366219
  Current log# 1 seq# 134867 mem# 1: +ASM_DG01/abdd/onlinelog/group_1.259.769366219
Wed Jul  6 00:24:23 2016
Thread 1 advanced to log sequence 134868 (LGWR switch)
  Current log# 5 seq# 134868 mem# 0: +ASM_DG01/abdd/onlinelog/group_5.408.776366043
  Current log# 5 seq# 134868 mem# 1: +ASM_DG01/abdd/onlinelog/group_5.420.776366045
Wed Jul  6 00:32:29 2016
Thread 1 advanced to log sequence 134869 (LGWR switch)
  Current log# 7 seq# 134869 mem# 0: +ASM_DG01/abdd/onlinelog/group_7.314.776366093
  Current log# 7 seq# 134869 mem# 1: +ASM_DG01/abdd/onlinelog/group_7.325.776366093
Wed Jul  6 00:47:23 2016

解决ktsplbfmb-sync错误

ORA-00600: internal error code, arguments: [ktsplbfmb-sync], [], [], [], [], [], [], []

通过dbv校验数据文件,发现之前有坏块,根据file id,block id可以查到,坏块涉及的对象有2个:一个为Lob Index(一个块),一个为Lob Segment(多个块)

尝试通过expdp导出这2个表,Lob Index损坏的表可以正常导出,然后通过move table,应用程序端出现的错误消失,Lob Segment损坏的表,无法通过expdp导出。最终通过找到损坏的表的rowid,忍痛割爱通过empty_blob()重新初始化,好在损坏的行不多,只有2行,至此问题圆满解决。


小结:在数据库上不论做什么操作,都要认真去分析调查,小心无大错。


Reference

Export Receives The Errors ORA-1555 ORA-22924 ORA-1578 ORA-22922 (Doc ID 787004.1)

Export Fails With Errors ORA-2354 ORA-1555 ORA-22924 And How To Confirm LOB Segment Corruption Using Export Utility (Doc ID 833635.1)



关键词:recover block bbed 

相关文章

RMAN备份脚本样本
ORA-600 【2662】
Oracle Database Recover Check
如何用BBED使Offline的数据文件Online
ORA-00338/ORA-00312
如何使用BBED
Interpret The Datafile OS Header/Block Zero
如何计算Oracle Block的校验值
10g,11g,12c X$KCVFH(一)
如何安装BBED
Top