Focus On Oracle

Installing, Backup & Recovery, Performance Tuning,
Troubleshooting, Upgrading, Patching

Oracle Engineered System


当前位置: 首页 » 技术文章 » 案列分析

Oracle RAC can not be started due to AIX install_assist

环境:AIX6.1 + 11gR2 RAC

描述:重启服务器后,Oracle RAC集群服务无法启动。


1.通过crsctl启动时,会有以下提示错误

# /oracle/grid_home/bin/crsctl start crs

CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.

#

2.查看后台进程,发现只有ohasd.bin reboot
# ps -ef|grep d.bin
    root  9175248 31457456   0 18:11:19  pts/3  0:00 grep d.bin
    root 10092704        1   0 18:07:34      -  0:00 /oracle/grid_home/bin/ohasd.bin reboot

3.crsctl_root.log显示以下内容
Oracle Database 11g Clusterware Release 11.2.0.3.0 - Production Copyright 1996, 2011 Oracle. All rights reserved.
2015-12-21 18:47:49.957: [  OCRMSG][1]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2015-12-21 18:47:49.957: [  OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused]
2015-12-21 18:47:49.958: [  OCRMSG][1]prom_connect: error while waiting for connection complete [24]
2015-12-21 19:08:04.063: [  OCRMSG][1]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2015-12-21 19:08:04.063: [  OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused]
2015-12-21 19:08:04.063: [  OCRMSG][1]prom_connect: error while waiting for connection complete [24]
2015-12-21 19:25:27.773: [  OCRMSG][1]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2015-12-21 19:25:27.773: [  OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused]

3.通过truss ohasd.bin进程,可以看到以下信息
open("/tmp/.oracle/npohasd", O_WRONLY|O_NONBLOCK) = -1 ENXIO (No such device or address)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
open("/tmp/.oracle/npohasd", O_WRONLY|O_NONBLOCK) = -1 ENXIO (No such device or address)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0

5.ohasd.log日志显示内容
 Created alert : (:OHAS00117:) :  TIMED OUT WAITING FOR OHASD MONITOR

分析:通过排查,发现网络、存储正常,没有zombie的进程。从日志中也看不出有什么特别的错误提示,过一段时间ohasd.log会显示超时的提示,应该是ohasd被hung住了。最后发现是由于/etc/inittab中的安装助手(install_assist)引起的。oracle cluster启动进程和install_assist是同一个级别,并且在集群服务启动之前先启动,由于安装助手需要人为干预(因为是通过ssh工具连接,没有发现这个问题,没有处理),导致这个进程阻塞后面的进程,进而阻塞了集群的正常启动。


解决方案:禁止安装助手启动,注释下面的一行,重启服务器。

# grep install /etc/inittab
install_assist:2:wait:/usr/sbin/install_assist </dev/console >/dev/console 2>&1


有关install_assist详情,请查看以下链接

http://www.ibm.com/developerworks/cn/aix/redbooks/test191-3/


关键词:

相关文章

vagrant with oracle
Install Oracle Database on X86
终于等到你,Oracle 19c真的来了
Install oracle products on docker
useful mos note for exadata
一图了解Oracle GoldenGate实现Oracle到Oracle复制的前世今生
Exadata最权威最完整的学习资料
Oracle数据库C函数解析
Exadata上收集Cell节点的日志
Exadata上如何重置Cell节点root密码当你忘记时
Oracle性能加速之Write-Back Flash Cache
PDB Migration/Failover in Dataguard
Top