Focus On Oracle

Installing, Backup & Recovery, Performance Tuning,
Troubleshooting, Upgrading, Patching

Oracle Engineered System


当前位置: 首页 » 技术文章 » 案列分析

Oracle RAC can not be started due to AIX install_assist

环境:AIX6.1 + 11gR2 RAC

描述:重启服务器后,Oracle RAC集群服务无法启动。


1.通过crsctl启动时,会有以下提示错误

# /oracle/grid_home/bin/crsctl start crs

CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.

#

2.查看后台进程,发现只有ohasd.bin reboot
# ps -ef|grep d.bin
    root  9175248 31457456   0 18:11:19  pts/3  0:00 grep d.bin
    root 10092704        1   0 18:07:34      -  0:00 /oracle/grid_home/bin/ohasd.bin reboot

3.crsctl_root.log显示以下内容
Oracle Database 11g Clusterware Release 11.2.0.3.0 - Production Copyright 1996, 2011 Oracle. All rights reserved.
2015-12-21 18:47:49.957: [  OCRMSG][1]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2015-12-21 18:47:49.957: [  OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused]
2015-12-21 18:47:49.958: [  OCRMSG][1]prom_connect: error while waiting for connection complete [24]
2015-12-21 19:08:04.063: [  OCRMSG][1]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2015-12-21 19:08:04.063: [  OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused]
2015-12-21 19:08:04.063: [  OCRMSG][1]prom_connect: error while waiting for connection complete [24]
2015-12-21 19:25:27.773: [  OCRMSG][1]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2015-12-21 19:25:27.773: [  OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused]

3.通过truss ohasd.bin进程,可以看到以下信息
open("/tmp/.oracle/npohasd", O_WRONLY|O_NONBLOCK) = -1 ENXIO (No such device or address)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
open("/tmp/.oracle/npohasd", O_WRONLY|O_NONBLOCK) = -1 ENXIO (No such device or address)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0

5.ohasd.log日志显示内容
 Created alert : (:OHAS00117:) :  TIMED OUT WAITING FOR OHASD MONITOR

分析:通过排查,发现网络、存储正常,没有zombie的进程。从日志中也看不出有什么特别的错误提示,过一段时间ohasd.log会显示超时的提示,应该是ohasd被hung住了。最后发现是由于/etc/inittab中的安装助手(install_assist)引起的。oracle cluster启动进程和install_assist是同一个级别,并且在集群服务启动之前先启动,由于安装助手需要人为干预(因为是通过ssh工具连接,没有发现这个问题,没有处理),导致这个进程阻塞后面的进程,进而阻塞了集群的正常启动。


解决方案:禁止安装助手启动,注释下面的一行,重启服务器。

# grep install /etc/inittab
install_assist:2:wait:/usr/sbin/install_assist </dev/console >/dev/console 2>&1


有关install_assist详情,请查看以下链接

http://www.ibm.com/developerworks/cn/aix/redbooks/test191-3/


关键词:

相关文章

Oracle Cloud Native Solutions
Oracle Kubernetes Engine
wordcloud and jieba
关于Java的那些事
Get financial data by tushare
conda and anaconda
python basic knowledge
OMC vs ELK
Install minikube on windows by Chocolatey
Terraform,docker,wercker,k8s
容器数据库(CDB)和可插拔数据库(PDB)概述
Vagrant with oracle
Top