What's New in Oracle RAC Administration and Deployment?
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/whatsnew.htm#RACAD000
Administrator managed(Admin-Managed)
Database administrators define on which servers a database resource should run, and place resources manually as needed. This is the management strategy used in previous releases.
数据库管理员根据需要,通过手动设置数据库资源在哪些服务器上运行,这种管理方式称为管理员管理的策略,这也是之前的版本管理方式。在11gR2之前,通过DBCA创建数据库时,可以选择需要在哪些节点(至少2个节点,可以有更多的节点)上创建数据库,创建完成后,如果不增加或减少节点,数据库会一直在这些节点上运行。就是说当管理员安装完毕以后,这些节点都不会自动变化,这称为Admin-Managed。其实Admin-Managed也有server pool,不过这种管理方式的管理池不能更改。
[orgrid@ohs1 ~]$ srvctl config database -d pgold
Database unique name: pgold Database name: pgold Oracle home: /ordb/oracle/product/112 Oracle user: oracle Spfile: +DATA_PGOLD/pgold/spfilepgold.ora Domain: Start options: open Stop options: immediate Database role: PRIMARY Management policy: AUTOMATIC Server pools: pgold Database instances: pgold1,pgold2 Disk Groups: DATA_PGOLD,SYSTEMDG Mount point paths: Services: Type: RAC Database is administrator managed
[orgrid@ohs2 ~]$
[orgrid@ohs2 ~]$ srvctl config srvpool -g pgoldPRKO-3160 : Server pool pgold is internally managed as part of administrator-managed database configuration and therefore cannot be queried directly via srvpool object.
[orgrid@ohs2 ~]$
Policy managed(Policy-Managed)
Database administrators specify the server pool (excluding generic or free) in which the database resource runs. Oracle Clusterware places the database resource on a server.
数据库管理员指定数据库在哪个服务器池(除了generic和free)中运行,也就是说这种管理方式是以服务器池(Server Pool)为基础。我们可以创建一个Server Pool,把一些服务器加入到Server Pool,制定策略(就是配置据库的实例在哪几台服务器上运行,及实例运行的的个数)。根据需要,我们可以设置数据库实例个数及所运行的主机。Policy-Managed数据库实例后缀名和Admin-Managed数据库实例后缀名是不一样的,前者的后缀名为1(racdb1,racdb2,racdb3),后者的后缀名为_1(racdb_1,racdb_2,racdb_3)。当服务器很多的时候,Policy-Managed管理方式的优势就体现出来了。
Cardinality
The number of servers on which a resource can run, simultaneously.
是说一个资源可以同时在多少个节点上一起运行。比如创建一个基于Policy-Managed数据库,如果这个参数为3(假如有4个节点),那么通过DBCA创建后,最多有三个实例。
RAC One Node
Oracle Real Application Clusters One Node (Oracle RAC One Node) is a single instance of an Oracle Real Application Clusters (Oracle RAC) database that runs on one node in a cluster. This option adds to the flexibility that Oracle offers for database consolidation.
RAC One Node:字面意思是一个节点的RAC数据库,是11.2的新特性, 是RAC数据库中的一个实例运行在集群中,有且只有一个节点运行(假定有2个节点,如果数据库在节点一上正常运行,如果这时启动节点二,节点一上的数据库会被关闭),还可以实现Failover。这个类似于利用其他厂商的集群软件来管理Oracle单机数据库,比如IBM HACMP。这两者都是高可用数据库的体现,不过RAC One Node是通过Oracle的集群软件实现的,通过集群软件可以管理数据库,和管理正常的数据库几乎一样(srvctl,crsctl命令几乎都一样),由于采用的都是Oracle的东西,所以管理维护起来和故障排除也很简单,如果利用厂商的集群软件做HA,还需要了解厂商集群软件(IBM HACMP,HP MC/SG)。RAC One Node的优点和特点
A.有且只有一个节点在运行集群中,也因为这个特点,和RAC数据库相比,减少了RAC实例之间消息、数据请求传输的时间以及GC等待时间
B.如果当前运行节点需要维护,可以手动切换数据库手动切换到(relocate)到备用服务器,这样可以减少业务中断时间
C.很容易转变成RAC数据库, 可以在线操作,不需要停数据库
D.在GRID中可以创建多个RAC One Node数据库,运行在不同的节点上,提高了硬件的利用率
E.可以创建为基于Admin-Managed或Policy-Managed方式的数据库
RAC数据库和RAC One Node数据库可以相互转化
srvctl convert database -d db_unique_name -c RACONENODE [-i instance_name -w timeout] srvctl convert database -d db_unique_name -c RAC [-n node_name]
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/onenode.htm
什么是Server Pool?
是集群中服务器逻辑上的分组,可以保护应用、数据库,或两者都有。在某一个时间,一个服务器只能在某一特定的池中。她有三种类型,Free池、Generic池和用户自定义池。每个池有3个参数:Importance,Min, Max,这三个参数的意义为:
MIN_SIZE
The minimum number of servers the server pool should contain.
server pool中最小的服务器数。
MAX_SIZE
The maximum number of servers the server pool should contain.
server pool中最大的服务器数,取决于实际节点个数IMPORTANCE
A number from 0 to 1000 (0 being least important) that ranks a server pool among all other server pools in a cluster.
她的取值范围为0到1000,表示该池的关键程度,这个数值越大,表示关键程度越高,会优先被考虑满足 Min 条件,默认值为0Server Pool的分类
Server Pool有三种类型,她包含Free池、Generic池和用户自定义的池
It contains servers that are not assigned to any other server pools. The attributes of the Free server pool are restricted, as follows:
SERVER_NAMES, MIN_SIZE, and MAX_SIZE cannot be edited by the user
IMPORTANCE and ACL can be edited by the user
没有被指派server pool的server,都位于这个池中。池的属性(SERVER_NAMES, MIN_SIZE, and MAX_SIZE)不能被修改,IMPORTANCE可以被修改。
Generic Pool:
It stores pre-11g release 2 (11.2) Oracle Databases and administrator-managed databases that have fixed configurations.
她包含了11gR2之前的的Oracle数据库和基于Admin-Managed数据库,因为他们的配置是固定的。基于Policy-Managed RAC数据库测试
2个节点的RAC数据库,使用Policy-Managed方式。通过设置Server Pool,把服务器池的最大个数设置为1,观察数据库的变化
[orgrid@hs1 ~]$ srvctl config database -d racdbDatabase unique name: RACDB Database name: RACDB Oracle home: /u01/ordb/oracle/product/112 Oracle user: oracle Spfile: +DATA_PGOLD/RACDB/spfileRACDB.ora Domain: Start options: open Stop options: immediate Database role: PRIMARY Management policy: AUTOMATIC Server pools: racp Database instances: Disk Groups: DATA_PGOLD Mount point paths: Services: Type: RAC Database is policy managed
[orgrid@ohs1 ~]$ srvctl config serverpool
Server pool name: Free Importance: 0, Min: 0, Max: -1 Candidate server names: Server pool name: Generic Importance: 0, Min: 0, Max: -1 Candidate server names: Server pool name: racp Importance: 0, Min: 0, Max: 2 Candidate server names:[orgrid@ohs1 ~]$
racp池是用DBCA创建数据库的时候由我们自己定义的。其中 Min: 0, Max: 2表示在这个池中最少允许有0台机器,最多允许有2台机器。
[orgrid@ohs1 ~]$ srvctl status server -n ohs1 -a 可以看到ohs1在racp池
Server name: ohs1 Server state: ONLINE Server active pools: ora.racp Server state details:[orgrid@ohs1 ~]$ srvctl status server -n ohs2 -a 可以看到ohs2也在racp池中
Server name: ohs2
Server state: ONLINE
Server active pools: ora.racp
Server state details: [orgrid@ohs1 ~]$
修改server pool中最大值(2-->1)
[orgrid@ohs1 ~]$ srvctl modify srvpool -h
Modifies the configuration for the server pool. Usage: srvctl modify srvpool -g <pool_name> [-l <min>] [-u <max>] [-i <importance>] [-n "<server_list>"] [-f] -g <pool_name> Server pool name -l <min> Minimum size of the server pool -u <max> Maximum size of the server pool, -1 for unlimited maximum size -i <importance> Importance of the server pool -n "<server_list>" Comma separated list of candidate server names -f Force the operation even though some resource(s) will be stopped -h Print usage
[orgrid@ohs1 ~]$
[orgrid@ohs1 ~]$ srvctl modify srvpool -g racp -l 1 -u 1 -i 100
PRCS-1011 : Failed to modify server pool racp CRS-2736: The operation requires stopping resource 'ora.racdb.db' on server 'ohs1' CRS-2738: Unable to modify server pool 'ora.racp' as this will affect running resources, but the force option was not specified
[orgrid@ohs1 ~]$ srvctl modify srvpool -g racp -l 1 -u 1 -i 100 -f 调整racp池最多可以容纳的服务器个数
[orgrid@ohs1 ~]$
[orgrid@ohs1 ~]$ srvctl config serverpoolServer pool name: Free
Importance: 0, Min: 0, Max: -1
Candidate server names:
Server pool name: Generic
Importance: 0, Min: 0, Max: -1
Candidate server names:
Server pool name: racp
Importance: 100, Min: 1, Max: 1 Candidate server names:
[orgrid@ohs1 ~]$ [orgrid@ohs1 ~]$ ps -ef|grep pmon
oracle 5765 1 0 00:37 ? 00:00:00 asm_pmon_+ASM1 oracle 10621 7687 0 01:07 pts/1 00:00:00 grep pmon
[orgrid@ohs1 ~]$ srvctl status server -n ohs1 -a 可以看到ohs1从racp池中移除,自动添加到Free池中
Server name: ohs1 Server state: ONLINE Server active pools: Free Server state details:
[orgrid@ohs1 ~]$ srvctl status server -n ohs2 -a ohs2还在racp池中
Server name: ohs2 Server state: ONLINE Server active pools: ora.racp Server state details:
[orgrid@ohs1 ~]$
我们可以看到,节点一被shutdown了,节点一从server pool:racp中移除,现在节点一的server pool为free
基于Policy-Managed RAC One Node数据库测试
2个节点的RAC One Node数据库,使用Policy-Managed方式。在节点一上kill pmon进程来测试Failover。
[orgrid@ohs1 ~]$ srvctl config database -d racdb
Database unique name: racdb
Database name: racdb
Oracle home: /ordb/oracle/product/112
Oracle user: oracle
Spfile: +DATA_PGOLD/racdb/spfileracdb.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: onp
Database instances:
Disk Groups: DATA_PGOLD
Mount point paths:
Services: ap
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: racdb
Candidate servers:
Database is policy managed
[orgrid@ohs1 ~]$ srvctl config service -s ap -d racdb
Service name: ap
Service is enabled
Server pool: onp
Cardinality: SINGLETON
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Service is enabled on nodes:
Service is disabled on nodes:
[orgrid@ohs1 ~]$
[orgrid@ohs1 ~]$ srvctl config serverpool -g onp
Server pool name: onp
Importance: 0, Min: 0, Max: 2
Candidate server names:
[orgrid@ohs1 ~]$ srvctl status server -n ohs2 -a
Server name: ohs2
Server state: ONLINE
Server active pools: ora.onp
Server state details:
[orgrid@ohs1 ~]$ srvctl status server -n ohs1 -a
Server name: ohs1
Server state: ONLINE
Server active pools: ora.onp
Server state details:
[orgrid@ohs1 ~]$
[orgrid@ohs1 ~]$ ps -ef|grep pmon
orgrid 5772 1 0 14:16 ? 00:00:00 asm_pmon_+ASM1
oracle 6645 1 0 14:17 ? 00:00:00 ora_pmon_racdb_1
[orgrid@ohs1 ~]$
[root@ohs2 ~]# ps -ef|grep pmon
orgrid 5820 1 0 14:16 ? 00:00:00 asm_pmon_+ASM2
root 12907 9205 0 15:14 pts/1 00:00:00 grep pmon
[root@ohs2 ~]#
kill节点一上pmon进程,kill一次之后,会尝试重新启动,如果启动不成功才切换到另外节点,本次测试kill了2次之后才切换到第二个节点
[orgrid@ohs1 ~]$ ps -ef|grep pmon
orgrid 5772 1 0 14:16 ? 00:00:00 asm_pmon_+ASM1
oracle 6645 1 0 14:17 ? 00:00:00 ora_pmon_racdb_1
oracle 14861 10475 0 15:14 pts/1 00:00:00 grep pmon
[orgrid@ohs1 ~]$ kill -9 6645
[orgrid@ohs1 ~]$ ps -ef|grep pmon
orgrid 5772 1 0 14:16 ? 00:00:00 asm_pmon_+ASM1
oracle 14903 1 0 15:14 ? 00:00:00 ora_pmon_racdb_1
oracle 15027 10475 0 15:14 pts/1 00:00:00 grep pmon
[orgrid@ohs1 ~]$ kill -9 14903
[orgrid@ohs1 ~]$ ps -ef|grep pmon
orgrid 5772 1 0 14:16 ? 00:00:00 asm_pmon_+ASM1
oracle 15213 10475 0 15:15 pts/1 00:00:00 grep pmon
[orgrid@ohs1 ~]$
可以看到数据库在第二个节点上成功启动,注意数据库的SID为racdb_1
[root@ohs2 ~]# ps -ef|grep ora_pmon
oracle 13224 1 0 15:15 ? 00:00:00 ora_pmon_racdb_1
root 13432 9205 0 15:16 pts/1 00:00:00 grep pmon
[root@ohs2 ~]#
数据库现在在节点二上运行,我们可以通过relocate把数据库重新在节点一上运行,注意relocate之后SID变了
[orgrid@ohs1 ~]$ srvctl relocate database -d racdb -n ohs1
[orgrid@ohs1 ~]$ ps -ef|grep pmon
orgrid 5772 1 0 14:16 ? 00:00:00 asm_pmon_+ASM1
oracle 17548 1 0 15:29 ? 00:00:00 ora_pmon_racdb_2
oracle 17909 10475 0 15:30 pts/1 00:00:00 grep pmon
[orgrid@ohs1 ~]$
[root@ohs2 ~]# ps -ef|grep pmon
orgrid 5820 1 0 14:16 ? 00:00:00 asm_pmon_+ASM2
root 15915 9205 0 15:33 pts/1 00:00:00 grep pmon
[root@ohs2 ~]#
RAC One Node总结
如果节点意外终止(可以称为Failover),比如通过kill -9或其他未知因素造成数据库Crash,GI首先会尝试在这一节点上重新启动实例。如果启动不成功,会在其他节点启动实例。假如实例名为racdb_1,Failover之后实例名仍是racdb_1,这就是之前的测试为什么第一次kill pmon进程之后仍旧在同一节点启动,以及在第二个节点启动之后实例名没有改变。
在正常运行情况下,我们可以手动的切换实例(可以称为switchover)。假如实例在节点二上运行,我们通过relocate命令把数据库移动节点一,Oracle会在第一个节点创建pfile,由于SID已经被占用,所以会使用一个新的SID,等到节点一上实例成功启动后,会关闭第二个节点的实例。这也是上面为什么上面的测试中relocate之后,实例名会变化的原因。
基于Admin-Managed的RAC One Node数据库,因为配置是固定的,所以实例名不会发生变化。
Glossary
http://docs.oracle.com/cd/E11882_01/rac.112/e41959/glossary.htm#CWADD91440
What's New in Oracle RAC Administration and Deployment?
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/whatsnew.htm#RACAD000
Understanding Server pool
http://docs.oracle.com/cd/E11882_01/install.112/e41961/concepts.htm#CWLIN2966
Administering Oracle RAC One Node
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/onenode.htm#RACAD7894
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/srvctladmin.htm#RACAD005
Administering Database Instances and Cluster Databases
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/admin.htm#RACAD900
Administering Oracle Clusterware
http://docs.oracle.com/cd/E11882_01/rac.112/e41959/admin.htm#CWADD838