Focus On Oracle

Installing, Backup & Recovery, Performance Tuning,
Troubleshooting, Upgrading, Patching

Oracle Engineered System


当前位置: 首页 » 技术文章 » Oracle

RAC LoadBalance and Failover

谈起Oracle RAC,就不能不说Oracle高可用性,高可用体现在负载均衡(Load Balance)和故障切换(Failover)。根据是否使用连接池还可分为Connect-time和Run-time两种情况,这两种情况下又可以分别实现Load Balancing和Failover,涉及到的知识点还有ONS、TAF、FAN、FCF、LBA等。

TAF
TAF是transparent application failover的缩写。从10gR2开始,你可以通过dbms_service包去配置TAF策略。她只适合于OCI方式的客户端。如果配置TAF类型为select,对于select语句,Oracle会继续从中断的地方执行下,不会中断;对于INSERT,UPDATE,DELETE的事务,不会断点续传,会重新开始。当使用OCI客户端做TAF,你应当启用FCF,她会使Failover更快。

Note: TAF不支持JDBC thin,因为JDBC thin不属于OCI


FAN

FAN是Oracle RAC的一个特性,是一种消息主动通知机制,是Fast Application Notification的缩写。当数据库节点的信息变化(Node up/down, instance up/down, database up/down)和节点负载信息发生变化,Oracle通过FAN events把这些信息发布出去,订阅FAN events的客户端在第一时间收到这些FAN events后能做出相应的动作来响应这些FAN events,有2种类型event,HA Event和LBA Event。

FCF

FCF是通过支持FAN Event的Oracle客户端/应用,接收FAN Events,当收到Down Event(HA events)时,和这个Instance/Service相关的连接会标记为无效并被清除;收到Up Event时,可以创建新的连接。FCF和TAF有很大的区别,FCF不能像TAF那样做到"断点续传",即便是一个简单的select。对于正在运行的Transaction会马上中止并回滚。当应用捕捉到这个中止的Transaction所产生的错误信息后,可以直接把相关错误返回给最终用户,或者从连接池中重新获取一个有效连接并重新执行这个被中止的Transaction。Tomcat和JBOSS可以利用FCF的特性如果使用Oracle的连接池,可以用UCP(Universal Connection Pool for JAVA)或ICC(JDBC Implicit Connection Cache)。建议使用UCP,因为ICC在将来的版本中会被废弃。

三者之间的关系
ONS -->发送/接收本地和远端的信息
FAN -->通过ONS去告知其他进程service的变化和负载的变化(HA event,LBA event)
FCF -->通过FAN的事件信息和Java连接池和其他连接池一起工作
这三者的关系,可以表示为ONS --> FAN --> FCF

A.Connect-Time Load Balancing

是指在连接Oracle数据时的负载均衡。连接时的负载均衡分两种情况,一种是客户端的负载均衡,另一种是服务器端的负载均衡。

    1.客户端的负载均衡:
    10g tnsnames.ora配置信息
    ORDB=(DESCRIPTION=
           (ADDRESS_LIST=
            (LOAD_BALANCE=ON)
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb1-vip)(PORT=1526))
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb2-vip)(PORT=1526))
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb3-vip)(PORT=1526))
           )
           (CONNECT_DATA = (SERVICE_NAME=ordb))
          )

    11g tnsnames.ora配置信息
    PROD=(DESCRIPTION =
          (LOAD_BALANCE=ON)
          (ADDRESS = (PROTOCOL = TCP)(HOST = prod-scan)(PORT = 1526))
          (CONNECT_DATA = (SERVICE_NAME =prod))
         )

    客户端的Connect Time Load Balancing很常容易实现,在11gR2之前,我们需要在tnsnames.ora文件中配置多个vip,在加上参数LOAD_BALANCE=ON就好,Oracle会随机选一个来连接数据库。11gR2出现了SCAN(Single Client Access Name),如果我们采用DNS/GNS方式,最多有3个SCAN VIP和Listener,如果采用hosts文件解析,最多有一个SCAN VIP和Listener。客户端通过SCAN连接数据库,针对SCAN采用hosts文件解析的情况,严格点来说,这不能称为Connect Time Load Balancing,因为只有一个SCAN VIP,11gR2默认的remote_listener是SCAN的信息,这个只能算做是服务器端的负载均衡。


    2.服务器端负载均衡

    服务器端的负载均衡是通过local_listener(使用VIP地址)和remote_listener(11gR2使用prod-scan:1526方式,之前版本采用VIP地址方式)来实现。

    10glocal_listener和remote_listener配置信息
    节点一:local_listener= (ADDRESS = (PROTOCOL = TCP)(HOST = ordb1-vip)(PORT = 1526))
    节点二:local_listener= (ADDRESS = (PROTOCOL = TCP)(HOST = ordb2-vip)(PORT = 1526))
    节点三:local_listener= (ADDRESS = (PROTOCOL = TCP)(HOST = ordb3-vip)(PORT = 1526))      
     节点一,二,三: 
    remote_listener=
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = TCP)(HOST = ordb1-vip)(PORT = 1526))
        (ADDRESS = (PROTOCOL = TCP)(HOST = ordb1-vip)(PORT = 1526))
        (ADDRESS = (PROTOCOL = TCP)(HOST = ordb1-vip)(PORT = 1526))
      )
    11g local_listener和remote_listener配置信息
    节点一:local_listener= (ADDRESS = (PROTOCOL = TCP)(HOST = prod1-vip)(PORT = 1526))
    节点二:local_listener= (ADDRESS = (PROTOCOL = TCP)(HOST = prod2-vip)(PORT = 1526))
    节点三:local_listener= (ADDRESS = (PROTOCOL = TCP)(HOST = prod3-vip)(PORT = 1526))
    
    节点一,二,三:remote_listener=prod-scan:1526

B.Run-time Connection Load Balancing

是指从连接池中获取已有连接时的负载均衡。在使用连接池的情况下,单纯的服务器端的负载均衡不能保证当应用从连接池里取得一个已有连接的时候,这个连接就一定指向了负载较低的那个节点。这个时候应用从连接池里取得的连接很可能是连接池刚开始初始化的时候形成的连接,只反映了连接池初始化这个点的各个节点的负载情况,随着时间的推移,节点的负载情况可能发生了很大的变化,所以这种情况下连接池的连接很可能并不是真正的负载均衡。

通过FAN可以解决这个问题,FAN有2种事件:HA Event和LBA Event。

HA Event
支持FAN events的连接池通过订阅HA events,就可以保证当应用从连接池里取得一个已有的有效连接,不会指向那些service宕掉或者实例崩溃的节点,因为和这个宕掉的instance/service相关的连接会标记为无效并清除连接。订阅 FAN HA events 的客户端包括:JDBC Implicit Connection Cache, OCI, ODP.NET Connection Pools, Listener, Server Side Callouts

LBA Event
通过订阅LBA events,就能知道RAC各个节点实际的负载情况,所以当应用需要从连接池里取得一个已有连接的时候,连接池就能提供给用户一个真正的负载较低的RAC节点,这样就实现了真正的Runtime Connection Load Balancing。订阅 LBA events 的客户端包括:JDBC Implicit Connection Cache, ODP.NET Connection Pools, Listener,OCI Session Pools

以我们常用的JDBC Thin(JDBC Runtime Connection Load Balancing)为例来配置,第一步需要配置FCF,第二步需要配置service的属性

1.在JDBC数据源中配置FCF
    http://docs.oracle.com/cd/E11882_01/java.112/e16548/fstconfo.htm#JJDBC28831
    // declare datasource
    ods.setUrl(
    "jdbc:oracle:thin:@(DESCRIPTION=
      (ADDRESS=(PROTOCOL=TCP)(HOST=cluster_alias)
        (PORT=1526))
        (CONNECT_DATA=(SERVICE_NAME=AP)))");
    ods.setUser("scott");
    ods.setConnectionCachingEnabled(true);
    ods.setFastConnectionFailoverEnabled(true):
    ctx.bind("myDS",ods);
    ds=(OracleDataSource) ctx.lookup("MyDS");
    try {
     ds.getConnection();  // transparently creates and accesses cache
     catch (SQLException SE {
      }
    }

2.配置Service:
    http://docs.oracle.com/cd/E11882_01/java.112/e16548/rlb.htm#JJDBC28747
    The service goal must be set to one of the following:
       DBMS_SERVICE.SERVICE_TIME
       DBMS_SERVICE.THROUGHPUT
    
    The connection balancing goal must be set to SHORT(默认是LONG).
       srvctl add service -d prod -s GL -B SERVICE_TIME -j SHORT -r prod1,prod2
    
11g srvctl add service的帮助信息
[orgrid@11grac1 ~]$ srvctl add service -h
Adds a service configuration to the Oracle Clusterware.
Usage: srvctl add service -d <db_unique_name> -s <service_name> {-r "<preferred_list>" [-a "<available_list>"] [-P {BASIC | NONE | PRECONNECT}] | -g <pool_name> [-c {UNIFORM | SINGLETON}] } [-k   

<net_num>] [-l [PRIMARY][,PHYSICAL_STANDBY][,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]] [-y {AUTOMATIC | MANUAL}] [-q {TRUE|FALSE}] [-x {TRUE|FALSE}] [-j {SHORT|LONG}] [-B {NONE|

SERVICE_TIME|THROUGHPUT}] [-e {NONE|SESSION|SELECT}] [-m {NONE|BASIC}] [-z <failover_retries>] [-w <failover_delay>] [-t <edition>] [-f]
    -d <db_unique_name>      Unique name for the database
    -s <service>             Service name
    -r "<preferred_list>"    Comma separated list of preferred instances
    -a "<available_list>"    Comma separated list of available instances
    -g <pool_name>           Server pool name
    -c {UNIFORM | SINGLETON} Service runs on every active server in the server pool hosting this service (UNIFORM) or just one server (SINGLETON)
    -k <net_num>             network number (default number is 1)
    -P {NONE | BASIC | PRECONNECT}        TAF policy specification
    -l <role>                Role of the service (primary, physical_standby, logical_standby, snapshot_standby)
    -y <policy>              Management policy for the service (AUTOMATIC or MANUAL)
    -e <Failover type>       Failover type (NONE, SESSION, or SELECT)
    -m <Failover method>     Failover method (NONE or BASIC)
    -w <integer>             Failover delay
    -z <integer>             Failover retries
    -t <edition>             Edition (or "" for empty edition value)
    -j <clb_goal>  Connection Load Balancing Goal (SHORT or LONG). Default is LONG.
    -B <Runtime Load Balancing Goal>     Runtime Load Balancing Goal (SERVICE_TIME, THROUGHPUT, or NONE)
    -x <Distributed Transaction Processing>  Distributed Transaction Processing (TRUE or FALSE)
    -q <AQ HA notifications> AQ HA notifications (TRUE or FALSE)
Usage: srvctl add service -d <db_unique_name> -s <service_name> -u {-r "<new_pref_inst>" | -a "<new_avail_inst>"} [-f]
    -d <db_unique_name>      Unique name for the database
    -s <service>             Service name
    -u                       Add a new instance to service configuration
    -r <new_pref_inst>       Name of new preferred instance
    -a <new_avail_inst>      Name of new available instance
    -f                       Force the add operation even though a listener is not configured for a network
    -h                       Print usage
[orgrid@11grac1 ~]$ 

12c srvctl add service的帮助信息
[orgrid@db1 ~]$ srvctl add service -h
Adds a service configuration to be managed by Oracle Restart.

Usage: srvctl add service -db <db_unique_name> -service <service_name> [-role [PRIMARY][,PHYSICAL_STANDBY][,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]] [-policy {AUTOMATIC | MANUAL}][-

notification {TRUE|FALSE}] [-clbgoal {SHORT|LONG}] [-rlbgoal {NONE|SERVICE_TIME|THROUGHPUT}][-failovertype {NONE|SESSION|SELECT|TRANSACTION}] [-failovermethod {NONE|BASIC}][-

failoverretry <failover_retries>] [-failoverdelay <failover_delay>] [-edition <edition>] [-pdb <pluggable_database>] [-global <TRUE|FALSE>] [-maxlag <max_lag_time>] [-sql_translation_profile

<sql_translation_profile>] [-commit_outcome {TRUE|FALSE}] [-retention <retention>] [replay_init_time <replay_initiation_time>] [-session_state {STATIC|DYNAMIC}] [-force]
    -db <db_unique_name>           Unique name for the database
    -service <service>             Service name
    -role <role>                   Role of the service (primary, physical_standby, logical_standby, snapshot_standby)
    -policy <policy>               Management policy for the service (AUTOMATIC or MANUAL)
    -failovertype                  (NONE | SESSION | SELECT | TRANSACTION)      Failover type
    -failovermethod                (NONE | BASIC)     Failover method
    -failoverdelay <failover_delay> Failover delay (in seconds)
    -failoverretry <failover_retries> Number of attempts to retry connection
    -edition <edition>             Edition (or "" for empty edition value)
    -pdb <pluggable_database>      Pluggable database name
    -maxlag <maximum replication lag> Maximum replication lag time in seconds (Non-negative integer, default value is 'ANY')
    -clbgoal                       (SHORT | LONG)                   Connection Load Balancing Goal. Default is LONG.
    -rlbgoal                       (SERVICE_TIME | THROUGHPUT | NONE)     Runtime Load Balancing Goal
    -notification                  (TRUE | FALSE)  Enable Fast Application Notification (FAN) for OCI connections
    -global <global>               Global attribute (TRUE or FALSE)
    -sql_translation_profile <sql_translation_profile> Specify a database object for SQL translation profile
    -commit_outcome                (TRUE | FALSE)          Commit outcome
    -retention <retention>         Specifies the number of seconds the commit outcome is retained
    -replay_init_time <replay_init_time> Seconds after which replay will not be initiated
    -session_state <session_state> Session state consistency (STATIC or DYNAMIC)
    -force                         Force the add operation even though a listener is not configured for a network
    -verbose                       Verbose output
    -help                          Print usage
[orgrid@db1 ~]$ 
 
C.Connect-time failover

是指客户端在连接数据库时的Failover。通过配置TNS,在TNS中加入多个监听信息(VIP),指定FAILOVER=ON即可。她不是从连接池中取得连接。格式如下:

    10g tnsnames.ora配置信息
    ORDB=(DESCRIPTION=
           (FAILOVER=ON)
           (ADDRESS_LIST=
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb1-vip)(PORT=1526))
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb2-vip)(PORT=1526))
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb3-vip)(PORT=1526))
           )
           (CONNECT_DATA = (SERVICE_NAME=ordb))
          )
    当连接第一个节点失败时,会尝试连接第二个节点,知道所有的节点结束为止。        
    11g tnsnames.ora配置信息
    PROD=(DESCRIPTION =
          (FAILOVER=ON)
          (ADDRESS = (PROTOCOL = TCP)(HOST = prod-scan)(PORT = 1526))
          (CONNECT_DATA = (SERVICE_NAME =prod))
         )

    当使用DNS/GNS,启用FAILOVER=ON,如果其中的一个SCAN VIP连不上,则会尝试另外一个SCAN VIP,最多三个。 

    注意:在11gR2中,用hosts文件来指定SCAN VIP时,RAC环境只有一个SCAN VIP,这时Failover也存在。当SCAN VIP的那个节点宕掉后,SCAN VIP和SCAN Listener会一起漂移到其他节点,等到这个SCAN VIP在其他节点启动成功后,客户端才能连接,这种情况下Failover的速度会慢一些。

D.Run-time failover

这种情况下的Failover是指使用连接池中的连接,或通过OCI客户端(比如sqlplus)连上Oracle数据库后的连接。如果数据库端出现异常(服务宕了、实例崩溃了、会话断了),会导致已有连接中断,怎样Failover才能保证数据库的高可用性的问题。我们有两种方法来实现Run-time Connection Failover,分别为TAF(Transparent Application Failover)和FCF(Fast Connection Failover)

A.配置TAF的步骤
    TAF有以下特点:
    a.可以在Client端tnsnames.ora和Server端的Service中定义,Service端的设置会取代(override)客户端的配置
    b.当TAF的TYPE设置为select的时候,Failover后单纯的select操作可以从中断的地方继续执行
    c.对DML(insert,delete,update)操作TAF后不能做到断点续传。当一个事务在使用TAF实现Failover后,不能从中断的地方继续执行,需要从头开始执行
    d.TAF仅对使用OCI连接的客户端和连接池有效(比如sqlplus,JDBC-OCI driver)

    注意:JDBC thin driver不是基于OCI,所以JDBC thin driver不支持TAF

    客户端可以这样设置
    ORDB=(DESCRIPTION=
           (FAILOVER=ON)
           (ADDRESS_LIST=
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb1-vip)(PORT=1526))
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb2-vip)(PORT=1526))
            (ADDRESS=(PROTOCOL=TCP)(HOST=ordb3-vip)(PORT=1526))
           )
           (CONNECT_DATA = (SERVICE_NAME=AP))
           (FAILOVER_MODE= (TYPE=SELECT)(METHOD=BASIC)(RETRIES=50)(DELAY=5))
          )
     服务器可以这样设置
    srvctl modify service -d EBS -s AP -q TRUE -P BASIC -e SELECT -z 100 -w 5 -j LONG      

B.配置FCF的步骤
FCF:即Fast Connection Failover的缩写,实际上她是客户端通过订阅FAN HA events来实现的。配置JDBC Fast Connection Failover (FCF),是指JDBC thin,
http://docs.oracle.com/cd/E11882_01/java.112/e16548/fstconfo.htm#JJDBC28825

启用implicit connection cache

ods.setConnectionCachingEnabled(true);

启用FastConnectionFailoverEnabled
ods.setFastConnectionFailoverEnabled(true):

最好是在Java程序里设置一下TCP timeout

应用程序可以调用OracleConnectionCacheManager上的isFatalConnectionError() API来决定所捕获的SQLException是否致命
如果该 API 的返回值为 true,则需重试 DataSource.xxxxxx 上的getConnection

try {
conn = getConnection();
} catch (SQLException e) {
handleSQLException(e)
}
void handleSQLException (SQLException e)
{
if
(OracleConnectionCacheManager.isFatalConnectionError(e))

retry getConnection

}


Reference
The ONS Daemon Explained In Oracle Clusterware/RAC Environment (Doc ID 759895.1)
                                                                                
Application Failover with Oracle Database 11g
http://www.oracle.com/technetwork/database/app-failover-oracle-database-11g-173323.pdf

Client Failover Best Practices for Highly Available Oracle Databases: Oracle Database 10g Release 2
http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_ClientFailoverBestPractices.pdf

Oracle Universal Connection Pool for JDBC Developer's Guide,
http://download.oracle.com/docs/cd/E11882_01/java.112/e12265/rac.htm#CHDCDFAC

http://docs.oracle.com/cd/E11882_01/java.112/e12265/rac.htm#CHDHCGGG


10gR2/11gR1/11gR2 Oracle Database JDBC Developer's Guide and Reference

Fast Connection Failover
http://download-west.oracle.com/docs/cd/B19306_01/java.102/b14355/fstconfo.htm
http://download.oracle.com/docs/cd/B28359_01/java.111/b31224/fstconfo.htm#CIHJBFFC
http://docs.oracle.com/cd/E11882_01/java.112/e16548/fstconfo.htm#CIHJBFFC
http://docs.oracle.com/cd/E11882_01/java.112/e16548/apxracfan.htm#JJDBC28934

Enabling Advanced Features of Oracle Net Services

http://docs.oracle.com/database/121/NETAG/advcfg.htm#NETAG013

Oracle  Call Interface Programmer's Guide

http://docs.oracle.com/cd/E11882_01/appdev.112/e10646/toc.htm


名词解释
connection pooling
A resource utilization and user scalability feature that enables you to maximize the number of sessions over a limited number of protocol connections to a shared server.

client load balancing
Load balancing, whereby if more than one listener services a single database, a client can randomly choose between the listeners for its connect requests. This randomization enables all listeners to share the burden of servicing incoming connect requests.

connection load balancing
The method for balancing the number of active connections for the same service across the instances and dispatchers. Connection load balancing enables listeners to make routing decisions based on how many connections for each dispatcher and the load on the nodes.

Runtime Connection Load Balancing
Enables Oracle Database to make intelligent service connection decisions based on the connection pool that provides the optimal service for the requested application based on current workloads. The JDBC, ODP.NET, and OCI clients are integrated with the load balancing advisory; you can use any of these client environments to provide runtime connection load balancing.

Connect-time failover
A client connect request is forwarded to a another listener if a listener is not responding. Connect-time failover is enabled by service registration, because the listener knows if an instance is running to attempt a connection.

SCAN: single client access name (SCAN)
Oracle Database 11g database clients use SCAN to connect to the database. SCAN can resolve to multiple IP addresses, reflecting multiple listeners in the cluster handling public client connections.

FAN: Fast Application Notification
Applications can use FAN to enable rapid failure detection, balancing of connection pools after failures, and re-balancing of connection pools when failed components are repaired. The FAN notification process uses system events that Oracle Database publishes when cluster servers become unreachable or if network interfaces fail.

FCF: Fast Connection Failover
Fast Connection Failover provides high availability to FAN integrated clients, such as clients that use JDBC, OCI, or ODP.NET. If you configure the client to use fast connection failover, then the client automatically subscribes to FAN events and can react to database UP and DOWN events. In response, Oracle Database gives the client a connection to an active instance that provides the requested database service.

TAF: Transparent Application Failover
A runtime failover for high-availability environments, such as Oracle RAC and Oracle RAC Guard, TAF refers to the failover and re-establishment of application-to-service connections. It enables client applications to automatically reconnect to the database if the connection fails, and optionally resume a SELECT statement that was in progress. This reconnect happens automatically from within the Oracle Call Interface library.

ONS: Oracle Notification Service
A publish and subscribe service for communicating information about all FAN events.

policy-managed database
A database that you define as a cluster resource. Management of the database is defined by how you configure the resource, including on which servers the database can run and how many instances of the database are necessary to support the expected workload.

关于ONS的解释

1. purpose of the ons daemon

The Oracle Notification Service daemon is a daemon started by the Oracle Clusterware as part of the nodeapps. There is one ons daemon started per clustered node.

The Oracle Notification Service daemon is receiving a subset of published clusterware events
via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners, this in order to facilitate:

a. the FAN or Fast Application Notification feature for allowing applications to respond
to database state changes. Fast Connection Failover (FCF) is the client mechanism which uses the
FAN feature to achieve it. FCF clients/subscribers are JDBC, OCI, and ODP.NET in 10gR2.

b. the Load Balancing Advisory (the RLB feature) or the feature that permit load balancing
accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating
an advisory for distribution of work every 30seconds and forward it via racgimon and ONS
to listeners and applications.

5. ons clients/subscribers

clients or subscribers connected to the ons are viewable via the SUBS column of the client connections
output of the 'onsctl debug', e.g.

Client connections:
    ID           IP        PORT    FLAGS    SENDQ     WORKER   BUSY  SUBS
---------- --------------- ----- -------- ---------- -------- ------ -----
         2 127.000.000.001  6101 0001001a          0               1     0
         5 127.000.000.001  6101 0001001a          0               1     1
5.1 the listeners are subscribers for the ons daemon

When the ons is started, the listeners will register to the ons as client subscribers to all FAN and RLB events.

Parameter SUBSCRIBE_FOR_NODE_DOWN_EVENT_<listener_name>=ON need to be set in the
listener.ora files. When that parameter is set and TRACE_LEVEL_<listener_name>=16 is set,
then a problem to subscribe to the locally running ONS can be viewed in the listener.log via messages like

WARNING: Subscription for node down event still pending

It is normally due to note:284602.1 and  bug:4417761. When you start the listener using lsnrctl, environment variable ORACLE_CONFIG_HOME = {Oracle Clusterware HOME}
need to be set prior to 10.2.0.4 (settable in the $ORACLE_HOME/bin/racgwrap scripts).
5.2 application clients/subscribers

With Oracle Database 10g Release 1, JDBC clients (both thick and thin driver) are integrated
with FAN by providing FCF. With Oracle Database 10g Release 2, ODP.NET and OCI clients
have been added. note:433827.1 can be used to setup an FCF client.

6. the FAN and RLB events

There are two types of events ONS handle. The FAN event (or HA events) are meant for FAN processing.The RLB events are meant for workload management. When setting loglevel to 9, it is possible to

check the events viewed in the <crs home>/opmn/logs/ons.log files.
6.1 the FAN events

The FAN events (event type=database/event/service) are forwarded by the racgimon (for pre 11gR2 databases) or by the 11gR2 agent and evmd clusterware processes to the ons daemon. Main

bug:13879428 need to be fixed in this area (see note:1489751.1) and bug:6760284 RACGEVTF SOMTIMES DOES NOT SEND ONS EVENT.
6.1.1 FAN events forwarded to the ons daemon by 11gR2 agent or pre-11gR2 racgimon

The clusterware forward instance and service up/down events to the ons daemon.

e.g. ../opmn/logs> grep -E "body|VERSION" ons.log (loglevel=9)
09/01/07 13:56:24 [8] Connection 2,127.0.0.1,6101 body:
VERSION=1.0 service= instance=ASM1 database= host=hostname1 status=up reason=boot
09/01/07 13:56:45 [8] Connection 4,127.0.0.1,6101 body:
VERSION=1.0 service=H102 instance=H1021 database=H102 host=hostname1 status=up reason=boot
09/01/07 13:56:46 [8] Connection 4,127.0.0.1,6101 body:
VERSION=1.0 service=ALL instance=H1021 database=H102 host=hostname1 status=up card=1 reason=boot
...
09/01/07 14:20:26 [8] Connection 5,140.87.216.64,6200 body:
VERSION=1.0 service=H102 instance=H1022 database=H102 host=hostname2 status=down reason=user
09/01/07 14:20:26 [8] Connection 5,140.87.216.64,6200 body:
VERSION=1.0 service=ALL instance=H1022 database=H102 host=hostname2 status=down reason=failure
09/01/07 14:20:27 [8] Connection 5,140.87.216.64,6200 body:
VERSION=1.0 service=ALL instance=H1022 database=H102 host=hostname2 status=not_restarting reason=UNKNOWN
6.1.2 FAN events forwarded from the evmd daemon to the ons daemon

It concerns the node down and public network down events. Main bug:6083726 (see note:6083726.8), Bug:9538932 REBOOTED SERVER NODE ONS DOES NOT SEND EVENTS TO CLIENT ONS and

bug:6760284 RACGEVTF SOMTIMES DOES NOT SEND ONS EVENT need to be fixed in this area.

When there is a node down event or a public vip network down event, then the evmd will post an event to
the ONS, i.e. when the vip is stopped on a preferred node, then a public network down event is originated from the failing node. This evm event is received by the evmd on all other surviving nodes via the

interconnect. The evmd on the remote nodes then publish the event to the ONS daemon locally.

e.g. ons.log with level=9 showing

VERSION=1.0 host=hostname incarn=100 status=nodedown reason=member_leave

The FAN events are a subset of the EVM events (logged in the $CRS_HOME/evm/log/<hostname>_evmlog.<date> files). All evm events can be viewed via:
evmshow -t "@timestamp @@"  <hostname>_evmlog.<date>
6.2 the RLB events

The RLB events (event type=database/event/servicemetrics/<service_name>) sent by the racgimon on MMON background process request

e.g. Notification Type "database/event/servicemetrics/ALL" set via
exec DBMS_SERVICE.MODIFY_SERVICE (service_name => 'ALL', goal => DBMS_SERVICE.GOAL_THROUGHPUT, clb_goal => DBMS_SERVICE.CLB_GOAL_SHORT)

Querying the sys$service_metrics_tab show MMON events logged every 30seconds, e.g.

SELECT user_data from SYS.SYS$SERVICE_METRICS_TAB order by 1 ;
USER_DATA(SRV, PAYLOAD)
--------------------------------------------------------------------------------
SYS$RLBTYP('ALL', 'VERSION=1.0 database=H102 service=ALL { {instance=H1021 percent=100 flag=UNKNOWN} } timestamp=2009-01-07 21:39:18')

grep -E 'body|percent' ons.log (with loglevel=9) show the same events
VERSION=1.0 database=H102 { {instance=H1021 percent=100 flag=UNKNOWN} } timestamp=2009-01-07 21:39:48
09/01/07 21:39:48 [9] Worker Thread 2 sending body [20:1073838448]: connection 5,140.87.216.64,6200


JDBC常用字符串

jdbc:oracle:oci:@TNS_ALIAS

jdbc:oracle:oci:@(DESCRIPTION= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=host1) (PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=host2)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=service_name)))

jdbc:oracle:oci:@(DESCRIPTION= (ADDRESS=(PROTOCOL=TCP)(HOST=cluster_alias) (PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=service_name)))

jdbc:oracle:thin@//host:port/service_name

jdbc:oracle:thin@//cluster-alias:port/service_name

jdbc:oracle:thin:@(DESCRIPTION= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=host1) (PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=host2)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=service_name)))

jdbc:oracle:thin:@(DESCRIPTION= (ADDRESS=(PROTOCOL=TCP)(HOST=cluster_alias) (PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=service_name)))

关键词:rac 

相关文章

Oracle 19c新特性之RAC Automatic Failback Service
Install Oracle RAC Database 19c Step by Step
Oracle事务卫士(Transaction Guard)和应用连续性(Application Continuity)
Install Oracle Domain Service Cluster Step by Step
Oracle RAC and Third Party Cloud
ORA-12514 During DataPump Export/Import In RAC
How to config IB network listener
Oracle MAA汇总
在OEL6.8上安装12.2 RAC
Oracle Database 12.2 Hands-On Lab
How to Convert Physical Standby to Snapshot Standby
如何配置HITACHI存储多路径软件
Top