This White Paper assumes that:
You are familiar with Sybase Replication
Server. This White Paper does not explain the steps necessary to
install Sybase Replication Server.
You are familiar with Sun Cluster HA. This White
Paper does not explain the steps necessary to install Sun Cluster
You have a two-node cluster hardware with Sun Cluster
Sun Cluster 2.2 Software Planning
and Installation Guide
Sun Cluster 2.2 System Administration
Configuring Sybase Adaptive Server Enterprise
12.0 Server for High Availability: Sun Cluster HA (see
White Papers under www.sybase.com/products/databaseservers/ase).
Replication Server documentation (see Product Manuals
This document uses these terms:
Cluster – multiple systems,
or nodes, that work together as a single entity to provide applications,
system resources, and data to users.
Cluster node – a physical machine that
is part of a Sun Cluster. Also called a physical host.
Data service – an application that provides
client service on a network and implements read and write access
to disk-based data. Replication Server and Adaptive Server Enterprise
are examples of data services.
Disk group – a well-defined group of multihost
disks that move as a unit between two servers in an HA configuration.
Fault monitor – a daemon that probes data
High availability (HA) – very low downtime.
Computer systems that provide HA usually provide 99.999% availability,
or roughly five minutes unscheduled downtime per year.
Logical host – a group of resources including
a disk group, logical host name, and logical IP address. A logical
host resides on (or is mastered by) a physical host (or node) in
a cluster machine. It can move as a unit between physical hosts
on a cluster.
Master – the node with exclusive read and
write access to the disk group that has the logical address mapped
to its Ethernet address. The current master of the logical host
runs the logical host’s data services.
Multihost disk – a disk configured for
potential accessibility from multiple nodes.
Failover – the event triggered by a node
or a data service failure, in which logical hosts and the data services
on the logical hosts move to another node.
Failback – a planned event, where a logical
host and its data services are moved back to the original hosts.
Sun Cluster HA is a hardware- and software-based high availability
solution. It provides high availability support on a cluster machine
and automatic data service failover in just a few seconds. It accomplishes
this by adding hardware redundancy, software monitoring, and restart
Sun Cluster provides cluster management tools for a System
Administrator to configure, maintain, and troubleshoot HA installations.
The Sun Cluster configuration tolerates these single-point
When any of these failures occur, HA software fails over logical
hosts onto another node and restarts data services on the logical
host in the new node.
Sybase Replication Server is implemented as a data service
on a logical host on the cluster machine. The HA fault monitor for
Replication Server periodically probes Replication Server. If Replication
Server is down or hung, the fault monitor attempts to restart Replication
Server locally. If Replication Server fails again within a configurable
period of time, the fault monitor fails over to the logical host
so the Replication Server will be rebooted on the second node.
To Replication Server clients, it appears as though the original
Replication Server has experienced a reboot. The fact that it has
moved to another physical machine is transparent to the users. Replication
Server is affiliated with a logical host, not the physical machine.
As a data service, the Replication Server includes a set of
scripts registered with Sun Cluster as callback methods. Sun Cluster
calls these methods at different stages of failover:
FM_STOP – to shut
down the fault monitor for the data service to be failed over.
STOP_NET – to shut down the data
START_NET – to start the data
service on the new node.
FM_START – to start the fault
monitor on the new node for the data service.
Each Replication Server is registered as a data service using
the hareg command. If you have multiple Replication
Servers running on the cluster, you must register each of them.
Each data service has its own fault monitor as a separate process.
For detailed information about the hareg command,
see the appropriate Sun Cluster documentation.
Server for high availability
This section describes the tasks required to configure a Replication
Server for HA on Sun Cluster (assuming a two-node cluster machine).
Configuring Sun Cluster
The system should have the following components:
Two homogenous Sun Enterprise servers
with similar configurations in terms of resources like CPU, memory,
and so on. The servers should be configured with cluster interconnect,
which is used for maintaining cluster availability, synchronization,
The system should be equipped with a set of multihost
disks. The multihost disk holds the data (partitions) for a highly
available Replication Server. A node can access data on a multihost
disk only when it is a current master of the logical host to which
the disk belongs.
The system should have Sun Cluster HA software installed,
with automatic failover capability. The multihost disks should have
unique path names across the system.
For disk failure protection, disk mirroring (not
provided by Sybase) should be used.
Logical hosts should be configured. Replication
Server runs on a logical host.
Make sure the logical host for the Replication Server
has enough disk space in its multihosted disk groups for the partitions,
and that any potential master for the logical host has enough memory
for the Replication Server.
Installing Replication Server
During Replication Server installation, you need to perform
these tasks in addition to the tasks described in the Replication
Server installation guide:
As a Sybase user, load
Replication Server either on a shared disk or on the local disk.
If it is on a shared disk, the release cannot be accessed from both machines
concurrently. If it is on a local disk, make sure the release paths are
the same for both machines. If they are not the same, use a symbolic link,
so they will be the same. For example, if the release is on /node1/repserver on
node1, and /node2/repserver on
node2, link them to /repserver on
both nodes so the $SYBASE environment
variable is the same across the system.
Add entries for Replication Server, RSSD server,
and primary/replicate data servers to the interfaces file
in the $SYBASE directory on both machines.
Use the logical host name for Replication Server in the interfaces file.
Start the RSSD server.
Follow the installation guide for your platform
to install Replication Server on the node that is currently the
master in the logical host. Make sure that you:
Set the environment
variables SYBASE, SYBASE_REP, and SYBASE_OCS:
setenv SYBASE /REPSERVER1210
setenv SYBASE_REP REP-12_1
setenv SYBASE_OCS OCS-12_0
where /REPSERVER1210 is the
Choose a run directory for the Replication Server
that will contain the Replication Server run file, configuration
file, and log file. The run directory should exist on both nodes
and have exactly the same paths on both nodes (the path can be linked
Choose the multihosted disks for the Replication
Initiate the rs_init command,
from the run directory:
Make sure that Replication Server is started.
As a Sybase user, copy the run file and the configuration
file to the other node in the same path. Edit the run file on the
second node to make sure it contains the correct path of the configuration
and log files, especially if links are used.
The run file name must be RUN_repserver_name,
where repserver_name is the name of
the Replication Server. You can define the configuration and log
Installing Replication Server
as a data service
You also need to perform these specialized tasks to install
Replication Server as a data service:
As root, create the
directory /opt/SUNWcluster/ha/repserver_name on both
cluster nodes, where repserver_name is
the name of your Replication Server. Each Replication Server must
have its own directory with the server name in the path. Copy the
following scripts from the Replication Server installation directory $SYBASE/$SYBASE_REP/sample/ha to:
on both cluster nodes, where repserver_name is
the name of your Replication Server:
If the scripts already exist on the local machine as part
of another Replication Server data service, you can create:
as a link to the script directory instead.
As root, create the directory /var/opt/repserver on
both nodes if it does not exist.
As root, create a file /var/opt/repserver/repserver_name on
both nodes for each Replication Server you want to install as a
data service on Sun Cluster, where repserver_name is
the name of your Replication Server. This file should contain only
two lines in the following form with no blank space, and should
be readable only by root:
repserver – the
Replication Server name.
logicalHost – the
logical host on which Replication Server runs.
RunFile – the complete
path of the runfile.
releaseDir – the $SYBASE
SYBASE_OCS – the $SYBASE
subdirectory where the connectivity library is located.
SYBASE_REP – the $SYBASE
subdirectory where the Replication Server is located.
probeCycle – the number
of seconds between the start of two probes by the fault monitor.
probeTimeout – time,
in seconds, after which a running Replication Server probe is aborted
by the fault monitor, and a timeout condition is set.
restartDelay – minimum
time, in seconds, between two Replication Server restarts. If, in
less than restartDelay seconds after a Replication Server restart,
the fault monitor again detects a condition that requires a restart,
it triggers a switch over to the other host instead. This resolves
situations where a database restart does not solve the problem.
login/password – the
login/password the fault monitor uses to ping Replication
To change probeCycle, probeTimeout, restartDelay, or login/password
for the probe after Replication Server is installed as data service,
send SIGINT(2) to the monitor process (repserver_fm) to
refresh its memory.
kill -2 monitor_process_id
As root, create a file /var/opt/repserver/repserver_name.mail on
both nodes, where repserver_name is
the name of your Replication Server. This file lists the UNIX login
names of the Replication Server administrators. The login names
should be all in one line, separated by one space.
If the fault monitor encounters any problems that need intervention,
this is the list to which it sends mail.
Register the Replication Server as a data service
on Sun Cluster:
hareg -r repserver_name \
-b "/opt/SUNWcluster/ha/repserver_name" \
-t START_NET=60 \
-t STOP_NET=60 \
-t FM_START=60 \
-m FM_STOP="/opt/SUNWcluster/ha/repserver_name/repserver_fm_stop" \
-t FM_STOP=60 \
[-d sybase] -h logical_host
where -d sybase is required if the RSSD
is under HA on the same cluster, and repserver_name is
the name of your Replication Server and must be in the path of the
Turn on the data service using hareg
Server as a data service
This section describes how to start and shut down Replication
Server as a data service, and useful logs for monitoring and troubleshooting.
Data service start/shutdown
Once a Replication Server is registered as data service, use:
hareg -y repserver_name
to start Replication Server as a data service. This starts
Replication Server if it is not already running, and also starts
the fault monitor for Replication Server.
To shut down Replication Server, use:
hareg -n repserver_name
The fault monitor restarts or fails over this Replication
Server if it is shut down or stopped (killed) any other way.
There are several logs you can use for debugging:
Replication Server log – the
Replication Server logs its messages here. Use the log to find informational
and error messages from Replication Server. The log is located in
the Replication Server Run directory.
Script log – the data service START and
STOP scripts log messages here. Use the log to find informational
and error messages that result from running the scripts. The log
is located in /var/opt/repserver/harep.log.
Console log – the operating system logs
messages here. Use this log to find informational and error messages
from the hardware. The log is located in /var/adm/messages.
CCD log – the Cluster Configurations Database,
which is part of the Sun Cluster configuration, logs messages here.
Use this log to find informational and error messages about the
Sun Cluster configuration and health. The log is located in /var/opt/SUNWcluster/ccd/ccd.log.