Saturday 29 December 2018

HANA Active Active System Replication – Configuration, Failover & Failback

Having worked with an Active-Active Read Enabled (R/E) System Replication scenario we wanted to share our experiences. The official documentation is good but does not provide many diagrams or overall setup, process flows for failover, failback and how the read-only queries can be handled. Working with a long-time colleague Paul Barker. This was used for a short evaluation of the Active-Active with Read Enabled capability.

System Replication Prerequisites

◈ 2+ HANA systems, we used HANA 2.00.20 (HANA 2 SP2).
◈ Same size systems
◈ Different host names

1. Initial Landscape Configuration

SAP HANA Certification, SAP HANA Tutorial and Material, SAP HANA Study Materials

1a. Acquire an Environment

We used a Cloud Appliance Library (CAL) HANA instance for a quick and easy access to HANA environments.

1b. Clone Environment, Rename Host

We cloned the HANA instance, with cloud providers such as GCP and AWS, this is a quick way of duplicating an existing environment.

SAP HANA Certification, SAP HANA Tutorial and Material, SAP HANA Study Materials

SAP HANA Certification, SAP HANA Tutorial and Material, SAP HANA Study Materials

We did experience an issue whereby after pausing the system my networking was screwed up.  Upon investigation we found that CAL has some clever start-up scripts that map host names and IPs automatically.  These needed to be disabled to preserve changes made to the OS configuration.  If you are experimenting with CAL then you would need to modify.

## Tier2 (Secondary)

SAP HANA Certification, SAP HANA Tutorial and Material, SAP HANA Study Materials

We now have 2 systems with the same SID, but different host names, we now need to tell HANA we have a new host name this can be achieved via this command.

## Tier2 (Secondary)
/hana/shared/HDB/hdblcm/hdblcm --action=rename_system --hostmap=vhcalhdbdb=tier2

1c. Configure System Replication

To enable system replication, we need to tell both the primary and secondary nodes about this configuration.  The secondary needs to be stopped before issuing this command.  When the secondary is re-started it will automatically sync all data with the primary node.

## Tier1 (Primary)
hdbnsutil -sr_enable --name=tier1

## Tier2 (Secondary)
hdbnsutil -sr_register --force_full_replica --remoteHost=vhcalhdbdb --remoteInstance=00 --replicationmode=syncmem --name=tier2 --operationMode=logreplay_readaccess

HDB start

1d. Networking – Virtual IPs

To hide the physical deployment from applications and client tools we can use Virtual IPs to connect our environment.  To make this possible we need to add a secondary network interface to each HANA node.  We also need to configure the Linux routing tables for each of the network interfaces, as adding the 2nd interface also effects the 1st one.

## Tier1 (Primary) & Tier2 (Secondary)

## Map the new network card (NIC) to eth2 
udevadm trigger --subsystem-match=net -c add -y eth2 

## Verify we now have 2 NICs
sid-hdb:~ # ifconfig -a
eth0      Link encap:Ethernet  HWaddr 0E:FA:20:F0:EE:D2  
         inet addr:  Bcast:  Mask:
         inet6 addr: fe80::cfa:20ff:fef0:eed2/64 Scope:Link
         RX packets:34438 errors:0 dropped:0 overruns:0 frame:0
         TX packets:25235 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000 
         RX bytes:92844006 (88.5 Mb)  TX bytes:5562258 (5.3 Mb)

eth2      Link encap:Ethernet  HWaddr 0E:B1:BF:1E:04:96  
         inet addr:  Bcast:  Mask:
         inet6 addr: fe80::cb1:bfff:fe1e:496/64 Scope:Link
         RX packets:38 errors:0 dropped:0 overruns:0 frame:0
         TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000 
         RX bytes:1560 (1.5 Kb)  TX bytes:1590 (1.5 Kb)

## Define the default routes for each NIC
sid-hdb:~ # ip route add default via dev eth0 tab 1
sid-hdb:~ # ip route add default via dev eth2 tab 2

sid-hdb:~ # ip route show table 1
default via dev eth0 
sid-hdb:~ # ip route show table 2
default via dev eth2

2. Failover in DR/HA Scenario

SAP HANA Certification, SAP HANA Tutorial and Material, SAP HANA Study Materials

2a. Active-Active

We can now access each HANA instance via either by either the original IP or the new virtual IP address (VIP).  The primary (Tier1) allows any type of query and it can also pass read-only queries to the secondary.  We can also connect directly to the secondary, if we wish to use this for purely read-only analytics.  We can verify our current configuration is as expected.

## Tier1 (Primary) or Tier2 (Secondary)
hdbnsutil -sr_state

2b. Simulate Primary Failure

In a fail-over scenario the primary could stop unexpectedly, we can simulate this with a kill.

## Tier1 (Primary)
HDB kill -9

2c. Secondary Takeover

We now tell the secondary (Tier2) to become the primary.

## Secondary (Tier2)
hdbnsutil -sr_takeover

2d. Swap Virtual IP to new Primary

Tier2 is now primary but queries are still being sent to the now dead Tier1 node.  Using the AWS CLI to swap the VIP from the Tier1 node to Tier2.  The command was generated using the AWS Console, but executing via the CLI prevents errors.  Here we are associating a Network Interface with a Private IP

## Windows, Mac or Linux with AWS Client Tools
aws ec2 associate-address --allocation-id "eipalloc-0b18c02cfc0694674" --network-interface-id "eni-00858248469

2e. Failover Completed

The process is now completed, we have swapped our primary HANA node from Tier1 to Tier2.

3. Failback to original configuration

The failback process is similar but first we need to re-sync our old primary (Tier1) with any changes that have taken place while it was offline.  The names primary and secondary are now very confusing as the actual nodes are reversed but those roles still remain

SAP HANA Certification, SAP HANA Tutorial and Material, SAP HANA Study Materials

3a. Original Primary Down, Secondary Now Primary

We start with just a single active node (Tier2).

3b. Make old Primary Secondary

Before re-starting Tier1, we need to tell it, that it’s now a secondary node.

## Failed Primary (Tier1)
hdbnsutil -sr_register --force_full_replica --remoteHost=tier2 --remoteInstance=00 --replicationmode=syncmem --name=tier1 --operationMode=logreplay_readaccess

3c. Start new Secondary (old primary)

When Tier1 re-starts it will now sync all changes made during the time it was not running.  We can also verify the status of our system replication configuration.

## Tier1 Failed Primary, becoming a Secondary
HDB start
hdbnsutil -sr_state

3d. Secondary re-sync with Primary

Initially the new secondary will not be available. The time before it becomes operational depends upon the volume of changes while it was off-line. With the re-sync completed we now have 2 nodes as before, but their roles are reversed.

3e. Stop Primary

To promote Tier1 back to primary we need to stop the current primary.

## Tier2 (now Primary)
HDB stop

3f. Promote Secondary to Primary

We can now tell Tier1 that it is the primary node.  It will automatically check Tier2 is not active and then take over.

## Tier1 Switching from Secondary to Primary
hdbnsutil -sr_takeover

3g. Swap Virtual IPs

The networking needs to be updated to reflect the changes in our deployment.  We point the VIP1 to our new primary and VIP2 back to the stopped primary (soon to become secondary).

## Windows, Mac or Linux with AWS Client Tools
## Switch Primary Virtual IP to Tier1 as Primary Node
aws ec2 associate-address --allocation-id "eipalloc-0b18c02cfc0694674" --network-interface-id "eni-059611a76ccc2c7b4" --allow-reassociation --private-ip-address "" --region us-east-1

3h. Revert Tier2 to Secondary

We now need to tell Tier2 it is a Secondary node again.

## Tier2 revert to Secondary
hdbnsutil -sr_register --force_full_replica --remoteHost=vhcalhdbdb --remoteInstance=00 --replicationmode=syncmem --name=tier2 --operationMode=logreplay_readaccess

3i. Restart and Re-sync Secondary

When we re-start the secondary it will re-sync with the primary.

## Tier2 re-starting as a Secondary
HDB start

3j. Failback completed, both servers are restored.

We finish the process as we began with 2 HANA servers in an Active-Active configuration.  We can verify all is configured as expected.

## Either HANA Node
hdbnsutil -sr_state

No comments:

Post a Comment