Saturday 4 September 2021

Near Zero Downtime Maintenance

Maintenance windows for SAP systems can be very limited because companies cannot afford extended downtimes since some of these systems are really key to production, for example the Supply Chain Management or the Customer Relationship Management among others,

Any SAP Basis, OS and Infrastructure engineer knows how challenging it can be to finish successfully and in time a maintenance on the SAP servers of the company and besides if it is for production hosts they should be carried out outside business hours which is an added effort.

Another challenge the IT departments need to deal with quite often when it comes to the SAP estate is to get the buy from the business so that the servers can be kept up to date regularly with all the recommendations from SAP notes, security fixes, patches, etc. Many times they get push-back because the users cannot afford too regular maintenance windows and the systems risk becoming not compliant or vulnerable.

Achieving Near Zero Downtime

The solution we will be analyzing permits to perform maintenances on SAP servers without the users noticing real disruption or the SAP background jobs being cancelled.

This solution is agnostic of the infrastructure, so it is equally valid for servers on premise, on public cloud, on private cloud and on hybrid cloud too. The SAP hosts need to run on RHEL and be subscribed to the RHEL for SAP Solutions subscription that includes the elements necessary to the implementation (the RHEL HA Add-On and Satellite – the Ansible Automation Platform that is also needed is not included in the subscription).

SAP HANA Exam Prep, SAP HANA Learning, SAP HANA Career, SAP HANA Study Materials, SAP HANA Guides
Logical design

The infrastructure management piece that we can see in the picture above consists of Red Hat Satellite and Red Hat Ansible Automation Platform and they can be deployed in the same data center or cloud where the SAP servers are or on any other location.

Red Hat Satellite

Red Hat Satellite manages the lifecycle of the SAP hosts and makes sure there is consistency across the SAP estate, with the same level of patches, security fixes, etc., in all the servers.

It takes care of the following aspects:

◉ Content Management. It uses a content repository which is curated prior to its distribution to hosts.

◉ Patch Management. Satellite reports on hosts that need patches, fixes or enhancements and applies them automatically when approved.

◉ Provision Management. It provisions to bare metal, private, public and hybrid clouds and uses Ansible roles to automate post-provisioning steps.

◉ Subscription Management. Manages centrally the subscriptions of all the SAP hosts and keeps track of their subscription consumption.

RHEL High Availability Add-On

RHEL HA Add-On makes the creation of clusters possible on both the DB and the application side. It features lock management, cluster management, fencing mechanisms (STONITH and SBD) and specific resources for ASCS and ERS instances and for all the SAP supported DBs, namely for SAP HANA which is the one used in this solution.

Ansible Automation Platform

Red Hat Ansible Automation Platform is the component that orchestrates the solution using Ansible playbooks that will automate the whole process of the maintenance (OS kernel upgrade, OS parameter change, package update, security fix application, SAP HANA revision update, SAP HANA parameter change, SAP kernel upgrade, etc.).

It is also the central point from where all the SAP estate can be managed following the Infrastructure as Code approach, with inventories for the different types of servers, departments in the company, etc., and adding a very granular layer of security with Role Based Access Control.

Implementation of the solution

All the SAP hosts need to run on RHEL and be registered with the RHEL for SAP Solutions subscription as mentioned earlier. All of them are connected to Satellite and to the Ansible Automation Platform.

SAP HANA Exam Prep, SAP HANA Learning, SAP HANA Career, SAP HANA Study Materials, SAP HANA Guides
Solution implementation with data flow

If the intervention is about applying patches, security updates, changing the OS kernel, etc., this will be done by Satellite. If it is about upgrading the SAP HANA revision or version, changing the SAP kernel or changing OS or DB parameters it will be done with Ansible playbooks written specifically for these purposes.

The resources running on the servers that will undergo the maintenance will be clustered with the RHEL HA Add-On, so if it is a maintenance on the SAP HANA hosts, there will be a SAP HANA cluster and if it is on the application servers, there will be an ASCS/ERS cluster.

And this is how the solution works (we have taken the example of a SAP Netweaver or SAP S/4HANA with a SAP HANA scale-up implementation using SAP HANA System Replication and considered a maintenance on the SAP HANA hosts) :

SAP HANA Exam Prep, SAP HANA Learning, SAP HANA Career, SAP HANA Study Materials, SAP HANA Guides
Steps of the process

1. The virtual IP resource of the cluster that is used for the application to connect to the DB is pointing initially to the primary SAP HANA node. Ansible Automation Platform triggers the intervention from Satellite (or the playbook to perform the intervention itself if it is not one of the interventions that Satellite does) on the secondary SAP HANA node.

2. Once the intervention is finished on the secondary SAP HANA node, Ansible Automation Platform triggers the failover of the virtual IP resource of the cluster so that it will now point to the node that has been maintained, it will also promote the SAP HANA DB in this node to primary and will revert the direction of the SAP HANA System Replication. Using the connectivity suspend feature introduced in SAP Netweaver 7.40 SP 5 the users will not perceive any disconnection while the cluster resources are failed over and promoted/demoted.

3. Ansible Automation Platform triggers the maintenance on the former primary SAP HANA node. After it is finished we can either revert to the initial situation failing back the resources or maintain the current one.

Interventions with Near Zero Downtime are something really desirable that can save lots of headaches and money and help building trust in the IT departments so that they have more liberty when making decisions on how to manage the SAP host estate.

No comments:

Post a Comment