SAP HANA Tutorial, Material and Certification Guide

Wednesday 23 March 2016

How to screw up your HANA database in 5 seconds

I like the SAP HANA database. I really do. Writing demanding SQL statements has never been so much fun since I throw them at SAP HANA. And the database simply answers, really quickly. While the database itself works fine, from time to time I stumble upon some strange issues around HANA administration where I notice that SAP HANA is still a quite new database. In certain cases the database is in real danger, so I want to share with you a perfidious trap.

You remember that starting with SAP HANA revision 93, a revision update automatically changed the database from the standalone statisiticsserver to the embedded statisticsserver? You could in theory keep the standalone statisticsserver, but I believe no one actually did this. So did you ever wonder why the systemOverview.py script provides this irritating warning?

I double-checked this on revision 111. The warning is still there. Now you could say, this is a harmless warning and should be ignored. Since SPS09 a standalone statisticsserver is against the clear recommendation from SAP. However, what if some lesser experienced HANA administrator sees this message, takes it seriously and tries to start the standalone statisticsserver anyway?

TL;DR: DO NOT DO THIS!

First of all, SAP did not yet remove the hdbstatisticsserver binary from the IMDB_SERVER.SAR packages. It is still available, even in revision 112.

However, it should not be possible to run it if you use the embedded statisticsserver, right? Starting the standalone statisticsserver in this scenario should result in an error message and no harm be done? Well, not quite. So far the topology for my HANA instance looks like this:

And now I screw up my HANA database via one simple command:

Oh no! What have I done? When checking the trace file of this new process, it detects the embedded statistics server and disables itself, but only after the topology was already botched up.

[31147]{-1}[-1/-1] 2016-03-22 10:16:36.813528 i StatsServ StatisticsServerStarter.cpp(00081) : new StatisticsServer active. Disabling myself...

[31147]{-1}[-1/-1] 2016-03-22 10:16:36.834024 i StatsServ StatisticsServerStarter.cpp(00096) : new StatisticsServer active. Disabling myself DONE.

[31147]{-1}[-1/-1] 2016-03-22 10:16:36.836820 i assign TREXIndexServer.cpp(01793) : assign to volume 5 finished

So I stop the ominous process asap:

However, in M_SERVICES I still see the "new" service! This is not nice. How do I clean up this mess?

This is not just a cosmetic issue. Important systems are protected by HANA system replication. Now this new (but inactive) service breaks the system replication! This is really bad:

How can we fix the system replication? Let's try the obvious way on the secondary site:

HDB stop

hdbnsutil -sr_unregister

hdbnsutil -sr_register --name=site2 --mode=sync --remoteHost=eahhan01 --remoteInstance=10

HDB start

The procedure seems to work. Unfortunately this does not really reinitialize the replication, because if I try a takeover then I get this error:

How to screw up your HANA database in 5 seconds

I cannot even perform a backup on the primary site, because that stupid statisticsserver is not active. Dang!

If you have been curious and screwed up your crash&burn instance, then you can try to fix the situation with such commands. Proceed at your own risk:

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini','host','eahhan01') UNSET ('statisticsserver','instances') WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION ('topology.ini','system') UNSET ('/host/eahhan01','statisticsserver') WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION ('topology.ini','system') UNSET ('/volumes','5') WITH RECONFIGURE

For more details, have a look at SAP notes 1697613, 2222249, 1950221.

Now the Python script shows that the system replication looks fine again:

IMPORTANT: Never solely rely on the output of this check script or what you see in the HANA studio on system replication. I recommend to test the takeover after all changes of the topology. It might happen that all lights are green and nevertheless the takeover fails after some topology change.

Source: scn.sap.com

SAP HANA Central

Pages

Wednesday 23 March 2016

How to screw up your HANA database in 5 seconds

No comments:

Post a Comment