Thursday 13 February 2020

Data Provisioning in HANA from Various Sources using Python

Why Python 


We have known that Smart Data Access (SDA) is much popular as a Data Provisioning method with SAP HANA to connect to various popular third party sources whether its SQL server, RDBMS etc. You can access data virtually as well as you can persist based on requirements. Flowgraphs plays an important role here.

But most of the times, in Real Life projects, we faced many requirements where there is a need to integrate HANA and do Data provisioning with the sources which are very specific and particular to any process/scenarios such as third party or custom APIs, Web pages, ODATA services, others. In these scenarios, we end up struggling to find ways on how to integrate HANA  with these sources as well as do the regular updates/processing of data.

Here, Python plays an important role as it brings lot of flexibility in integrating multiple sources along with powerful and easy to understand scripts. Also, provides lot of methods/functions to process your data.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Learning, SAP HANA Exam Prep, SAP HANA Prep

Pre-requisites 


In Python, we have multiple methods and so there is different need on what all things need to install.

Such as below

◉ HANA client,

◉ Python any version

◉ Any development environment supported by Python such as PyCharm, Spyder, Jupyter Notebooks etc.

◉ Source details depends upon from where you are loading the data. For example, if we are looking from APIs, then we need complete details like API urls, keys, parameters etc depend upon API nature.

◉ HANA database with proper privileges

Methods in Python for HANA


To integrate different sources of data mentioned, Python provides various inbuilt methods for integration.

As per my experience, Some of them which are popular as mentioned below.

IMPORT DBAPI

IMPORT PYHDB

FROM SQLALCHEMY IMPORT CREATE_ENGINE

Here, in this blog, we will see these methods along with few example of short script. Also, I would share my experience with each methods.

To use all the methods, you need to install specific methods, if it is not there available by default under python dictionary.

DBAPI Method


For Using this method, we need to install HANA Client along with few changes with files and folder provided details below. Sample script below.

IMPORT DBAPI

This will import required methods while executing python scripts

CONNECTION = DBAPI.CONNECT(SERVER, PORT, USER, PASSWORD)

Here you need to provide HANA system details along with credentials, Try to have some background User and password which can be used for all jobs.

CURSOR = CONNECTION.CURSOR()

QUERY = INSERT INTO TABLE VALUES(‘INSERT TEST SUCCESSFUL ‘)

CURSOR. EXECUTE(QUERY)

Under Query variables, you can provide any SQL script which you would like to execute in HANA. you can use above statements to execute any DML/DDL statements.

After above line execution, you can check, data will be inserted successfully into HANA tables.

PYHDB Method

This method is very much similar to DBAPI, only difference which I found using PYHDB is, it has more robust mechanism and also it provides good integration with HANA DB.

Syntax looks similar as DBAPI like below.

IMPORT REQUEST

IMPORT PYHDB

This will import required methods while executing python scripts

CONNECTION = PYHDB.CONNECT(HOST ,PORT ,USER NAME ,PASSWORD)

Here you need to provide HANA system details along with credentials. Try to have some background User and password which can be used for all jobs.

CURSOR = CONNECTION.CURSOR()

QUERY = INSERT INTO TABLE VALUES(‘INSERT TEST SUCCESSFUL ‘)

CURSOR. EXECUTE(QUERY)

Under Query variables, you can provide any SQL script which you would like to execute in HANA. you can use above statements to execute any DML/DDL statements

After above line execution, you can check, data can be inserted successfully into HANA tables.

As per my experience, both the above methods has some limitation due to which we faced couple of challenges while inserting data into HANA. Also it does not have much flexibility to provide more details about HANA like schema name etc. To achieve this, we need to go for some more custom scripts.

Also, these methods include too much SQL statements to process and insert data into HANA along with issues with bulk loads.

FROM SQLALCHEMY IMPORT CREATE_ENGINE Method


As per my experience, this is the most robust method and also provides good experience while using it for multiple scenarios. It also uses the PYHDB method, which makes it more integrated. Below are few of the reasons to use this method.

◉ You can make use of python inbuilt methods like Pandas, Dataframe which is very much effective and useful while dealing with structured data and also supports good volume of data loading

◉ You can reduce unnecessary scripting

◉ Flexibility in defining more details about HANA system like Schema name etc.

◉ Flexibility in dealing with data in-discrepancies

Sample script below

IMPORT PANDAS AS PD

IMPORT REQUESTS

IMPORT JSON

FROM SQLALCHEMY IMPORT CREATE_ENGINE

IMPORT PYHDB

All of the above are required ones for this method as each supports different purpose

RESPONSE = << API Details need to provide like keys etc and capture the response >>

Above statement is very specific to source details, as per source details you need to capture the response and process accordingly and store in any variable like below ‘Table’

TABLE = RESPONSE.JSON()[‘TEST’]

Converting the response in JSON format

DF = PD.DATAFRAME(TABLE)

Using Pandas, Dataframe to make the source data in more readable format

CONNECTION = CREATE_ENGINE(‘HANA+PYHDB://'<<USERNAME>>:<<PASSWORD>>@<<HOSTNAME>>:<<PORT>>)

Provide parameters same as applicable for above other methods

DF.TO_SQL(‘<<HANATABLENAME>>’,CONNECTION,IF_EXISTS = ‘APPEND’,)

Passing Dataframe values to hana tables, here there is no need to write SQL like others, also gives you flexibility to define schemas and other details

After above line execution, you can check, data can be inserted successfully into HANA tables.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Learning, SAP HANA Exam Prep, SAP HANA Prep

Scheduling Python Scripts


In all Data integration scenarios, there is always a need to make the complete load process more robust and without manual intervention.

Python provides you flexibility to schedule its scripts using Windows task scheduler itself.

You can easily schedule any scripts using scheduler by creating Basic Task along with your program details, time etc and monitor also.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Learning, SAP HANA Exam Prep, SAP HANA Prep

No comments:

Post a Comment