SAP HANA Tutorial, Material and Certification Guide

Data being one of the most important assets for any Enterprise , its exploration and analysis becomes very crucial.

SAP Data Intelligence is a very powerful tool , which lets you do those complex processing on the data .

What is SAP Data Intelligence and how does it relate to Data Hub?

In this blog, you will be able connect HANA database as a service with Data Intelligence, explore the data via meta explorer and apply Random Forest Classifier algorithm on it.

For this you will be requiring a HANA database a service running on SAP Cloud Platform (Foundry) , a running instance of SAP Data Intelligence

So lets Get Started

Open SAP Cloud Platform Cockpit, navigate to the Global Account , then to Sub account , and finally to the space , where your HANA instance is running and open the HANA Dashboard.

SAP HANA Study Materials, SAP HANA Learning, SAP HANA Guides, SAP HANA Exam Prep

Click On Edit and then Allow All IP address , this will make sure your SAP Data Intelligence instance can access the HANA instance

Its time to login into your SAP Data Intelligence and navigate to connection management and create a connection of type HANA_DB

user , password – username and password for logging into the HANA database

Host,Port – direct sql connectivity host and port , which can be found on HANA DB dashboard from above step

Now we are going to create a Jupyter notebook.

For analysis , my database (File) looks like

User ID	Gender	Age	Salary	Purchased
1	Male	19	19000	0
2	Male	25	24000	1
3	Male	36	25000	0
4	Female	37	87000	1
5	Female	29	89000	0
6	Female	27	90000	1

For analysis i will be using (Only Age , Salary Column) for predicting Purchased column

now open a jupyter notebook from ML scenario manager and install these libraries one by one

pip install sklearn

pip install hdbcli

pip install matplot

Code For Jupiter (Note , if you have any library missing , kindly install using above step)

2 things to configue

1. HANA connection id – line 2

2. Enter Table Name (Schema.TableName) – line 13

import notebook_hana_connector.notebook_hana_connector

di_connection = notebook_hana_connector.notebook_hana_connector.get_datahub_connection(id_="hana") # enter id of the connection

from hdbcli import dbapi

conn = dbapi.connect(

address=di_connection["contentData"]['host'],

port=di_connection["contentData"]['port'],

user=di_connection["contentData"]['user'],

password=di_connection["contentData"]["password"],

encrypt='true',

sslValidateCertificate='false'

)

cursor = conn.cursor()

path="ML_TEST.PURCHASE" #enter table name

sql = 'SELECT * FROM '+path

cursor = conn.cursor()

cursor.execute(sql)

c=0

X=[]

y=[]

for row in cursor:

d_r=[]

#I AM USING 4 COLUMN DATASET

d_r.append(row[2])

d_r.append(row[3])

y.append(row[4])

X.append(d_r)

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Fitting Random Forest Classification to the Training set

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)

classifier.fit(X_train, y_train)

# Predicting the Test set results

y_pred = classifier.predict(X_test)

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred.tolist())

print(cm)

arrx=np.array(X_train)

y_set=np.array(y_train)

from matplotlib.colors import ListedColormap

X1, X2 = np.meshgrid(np.arange(start = arrx[:, 0].min() - 1, stop = arrx[:, 0].max() + 1, step = 0.1),

np.arange(start = arrx[:, 1].min() - 1, stop = arrx[:, 1].max() + 1, step = 1000))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),

alpha = 0.75, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())

plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):

plt.scatter(arrx[y_set == j, 0], arrx[y_set == j, 1],

c = ListedColormap(('red', 'green'))(i), label = j)

you should be able to view the results in a graph.

Now Lets us create a pipeline from ML Scenario Manager for creating the model.

First let us create a pipeline from the template python producer

(There are some changes in the components ) to get data from HANA

1. Constant Generator – to feed in the SQL query , please see the configuration below, in this case the query is

SELECT * FROM ML_TEST.PURCHASE

2. HANA Client (To connect with HANA):things to note(Connection,TableName) and if you scroll down(ColumnHeader) select it to None

3. JS Operator – to extract only the body of the message i.e. rows

$.setPortCallback("input",onInput);

function isByteArray(data) {

switch (Object.prototype.toString.call(data)) {

case "[object Int8Array]":

case "[object Uint8Array]":

return true;

case "[object Array]":

case "[object GoArray]":

return data.length > 0 && typeof data[0] === 'number';

}

return false;

}

function onInput(ctx,s) {

var msg = {};

var inbody = s.Body;

var inattributes = s.Attributes;

// convert the body into string if it is bytes

if (isByteArray(inbody)) {

inbody = String.fromCharCode.apply(null, inbody);

}

msg.Attributes = {};

msg.Body = inbody;

$.output(msg.Body);

}

4. To String converter (Use inInterface for sending the data from JS operator to the python file)

Python File for training the model and saving it

# Example Python script to perform training on input data & generate Metrics & Model Blob

def on_input(data):

import pandas as pd

import io

from io import BytesIO

import os

import numpy as np

import json

dataset = json.loads(data)

i =0;

# to send metrics to the Submit Metrics operator, create a Python dictionary of key-value pairs

X=[]

y=[]

for j in dataset:

x_temp=[]

x_temp.append(j["AGE"])

x_temp.append(j["SALARY"])

y.append(j["PURCHASED"])

X.append(x_temp)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Fitting Random Forest Classification to the Training set

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)

classifier.fit(X_train, y_train)

# Predicting the Test set results

y_pred = classifier.predict(X_test)

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred.tolist())

metrics_dict = {"confusion matrix": str(cm)}

# send the metrics to the output port - Submit Metrics operator will use this to persist the metrics

api.send("metrics", api.Message(metrics_dict))

# create & send the model blob to the output port - Artifact Producer operator will use this to persist the model and create an artifact ID

import pickle

model_blob = pickle.dumps(classifier)

api.send("modelBlob", model_blob)

api.set_port_callback("input", on_input)

wiretaps have been used to check the output , you may skip those blocks

For running the pipeline , you may need the dockerfile

Content of the dockerfile

FROM python:3.6.4-slim-stretch

RUN pip install tornado==5.0.2

RUN python3.6 -m pip install numpy==1.16.4

RUN python3.6 -m pip install pandas==0.24.0

RUN python3.6 -m pip install sklearn

RUN groupadd -g 1972 vflow && useradd -g 1972 -u 1972 -m vflow

USER 1972:1972

WORKDIR /home/vflow

ENV HOME=/home/vflow

Now create tags for the dockerfile (Custom tag blogFile is create ) , tag your python file with this tag as well. Build the dockefile

Now we can run the pipeline and store the artifact (Please provide a name )

Now we have to create another pipeline to make an API , so that it can be consumed.For this case use the template (Python Consumer)

As done in the above step , tag the python and update the script

import json

import io

import numpy as np

import pickle

# Global vars to keep track of model status

model = None

model_ready = False

# Validate input data is JSON

def is_json(data):

try:

json_object = json.loads(data)

except ValueError as e:

return False

return True

# When Model Blob reaches the input port

def on_model(model_blob):

global model

global model_ready

model = pickle.loads(model_blob)

model_ready=True

# Client POST request received

def on_input(msg):

error_message = ""

success = False

try:

attr = msg.attributes

request_id = attr['message.request.id']

api.logger.info("POST request received from Client - checking if model is ready")

if model_ready:

api.logger.info("Model Ready")

api.logger.info("Received data from client - validating json input")

user_data = msg.body.decode('utf-8')

# Received message from client, verify json data is valid

if is_json(user_data):

api.logger.info("Received valid json data from client - ready to use")

# obtain your results

feed = json.loads(user_data)

data_to_predict = np.array(feed['data'])

api.logger.info(str(data_to_predict))

# check path

prediction = model.predict(data_to_predict)

prediction = (prediction > 0)

success = True

else:

api.logger.info("Invalid JSON received from client - cannot apply model.")

error_message = "Invalid JSON provided in request: " + user_data

success = False

else:

api.logger.info("Model has not yet reached the input port - try again.")

error_message = "Model has not yet reached the input port - try again."

success = False

except Exception as e:

api.logger.error(e)

error_message = "An error occurred: " + str(e)

if success:

# apply carried out successfully, send a response to the user

result = json.dumps({'Results': str(prediction)})

else:

result = json.dumps({'Error': error_message})

request_id = msg.attributes['message.request.id']

response = api.Message(attributes={'message.request.id': request_id}, body=result)

api.send('output', response)

api.set_port_callback("model", on_model)

api.set_port_callback("input", on_input)

Now you can deploy the pipeline , once it is done , you will get a url , which you can use for the testing of your model , make sure to append /v1/uploadjson/ to your url.

Deployment of the pipeline can take a while.

Post data you can test the model

headers of the call , Authorization is Basic with username

[{"key":"X-Requested-With","value":"XMLHttpRequest","description":""},{"key":"Authorization","value":"Add your authentication here":""},{"key":"Content-Type","value":"application/json","description":""}]

Body of the request , having Age and Salary

{

"data":[[47,25000]]

}

SAP HANA Central

Pages

Friday 17 April 2020

Creating HANA (Cloud Foundry) connection with SAP Data intelligence and Applying Random Forest

No comments:

Post a Comment