Friday 17 March 2017

How to have Predictive Analytics functions [PAL] embedded in sql scripted views and applied on data in HANA Modeling

Here we are going through how we can use the Predictive Analytics functions provided via PAL in HANA in the Scripted Calculated views .

We are following the SAP provided SAP HANA Predictive Analytics Library ( PAL ) guide to do this.

PAL provides a vast business opportunity and immense business value to cater the needs of Analytics domain in projects implemented on SAP HANA platform.


Pre-requisites :


[1] SAP HANA SPS 12

We can check this here:

Go to HANA System installed in HANA Studio/ Eclipse-> Configuration and Mapping -> Open Administration

SAP HANA Tutorials, SAP HANA Materials, SAP HANA Certifications

[2] Application Function Library (AFL) is installed .

Check it by executing these queries  :

SELECT * FROM "SYS"."AFL_AREAS" WHERE AREA_NAME = 'AFLPAL';
SELECT * FROM "SYS"."AFL_PACKAGES" WHERE AREA_NAME = 'AFLPAL';
SELECT * FROM "SYS"."AFL_FUNCTIONS" WHERE AREA_NAME = 'AFLPAL';

[3] Access is granted to the user for the below roles :

AFL__SYS_AFL_AFLPAL_EXECUTE
AFL__SYS_AFL_AFLPAL_EXECUTE_WITH_GRANT_OPTION
AFLPM_CREATOR_ERASER_EXECUTE

[4] Script Server is Active as seen in Administration Section below

SAP HANA Tutorials, SAP HANA Materials, SAP HANA Certifications

Here are the steps we follow to apply a PAL algorithm on a business data in a scripted view in HANA Modeling : 

In simplified sense , below is what we will be doing to apply the PAL functions on business data ; We are however not covering the UI work here:

SAP HANA Tutorials, SAP HANA Materials, SAP HANA Certifications

Business Problem : Group  different materials based on their mutual similarities in their physical attributes for a better material management

Steps in Detail: 


[ In this blog , we created / configured all the needed table types that we define below and tables needed in the schema “VMARADA” ]

[1] Study the business problem and identify the appropriate statistical algorithm suitable to address the requirement .

[2] Go to SAP HANA Predictive Analytics Library ( PAL ) guide to know the programming syntax of the statistical algorithm

[3] Understand the specific input data needs for the PAL algorithm that we want to use and process the input business data we have got into that format before making it get consumed in the PAL procedure

[4] Create a procedure that wraps the PAL function . These are called Wrapper procedures as we are hiding the actual Statistical algorithm being used but just using its interface to consume our business data and output the expected statistical insight .

[5] Create a Scripted view where we Call this Wrapper procedure to apply on the business data we have  to get the predictive insight on that data to get computed.

In this blog , we are seeing how Affinity Propagation algorithm of PAL can be leveraged to derive analytics out of data that is processed in SAP HANA .

What is Affinity Propagation ?


It is a form of Clustering technique which falls in Unsupervised learning category in Statistical Algorithms.

Affinity propagation   simultaneously considers all data points as potential exemplars, exchanging real-valued messages between data points until a high-quality set of exemplars and corresponding clusters gradually emerges by running multiple iterations.

In this example, we  are taking a  data-set on Material with some of the associated physical parameters of Materials like Strength, Stiffness, Hardness, and Ductility etc. for our understanding on using PAL in a scripted view.

SAP HANA Tutorials, SAP HANA Materials, SAP HANA Certifications

This data-set can be via. any HANA Model that is built or from a table.

There are these below pre-requisites to use AF algorithm of PAL on a data-set :

=> No missing or null data in the inputs.

=> The data is numeric, not categorical.

If the initial data-set is not in this format , we need to process the data of the original data-set in our modeling to get it into this format.

In this example data-set we have below , we are going to define the row number  on Material Id to get a unique row number to every record. This will be done in the scripted view that is there later in this blog.

This row number will later help us to link back with original record of data set .

We will represent the data of the data-set to PAL Algorithm on Affinity Propagation via this row number as row number is a numeric form of data.

Below steps [A] , [B] and [C] in sequence are to write the code :

[A] CREATION OF THE NEEDED TABLES/TABLE TYPES

To use the PAL function on this data-set , we have to create few table types .

These are as follows :

[A][1] Source data :

This is the initial data set on which we are going to apply  the PAL algorithm .

We don’t need to create this table/ table type if we already have it via. any other table / HANA model for us.

For our example , I am creating a table for this and then loading it with data via. Flat File

/* SOURCE DATA */

CREATE COLUMN TABLE "VMARADA"."PAL_AP_DATA_MATID" (
"MANDT" NVARCHAR(3) ,
"MATID" NVARCHAR(8),
"STRENGTH" DOUBLE,
"STIFFNESS" DOUBLE,
"HARDNESS" DOUBLE,
"DUCTILITY" DOUBLE
);"

PFA of the file  “PAL_input.csv” where we have the input data .

For our example , We loaded it  into this new table we created with the above code.

[A][2] /* SIGNATURE TABLE FOR THE PREDICTIVE ANALYTICS METHOD*/

This table is needed for the PAL function to work on the input data

[A][2][i] Create the table

CREATE COLUMN TABLE "VMARADA".PAL_MATID_AP_SIGNATURE (
"POSITION" INT,
"SCHEMA_NAME" NVARCHAR(256),
"TYPE_NAME" NVARCHAR(256),
"PARAMETER_TYPE" VARCHAR(7)
);

[A][2][ii]Insert below records into this table:

INSERT INTO "VMARADA".PAL_MATID_AP_SIGNATURE VALUES (1,'VMARADA','AP_MATID_SOURCE_TYPE', 'IN');
INSERT INTO "VMARADA".PAL_MATID_AP_SIGNATURE VALUES (2,'VMARADA','AP_MATID_SEED_TYPE', 'IN');
INSERT INTO "VMARADA".PAL_MATID_AP_SIGNATURE VALUES (3,'VMARADA','PAL_CONTROL_T', 'IN');
INSERT INTO "VMARADA".PAL_MATID_AP_SIGNATURE VALUES (4,'VMARADA','AP_MATID_RESULT_TYPE', 'OUT');

[A][3] TABLE TYPES : These are all used in the Wrapper procedure we are going to create for PAL application and in the Scripted code that we call this wrapper procedure to apply on the initial data set:

/* [A][3][1] SOURCE DATA TABLE TYPE */

CREATE TYPE "VMARADA".AP_MATID_SOURCE_TYPE AS TABLE (
ID INTEGER,
STRENGTH DOUBLE,
STIFFNESS DOUBLE ,
HARDNESS DOUBLE,
DUCTILITY DOUBLE
);

/* [A][3][2] Input Seed table Type */

CREATE TYPE "VMARADA".AP_MATID_SEED_TYPE AS TABLE (
ID INTEGER,
SEED INTEGER
);

/*[A][3][3] Result data table type */

CREATE TYPE "VMARADA".AP_MATID_RESULT_TYPE AS TABLE(
ID INTEGER,
RESULT INTEGER
);

/*[A][3][4]Control Table type*/

CREATE TYPE "VMARADA".PAL_CONTROL_T AS TABLE(
NAME VARCHAR(50),
INTARGS INTEGER,
DOUBLEARGS DOUBLE,
STRINGARGS VARCHAR(100)
);

[B] Creating the wrapper procedure to apply PAL

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('AFLPAL', 'AP', 'VMARADA', 'PAL_AP_MATID',
"VMARADA".PAL_MATID_AP_SIGNATURE);

[C] Creating the Scripted view  to process the initial data set to the needed format , calling PAL and outputting the insight derived

HINT : After the script is written , mark the type of the output field CLUSTER in the Semantics section as Attribute as shown below.

SAP HANA Tutorials, SAP HANA Materials, SAP HANA Certifications

Below is the code in the scripted view :


 /********* Begin Procedure Script ************/
 BEGIN
 DECLARE vc "VMARADA".PAL_CONTROL_T ;
DECLARE   vs "VMARADA".AP_MATID_SOURCE_TYPE ;
DECLARE   vseed "VMARADA".AP_MATID_SEED_TYPE ;
DECLARE   vr "VMARADA".AP_MATID_RESULT_TYPE ;
/* COntrol table */
vc = SELECT 'THREAD_NUMBER' AS NAME,
                           2 AS INTARGS ,
                           null AS DOUBLEARGS ,
                           null AS STRINGARGS
                           FROM DUMMY
                           UNION ALL
                   SELECT 'MAX_ITERATION' AS NAME,
                           500 AS INTARGS ,
                           null AS DOUBLEARGS ,
                           null AS STRINGARGS
                           FROM DUMMY
                           UNION ALL
                   SELECT 'CON_ITERATION' AS NAME,
                           100 AS INTARGS ,
                           null AS DOUBLEARGS ,
                           null AS STRINGARGS
                           FROM DUMMY
                           UNION ALL
                   SELECT 'DAMP' AS NAME,
                           null AS INTARGS ,
                           0.9 AS DOUBLEARGS ,
                           null AS STRINGARGS
                           FROM DUMMY
                           UNION ALL
                   SELECT 'PREFERENCE' AS NAME,
                           null AS INTARGS ,
                           0.5 AS DOUBLEARGS ,
                           null AS STRINGARGS
                           FROM DUMMY
                           UNION ALL
                  SELECT 'DISTANCE_METHOD' AS NAME,
                           2 AS INTARGS ,
                           null AS DOUBLEARGS ,
                           null AS STRINGARGS
                           FROM DUMMY
                           UNION ALL
                   SELECT 'CLUSTER_NUMBER' AS NAME,
                           0 AS INTARGS ,
                           null AS DOUBLEARGS ,
                           null AS STRINGARGS
                           FROM DUMMY;
                          
/* Source Table */
                                 
vs0 = SELECT
            ROW_NUMBER() over ( order by MATID ) as ID,
            MANDT ,
            MATID ,
            STRENGTH ,
            STIFFNESS  ,
            HARDNESS ,
            DUCTILITY
            FROM "VMARADA"."PAL_AP_DATA_MATID" ;
           
vs = SELECT ID ,
            STRENGTH ,
            STIFFNESS  ,
            HARDNESS ,
            DUCTILITY
            FROM :vs0 ; 
           
/* Seed table */
vseed = SELECT 0 as ID ,
               0 as SEED 
               FROM DUMMY ;       
vseed = SELECT * FROM :vseed where ID > 0 ; 
/* PAL Procedure call */
CALL "VMARADA"."PAL_AP_MATID"(:vs, :vseed, :vc,:vr) ;         
var_out = SELECT a.MANDT ,
                 a.MATID ,
                 b.RESULT + 1 as CLUSTER ,
                 a.STRENGTH ,
                 a.STIFFNESS  ,
                 a.HARDNESS ,
                 a.DUCTILITY
                 FROM
                 :vs0 a
                 inner join
                 :vr b on
                 a.ID = b.ID ;   
                
var_out = SELECT * FROM :var_out order by to_integer(CLUSTER) asc ;                 
END /********* End Procedure Script ************/

The final output of this scripted view will be as follows where we have Materials clustered into groups based on their affinity with each-other with respect to the physical parameters like Strength, Stiffness, Hardness, and Ductility etc as seen in the attached file  “PAL_OUTPUT.csv”.

Below is a brief screen-shot of this output.

SAP HANA Tutorials, SAP HANA Materials, SAP HANA Certifications

Based on the Column ‘CLUSTER’ here  , we can get the grouping of different materials.

The different materials based on their mutual similarities in their physical attributes get clustered into different clusters.

This can further get consumed in UI for a more interactive insight for further understanding and analysis .

PFA of all the sql syntax and scripted view created .[ Attachment = “PAL_MATNR_AP.txt” ]

No comments:

Post a Comment