Friday 22 February 2019

Machine Learning with SAP HANA

AI and machine learning are the hottest trends in the current IT market. Everyone is talking about it and customers are adopting these technologies in day to day processes. Because of this, there is a need to have systems that will enable the processes to be scaled, governed and compliant to current business needs.

As part of digital transformation efforts, customers currently running SAP ERP applications are implementing innovative solutions to enhance operations. These innovative solutions range from RPA-robotic process automation, machine learning and enhanced analytics leading to an intelligent ERP aka iERP.

This blog assumes that the audience is familiar with SAP HANA technology both as a database and application platform, along with the engines that are available to perform various tasks. The blog will cover use of SAP HANA as a scalable machine learning platform for enterprises.

We will cover the business applications and technical aspects of the following HANA components:
1) PAL – HANA Predictive Analytics Library

2) HANA- R – Integrated platform between HANA – R

3) HANA EML – Extended Machine Library

4) AFM – Application function modeler

Along with the components above, HANA also enables the use of external API’s via the XS engine.

The diagram below covers the components available within the HANA platform which can be utilized for machine learning.

SAP HANA Study Material, SAP HANA Tutorial and Material, SAP HANA Certification

SAP HANA Predictive Analytics Library (PAL)

Let’s understand some technical aspects of PAL before we see its applications. PAL is one of the components of the Application Function Library (AFL) in HANA. It can define functions that can be called from within SQL Script procedures in HANA to perform advanced analytic algorithms. SAP has provided host of classic and universal algorithms with PAL. When PAL is paired with HANA’s ability to host execution engines and perform local calculations in-memory and in parallel, it provides a unique capability to accelerate machine learning models. PAL is available on every HANA license (from HANA 1.0 SPS06 onward) and cloud platform in the AFL. SAP PAL algorithms are divided into 10 categories, some of them include clustering, classification, association and time series functions. All of the PAL procedures can be seen below.

SAP HANA Study Material, SAP HANA Tutorial and Material, SAP HANA Certification

PAL also includes several algorithms that learn and continuously update to enable dynamic predictions, allowing companies the ability to use current data and instantly adapt to the changing conditions and behaviors of their clients.

The goal of HANA PAL is to enable a majority of the most common predictive use cases . Paired with the in-memory and fast performance of HANA, many choose this as their predictive tool of choice. However, even with all of the algorithms offered, you may need an external R server for even more advanced algorithms.(The HANA-R integration is covered in the points below).

Use Case

For clients, it will be ideal to start with algorithms provided out of box with PAL and then explore external algorithms if there is an absolute need.

PAL alone is best for those who need to perform the above algorithms, already use SAP HANA and have SAP HANA Studio installed. For those who want flexibility and customization such as data scientists and mathematicians, deploy both PAL and R together. However, PAL requires experience in SQL Script and/or the predictive methods used.

HANA-R Integration 

R is an open source programming language for statistical computing that is widely used for advanced data analysis. Providing R integration opens the door wide by enabling HANA to consume and execute all the open source algorithms available in R. HANA database interprets the R language and accordingly submits the script to R Server.

The goal of integrating SAP HANA with R is to ultimately customize algorithms even more than what is offered in the standard libraries via PAL. SAP HANA uses an external R environment to execute the R code. The application developer can then embed R function definition and calls within HANA SQLScript and submit the entire code as a database query.

This opens up all new possibilities because the extent of R’s capabilities can be utilized on your data in HANA.

SAP HANA Study Material, SAP HANA Tutorial and Material, SAP HANA Certification

Use Case

Integration of R code is suitable when an SAP HANA-based modeling and consumption application or developer wants to use the R environment for specific statistical functions. It allows a developer to use their creativity, choose from thousands of R packages and script some very agile data analysis and predictions.

SAP does not include the R environment with a SAP HANA license since R is open source and available under the General Public License. Similarly, SAP does not provide support for R. In order to use the SAP HANA integration with R, you need to download R from the open-source community and configure it. You also need Rserve, a TCP/IP server that allows other programs to use facilities of R without the need to initialize R or link with the R library. Note that this integration requires prior knowledge and expertise with R code.

HANA Extended Machine Learning Library (EML)

SAP has introduced HANA EML from HANA 2.0 SP02 onward. It gives ability to use HANA data for scoring (the process of applying a predictive model to a set of data is referred to as scoring) on pre-built machine learning models.

What does this mean? HANA will not be used in this case for training and building machine learning models. Models will be built using a python environment – Tensorflow, which is a python package offered by Google to build complex, deep learning models. Tensorflow is not only limited to deep learning but also can be used for preforming common machine learning models.

How does this work? Build models (train, test & validate) in Tensorflow and then enable the models to be consumed by Tensorflow serving (a separate server needed to enable the HANA EML and Tensoflow integration). Then build the HANA SQL scripts to call the models served in Tensorflow serving and enable data for the scoring (predicting) .

Use Case

EML is already included as part of SAP HANA in the AFL and users can easily connect TensorFlow serving to HANA. With HANA EML, TensorFlow models can now be easily integrated into enterprise applications and business processes for clients.

HANA Application Function Modeler (AFM)

HANA AFM is an extension of SAP HANA Studio used to create flowgraph models (tables, views, procedures, R scripts, PAL functions) without writing SQL script code. Think of it as a drag and drop interface that allows less experienced developers to build complex procedures quickly.

A major advantage of AFM is that it eliminates the need to code PAL and BFL (Business Function Library) algorithms. Another benefit for businesses is that model objects can be checked in and out of development so that multiple users can access and develop simultaneously.

Use Case

AFM is for users who are less experienced in SQL code but still want to perform predictive analytics using HANA database. The advantages are that the algorithms are available out of box, there’s no need to send data out of the system and it provides easier integration with in-process applications and transactions in SAP .

Key Takeaways

The table below compares the four products’ use cases, skillsets and prerequisites.

SAP HANA Study Material, SAP HANA Tutorial and Material, SAP HANA Certification

PAL and AFM can be used out of the box with HANA enterprise without needing any additional licenses, servers or components. This can jump start clients to start building machine learning use cases in SAP. For very complex use cases you can then enable R integration and EML .

In conclusion, both data scientists and business analysts should start their analysis by using SAP HANA automated predictive capabilities whenever possible. Automated machine learning can apply to a growing number of scenarios while producing valid results in seconds or minutes. Therefore, those who are not data scientists now have the ability to answer their own questions and quickly depict on the results while providing data scientists and mathematicians an automated way of quickly analyzing problems.