Sunday 28 August 2022

Load Data from A Local File into HANA Cloud, Data Lake

Overview

Ever wondered how will you load data/file from your local machine into HANA Cloud, Data Lake without hesitation?  

Then you are at the right place. This blog will provide you a step-by-step guide as to how anyone can easily load data from their local machine to the HANA Cloud, Data Lake using Data Lake File store. 

Step-by-step process:

Firstly, one must provision a Data Lake instance from the SAP HANA Cloud Central through SAP BTP Cockpit. One can learn to do so by going through the following tutorial – Provision a Standalone Data Lake in SAP HANA Cloud | Tutorials for SAP Developers  

If you haven’t had a chance to try out SAP HANA Cloud and its built-in data lake, you can do that now by signing up for the Free trial – https://www.sap.com/cmp/td/sap-hana-cloud-trial.html

(This tutorial will teach you how to provision a standalone Data Lake Instance, one can also provision a HANA Managed HDLRE– It’s just an additional step when you try to provision a HANA instance, one needs to just select the data lake option while doing that) One could also watch the video – HDL_DEMO_pt1.mp4 (sapjam.com) to get an overview of provision a Data Lake instance in cloud central and how can one connect the Data Lake instance in the SAP HANA Database explorer for starters. 

So, in this blog, first we will learn how to upload a local data file to the HANA Data lake file store and from there load it to a SQL table in Hana Data Lake. 

Step 1:

Firstly, one needs to download the SAP HANA Data Lake Client. It could be downloaded from Software Downloads – SAP ONE Support Launchpad. Download the following version of the Data Lake Client. (it’s the recent build) 

SAP HANA Exam, SAP HANA Exam Prep, SAP HANA Tutorial and Materials, SAP HANA Prep, SAP Preparation, SAP HANA

Unzip the folder and run the setup.exe installer and proceed with installing Data Lake Client on your machine. 

Step 2:

To make a secure connection between the HDLFSCLI and a HANA Data Lake file container, few  certificates must be generated. Follow the step 3 in Getting Started with Data Lake Files HDLFSCLI | Tutorials for SAP Developers to generate certificates. Follow the step 4 also to add Trust and Authorizations in your HDL (HANA Data Lake) instance.  

Step 3:

To input files in the file store we will use a tool known as the Hana Data Lake File Store Command line interface or HDLFSCI for short.  

We will verify that the configuration we did in Step 2.

Note: The following command wont return anything in the output.

hdlfscli -cert <PATH>\client.crt -key <PATH>\client.key -cacert <PATH>\ca.crt -k -s https://<REST API Endpoint> -filecontainer <Instance ID> ls

The <REST API Endpoint> and <Instance ID> can be found in the SAP HANA Cloud Central Cockpit. <PATH> is the path to the corresponding certificate.

[Optional]: Configure a configuration file to make using the CLI simpler.

Note: The configuration will be placed in the user’s root directory. It is saved as a JSON file that can be modified in any text editor.

hdlfscli -cert <PATH>\client.crt -key <PATH>\client.key -k -s <REST API Endpoint> -config myconfig -dump-config ls

Test the configuration that was just created.

hdlfscli -config myconfig ls

Enter this into your command line to specify the database connection which will be configured by you. (I called it myconfig) 

SAP HANA Exam, SAP HANA Exam Prep, SAP HANA Tutorial and Materials, SAP HANA Prep, SAP Preparation, SAP HANA

The “ls” command will display the files that are already present in your Data Lake. 

Step 4:

Upload a file to the SAP HANA Data Lake file container. Ensure you know the path to the data files you wish to load into the Data Lake File container.

To upload a new file, you will need to use the HDLFSCI upload command  

hdlfsci -config myconfig upload customer_as_pipe.csv customer_demo_csv.csv 

SAP HANA Exam, SAP HANA Exam Prep, SAP HANA Tutorial and Materials, SAP HANA Prep, SAP Preparation, SAP HANA

You can also upload the file directly through the Database Explorer connection into the File Container. Right click on your Data Lake File Container connection and select Upload File option.

SAP HANA Exam, SAP HANA Exam Prep, SAP HANA Tutorial and Materials, SAP HANA Prep, SAP Preparation, SAP HANA

The next step is to the give the RelativePath and browse the Local File. The Relative path will be the path as to how the file will appear in your Data Lake File Container. And the path prefix is ‘ / ‘.

SAP HANA Exam, SAP HANA Exam Prep, SAP HANA Tutorial and Materials, SAP HANA Prep, SAP Preparation, SAP HANA

Step 5:

The following section will demonstrate how to load data from the file we just uploaded to the Data Lake File store into a table in HANA Data Lake. 

You need to open a SQL console and make sure you are connected to the HANA Data Lake using HDLADMIN schema. 

Let’s just create a standard table for better understanding as to how does the loading work. 

CREATE TABLE “CUSTOMER” ( 

   C_CUSTKEY            integer                        not null, 

   C_NAME               varchar(25)                    not null, 

   C_ADDRESS            varchar(40)                    not null, 

   C_NATIONKEY          integer                        not null, 

   C_PHONE              varchar(15)                       not null, 

   C_ACCTBAL            decimal(15,2)                  not null, 

   C_MKTSEGMENT         varchar(10)                       not null, 

   C_COMMENT            varchar(117)                   not null, 

   primary key (C_CUSTKEY) 

); 

Following is the syntax for loading syntax into the Customer table that we just created. 

LOAD TABLE CUSTOMER( 

   C_CUSTKEY, 

   C_NAME, 

   C_ADDRESS, 

   C_NATIONKEY, 

   C_PHONE  

   C_ACCTBAL 

   C_MKTSEGMENT 

   C_COMMENT 


FROM ‘hdlfsci://customer_demo_csv.csv’ 

DELIMITED BY ‘|’ 

FORMAT CSV 

ESCAPES OFF; 

‘hdlfsci://customer_demo_csv.csv’ – The path from the HANA Data Lake File store to the file we just uploaded is indicated. 

Execute the CREATE TABLE and LOAD statements in the HDLRE and let it execute for a while. 

To verify if the data has been imported to the table, browse through  

Your HDL Connection –> Catalog –> Tables –> CUSTOMER table.  

Right click on the table definition and select “Open data” and you should see the data loaded successfully in the table. 

SAP HANA Exam, SAP HANA Exam Prep, SAP HANA Tutorial and Materials, SAP HANA Prep, SAP Preparation, SAP HANA

Note: The above method shows how to load a file/data into HANA Data Lake using HANA Data Lake File store (load data from a local file). 

But one can also load data into the Data Lake using Amazon S3 buckets and Azure Data Lake. 

Step 6: Optional

Let’s see how one can load data using Amazon S3 buckets. (There’s not much of a difference)  

You need to follow the same steps concerning Provisioning a Data Lake instance in Cloud Central.  

Once that is done, you need to open up a SQL Console connected to the HDLADMIN schema.  

Creating a CUSTOMER table, 

CREATE TABLE “CUSTOMER” ( 

   C_CUSTKEY            integer                        not null, 

   C_NAME               varchar(25)                    not null, 

   C_ADDRESS            varchar(40)                    not null, 

   C_NATIONKEY          integer                        not null, 

   C_PHONE              varchar(15)                       not null, 

   C_ACCTBAL            decimal(15,2)                  not null, 

   C_MKTSEGMENT         varchar(10)                       not null, 

   C_COMMENT            varchar(117)                   not null, 

   primary key (C_CUSTKEY) 

); 

Once the table is created, run the following load statement to load the data into the Data Lake table. 

BEGIN 

LOAD TABLE CUSTOMER( 

   C_CUSTKEY, 

   C_NAME, 

   C_ADDRESS, 

   C_NATIONKEY, 

   C_PHONE, 

   C_ACCTBAL, 

   C_MKTSEGMENT, 

   C_COMMENT 


USING FILE ‘s3://hdl-test-data-eu-central-1/tpch-sf1000/customer.tbl’ 

DELIMITED BY ‘|’ 

FORMAT CSV 

ACCESS_KEY_ID ‘<Access Key>’ 

SECRET_ACCESS_KEY ‘<Secret Access Key>’ 

REGION ‘<Region>’ 

ESCAPES OFF 

QUOTES OFF;  

END 

The above statement will load the data into the table from the QA Amazon S3 bucket.  

The path from where it will load data is – ‘s3://hdl-test-data-eu-central-1/tpch-sf1000/customer.tbl’ 

The ACCESS_KEY_ID, SECRET_ACCESS_KEY, REGION are the parameters of the AWS instances from the data is getting pulled from. To access that AWS instance, you need to mention the above 3 parameters.

No comments:

Post a Comment