Wednesday 28 December 2022

Accessing SAP HANA Cloud, data lake Files from Python

Overview:


In this blog, we will learn how to use the SAP HANA data lake REST API to Create/Write, Access/Read and list your files through a python script. The REST API reference documentation link can be found at (SAP HANA Cloud, Data Lake Files REST API), and it may be used to access the file containers of the SAP HANA data lake. The Python demonstrations that follow, however, use some of the most typical endpoints. We will learn how to use a Python http client to fire a http request and then parse a response status and get response body data. In this post on python http module, we will try attempting making connections and making http requests like GET, POST, PUT, DELETE. Let’s get started.

Step 1: Making a http connection to data lake

Copy and paste your client.key and client.crt in the relative path where your python script is placed (Home of the Jupyter Notebook).

The first step over here is to import http.client package for making HTTP requests and will set some commonly re-used variables for the API calls. Add the following at the top of a Python script and populate the variables with the proper information for your SAP HANA data lake file container.

We use the http client to get a response and a status from the URL (i.e., FILES REST API)

The code will validate the client certificate and client key too. So, I would recommend to moving the certs into the same directory as your script and just use a relative path.

use “./<certname.crt>” as the filepath when they are in the same directory as your script.

use “./<keyname.key>” as the filepath when they are in the same directory as your script.


Step 2: Write/Create a file to the data lake File Store

To Write/Create a file into the data Lake File Store, we will use the PUT request method supported by http.

PUT http Method

◉ PUT requests are used to change data on the server. It replaces the entire content at a specific location with data from the body payload. If no resources match the request, one will be generated.

◉ The PUT method requests that the enclosed entity be stored under the supplied URI. If the URI refers to an already existing resource, it is modified and if the URI does not point to an existing resource, then the server can create the resource with that URI.

The following code sets up the API call to the CREATE endpoint and will upload a file to the folder specified in your SAP HANA data lake File Store.


The above code will create a file in the data lake File Store under the “test” directory as “MYFIRSTAPIFILE” and the message will be given under the parameter file = “Welcome to the SAP blog about Accessing SAP HANA Cloud, data lake Files from Python”.

DBX Screenshot


Step 3: Access/Read the file that was created above in the data lake File Store

To Access/Read a file from the data Lake File Store, we will use the GET request method supported by HTTP.

The GET() method sends a GET request to the specified url. GET request is the most common method and is used to obtain the requested data from the specific server.

The get method will display the contents of the file mentioned in the file path (f_path)

The following code sets up the API call to the OPEN endpoint and will print your file contents.


Output will be:


One can also download the file from DBX and open the file content in notepad


Step 4: Print the list of directories/files in your data lake File Store

The GET() method is also used to specify the directories and all the files present within those directories.

Under the file path (f_path) we need to mention a “/” which means that the GET request will fetch all the directories and the files within those directories, which are present in the data lake File Store.

The following code sets up the API call to the LISTSTATUS endpoint and will print the list of directories and files in your SAP data lake File Store.

And it will also display the type of the contents. i.e., whether it’s a file or a directory.


The output will be:


The following code sets up the API call to the LISTSTATUS_RECURSIVE endpoint and will print the list of directories and files in your SAP HANA Cloud, data lake File storage.


Output:


Step 5: Delete the entire directory and its files contents from your data lake File Store

The Delete () method is used to delete the entire directory and its files contents within a data lake File Store.

The following code sets up the API call to the DELETE endpoint and will delete the entire directory and files mentioned in the file path, from your SAP data lake File Store.

Under the f_path we need to mention the exact file path of the file or the directory we wish to delete.

Please see the below code sample.


The above code will delete the “test” directory as well. Since, it has only one file “MYFIRSTAPIFILE” inside the “test” folder.


Now let us create another file within the test folder and run the Delete () operation.


The above code block will Create a 2nd file “MYSECONDAPIFILE” in the same “test” directory folder.


The above code will Read the “MYSECONDAPIFILE” and its output is displayed.

 

DBX Screenshot before Delete () operation:


The following code will delete the “MYSECONDAPIFILE” that was created in the file container.


DBX Screenshot:


The entire code for Accessing HANA Cloud, data lake Files from Python

import http.client 
import warnings 
warnings.filterwarnings("ignore", category=DeprecationWarning) 
import csv 

FILES_REST_API='<REST API ENDPOINT>' 
CONTAINER = '<INSTANCE ID>' 
CRT_PATH = './client.crt' 
KEY_PATH= './client.key' 

#-- Write/Create a directory and a file within that directory, to the data lake File Store

place = '/test/'
file_name = 'MYSFIRSTAPIFILE'
file = 'Welcome to the SAP blog about Accessing SAP HANA Cloud, data lake Files from Python'
request_url = '/webhdfs/v1/' + place + file_name + '?op=CREATE&data=true'
request_headers = {
    'x-sap-filecontainer': CONTAINER,
    'Content-Type': 'application/octet-stream'
}
conn = http.client.HTTPSConnection(FILES_REST_API, port=443, key_file=KEY_PATH, cert_file=CRT_PATH)
conn.request(method="PUT", url=request_url, body=file, headers=request_headers)
response = conn.getresponse()
response.close()

# -- Will print the list of directories/files in your data lake File Store

f_path = '/'
request_url=f'/webhdfs/v1/{f_path}?op=LISTSTATUS_RECURSIVE'
request_headers = {
    'x-sap-filecontainer': CONTAINER,
    'Content-Type': 'application/json'
}

conn = http.client.HTTPSConnection(FILES_REST_API, port=443, key_file=KEY_PATH, cert_file=CRT_PATH)
conn.request(method="GET", url=request_url, body=None, headers=request_headers)
response = conn.getresponse()
print(response.read())
response.close()

# -- Will delete the entire directory and its files content

f_path = '/test/MYSECONDAPIFILE'
request_url=f'/webhdfs/v1/{f_path}?op=DELETE'
request_headers = {
    'x-sap-filecontainer': CONTAINER,
    'Content-Type': 'application/json'
}
conn = http.client.HTTPSConnection(FILES_REST_API, port=443, key_file=KEY_PATH, cert_file=CRT_PATH)
conn.request(method="DELETE", url=request_url, body=None, headers=request_headers)
response = conn.getresponse()
print(response.read())
response.close()

# -- Will upload a local file to the data lake files storage

f_path = 'C:/Users/I567343/OneDrive - SAP SE/Documents/REST API Blog/Orders'
request_url=f'/webhdfs/v1/{f_path}?op=APPEND'
request_headers = {
    'x-sap-filecontainer': CONTAINER,
    'Content-Type': 'application/json'
}
conn = http.client.HTTPSConnection(FILES_REST_API, port=443, key_file=KEY_PATH, cert_file=CRT_PATH)
conn.request(method="POST", url=request_url, body=None, headers=request_headers)
response = conn.getresponse()
print(response.read())
response.close()

No comments:

Post a Comment