Saturday 5 December 2020

Track ISS with SAP Data Intelligence and SAP HANA: Ingest and transform data

The purpose of this post is purely educational. Using SAP Data Intelligence and SAP HANA for what we will achieve is the overkill. But learning how to build data pipelines while having fun and learning something about space (what we geeks all love) should be reasonable.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

This selfie in NASA astronaut’s suite is as far into the space I have been so far

To the point. This example is used as a demo in a session INT105 – Build Data Pipelines with SAP Data Intelligence at SAP TechEd 2020. You are more than welcome to join that session to watch that demo. But it is not a prerequisite for following this post.

The scenario


We want to collect data about the locations on the Earth’s globe which are directly beneath the International Space Station (ISS). This would allow us to plot these locations getting a visualization like below.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

We will use an API https://isstracker.spaceflight.esa.int/tledata.txt as the source of the ISS location. It returns a TLE (Two-line Element) record “encoding a list of orbital elements of an Earth-orbiting object for a given point in time, the epoch.” (source: Wikipedia)

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

We will need to transform this TLE data into an Earth-based location for thee current time, and for that we will use Python’s Skyfield package: https://pypi.org/project/skyfield/.

SAP HANA db will be our destination to store the data for some further analysis and visualization.

The setup


I’d had an instance of SAP HANA, express edition, (ver. 2.0.45 at the time) already deployed in Google Cloud Platform. Therefore I deployed an instance of SAP Data Intelligence trial (ver. 3.0.0 at the time) to the same cloud provider in the same region to have both systems co-located.

Please note that deployment options and the choice of a IaaS vendor are not relevant in this scenario as long as an instance of SAP Data Intelligence can communicate with the instance of SAP HANA and you can access data in the SAP HANA db from an external client.

For the sake of simplicity I will use SYSTEM users in both system. Needless to say it is the very bad practice. And never do this in Production!

Prerequisites


I would recommend you go through the following tutorials, if not yet familiar with SAP Data Intelligence at all:


For the steps that follow I assume you have some familiarity with the basics of SAP Data Intelligence. I will not detail every small task and icon to click below.

Initial pipeline in SAP Data Intelligence Modeler


Create a new graph in the Modeler application.

Switch to the JSON view of the graph and paste the following code.

{
    "description":"ISS locations",
    "processes":
    {
        "python3operator1":{"component":"com.sap.system.python3Operator","metadata":{"label":"Python3 Operator","extensible":true,"config":{"script":"def on_tle(tle_data):\n    api.send(\"location\", tle_data)\n\napi.set_port_callback(\"tle\", on_tle)"},"additionalinports":[{"name":"tle","type":"string"}],"additionaloutports":[{"name":"location","type":"message"},{"name":"debug","type":"string"}]}},
        "httpclient1":{"component":"com.sap.http.client2","metadata":{"label":"HTTP Client","config":{"pollingEnabled":true,"postConnection":{},"getConnection":{"connectionProperties":{"host":"isstracker.spaceflight.esa.int","port":443,"protocol":"HTTPS","authenticationType":"NoAuth"},"path":"tledata.txt","configurationType":"Manual"},"getPeriodInMs":10000}}},
        "wiretap1":{"component":"com.sap.util.wiretap","metadata":{"label":"Wiretap","ui":"dynpath","config":{}}},
        "tostringconverter1":{"component":"com.sap.util.toStringConverter","metadata":{"label":"ToString Converter","config":{}}},
        "wiretap3":{"component":"com.sap.util.wiretap","metadata":{"label":"Wiretap","ui":"dynpath","config":{}}}
    },
    "connections":[{"src":{"port":"out","process":"httpclient1"},"tgt":{"port":"in","process":"wiretap1"}},{"src":{"port":"out","process":"wiretap1"},"tgt":{"port":"ininterface","process":"tostringconverter1"}},{"src":{"port":"outstring","process":"tostringconverter1"},"tgt":{"port":"tle","process":"python3operator1"}},{"src":{"port":"location","process":"python3operator1"},"tgt":{"port":"in","process":"wiretap3"}}],"inports":{},"outports":{},
    "groups":[],
    "properties":{}
}

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

Now switch back to the Diagram view and click the Auto-Layout icon.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

In the imported pipeline check the configuration of the HTTP Client operator:

◉ It is already set to poll every 10 seconds…
◉ …from https://isstracker.spaceflight.esa.int/tledata.txt using the GET method.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

Ingest data from the API


Save the graph with the name community.sample.iss creating a new category Community Samples.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

Once the graph is saved (and only then) you can run it.

It might take a bit longer the first time while SAP Data Intelligence builds containers to run the graph. But once you see the status changed to Running then right click on the first Wiretap to open it’s UI.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

You should see new TLE records polled from the source API every 10 seconds.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

Transform data with Python operator


If you look into the UI of the second Wiretap, then you should see exactly the same records there. That’s because in this initial graph the Python operator simply sends to the output port location exactly the same information it receives on the input port tle.

def on_tle(tle_data):
    api.send("location", tle_data)

api.set_port_callback("tle", on_tle)

Let’s modify the code to transform TLE record into a record that contains timestamp plus latitude and longitude of the ISS location projected on the Earth surface.

Stop the running graph, and open a script of the Python3 Operator. Replace existing code with the following.

# Import dependencies
from skyfield.api import Topos, load, EarthSatellite
import datetime

# Decoding input and formatting output
def on_tle(tle_data):

    l1, l2, l3 = tle_data.strip().splitlines()
    ts = load.timescale()
    satellite = EarthSatellite(l2, l3, l1, ts)
    
    pytime_now=datetime.datetime.now(datetime.timezone.utc)
    ts_now=load.timescale().from_datetime(pytime_now)

    geocentric = satellite.at(ts_now)
    subpoint = geocentric.subpoint()
    timestamp = pytime_now
    
    location = [
                {
                    "TSTMP" : pytime_now.strftime('%Y-%m-%d %H:%M:%S.%f'), 
                    "LAT"   : subpoint.latitude.degrees, 
                    "LON"   : subpoint.longitude.degrees, 
                    "ALT"   : int(subpoint.elevation.m)
                }
            ]
    
    # api.send("debug", str(satellite))
    api.send(
        "location", 
        api.Message(location, {"Satelite": str(l1)})
    )

# Callback(s)
api.set_port_callback("tle", on_tle)

Save the graph. But it is to early to run it yet. If you try to run this graph, it will fail with the error:

Graph failure: operator.com.sap.system.python3Operator:python3operator1: Error while executing Python Operator's user provided script: No module named 'skyfield' [line 2]

Build and use a container image


The Python package skyfield is missing in the run-time container environment. We need to build a container image that includes the required package and tell the Python operator to use it during a run-time.

Go to Repository and right click on dockerfiles. Chose Create Docker File.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

Call it samples.skyfield

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

and include the following code.

FROM $com.sap.sles.base
RUN pip3 install --user skyfield

In a Configuration panel add a tag skyfield.

Save it and refresh the Repository panel. You should see two files — Dockerfile and Tags.json — created under dockerfiles.samples.skyfield folder.

Go back to the graph’s diagram and add a Python operator to a group.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

In the configuration of the group add the  same tag as assigned to the Dockerfile, i.e. skyfield.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

Run the graph. It will take a few minutes longer to start as it needs to build a new container image based on the Dockerfile we created.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

Open the UI of the second Wiretap to see transformed messages, when the graph is running.

SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Study Material, SAP HANA Career

No comments:

Post a Comment