Tuesday 28 April 2020

Create graphs on SAP HANA Cloud

Start there to learn how to create your own SAP HANA cloud instance and develop with SAP Web IDE Full-Stack.

Our goal is to create a graph workspace that represents routes travelled between airports by different airlines.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

In computing, a graph database uses nodes, edges, and properties to represent and store data.

The graph describes the relationships(=edges) between the different entities(=nodes) and all the properties of these relationships and entities.

The relationships allow data to be linked together directly and retrieved in one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database.

Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

Graph databases are a type of NoSQL database, they can be very flexible, as data does not have to reside within pre-fixed schemas. Users can add or remove properties from nodes and edges as they go.

SAP HANA is not a NoSQL database. All data in SAP HANA reside within a schema. So how do we manage to offer graph capabilities ?

All data are stored in the vertex table (set of vertices) and the edge table (set of edges). Vertex attributes match to columns of the vertex table. Edge attributes match to columns of the edge table.

One of the vertex attributes (the vertex key) uniquely identifies vertices.
One of the edge attributes (the edge key) uniquely identifies edges.
The edge table contains two additional columns referencing the source vertex and the target vertex of each edge.

Now, I call them tables, but they do not have to be physical tables. They can be based on views which aggregate several different tables without data replication. They can also be based on virtual tables which access data stored in another database.

Relational storage allows all the functions of SAP HANA to be applied to the graph data: access control, backup and recovery, etc. It also allows all SAP HANA Graph functions to be applied to the graph data stored in relational format coming from business applications. SAP HANA Graph provides a dedicated catalog object, which is referred to as a graph workspace, for defining a graph in terms of the existing SAP HANA tables.

Here is an example with articles and their authors :

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications


SAP HANA’s native graph engine

◉ Property graph model embedded in a relational model : full transaction properties (ACID)
◉ Built-in functions : shortest path(1-1, 1-all, top k) and strongly connected components, Neighbors, Breadth First Search
◉ Support for pattern matching using openCypher
◉ GraphScript to develop custom graph algorithms
◉ Graph viewer, SAP HANA Database Explorer

Benefits

◉ Store and analyze connected data in real-time
◉ Combine text, spatial, and advanced analytics with graph intelligence
◉ Tightly integrated in SAP HANA operations (security, backup/restore, scale-out, import/export etc.)

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Create an edge and a vertex table


These two tables will become our edge and vertex tables.

Start by importing some data.

Insert data in csv format for airports and routes.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Import that data to tables with .hdbtabledata files

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Build your project and you will have data within your two tables. You can check them in the Database Explorer.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Create a graph workspace


Now that you have a vortex table(AIRPORTS) and an edge table (ROUTES), you can create a graph workspace with a .hdbgraphworkspace file.

Airport code will be the key for airports.

Route ID will be the key for routes, and each route will have an origin and a destination.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Once you build your graph workspace, you can use all HANA graph algorithms on your data.

Let’s start by exploring our data through the Database Explorer.

By clicking your graph workspace, you can see an overview of the edge and vertex table. Press the mark on the upper-right to explore the graph.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Explore the graph algorithms


This opens a graphical view of the data, from here you can see the details of each node and edge by clicking them and filter the data.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

You can test algorithms directly within the graphical view of Database Explorer on SAP HANA, on-premise. This features which is not yet available on SAP HANA Cloud as of April 2020.

You can use the Shortest Path algorithm to find the shortest path between two airports. By selecting the weight column, you can choose between distance and time, for example.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

You can find all nodes which are related to a particular node with the Neighborhood algorithm. You can set up the depth of search, and whether it searches for incoming or outgoing edges.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Find clusters within your data with the Strongly Connected Components alogrithm. In this case, all airports are interconnected, so they form one big cluster.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

SAP HANA supports two languages to perform graph computation : GraphScript and openCypher.

I prepared a procedure using the LANGUAGE GRAPH in a .hdbprocedure file.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

This takes the origin and destination airports as input, and outputs the total number of segment, the total distance, the total duration and routing. It uses the shortest path algorithm to compute the most efficient path between nodes.

In the database explorer, right-click the procedure and select “Generate CALL Statement with UI”.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

This opens a UI where you can input the parameters. In this case we’ll run the procedure from Dubai to Paris.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

There is one stop in Amsterdam, and you can see the airline, distance and duration for each flight.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

In order to find nodes related to any particular node, you can use the Traversal Breadth-First Search algorithm.

In this example, we compute how many airports have direct flights to Atlanta, how many airports are connected through one transfer, or two transfers.

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Pattern matching with openCypher


SAP HANA supports openCypher for pattern matching. Let’s use Cypher to match this pattern : All flights from A to B, with filters on : the country, the distance and the altitude of the target, as well as a custom filter.

In openCypher, nodes are represented within parenthese : (a)
Edges are represented as arrows : -[e]->

We select all flights coming from the United States, to any destination within 4000km, with an altitude less than 50m.

We add a custom filter : the airline name must contain Airline, but the search is fuzzy, which means that similar words such as Air line or Airlines are also returned.

SELECT * FROM OPENCYPHER_TABLE( GRAPH WORKSPACE "TRAVEL"."Flights" QUERY
    '
MATCH (a)-[e]->(b)
WHERE a.country = ''United States'' AND e.distance> 4000 AND b.altitude < 50 AND SYS.TEXT_CONTAINS(e.airlineName, ''Airline'', ''FUZZY(0.4)'')
RETURN a.airportCode AS airportCodeFrom, b.airportCode AS airportCodeTo, e.airlineName AS airlineName, e.distance AS distance, b.altitude AS altitude
ORDER BY b.altitude, e.distance DESC
    '
)

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Let’s see another example of openCypher. We match 3 different patterns in one query.

MATCH (a)-[e1]->(b)
MATCH (b)-[e2]->(c)
MATCH (c)-[e3]->(d)

Then we apply filters to search for flights from NTE to PDX.

SELECT * FROM OPENCYPHER_TABLE( GRAPH WORKSPACE "TRAVEL"."Flights" QUERY
    '
MATCH (a)-[e1]->(b)
MATCH (b)-[e2]->(c)
MATCH (c)-[e3]->(d)
WHERE a.airportCode = ''NTE'' AND d.airportCode = ''PDX'' 
RETURN e1.airlineName AS airlineName1, b.airportCode AS transferAirportCode1, e2.airlineName AS 
airlineName2, c.airportCode AS transferAirportCode2, e3.airlineName AS airlineName3
    '
)

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

Use case : Supply chain risk analysis


Who is affected when a supplier shuts down ?
Which supplier was responsible for quality issues ?

These are some questions that customers answer with SAP HANA graph. In this use case, the customer stores purchasing data, engineering data, and sales data in different systems. To support risk analysis, the data is consolidated in a single SAP HANA system.

SAP HANA Graph is used to support a global “where-used list”, essentially providing cross-system “reachability analysis”. Which customer depends on which suppliers?

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

The graph has around 250k Nodes (supplier, material, customer) and 1 million edges (supplies, is-used-in, purchases). The impact analysis implementation with GraphScript takes about 10 lines of code using build-in BFS traversal operation with filter conditions. There is no data redundancy, the graph is based on views.

It takes –100 ms to aggregate all reachable materials and customers from a single supplier.

It takes 2 seconds to identify/count all customers for all suppliers

SAP HANA Cloud, SAP HANA Tutorial and Material, SAP HANA Exam Prep, SAP HANA Certifications

No comments:

Post a Comment