Wednesday 22 June 2016

SAP HANA Database as a Graph Store - Creating Graph Database Objects

This Document is part of the series on HANA Database as a Graph Store in SAP HANA SPS 12 : SAP HANA Database as a Graph Store - Introduction

After getting the brief understanding on Graph Database in the previous discussion, let us proceed ahead in creating the needed Graph Database objects for the discussed Insurance-Fraud scenario.

Let us create Graph Database tables for the edges and vertices  discussed in the previous Insurance-Fraud scenario using the below SQL's :
 Vertex Table:

CREATE COLUMN TABLE "INSURANCE_FRAUD"."MEMBERS" (
  "NAME" VARCHAR(100) PRIMARY KEY,
  "TYPE" VARCHAR(100)
);

Similarly let us create database table for edges that defines various roles played by the above vertices:

Edge Table:

CREATE COLUMN TABLE "INSURANCE_FRAUD"."RELATIONSHIPS" (
  "KEY" INT UNIQUE NOT NULL,
  "SOURCE" VARCHAR(100) NOT NULL
    REFERENCES "INSURANCE_FRAUD"."MEMBERS" ("NAME")
    ON UPDATE CASCADE ON DELETE CASCADE,
  "TARGET" VARCHAR(100) NOT NULL
    REFERENCES "INSURANCE_FRAUD"."MEMBERS" ("NAME")
    ON UPDATE CASCADE ON DELETE CASCADE,
  "TYPE" VARCHAR(100)
);

Here the above created edge table must have a column as Primary Key and two other columns as Foreign Key's referring the Primary Key column of the vertex table and inferring source and target nodes of a particular edge/relationship.

Let us now insert the data into Vertex table to define the set of involved people, cars and events in the fraud-insurance act :

INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('John', 'Person');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Williams', 'Person');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Jane', 'Person');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Tony', 'Person');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Peter', 'Person');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Robert', 'Person');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Maruti', 'Car');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Benz', 'Car');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('BMW', 'Car');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Audi', 'Car');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Accident1', 'Accident');
INSERT INTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Accident2', 'Accident');

Now let us insert data into the Edge table to define various relationships between the vertexes like Drives, medicates, witnesses etc so as to figure out the multiple fake roles(edges) played by same people(vertex) in the act.

insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(1,'John','Maruti','Drives');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(2,'John','BMW','Witnesses');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(3,'Williams','Benz','Drives');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(4,'Williams','BMW','Is_Passenger');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(5,'Jane','Audi','Drives');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(6,'Tony','John','Advocates');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(7,'Tony','Jane','Advocates');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(8,'Peter','Jane','Medicate');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(9,'Robert','Audi','Repairs');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(10,'Maruti','Accident1','Involves');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(11,'Benz','Accident1','Involves');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(12,'BMW','Accident2','Involves');
insert into "INSURANCE_FRAUD"."RELATIONSHIPS" values(13,'Audi','Accident2','Involves');

The maximum number of attributes is bound by the maximum number of columns for the above table. One of the vertex attributes must uniquely identify vertices. This attribute is also referred to as a vertex key. Similarly, one of the edge attributes must uniquely identify edges and is referred to as edge key. The edge table contains two additional columns referencing the key column of the vertex table. One of them identifies the source vertex and the other identifies the target vertex of an edge.

SAP HANA Graph provides a dedicated catalog object, called graph workspace, to define a graph in terms of the above created SAP HANA tables.

Graph WorkSpaces :

A graph workspace is a catalog object that defines a graph in terms of tables and columns:

  • vertex table
  • edge table
  • key column in the vertex table
  • key column in the edge table
  • source vertex column in the edge table
  • target vertex column in the edge table

Both vertex key and edge key columns need to be flagged as unique and NOT NULL.

A graph workspace can be uniquely identified by the database schema it resides in and the workspace name. An SAP HANA instance can contain multiple graph workspaces in the same schema (with different workspace names) or different database schemas.

Graph workspace information is stored in the GRAPH_WORKSPACES system view.

We shall create the graph database by defining the  above mentioned GraphWorkspace that helps us to connect the already created vertex and edges , there by to derive out various insights needed :

CREATE GRAPH WORKSPACE "INSURANCE_FRAUD"."GRAPH"
EDGE TABLE "INSURANCE_FRAUD"."RELATIONSHIPS"
SOURCE COLUMN "SOURCE"
TARGET COLUMN "TARGET"
KEY COLUMN "KEY"
VERTEX TABLE "INSURANCE_FRAUD"."MEMBERS"
KEY COLUMN "NAME";

With the created Graph Database, let us fetch the information about a person(Vertex)  as whether he is repeatedly involved(connected/Edge) or not in the accidents(vertex).

Neighborhood Search Algorithm :

To achieve the above requirement, let us create a calculation scenario with “Neighborhood Search” Algorithm to figure out the existence of connection between person(John) and accidents with the below create statement

CREATE CALCULATION SCENARIO "INSURANCE_FRAUD"."GET_NEIGHBORHOOD_EXAMPLE" USING '
<?xml version="1.0"?>
<cubeSchema version="2" operation="createCalculationScenario" defaultLanguage="en">
<calculationScenario schema="INSURANCE_FRAUD" name="GET_NEIGHBORHOOD_EXAMPLE">
<calculationViews>
<graph name="get_neighborhood_node" defaultViewFlag="true" schema="INSURANCE_FRAUD" workspace="GRAPH" action="GET_NEIGHBORHOOD">
<expression>
<![CDATA[{
"parameters": {
"startVertices": ["John"],
"direction":"outgoing",
"minDepth": 0,
"maxDepth": 2
}
}]]>
</expression>
<viewAttributes>
<viewAttribute name="NAME" datatype="string"/>
<viewAttribute name="DEPTH" datatype="int"/>
</viewAttributes>
</graph>
</calculationViews>
</calculationScenario>
</cubeSchema>
' WITH PARAMETERS ('EXPOSE_NODE'=('get_neighborhood_node', 'GET_NEIGHBORHOOD_EXAMPLE'));

In the above calculation scenario we define the name of the action(Graph Algo),Parameters such as the start vertex from which we must get the nearest vertex with a depth of connection existing and also the direction to finalize whether it’s an incoming edge or an outgoing edge to be considered to find the nearest vertex.

This creates a calculation scenario with a single graph node with the GET_NEIGHBORHOOD action on the graph workspace “INSURANCE_FRAUD”.”GRAPH”.

Here we can traverse the underlying graph using all outgoing edges starting from vertex ‘John’ and returns vertices with minimum depth 0 and maximum depth of 2 from the start vertex.

Execute the below query on the above created Calc Scenario to  find the relationship of John/Person(Vertex) with Other Vertex(Cars and Accidents) :

SELECT * FROM "INSURANCE_FRAUD"."GET_NEIGHBORHOOD_EXAMPLE" ORDER BY "DEPTH";

SAP HANA Database as a Graph Store - Creating Graph Database Objects

Thus with Neighborhood search algorithm we are able to find the nearest vertices from the start vertex for a given depth.

In our example we were able to fetch the result of Cars and Accidents the Person ‘John’ is connected to in the fraud-insurace ring.

Result-set says, John has a direct relationship to Maruti and Audi Cars(As the Depth value is 1) and  is having indirect relationship to accidents 1 and 2 via another vertex(As the Depth value is 2).

Let us proceed ahead and understand the other possible ways to uncover fraud ring using available Graph Algorithms in HANA database in the next discussion : Uncovering the network using Graph Algorithms

Source: scn.sap.com

No comments:

Post a Comment