Wednesday 15 May 2019

Split table column value into multiple rows using SQL in SAP HANA

Introduction


In this post I would like to focus on the similar scenario, but this time the challenge will be to do the same not for variable but for table column. SAP HANA enables STRING_AGG function for concatenating values from multiple lines into single string, however doing opposite is quite challenging.

Scenario


In my scenario I will use my test table containing contact person details. Its structure is as follows:

CREATE COLUMN TABLE CONTACTS
(
ID INT,
COUNTRY VARCHAR(2),
FULL_NAME VARCHAR(100),
PHONE_NUMBERS VARCHAR(200)
);

Existing CONTACTS table has PHONE_NUMBERS column which stores comma delimited numbers. The purpose is to display phone numbers as separate rows and keep other contact information as in the source table. Single phone number length may vary and count of phone numbers can be also different for each record.
BW SAP HANA Modeling Tools (Eclipse), SAP HANA, SAP HANA Studio, SAP HANA Study Materials

Split column values into multiple lines


To implement the logic for splitting column values into multiple lines, following steps are required

1. Retrieve from the string values delimited by separators
2. Dynamically define maximum number of values for all concatenated strings
3. Generate multiple lines for each record
4. Distribute retrieved values to generated rows

Step 1. Retrieve from the string values delimited by separators

In this step I will use string function, which allows to distinguish values from comma separated string. For this purpose I will use SUBSTR_REGEXPR SQL function. This function allows to retrieve substring from specific string based on regular expression. It also allows to specify which occurrence of the matching substring we want to display.

Following expression allows to retrieve first occurrence of the string of any characters excluding commas

SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1 )

Knowing that in my scenario there is up to 3 phone numbers concatenated within single value, let’s add the expression for remaining numbers:

Query:

SELECT 
"ID",
"COUNTRY",
"FULL_NAME",
"PHONE_NUMBERS",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1) AS "PHONE_NUMBER1",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 2) AS "PHONE_NUMBER2",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 3) AS "PHONE_NUMBER3"
FROM 
CONTACTS;

Result:

BW SAP HANA Modeling Tools (Eclipse), SAP HANA, SAP HANA Studio, SAP HANA Study Materials

Step 2. Dynamically define maximum number of values for all concatenated strings

In this step we want to define what is the maximum number of phone number values in single string. For this purpose I will use OCCURRENCES_REGEXPR function to count number of separators within the string.

Then I will add +1, because number of commas is always less by 1 than the number of phone numbers in the string:

OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1

Now we have count of phone number occurrences for each string.

Query:

SELECT 
"ID",
"COUNTRY",
"FULL_NAME",
"PHONE_NUMBERS",
OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 AS "OCCURRENCES_COUNT"
FROM 
CONTACTS;

Result:

BW SAP HANA Modeling Tools (Eclipse), SAP HANA, SAP HANA Studio, SAP HANA Study Materials

Finally I want to see the maximum value to know, how many lines I need to generate. This value will be assigned to the variable MAX_NR_OCCURRENCES and will be used in Step 3. For the purpose of creating variable, I used anonymous block:

Query:

DO
BEGIN

DECLARE MAX_NR_OCCURRENCES INT;
SELECT
MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )
INTO
MAX_NR_OCCURRENCES
FROM
CONTACTS;

END

Step 3. Generate multiple lines for each record

For generating multiple lines for each record I will cross join CONTACT table with series of 3 records (because in my case there are max 3 phone numbers in string). To generate N records I used SERIES_GENERATE_INTEGER function. Variable defined in Step 2 will be used as input parameter for this function to define number of records to be generated:

Query:

DO
BEGIN

DECLARE MAX_NR_OCCURRENCES INT;
SELECT
MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )
INTO
MAX_NR_OCCURRENCES
FROM
CONTACTS;

SELECT * FROM SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES);

END

Result:

BW SAP HANA Modeling Tools (Eclipse), SAP HANA, SAP HANA Studio, SAP HANA Study Materials

Now let’s cross join the series with result set from Step 1. This way each record will be copied 3 times:

Query:

DO
BEGIN

DECLARE MAX_NR_OCCURRENCES INT;
SELECT
MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )
INTO
MAX_NR_OCCURRENCES
FROM
CONTACTS;

SELECT 
CNT."ID",
CNT."COUNTRY",
CNT."FULL_NAME",
CNT."PHONE_NUMBERS",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1) AS "PHONE_NUMBER1",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 2) AS "PHONE_NUMBER2",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 3) AS "PHONE_NUMBER3",
SERIES."ELEMENT_NUMBER"
FROM 
CONTACTS CNT
CROSS JOIN SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES) SERIES;

END

Result:

BW SAP HANA Modeling Tools (Eclipse), SAP HANA, SAP HANA Studio, SAP HANA Study Materials

Step 4. Distribute retrieved values to generated rows

In this step we will apply final query adjustments, to move distinguished phone numbers from columns to consecutive rows. To achieve that we can use ELEMENT_NUMBER column from SERIES_GENERATE_INTEGER function, which returns consecutive numbers for each line within specific contact person. This column will be consumed by OCCURRENCE parameter. By having consecutive numbers in ELEMENT_NUMBER column we can dynamically substring separated values one by one, and display them in consecutive rows. We also need to remember that initially for each record we generated three lines, so at the end we also need to filter out empty rows (for these cases where there are less than 3 phone numbers in string)

Query:

DO
BEGIN

DECLARE MAX_NR_OCCURRENCES INT;
SELECT
MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )
INTO
MAX_NR_OCCURRENCES
FROM
CONTACTS;

SELECT 
CNT."ID",
CNT."COUNTRY",
CNT."FULL_NAME",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE SERIES."ELEMENT_NUMBER") AS "PHONE_NUMBER"
FROM 
CONTACTS CNT
CROSS JOIN SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES) SERIES
WHERE
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE SERIES."ELEMENT_NUMBER") IS NOT NULL;
END

Result:

BW SAP HANA Modeling Tools (Eclipse), SAP HANA, SAP HANA Studio, SAP HANA Study Materials

No comments:

Post a Comment