Monday 2 December 2019

Split string into multiple rows using SQL in SAP HANA

Introduction


In this post I would like to describe options for splitting concatenated string into multiple rows using SQL in SAP HANA. Depending on which version of HANA system you are working you can choose the one which is valid.

Creating custom function (when SAP HANA system version is 2.0 SP02 or lower)


If your HANA system is of version 2.0 SP2 or older, then there is no in-built function for splitting strings, that’s why you will need to create custom function.

Scenario 1.

You have the comma separated string ‘A1,A2,A3,A4,A5,A6’ which you want to display as multiple rows. Below there is a code for table function splitting comma separated string into multiple lines.

FUNCTION "_SYS_BIC"."TMP::TF_SPLIT_STRING" ( INPUT_STRING VARCHAR(5000) )
RETURNS TABLE
(
"OUTPUT_SPLIT" VARCHAR(5000)
)
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER AS
BEGIN

DECLARE COUNTER INT := 1;
SPLIT_VALUES = SELECT SUBSTR_BEFORE(:INPUT_STRING,',') SINGLE_VAL FROM DUMMY;
SELECT SUBSTR_AFTER(:INPUT_STRING,',') || ',' INTO INPUT_STRING FROM DUMMY;

WHILE( LENGTH(:INPUT_STRING) > 0 )
DO

   SPLIT_VALUES =
 
    SELECT SUBSTR_BEFORE(:INPUT_STRING,',') SINGLE_VAL FROM DUMMY  
   UNION  
SELECT SINGLE_VAL FROM :SPLIT_VALUES;

   SELECT SUBSTR_AFTER(:INPUT_STRING,',') INTO INPUT_STRING FROM DUMMY;
   
END WHILE;

RETURN

SELECT SINGLE_VAL AS "OUTPUT_SPLIT" FROM :SPLIT_VALUES;

END

*If your string is separated by different delimiter simply replace ',' with other symbol.

Now let’s query the function for the string which we need to split:

DO
BEGIN

DECLARE TEST_STRING VARCHAR(50) := 'A,B,C,D,E,F,G,H';

SELECT "OUTPUT_SPLIT" FROM "_SYS_BIC"."TMP::TF_SPLIT_STRING" (:TEST_STRING);

END

Result:

SAP HANA Study Materials, SAP HANA Guides, SAP HANA Tutorial and Material, SAP HANA Learning

Using SQLSCRIPT built-in Library (recommended when SAP HANA version is 2.0 SP03 or higher)


Version 2.0 SP03 of SAP HANA offers new enhancement of SQLScript library – SQLSCRIPT_STRING. Functions within this library enables easy way of splitting string into multiple rows. Additionally developer has more flexibility to define splitting rules.

Scenario 1


You have the comma separated string

'A1,A2,A3,A4,A5,A6'

which you want to display as multiple rows.

Use following code:

SAP HANA Study Materials, SAP HANA Guides, SAP HANA Tutorial and Material, SAP HANA Learning

I. Explicitly declare usage of SQLSCRIPT_STRING library. Assign alias for the library (in this example name of the alias is “LIBRARY”, but you can assign any other name)

II. Explicitly declare output table

III. Use SPLIT_TO_TABLE function. Combine it with the alias assigned in point I. As input for SPLIT_TO_TABLE function use string which you want to split, and after comma define delimiter (in this example comma is delimiter). At the end you need to assign output of that function to the table variable defined in point II.

IV. Query the table variable

After running following the query string will be splitted into multiple rows:

DO
BEGIN

USING SQLSCRIPT_STRING AS LIBRARY;
DECLARE TEST_OUTPUT TABLE(RESULT NVARCHAR(5000));
DECLARE TEST_STRING VARCHAR(50) := 'A,B,C,D,E,F,G,H';

TEST_OUTPUT = LIBRARY:SPLIT_TO_TABLE(:TEST_STRING,',');
SELECT * FROM :TEST_OUTPUT;

END

Result:

SAP HANA Study Materials, SAP HANA Guides, SAP HANA Tutorial and Material, SAP HANA Learning

Scenario 2


There is a string which combines first and last name, phone and the address:

'Sarah Blake, 98 921 29 30, 270 Sycamore Street Brookfield, WI 53045, US' 

You want to split this string into three values: full name, phone, address.

Use following code:

DO
BEGIN

USING SQLSCRIPT_STRING AS LIBRARY;
DECLARE TEST_OUTPUT TABLE(RESULT NVARCHAR(5000));
DECLARE TEST_STRING VARCHAR(100) := 'Sarah Blake, 98 921 29 30, 270 Sycamore Street Brookfield, WI 53045, US';

TEST_OUTPUT = LIBRARY:SPLIT_TO_TABLE(:TEST_STRING,',',2);
SELECT * FROM :TEST_OUTPUT;

END

Result:

SAP HANA Study Materials, SAP HANA Guides, SAP HANA Tutorial and Material, SAP HANA Learning

Here I used third optional parameter (MAXSPLIT) which is available for SPLIT_TO_TABLE and assigned value of 2. This way I will limit the split to only first two comma occurrences (subsequent commas will not be considered).

Besides defining delimiter for split, you can also specify maximum number of splits. Function takes first n-number of delimiter occurences, and remaining part is being displayed in the last row. As you can see in the string from the example there are 4 commas, so if this parameter would not be specified, then in output there would be 5 rows.

Scenario 3


There is a string which combines number, date and time. It’s separated by #, DATE, TIME string:

'10000000123#20190101DATE115737TIME'
You want to split this string into three strings: number, date, time.

Use following code:

DO
BEGIN

USING SQLSCRIPT_STRING AS LIBRARY;
DECLARE TEST_OUTPUT TABLE(RESULT NVARCHAR(5000));
DECLARE TEST_STRING VARCHAR(50) := '10000000123#20190101DATE115737TIME';

TEST_OUTPUT = LIBRARY:SPLIT_REGEXPR_TO_TABLE(:TEST_STRING, '\#|[A-Z]*[A-Z]') ;
SELECT * FROM :TEST_OUTPUT WHERE "RESULT" != '';

END

Output:

SAP HANA Study Materials, SAP HANA Guides, SAP HANA Tutorial and Material, SAP HANA Learning

Here I used function SPLIT_REGEXPR_TO_TABLE. It can be combined with regular expressions, which gives user even more options for defining logic for splitting strings. Regular expression from the example splits string after each occurencs of # symbol or uppercase alphabet string.

No comments:

Post a Comment