Saturday 17 March 2018

Custom Rule Set – SAP HANA Text Search

In this blog, I’ll discuss how to create custom rule set in SAP HANA.  To implement certain custom use cases, customers have to implement their own rule set for performing Text Search Operations.

Search Rule Set


Figure 1 below shows the structure of Rule Sets stored in XML/Tree Like Formation.

SAP HANA Tutorials and Materials, SAP HANA Learning, SAP HANA Guides, SAP HANA Certifications, SAP HANA

Figure 1: Rule Set Structure

Figure 2 below shows the steps to configure and use a Search Rule Set.

SAP HANA Tutorials and Materials, SAP HANA Learning, SAP HANA Guides, SAP HANA Certifications, SAP HANA

Step 1: Add View


First step while configuring the rule set is to define a view. Search operation can be performed on Attribute Information views, Column views of type Join, and SQL views. Other database objects, such as row store tables, column store tables, calculation views, or analytic views, are not supported.

1.1 Create a column table and insert the records in column table as specified below:

CREATE COLUMN TABLE employee
(
  id            INTEGER          PRIMARY KEY,
  firstname     SHORTTEXT(100)   FUZZY SEARCH INDEX ON,
  lastname      SHORTTEXT(100)   FUZZY SEARCH INDEX ON,
  address    NVARCHAR(100)    FUZZY SEARCH INDEX ON,
  postcode      NVARCHAR(20)     ,
  cityname      NVARCHAR(100)    ,
  countrycode   NVARCHAR(2),
);

INSERT INTO employee VALUES(1, 'Michael', 'Milliken', '3999 WEST CHESTER PIKE', '001', 'NEWTON SQUARE', 'PA');

1.2 Create and Activate an Attribute View by selecting the customer table and projecting all the columns in output of attribute view as shown in Figure3.

SAP HANA Tutorials and Materials, SAP HANA Learning, SAP HANA Guides, SAP HANA Certifications, SAP HANA

Figure 3: Attribute View

Step 2: Add Stop Words and Term Mappings


Second step is to configure the nodes “Stopwords (table-based)” and “Term Mappings (table-based)” by creating two tables Stopwords and Termmapping.

2.1 Stopwords

Stopwords are the terms which are less significant terms, therefore these terms are not used to generate the result set. However, these terms do influence score calculations.  A record with stopwords identical to the user input gets a higher score than a record with differing or missing stopwords.

Stopwords can be defined either as single terms or as stopword phrases consisting of multiple terms.

Syntax for creating a term mapping table via SQL:

CREATE COLUMN TABLE stopwords
(
stopword_id VARCHAR(32) PRIMARY KEY,
list_id VARCHAR(32) NOT NULL,
language_code CHAR(2) NOT NULL,
term NVARCHAR(200) NOT NULL
);

Stopwords are stored in a column-store table with the following format:

-- to be able to use stopwords, a stopword table is needed:
CREATE COLUMN TABLE stopwords
(
  stopword_id    VARCHAR(32)    PRIMARY KEY,
  list_id        VARCHAR(32)    NOT NULL,
  language_code  VARCHAR(2),
  term           NVARCHAR(200)  NOT NULL
);

INSERT INTO stopwords VALUES('1', 'firstname', 'en', 'Dr');
INSERT INTO stopwords VALUES('2', 'firstname', 'en', 'Mr');
INSERT INTO stopwords VALUES('3', 'firstname', 'en', 'Mrs');
INSERT INTO stopwords VALUES('4', 'firstname', 'en', 'Sir');
INSERT INTO stopwords VALUES('5', 'firstname', 'en', 'Prof');

2.2 Term Mappings

Term mappings can be used to extend the search by adding additional search terms to the user input. When the user enters a search term, the search term is expanded, and synonyms, hypernyms, hyponyms, and so on are added. Term mappings are defined in a column table and can be changed at any time.

Syntax for creating a term mapping table via SQL:

CREATE COLUMN TABLE termmappings
(
    mapping_id    VARCHAR(32)   PRIMARY KEY,
    list_id       VARCHAR(32)   NOT NULL,
    language_code VARCHAR(2),
    term_1        NVARCHAR(200) NOT NULL,
    term_2        NVARCHAR(200) NOT NULL,
    weight        DECIMAL       NOT NULL
);

Term mappings are defined as a unidirectional replacement. For a term mapping definition of ‘term1’ -> ‘term2’, ‘term1’ is replaced with ‘term2’, but ‘term2’ is not replaced with ‘term1’. Term mappings are language-dependent.

Term mappings are stored in a column-store table with the following format:

-- and for term mappings another table:
CREATE COLUMN TABLE termmappings
(
  mapping_id    VARCHAR(32)   PRIMARY KEY,
  list_id       VARCHAR(32)   NOT NULL,
  language_code VARCHAR(2),
  term_1        NVARCHAR(255) NOT NULL,
  term_2        NVARCHAR(255) NOT NULL,
  weight        DECIMAL       NOT NULL
);

INSERT INTO termmappings VALUES('1', 'firstname', 'en', 'Michael', 'Mike', '0.9');
INSERT INTO termmappings VALUES('2', 'firstname', 'en', 'Mike', 'Michael',  '0.9');
INSERT INTO termmappings VALUES('3', 'firstname', 'en', 'Michael', 'Miky',  '0.8');
INSERT INTO termmappings VALUES('4', 'firstname', 'en', 'Miky', 'Michael',  '0.8');
INSERT INTO termmappings VALUES('5', 'lastname', 'en', 'Milly', 'Milliken', '0.9');
INSERT INTO termmappings VALUES('6', 'lastname', 'en', 'Mille', 'Milliken',  '0.9');

Step 3: Add Rule


Next Step is to create a General Project in ABAP perspective. Under that Project create a search rule set file as shown in Figure 4. After creation of search rule set, validate and activate this search rule.

SAP HANA Tutorials and Materials, SAP HANA Learning, SAP HANA Guides, SAP HANA Certifications, SAP HANA

Figure 4: Create a Search Rule Set

Step 4: Final Step is to perform the search operation.


New Built in function SYS.EXECUTE_SEARCH_RULE_SET is available in HANA to execute defined rule set.

Execute the custom created search rule set with below mentioned command.

CALL SYS.EXECUTE_SEARCH_RULE_SET('
<query>
 <ruleset name="ZSearch_RuleProject:Search_Rule.searchruleset" />
 <column name="FIRSTNAME">Prof. Mike</column>
 <column name="LASTNAME">Milly</column>
 <resultsetcolumn name="_SCORE" />
 <resultsetcolumn name="_RULE_ID" />
 <resultsetcolumn name="_RULE_NUMBER" />
 <resultsetcolumn name="FIRSTNAME" />
 <resultsetcolumn name="LASTNAME" />
</query>
');

This command will give user-defined result set by specifying the result set columns name in call procedure command. Figure 5 below shows the Output of call procedure.

SAP HANA Tutorials and Materials, SAP HANA Learning, SAP HANA Guides, SAP HANA Certifications, SAP HANA

Figure 5: Result Set

While executing the below procedure, in case the below error messages appears in SQL console:

Could not execute ‘CALL SYS.EXECUTE_SEARCH_RULE_SET(‘ <query> <ruleset …’

SAP DBTech JDBC: [258]: insufficient privilege: Not authorized

Recommendation would be to provide authorization of SYS.EXECUTE_SEARCH_RULE_SET to _SYS_REPO.

An alternate way to hold the result set is by transferring the records in result table by using below mentioned command.

CREATE COLUMN TABLE RESULT_STORE (
_SCORE FLOAT,
_RULE_ID VARCHAR(255),
"FIRSTNAME" TEXT,
"LASTNAME" TEXT,
"ADDRESS" NVARCHAR(100),
"POSTCODE" NVARCHAR(20),
"CITYNAME" NVARCHAR(100),
"COUNTRYCODE" NVARCHAR(2)
);

CALL SYS.EXECUTE_SEARCH_RULE_SET('
<query>
 <ruleset name="ZSearch_RuleProject:Search_Rule.searchruleset" />
 <resulttablename name="RESULT_STORE"/>
 <column name="FIRSTNAME">Prof. Mike</column>
 <column name="LASTNAME">Milly</column>
</query>
');
 
Select * from RESULT_STORE;

Figure 6 below shows the output by querying the result set table RESULT_STORE.

SAP HANA Tutorials and Materials, SAP HANA Learning, SAP HANA Guides, SAP HANA Certifications, SAP HANA

Figure 6: Result Set

Step 5. Dynamic Rule Set


There is a possibility to create a dynamic rule set by specifying the XML code in call procedure statement.

CALL SYS.EXECUTE_SEARCH_RULE_SET('
<query>
   <ruleset scoreSelection="firstRule">
      <attributeView name="TRAINING::ZAT_SEARCH_RULESET">
        <keyColumn name="ID"/>
      </attributeView>
      <termMappingsTableBased schema="SYSTEM" table="TERMMAPPINGS">
        <column name="FIRSTNAME">
          <list id="FIRSTNAME"/>
        </column>
        <column name="LASTNAME">
          <list id="LASTNAME"/>
        </column>
      </termMappingsTableBased>
      <rule name="Rule 1">
        <column minFuzziness="0.8"  name="FIRSTNAME">
          <ifMissing action="skipColumn"/>
        </column>
        <column minFuzziness="0.8"  name="LASTNAME">
          <ifMissing action="skipColumn"/>
        </column>
      </rule>
  </ruleset>
 <column name="FIRSTNAME">Mike</column>
 <column name="LASTNAME">Milly</column>
<resultsetcolumn name="_SCORE" />
 <resultsetcolumn name="_RULE_ID" />
 <resultsetcolumn name="_RULE_NUMBER" />
 <resultsetcolumn name="FIRSTNAME" />
 <resultsetcolumn name="LASTNAME" />
</query>
');

Figure 7 shows the output while specifying the dynamic rule set while calling the procedure.

SAP HANA Tutorials and Materials, SAP HANA Learning, SAP HANA Guides, SAP HANA Certifications, SAP HANA

Figure 7: Result Set

No comments:

Post a Comment