SAP HANA Text Analysis

Mining social media data for customer feedback is one of the greatest untapped opportunities for customer analysis in many organizations today.

As many are aware, twenty-first century corporations are facing a crisis. Many corporations have been accurately and comprehensively storing data for years. The data is in variety of forms like social media posts, email, blogs, news, feedback, tweets, business documents etc.

It is very important to extract meaningful information without having to read every single sentence. Now, what is meaningful information. The extraction process should identify the "who", "what", "where", "when" and "how much" (among other things) from these data.
For example, use social media data to find out -
  • What people are saying about my brand or products?
  • How many people recommend my brand vs. advocate against it?
Text Analysis is the solution of all this problem.
In this article we will explain:
  • What is Text Analysis?
  • Why Text Analysis is so important for business?
  • How does SAP HANA support text analysis?
Before understanding Text Analysis, you will have to first understand Structured Data and Unstructured Data.

Structured and Unstructured Data:

Structured Data:
Data that resides in a fixed field within a record or file is called structured data. This includes data contained in relational databases and spreadsheets .
For example data stored in database tables are structured data.

Structured data has the advantage of being easily entered, stored, queried and analyzed.


Unstructured Data:
The phrase "unstructured data" usually refers to information that doesn't reside in a traditional row-column database. 

Unstructured data files often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. 


Digging through unstructured data can be cumbersome and costly. Email is a good example of unstructured data. It's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured. Other examples of unstructured data include books, documents, medical records, and social media posts.

Why unstructured data is so important for business?
Experts estimate that 80 to 90 percent of the data in any organization is unstructured. And the amount of unstructured data in enterprises is growing significantly -- often many times faster than structured databases are growing.

The only problem is extracting meaningful information from unstructured data.

What is Text Analysis?

Text Analysis is the process of analyzing unstructured text, extracting relevant information and then transforming that information into structured information that can be leveraged in different ways.

Text Analysis refers to the ability to do Natural Language Processing, linguistically understand the text and apply statistical techniques to refine the results. 
With the help of text analysis we can model and structure the information content of unstructured data for the purpose of business analysis, research and investigation.


Mapping Business Needs to Text Analysis


Example of Meaning Extraction from a sentence 

There are few important techniques being used in Text Analysis.
  • Full Text Search
  • Full Text Indexing
  • Fuzzy Search
Let's have a look into them one by one. 

Full Text Search:

The primary function of full-text search is to optimize linguistic searches.

Full text search is designed to perform linguistic (language-based) searches against text and documents stored in your database. 
In a full-text search, the search engine examines all of the words in every stored document as it tries to match search criteria (text specified by a user). 

Full Text Indexing:

When dealing with a small number of documents, it is possible for the full-text-search engine to directly scan the contents of the documents with each query, a strategy called "serial scanning." This is what some rudimentary tools, such as grep, do when searching.

However, when the number of documents to search is potentially large, the problem of full-text search is often divided into two tasks: indexing and searching. 

The indexing stage will scan the text of all the documents and build a list of search terms (often called an index). In the search stage, when performing a specific query, only the index is referenced, rather than the text of the original documents. 
The indexer will make an entry in the index for each term or word found in a document, and possibly note its relative position within the document. 

Conceptually, full-text indexes support searching on columns in the same way that indexes support searching through books. 

Fuzzy Search:

Also known as approximate string matching. 
Fuzzy search is the technique of finding strings that match a pattern approximately (rather than exactly). 
It is a type of search that will find matches even when users misspell words or enter in only partial words for the search. 

A Real World Example:
If a user types "SAP HANA Tutorl" into Yahoo or Google (both of which use fuzzy matching), a list of hits is returned along with the question, "Did you mean "SAP HANA Tutorial".

To know more about Fuzzy Search, please check

How Business can take leverage of Text Analysis:

All that tech talk is fine, but how can Text Analysis help companies make more money?

Below are the few real time examples. 

Automate the process of customer response:
There is an airline company that wanted to automate the process of responding to customer requests via email. Using SAP Text Analysis technology, they are able to classify incoming emails and accurately and effectively respond to requests. This also helps them reduce their call-center costs.

Automate document categorization, search and retrieval:
Another example is of a financial services company that uses SAP Text Analysis technology as the backbone for their automatic content enrichment platform. They use Text Analysis to discover meta-data in input text data feeds, making document categorization, search and retrieval a seamless process.

Find public intent to buy a product from Twitter:
Suppose your company is planning to launch a new product (say smart phone, bike etc.) in market.
You can do a text analysis on Twitter data to find out
  • How many people are showing their interest to buy this product?
  • How frequent people are talking about this new product?
  • Is there any negative comments or rumor going around for this product?
Top Business Use-cases of text Analysis:
  • Brand/ Product/ Reputation Management
    • Market research and social media monitoring, i.e. what people are saying about my brand or products
  • Voice of the Customer/ Customer Experience Management
    • Do I need to step in and offer customer service?
    • How many people recommend my brand vs. advocate against it?
  • Search, Information Access, or Questions Answering
    • Which bloggers are negative towards USA Policies?
    • Which of the hotels on India get great reviews for the room service?
  • Competitive Intelligence
    • What competing products are peope considering and why?
    • Are competitors's media sped generating purchase intent?
Implementation of Text Analysis in SAP HANA:

The implementation of Text Analysis is one of the coolest features of SAP HANA. 
Text analysis is supported from SAP HANA SP05. 

SAP HANA Text Analysis has market-leading, out-of-the-box predefined entity types that are packaged as part of the platform. Looking at a clause, sentence, paragraph, or document, the technology can identify the "who", "what", "where", "when" and "how much" and classify it accordingly. 
For example, in the following sentence "India celebrates Independence day on 15th August?, the analysis can identify the country, holiday and month using HANA"s predefined core extraction. 

If you have reach till this end, you should have a clear understanding on Text Analysis. 
if you have any doubt or question, please leave a comment. 

Continue reading:

1 comment:

  1. Hiiiii.....Thanks for sharing Great information....Nice post....Keep move on....
    SAP HANA Training in Hyderabad

    ReplyDelete