Saturday 16 July 2016

A closer look at tables that contain LOB data type columns

Introduction


Since SPS06, SAP HANA offers a flexibility around how it manages data type LOB (Large Object Types) columns, namely Hybrid LOBs. Data types, CLOB, NCLOB and BLOB are used to store a large amount of data such as text documents and images. The current maximum size for an LOB on SAP HANA is 2GB.

In the following blog, we will go through some examples of how Hybrid LOBS are defined, specifically around the memory threshold value and the effect that has on keeping the LOB column data on disk versus allowing it to be loaded into memory.

In-memory databases like SAP HANA, ideally you would only want to load data into memory that is actually required for processing. As SAP HANA will not compress LOB columns regardless of whether it resides in disk or in-memory, it is essential that any possible compression algorithm logic (e.g. gzip) are applied at the application layer on writing/reading from the database.

As it is expected that LOB column data will be more than likely stored on disk and temporarily loaded into memory for processing (depending on memory threshold value, 1000 bytes by default), it is worth doing specific analysis and sizing on your tables that contain LOB data columns to see what works best in your environment.

Definition


The following guideline is key for the memory threshold value.

  • If <memory_threshold_value> is not provided, a hybrid lob with memory threshold 1000 is created as default.
  • If <memory_threshold_value> is provided and its LOB size is bigger than memory threshold value, LOB data is stored on disk.
  • If <memory_threshold_value> is provided and its LOB size is equal or less than memory threshold value, LOB data is stored in memory.
  • If <memory_threshold_value> is NULL, all LOB data is stored in memory.
  • If <memory_threshold_value> is 0, all LOB data is stored in disk.

Create Table with LOB column


Let's try it out, create a simple table with 3 columns

CREATE COLUMN TABLE "HOLLAS"."SH_TEST_XML" (
  "XML_ID" VARBINARY(16) CS_RAW NOT NULL ,
  "LINE_NO" NVARCHAR(10) DEFAULT '0000000000' NOT NULL ,
  "XML_STRING" NCLOB,
  PRIMARY KEY (
  "XML_ID",
  "LINE_NO"))

As we left the memory_threshold_value blank on the create, we expect the default to have the 1000 bytes value. Lets confirm.

A closer look at tables that contain LOB data type columns

(Aside, it wasn't altogether obvious what table/column I would find the value for memory_threshold, but it's always good to check table_columns or view_columns for column_name (in this case like '%MEM%THRES%'), to see what tables/views may have a column with this name).

I had a similar table with xml string values to hand, so for this example, I inserted 10K rows into our test xml table. The original XML string value had been truncated into 1kb row segments, with the last row in the group holding the overflow value as below.

A closer look at tables that contain LOB data type columns

So for our example, I wanted to unload the data, alter the threshold value to 0, reload the table and confirm that the LOB column data does not reside in memory.

unload "HOLLAS"."SH_TEST_XML";  
alter table "HOLLAS"."SH_TEST_XML" alter ("XML_STRING" NCLOB MEMORY THRESHOLD 0);  
load "HOLLAS"."SH_TEST_XML" all;  

Looking at the definition of the table post Load, the memory consumption level is as expected very small compared to overall size of the table, we presume here that the first two non-LOB data columns are loaded into memory. As we had set the memory threshold value to 0, this forced the LOB column rows of the table to move out of memory and stored on disk.

A closer look at tables that contain LOB data type columns

Now, let's set it back to the default of 1000bytes, our expectation now is that the memory consumption here will only marginally increase, based on the fact that only 37 rows out of the 10k rows had a length < 1000.

Statement 'Select x.*, length(xml_string) from "HOLLAS"."SH_TEST_XML" as x where length(xml_string) < 1000'
successfully executed in 485 ms 503 µs  (server processing time: 272 ms 18 µs)
Fetched 37 row(s) in 111 ms 384 µs (server processing time: 0 ms 299 µs)


unload "HOLLAS"."SH_TEST_XML";  
alter table "HOLLAS"."SH_TEST_XML" alter ("XML_STRING" NCLOB MEMORY THRESHOLD 1000);  
load "HOLLAS"."SH_TEST_XML" all;  

Refreshing the table definition, the memory consumption increase is marginal as expected.

A closer look at tables that contain LOB data type columns

Aside: One thing we've noticed with tables with LOB columns, the disk size space shown above is well above the initial size, it appears to increment after every alter command, as if it's keeping a copy of the table before & after, anyway not entirely sure why that is showing up like that, but not going to spend time on it now, near the end of this blog 

One more test, let's set the threshold above the max value for the LOB column size, in this case 1024bytes.

unload "HOLLAS"."SH_TEST_XML";  
alter table "HOLLAS"."SH_TEST_XML" alter ("XML_STRING" NCLOB MEMORY THRESHOLD 1024);  
load "HOLLAS"."SH_TEST_XML" all;  

As expected, the entire table is now loaded into memory, as the table definition is reflected below.

A closer look at tables that contain LOB data type columns

Just a note on the hybrid LOB container values, the idea with the container or group of LOB column values, I believe it is possible to have a small LOB table be part of the same container, and get loaded in/out of memory at the same time. I was not able to prove this out, but would welcome some documentation links on how to do same.

You can see the container value information for the tables in M_TABLE_LOB_FILES.

A closer look at tables that contain LOB data type columns

Source: scn.sap.com

No comments:

Post a Comment