Monday 14 June 2021

Piecewise Linear Trend with Automated Time Series Forecasting (APL)

If you are a user of APL time series, you probably have seen models fitting a linear trend or a quadratic trend to your data. With version 2113 the Automated Predictive Library introduces an additional method called Piecewise Linear that can detect breakpoints in your series. You don’t have to do anything new to take advantage of this functionality, the trend is detected automatically as shown in the example below.

For SAP Analytics Cloud users, note that Piecewise Linear Trend is coming with  the 2021.Q3 QRC (August release).

This article presents two ways of using APL: Python notebook, and SQL script. Let’s start with Python.

First, we connect to the HANA database.

# Connect using the HANA secure user store

from hana_ml import dataframe as hd

conn = hd.ConnectionContext(userkey='MLMDA_KEY')

You may want to check that the Automated Predictive Library on your HANA server is recent enough.

import hana_ml.algorithms.apl.apl_base as apl_base

df = apl_base.get_apl_version(conn)

v = df[df.name.eq('APL.Version.ServicePack')].iloc[0]['value']

print('APL Version is ' + v)

SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Preparation, SAP HANA Guides

Don’t forget to sort the series over time before giving it to APL.

sql_cmd = 'SELECT * FROM "APL_SAMPLES"."BOSTON_MARATHON" ORDER BY "THE_DATE"'

hdf_in = hd.DataFrame(conn, sql_cmd)

df = hdf_in.collect()

This is how the series looks like.

import matplotlib.pyplot as plt

plt.figure(figsize=(12,5))

plt.plot(df.THE_DATE, df.WINNING_TIME)

plt.title('Boston Marathon Winning Time')

plt.xlabel('Year')

plt.ylabel('Men Times in minutes')

plt.grid()

plt.show()

SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Preparation, SAP HANA Guides

We ask APL to build a time series model and make a forecast 3 years ahead.

from hana_ml.algorithms.apl.time_series import AutoTimeSeries
model = AutoTimeSeries(time_column_name= 'THE_DATE', target= 'WINNING_TIME', horizon= 3)
hdf_out = model.fit_predict(hdf_in)

And then we display the forecasted values.

df = hdf_out.collect()

import pandas as pd
df['THE_DATE'] = pd.to_datetime(df['THE_DATE'])
df = df.set_index('THE_DATE')

plt.figure(figsize=(12,5))
ax1 = df.ACTUAL.plot(color='royalblue', label='Actual')
ax2 = df.PREDICTED.plot(color='darkorange', label='Forecast', linestyle='dashed')
h1, l1 = ax1.get_legend_handles_labels()
plt.legend(h1, l1, loc=1)
plt.title('Boston Marathon Winning Time')
plt.xlabel('Year')
plt.ylabel('Men Times in minutes')
plt.grid()
plt.show()

SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Preparation, SAP HANA Guides

The forecast line shows a piecewise trend with two breakpoints. This is confirmed by the components information.

d = model.get_model_components()
components_df = pd.DataFrame(list(d.items()), columns=["Component", "Value"])
components_df.style.hide_index()

SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Preparation, SAP HANA Guides

We display some forecasting accuracy indicators from our APL model.

import numpy as np
d = model.get_performance_metrics()
# Average each indicator across the horizon time window
apm = []
for k, v in d.items():
   apm.append((k, np.mean(v)))

metric = [apm for apm in apm if apm[0] =='L1'][0][1]
print("MAE  is {:0.3f}".format(metric))
metric = [apm for apm in apm if apm[0] =='MAPE'][0][1] *100
print("MAPE is {:0.2f}%".format(metric))
metric = [apm for apm in apm if apm[0] =='L2'][0][1]
print("RMSE is {:0.3f}".format(metric))

SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Preparation, SAP HANA Guides

We are done with our Python notebook. We said we will run the same example using SQL.

SQL code to check what the version of APL is.

call SAP_PA_APL."sap.pa.apl.base::PING"(?)

Sample SQL code to build the forecasting model and query some of the output tables.

--- Input Series
create view TS_SORTED as select * from APL_SAMPLES.BOSTON_MARATHON order by "THE_DATE" asc;
--- Output Series
create local temporary table #FORECAST_OUT (
"THE_DATE" DATE,
"WINNING_TIME" DOUBLE,
"kts_1" DOUBLE );

DO BEGIN
    declare header "SAP_PA_APL"."sap.pa.apl.base::BASE.T.FUNCTION_HEADER";
    declare config "SAP_PA_APL"."sap.pa.apl.base::BASE.T.OPERATION_CONFIG_EXTENDED";   
    declare var_desc "SAP_PA_APL"."sap.pa.apl.base::BASE.T.VARIABLE_DESC_OID";      
    declare var_role "SAP_PA_APL"."sap.pa.apl.base::BASE.T.VARIABLE_ROLES_WITH_COMPOSITES_OID";      
    declare out_log   "SAP_PA_APL"."sap.pa.apl.base::BASE.T.OPERATION_LOG";      
    declare out_sum   "SAP_PA_APL"."sap.pa.apl.base::BASE.T.SUMMARY";      
    declare out_indic "SAP_PA_APL"."sap.pa.apl.base::BASE.T.INDICATORS";      
    declare out_debrief_metric "SAP_PA_APL"."sap.pa.apl.base::BASE.T.DEBRIEF_METRIC_OID";      
    declare out_debrief_property "SAP_PA_APL"."sap.pa.apl.base::BASE.T.DEBRIEF_PROPERTY_OID";      

    :header.insert(('Oid', 'Marathon'));

    :config.insert(('APL/Horizon', '3',null));
    :config.insert(('APL/TimePointColumnName', 'THE_DATE',null));

    :var_role.insert(('THE_DATE', 'input',null,null,'Marathon'));
    :var_role.insert(('WINNING_TIME', 'target',null,null,'Marathon'));

    "SAP_PA_APL"."sap.pa.apl.base::FORECAST_AND_DEBRIEF"(:header,:config,:var_desc,:var_role,'USER_APL','TS_SORTED', 'USER_APL','#FORECAST_OUT',out_log,out_sum,out_indic,out_debrief_metric,out_debrief_property);
 
select  * from "SAP_PA_APL"."sap.pa.apl.debrief.report::TimeSeries_Components"(:out_debrief_property, :out_debrief_metric);
    select  * from "SAP_PA_APL"."sap.pa.apl.debrief.report::TimeSeries_Performance"(:out_debrief_property, :out_debrief_metric) where "Partition" = 'Validation';
END;

select THE_DATE as "Date", WINNING_TIME as "Actual", round("kts_1",3) as "Forecast" 
from #FORECAST_OUT 
order by 1;

No comments:

Post a Comment