Model Training via Python
AISQream’s Model Training via Python modules extends its machine learning capabilities beyond the currently embedded Linear Regression and XGBoost algorithms. This new feature allows for the seamless integration and utilization of a broader range of Python-based ML algorithms directly within SQream’s GPU-accelerated environment, significantly reducing development time for new models.
Syntax
– Algorithm registration
REGISTER ALGORITHM 'algorithm_name'
OPTIONS
( alg_path = 'path_to_alg_file'
[, train_method := Text -- defaults to 'train'
, predict_method := Text -- defaults to 'predict'
]);
– Create and train model
CREATE [OR REPLACE] MODEL [database.schema.]model_name
OPTIONS(model_option_list)
AS {query_statement};
model_option_list:
MODEL_PATH := Text -- path to save the model to
MODEL_TYPE := Text -- algorithm name as registered
MODEL_PARAMS := Text -- MODEL_PARAMS map of Strings
– Inference
SELECT model_predict(
[database.schema.]model_name,
feature_col1 [,feature_column2, ...])
FROM {query_statement}; --either <table_name> or (select_query) in parentheses
– Drop model
DROP MODEL [database.schema.]model_name;
– Drop Algorithm registration
UNREGISTER ALGORITHM 'algorithm_name';
Usage notes
The label column is the last column in the chunk’s input for training.
Distinguish between
alg_pathin theREGISTER ALGORITHMcommand andmodel_pathin theCREATE MODELcommand: the former specifies the path to the Python script while the latter specifies the path to the model output.The python file referred to in
alg_pathshould contain a function for training and a function for inference, both of which should accept the input data incudf.DataFrameform, and a map of string parameters. Themodel_pathparameter in the train function will contain the path the model should be saved to, and later extracted in the predict function in the form oftrain_result_path.Once registered, algorithms are persistent until removed by
UNREGISTER ALGORITHMcommand.Python version is compatible with SQream’s prerequisites 3.11
Embedded algorithms can not be unregistered.
train_methodandpredict_methodindicate the function names in the Python file, default values aretrainandpredictrespectively.Parameters for Train will be passed by the user and parsed at runtime by the Python code.