Model Training via Python
AISQream’s Model Training via Python modules extends its machine learning capabilities beyond the currently embedded Linear Regression and XGBoost algorithms. This new feature allows for the seamless integration and utilization of a broader range of Python-based ML algorithms directly within SQream’s GPU-accelerated environment, significantly reducing development time for new models.
Syntax
– Algorithm registration
REGISTER ALGORITHM 'algorithm_name'
OPTIONS
( alg_path = 'path_to_alg_file'
[, train_method := Text -- defaults to 'train'
, predict_method := Text -- defaults to 'predict'
]);
– Create and train model
CREATE [OR REPLACE] MODEL [database.schema.]model_name
OPTIONS(model_option_list)
AS {query_statement};
model_option_list
:
MODEL_PATH := Text -- path to save the model to
MODEL_TYPE := Text -- algorithm name as registered
MODEL_PARAMS := Text -- MODEL_PARAMS map of Strings
– Inference
SELECT model_predict(
[database.schema.]model_name,
feature_col1 [,feature_column2, ...])
FROM {query_statement}; --either <table_name> or (select_query) in parentheses
– Drop model
DROP MODEL [database.schema.]model_name;
– Drop Algorithm registration
UNREGISTER ALGORITHM 'algorithm_name';
Usage notes
The label column is the last column in the chunk’s input for training.
Distinguish between
alg_path
in theREGISTER ALGORITHM
command andmodel_path
in theCREATE MODEL
command: the former specifies the path to the Python script while the latter specifies the path to the model output.The python file referred to in
alg_path
should contain a function for training and a function for inference, both of which should accept the input data incudf.DataFrame
form, and a map of string parameters. Themodel_path
parameter in the train function will contain the path the model should be saved to, and later extracted in the predict function in the form oftrain_result_path
.Once registered, algorithms are persistent until removed by
UNREGISTER ALGORITHM
command.Python version is compatible with SQream’s prerequisites 3.11
Embedded algorithms can not be unregistered.
train_method
andpredict_method
indicate the function names in the Python file, default values aretrain
andpredict
respectively.Parameters for Train will be passed by the user and parsed at runtime by the Python code.