Validation Functions
Validation functions provide tools to evaluate the performance of trained machine learning models against test datasets, offering metrics like Mean Square Error (MSE), Root Mean Square Error (RMSE) and Area Under the Curve (AUC).
How to Use Model Validation Functions
Use these functions in a SELECT statement to evaluate a model’s performance against a test dataset.
MSE / RMSE (for Regression)
Calculates the Root Mean Square Error for Linear Regression models.
Syntax
SELECT validation_functions.rmse(model_name, features, label) FROM test_table;
Parameters
'model_name'
: The name of your trained Linear Regression model.features
: The column(s) the model uses for prediction.labels
: The column with the true numeric values.
Example:
SELECT
validation_functions.rmse(
house_price_model,
(bedrooms, square_feet),
actual_price
)
FROM
housing_test_data;
Area under curve (for XGBoost)
Calculates the Area Under the Curve (AUC) for XGBoost binary classification models.
Syntax
SELECT validation_functions.area_under_curve(model_name, features, labels)
FROM test_table;
Parameters
'model_name'
: The name of your trained XGBoost model.features
: The column(s) the model uses for prediction.labels
: The column with the true binary labels (0 or 1).
Example:
SELECT
validation_functions.area_under_curve(
customer_churn_model,
account_age, monthly_spend,
has_churned
)
FROM customer_test_data;
Usage notes
These functions, like
SUM()
orAVG()
, return a single value for the entire dataset and cannot be used with aGROUP BY
clause.Feature and label columns must be numeric; rows containing
NULL
values will be ignored.When selecting a model type, use
rmse
for Linear Regression andarea_under_curve
for XGBoost.Using the incorrect function for a model type will result in an error.