Introduction

Machine learning capabilities inside MySQL HeatWave allow you to build, train, and evaluate models directly within the database environment. Traditionally, model evaluation required exporting data into external tools such as Python or specialized ML platforms. This approach often introduced additional complexity, data movement overhead, and security concerns.

This approach eliminates the need for external tools like Python, enabling full machine learning workflows directly inside MySQL.

In my previous blog, How to Use ML_PREDICT in MySQL AI, we explored how to generate predictions directly within MySQL using trained models. However, generating predictions is only part of the machine learning lifecycle.

With MySQL HeatWave ML, the entire lifecycle can be handled in one place:

  • Training models
  • Generating predictions
  • Evaluating model performance

Among these, evaluation is a critical step because it determines whether a model can be trusted in real-world scenarios. A model that produces predictions without validation can lead to inaccurate decisions.

To solve this, MySQL provides the sys.ML_SCORE procedure, which allows you to evaluate model performance directly within the database.

MySQL AI Machine Learning Lifecycle

The complete machine learning workflow in MySQL follows a structured pipeline:

Raw Data → ML_TRAIN → Model → ML_PREDICT → Predictions → ML_SCORE → Evaluation

Each step has a defined role:

  • ML_TRAIN builds the model using historical data
  • ML_PREDICT generates predictions based on the model
  • ML_SCORE evaluates how accurate those predictions are

This lifecycle ensures that models are not only built but also validated before being used in production.

What is sys.ML_SCORE?

The sys.ML_SCORE procedure is a built-in MySQL HeatWave ML routine designed to evaluate machine learning models.

It performs the following operations:

  • Takes a test dataset containing actual (ground truth) values
  • Uses the trained model to generate predictions
  • Compares predicted values with actual values
  • Calculates a performance score based on the selected metric

This evaluation process helps determine how well the model generalizes to unseen data.

Why Model Evaluation is Important

Model evaluation is not optional; it is essential for ensuring reliability.

Without evaluation:

  • The accuracy of predictions cannot be verified
  • Errors remain hidden
  • Poor models may be deployed into production

Even a model with high accuracy can fail in critical scenarios. For example, in classification problems with imbalanced data, accuracy alone may be misleading.

ML_SCORE provides deeper insights through multiple evaluation metrics, allowing you to assess model quality from different perspectives.

Syntax of sys.ML_SCORE

CALL sys.ML_SCORE(
'schema_name.table_name',
'target_column_name',
@model_handle,
'metric_name',
@score,
NULL
);SELECT @score AS model_score;




Explanation of Parameters:

  • schema_name.table_name → Test dataset
  • target_column_name → Actual values (ground truth)
  • @model_handle → Model generated using ML_TRAIN
  • metric_name → Evaluation metric (accuracy, r2, etc.)
  • @score → Output variable storing the result

Supported Evaluation Metrics

Classification Metrics

  • accuracy
  • balanced_accuracy
  • precision / precision_macro / precision_micro / precision_weighted
  • recall / recall_macro / recall_micro / recall_weighted
  • f1 / f1_macro / f1_micro / f1_weighted

Regression Metrics

  • r2
  • neg_mean_squared_error
  • neg_mean_absolute_error
  • neg_mean_squared_log_error
  • neg_median_absolute_error

These metrics provide a comprehensive understanding of model performance depending on the problem type.

Real-Time Example: Loan Approval Prediction (Classification)

Business Scenario

A financial institution wants to predict whether a loan application should be approved or rejected based on customer attributes such as income, credit score, and loan history.

Step 1: Train the Model

Before evaluating a model, it must first be trained.

CALL sys.ML_TRAIN(
'bank_data.loan_train',
'Approved',
JSON_OBJECT('task', 'classification'),
@loan_model
);




Explanation

  • The model is trained using historical loan data
  • The target column is Approved
  • The task is defined as classification
  • The trained model reference is stored in @loan_model

Step 2: Evaluate the Model Using ML_SCORE

Once the model is trained, the next step is evaluation using a test dataset.

CALL sys.ML_SCORE(
'bank_data.loan_test',
'Approved',
@loan_model,
'accuracy',
@accuracy_score,
NULL
);SELECT @accuracy_score AS model_accuracy;




Sample Output

model_accuracy
--------------
0.872




Interpretation

An accuracy of 0.872 means that the model correctly predicts outcomes for 87.2% of the test data.

However, accuracy alone does not provide a complete picture, especially when data is imbalanced.

Improved Evaluation Using balanced_accuracy

CALL sys.ML_SCORE(
'bank_data.loan_test',
'Approved',
@loan_model,
'balanced_accuracy',
@bal_score,
NULL
);SELECT @bal_score AS balanced_accuracy;




Balanced accuracy accounts for class imbalance and provides a more reliable evaluation.

Real-Time Example: House Price Prediction (Regression)

Business Scenario

A real estate platform wants to predict house prices based on features such as location, size, and number of rooms.

Evaluate Regression Model

CALL sys.ML_SCORE(
'real_estate.house_test',
'price',
@price_model,
'r2',
@r2_score,
NULL
);SELECT @r2_score AS r2_score;




Interpretation

An R² score of 0.89 indicates that the model explains 89% of the variance in house prices, which reflects strong predictive performance.

Evaluate Error Metrics

CALL sys.ML_SCORE(
'real_estate.house_test',
'price',
@price_model,
'neg_mean_absolute_error',
@mae,
NULL
);SELECT @mae AS mean_absolute_error;




Error metrics help quantify prediction deviation and provide additional insight into model accuracy.

Understanding Evaluation Metrics

  • Accuracy
    Measures the overall correctness of predictions.
  • Precision
    Indicates how many predicted positive values are actually correct.
  • Recall
    Measures how many actual positive cases are correctly identified by the model.
  • F1 Score
    Balances precision and recall, making it useful for evaluating models on imbalanced datasets.

Practical Use Cases

Banking & Finance

  • Loan approval prediction
  • Credit risk analysis

E-commerce

  • Customer churn prediction
  • Recommendation systems

Healthcare

  • Disease prediction
  • Risk assessment models

Manufacturing

  • Predictive maintenance
  • Failure prediction

Best Practices for Model Evaluation

  • Always use a separate test dataset
  • Evaluate models using multiple metrics
  • Monitor performance regularly
  • Retrain models when accuracy drops
  • Store evaluation results for tracking

Final Thoughts

Model evaluation is an important step in the machine learning lifecycle. Without it, we cannot know if a model is reliable.

MySQL HeatWave ML makes this simple using sys.ML_SCORE, allowing you to evaluate models directly inside the database.

ML_TRAIN → ML_PREDICT → ML_SCORE

This approach reduces data movement, improves performance, and keeps everything secure and easy to manage. A model that is trained can give predictions, but a model that is evaluated can be trusted.

Discover more from Genexdbs

Subscribe now to keep reading and get access to the full archive.

Continue reading