Introduction
Machine learning capabilities inside MySQL HeatWave allow you to build, train, and evaluate models directly within the database environment. Traditionally, model evaluation required exporting data into external tools such as Python or specialized ML platforms. This approach often introduced additional complexity, data movement overhead, and security concerns.
This approach eliminates the need for external tools like Python, enabling full machine learning workflows directly inside MySQL.
In my previous blog, How to Use ML_PREDICT in MySQL AI, we explored how to generate predictions directly within MySQL using trained models. However, generating predictions is only part of the machine learning lifecycle.
With MySQL HeatWave ML, the entire lifecycle can be handled in one place:
- Training models
- Generating predictions
- Evaluating model performance
Among these, evaluation is a critical step because it determines whether a model can be trusted in real-world scenarios. A model that produces predictions without validation can lead to inaccurate decisions.
To solve this, MySQL provides the sys.ML_SCORE procedure, which allows you to evaluate model performance directly within the database.
MySQL AI Machine Learning Lifecycle

The complete machine learning workflow in MySQL follows a structured pipeline:
Raw Data → ML_TRAIN → Model → ML_PREDICT → Predictions → ML_SCORE → Evaluation
Each step has a defined role:
- ML_TRAIN builds the model using historical data
- ML_PREDICT generates predictions based on the model
- ML_SCORE evaluates how accurate those predictions are
This lifecycle ensures that models are not only built but also validated before being used in production.
What is sys.ML_SCORE?
The sys.ML_SCORE procedure is a built-in MySQL HeatWave ML routine designed to evaluate machine learning models.
It performs the following operations:
- Takes a test dataset containing actual (ground truth) values
- Uses the trained model to generate predictions
- Compares predicted values with actual values
- Calculates a performance score based on the selected metric
This evaluation process helps determine how well the model generalizes to unseen data.
Why Model Evaluation is Important
Model evaluation is not optional; it is essential for ensuring reliability.
Without evaluation:
- The accuracy of predictions cannot be verified
- Errors remain hidden
- Poor models may be deployed into production
Even a model with high accuracy can fail in critical scenarios. For example, in classification problems with imbalanced data, accuracy alone may be misleading.
ML_SCORE provides deeper insights through multiple evaluation metrics, allowing you to assess model quality from different perspectives.
Syntax of sys.ML_SCORE
CALL sys.ML_SCORE( 'schema_name.table_name', 'target_column_name', @model_handle, 'metric_name', @score, NULL);SELECT @score AS model_score;
Explanation of Parameters:
- schema_name.table_name → Test dataset
- target_column_name → Actual values (ground truth)
- @model_handle → Model generated using ML_TRAIN
- metric_name → Evaluation metric (accuracy, r2, etc.)
- @score → Output variable storing the result
Supported Evaluation Metrics
Classification Metrics
- accuracy
- balanced_accuracy
- precision / precision_macro / precision_micro / precision_weighted
- recall / recall_macro / recall_micro / recall_weighted
- f1 / f1_macro / f1_micro / f1_weighted
Regression Metrics
- r2
- neg_mean_squared_error
- neg_mean_absolute_error
- neg_mean_squared_log_error
- neg_median_absolute_error
These metrics provide a comprehensive understanding of model performance depending on the problem type.
Real-Time Example: Loan Approval Prediction (Classification)
Business Scenario
A financial institution wants to predict whether a loan application should be approved or rejected based on customer attributes such as income, credit score, and loan history.
Step 1: Train the Model
Before evaluating a model, it must first be trained.
CALL sys.ML_TRAIN( 'bank_data.loan_train', 'Approved', JSON_OBJECT('task', 'classification'), @loan_model);
Explanation
- The model is trained using historical loan data
- The target column is Approved
- The task is defined as classification
- The trained model reference is stored in @loan_model
Step 2: Evaluate the Model Using ML_SCORE
Once the model is trained, the next step is evaluation using a test dataset.
CALL sys.ML_SCORE( 'bank_data.loan_test', 'Approved', @loan_model, 'accuracy', @accuracy_score, NULL);SELECT @accuracy_score AS model_accuracy;
Sample Output
model_accuracy--------------0.872
Interpretation
An accuracy of 0.872 means that the model correctly predicts outcomes for 87.2% of the test data.
However, accuracy alone does not provide a complete picture, especially when data is imbalanced.
Improved Evaluation Using balanced_accuracy
CALL sys.ML_SCORE( 'bank_data.loan_test', 'Approved', @loan_model, 'balanced_accuracy', @bal_score, NULL);SELECT @bal_score AS balanced_accuracy;
Balanced accuracy accounts for class imbalance and provides a more reliable evaluation.
Real-Time Example: House Price Prediction (Regression)
Business Scenario
A real estate platform wants to predict house prices based on features such as location, size, and number of rooms.
Evaluate Regression Model
CALL sys.ML_SCORE( 'real_estate.house_test', 'price', @price_model, 'r2', @r2_score, NULL);SELECT @r2_score AS r2_score;
Interpretation
An R² score of 0.89 indicates that the model explains 89% of the variance in house prices, which reflects strong predictive performance.
Evaluate Error Metrics
CALL sys.ML_SCORE( 'real_estate.house_test', 'price', @price_model, 'neg_mean_absolute_error', @mae, NULL);SELECT @mae AS mean_absolute_error;
Error metrics help quantify prediction deviation and provide additional insight into model accuracy.
Understanding Evaluation Metrics

- Accuracy
Measures the overall correctness of predictions. - Precision
Indicates how many predicted positive values are actually correct. - Recall
Measures how many actual positive cases are correctly identified by the model. - F1 Score
Balances precision and recall, making it useful for evaluating models on imbalanced datasets.
Practical Use Cases
Banking & Finance
- Loan approval prediction
- Credit risk analysis
E-commerce
- Customer churn prediction
- Recommendation systems
Healthcare
- Disease prediction
- Risk assessment models
Manufacturing
- Predictive maintenance
- Failure prediction
Best Practices for Model Evaluation
- Always use a separate test dataset
- Evaluate models using multiple metrics
- Monitor performance regularly
- Retrain models when accuracy drops
- Store evaluation results for tracking
Final Thoughts
Model evaluation is an important step in the machine learning lifecycle. Without it, we cannot know if a model is reliable.
MySQL HeatWave ML makes this simple using sys.ML_SCORE, allowing you to evaluate models directly inside the database.
ML_TRAIN → ML_PREDICT → ML_SCORE
This approach reduces data movement, improves performance, and keeps everything secure and easy to manage. A model that is trained can give predictions, but a model that is evaluated can be trusted.