
Explanation:
validation.
In the Microsoft Azure AI Fundamentals (AI-900) study materials, a key concept in machine learning model development is splitting data into subsets for training, validation, and testing. A randomly extracted subset of data from a dataset is most commonly used for validation - that is, for evaluating the performance of the model during or after training.
Here's how this process works:
* Training set - This portion of the dataset is used to train the machine learning model. The model learns patterns, relationships, and parameters from this data.
* Validation set - This is a randomly selected subset (separate from training data) used to fine-tune model hyperparameters and evaluate how well the model generalizes to unseen data. It helps detect overfitting
- when the model performs well on training data but poorly on new data.
* Test set - A final, untouched dataset used to measure the model's real-world performance after all training and tuning are complete.
By reserving a random subset for validation, data scientists ensure that the model's performance metrics reflect generalization, not memorization of the training data.
Let's review the incorrect options:
* Algorithms - These are the mathematical frameworks or methods used to build models (e.g., decision trees, neural networks). They are not data subsets.
* Features - These are input variables (attributes) used by the model, not randomly selected data subsets.
* Labels - These are target values or outcomes the model predicts; again, not data subsets.
Therefore, in alignment with Azure AI-900's machine learning fundamentals, the correct completion is:
# "A randomly extracted subset of data from a dataset is commonly used for validation of the model."