Machine Learning

The process of training machines; teaching computers to learn from data and detect patterns. It supports decision making.
Consists of exploratory analysis(10%), data cleaning (20%), feature engineering(25%), algorithm selection(10%), and model training(15%).

How to Pick ML Algorithms

Algorithms are chosen based on intuition and practical benefits, rather than math and theory.

data scientists actually do spend most their time on the earlier steps:

    - Exploring the data.
    - Cleaning the data.
    - Engineering new features.

Again, that’s because better data beats fancier algorithms.

Training the Model

datasets should be split such that the largest portion is used for training the model, and the remaining smaller portion is used to test the model. It is important to mention that models are tested for their ability to predict new, unseen data. Therefore, different data sets should be used for the purpose of testing, as to have reliable models and avoid having overfit models.

Hyperparameters

Tuning “training” the model basically means tuning the hyperparatmeters
heyperparameters are different from model parameters, in the fact that they cannot be learned directly from the training data.
Model parameters:

learned attributes that define individual models.
Hyperparameters

express “higher-level” structural settings for algorithms.

Cross-Validation

a method for getting a reliable estimate of model performance using only your training data.

Select Winning Model

Selecting the best performing model using testing datasets, according to perfromance metrics: - For regression tasks, we recommend Mean Squared Error (MSE) or Mean Absolute Error (MAE). (Lower values are better) - For classification tasks, we recommend Area Under ROC Curve (AUROC). (Higher values are better)