Lesson 6: Supervised Learning Models

Supervised learning is the most widely-used machine learning paradigm. In this lesson, you'll master key algorithms including linear regression, logistic regression, decision trees, and random forests—each suited to different problem types.

Linear Regression

Linear regression is the simplest supervised learning algorithm. It fits a straight line through data to predict continuous values. The model assumes a linear relationship between input features and output.

When to use: Predicting continuous values like house prices, temperature, or sales. Best when relationship is roughly linear.

// Linear Regression with ML.NET
var mlContext = new MLContext();
var data = mlContext.Data.LoadFromTextFile("prices.csv", hasHeader: true);
var splitData = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);

var pipeline = mlContext.Transforms.NormalizeMinMax("Features")
    .Append(mlContext.Regression.Trainers.Sdca());
var model = pipeline.Fit(splitData.TrainSet);

var predictions = model.Transform(splitData.TestSet);
var metrics = mlContext.Regression.Evaluate(predictions);
Console.WriteLine($"R²: {metrics.RSquared:F4}");  // Higher is better
Console.WriteLine($"RMSE: {metrics.RootMeanSquaredError:F4}");  // Lower is better

Logistic Regression

Logistic regression extends linear regression for binary classification. Instead of predicting a value, it predicts the probability of a class (0 or 1). The output is transformed by a sigmoid function to produce probabilities between 0 and 1.

When to use: Binary classification like spam detection, disease diagnosis, or churn prediction.

// Binary Classification with Logistic Regression
var pipeline = mlContext.Transforms.NormalizeMinMax("Features")
    .Append(mlContext.BinaryClassification.Trainers
        .SdcaLogisticRegression("Label", "Features"));
var model = pipeline.Fit(splitData.TrainSet);

var predictions = model.Transform(splitData.TestSet);
var metrics = mlContext.BinaryClassification.Evaluate(predictions);
Console.WriteLine($"Accuracy: {metrics.Accuracy:F4}");
Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve:F4}");  // 1.0 = Perfect

Decision Trees

Decision trees learn by recursively splitting data on features that best separate classes. They create a tree-like structure of yes/no questions to make predictions. Easy to understand and visualize.

Advantages: Interpretable, handles non-linear relationships, requires little preprocessing

Disadvantages: Prone to overfitting, can be unstable

Decision Tree Example: Predicting Credit Approval

Income > $50k?
├─ Yes: Credit Score > 700?
│ ├─ Yes: ✅ APPROVE
│ └─ No: ❌ DENY
└─ No: ❌ DENY

// Decision Tree Classification
var pipeline = mlContext.Transforms.NormalizeMinMax("Features")
    .Append(mlContext.MulticlassClassification.Trainers
        .LightGbm("Label", "Features", numberOfLeaves: 4, minimumExampleCountPerLeaf: 2));
var model = pipeline.Fit(splitData.TrainSet);

Random Forests

Random forests combine predictions from many decision trees. Each tree is trained on a random subset of data and features. Final prediction is the average (regression) or majority vote (classification) from all trees.

Advantages: More stable than single trees, handles non-linear data, reduces overfitting, provides feature importance

Best for: Most general problems. Often the first choice for tabular data.

// Random Forest with ML.NET (via LightGBM)
var pipeline = mlContext.Transforms.NormalizeMinMax("Features")
    .Append(mlContext.BinaryClassification.Trainers.FastForest(
        labelColumnName: "Label",
        featureColumnName: "Features",
        numberOfTrees: 100,
        numberOfLeaves: 20));
var model = pipeline.Fit(splitData.TrainSet);

Comparing Supervised Algorithms

Linear Regression

Speed: Very Fast | Accuracy: Medium | Interpretability: Excellent | Non-linearity: No

Logistic Regression

Speed: Very Fast | Accuracy: Medium | Interpretability: Good | Probability Scores: Yes

Decision Trees

Speed: Fast | Accuracy: Medium-High | Interpretability: Excellent | Overfitting Risk: High

Random Forests

Speed: Medium | Accuracy: High | Interpretability: Medium | Scalability: Good

Evaluating Classification Models

Different metrics evaluate different aspects of classification performance:

Accuracy: Overall correct predictions. Good for balanced datasets.
Precision: Of predicted positives, how many are actually correct? Important when false positives are costly.
Recall (Sensitivity): Of actual positives, how many did we find? Important when false negatives are costly.
F1-Score: Harmonic mean of precision and recall. Good for imbalanced datasets.
AUC-ROC: Area under the ROC curve. Measures performance across all thresholds (1.0 = perfect).

// Evaluating classification performance
var metrics = mlContext.BinaryClassification.Evaluate(predictions);
Console.WriteLine($"Accuracy: {metrics.Accuracy:F4}");
Console.WriteLine($"Precision: {metrics.PositivePrecision:F4}");
Console.WriteLine($"Recall: {metrics.PositiveRecall:F4}");
Console.WriteLine($"F1: {(2 * metrics.PositivePrecision * metrics.PositiveRecall) / (metrics.PositivePrecision + metrics.PositiveRecall):F4}");
Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve:F4}");

🧠 Quick Check — Lesson 6

Which algorithm is best suited for predicting a house price given square footage and location?

🧠 Quick Check — Lesson 6

What is the main advantage of Random Forests over a single Decision Tree?

Lesson Summary

✅

Linear Regression predicts continuous values assuming linear relationships. Fast and interpretable but limited to linear patterns.

✅

Logistic Regression predicts class probabilities for binary classification via sigmoid transformation.

✅

Decision Trees recursively split on features to create interpretable rules but prone to overfitting.

✅

Random Forests combine multiple trees for better stability and accuracy. Often the best choice for tabular data.

✅

Use appropriate evaluation metrics: accuracy, precision, recall, F1-score, and AUC-ROC depending on your problem.

Series Complete

You’ve finished the AI & Machine Learning series.

View AI & ML Overview