by Sajjad Ahmed Niloy
Teaching computers to learn from data.
Machine Learning is a field of artificial intelligence where we don't write explicit rules for a program. Instead, we provide a computer with a large amount of data and let it learn patterns, relationships, and "rules" on its own. The goal is to create a **model** that can make predictions or decisions about new, unseen data.
Different ways a computer can learn.
The computer learns from data that is already labeled with the correct answers. It's like studying with flashcards where the question is on one side and the answer is on the other.
The computer is given unlabeled data and must find patterns or structures on its own. It's like being given a box of mixed fruits and asked to sort them into groups based on similarity, without being told what the fruits are.
The computer learns by performing actions and receiving rewards or penalties. It's like teaching a dog a new trick: you give it a treat (reward) when it does it right, and nothing when it does it wrong. Over time, it learns which actions lead to the most rewards.
Predicting a numerical value like a house price.
This model finds the best linear relationship between input features and an output value. With multiple features, the formula expands:
Here, $\hat{y}$ is the predicted price, the $x$ values are our features, the $w$ values are the learned **weights**, and $b$ is the **bias**.
The model learns by minimizing error with **Gradient Descent**. Let's walk through the first step of training with our data to predict `Price`.
Step 1: Initialize. We start with a guess. Let's say weights $w_1=0, w_2=0, w_3=0$ and bias $b=0$. We also choose a **Learning Rate**, $\alpha=0.0001$.
Step 2: Predict (First Row). For the first
training row ($x_1=3, x_2=1500, x_3=10$):
$\hat{y} = (0 \times 3) + (0 \times 1500) + (0
\times 10) + 0 = 0$.
Step 3: Calculate Error. $E = \text{Actual} - \text{Predicted} = 300 - 0 = 300$.
Step 4: Update Weights & Bias. We use the
error to "nudge" the weights: $w_{\text{new}} = w_{\text{old}} - \alpha \times
(-\text{feature} \times \text{error})$.
$w_1 \text{(new)} = 0 - 0.0001 \times (-3 \times 300) = 0.09$
$w_2 \text{(new)} = 0 - 0.0001 \times (-1500 \times 300) = 45.0$
$w_3 \text{(new)} = 0 - 0.0001 \times (-10 \times 300) = 0.3$
$b \text{(new)} = 0 - 0.0001 \times (-1 \times 300) = 0.03$
Step 5: Repeat. The algorithm repeats this for all training rows, over and over. After many passes, the final learned parameters will converge to optimal values.
After training, the model has learned its final parameters. Let's say they are: $w_1 \approx 20.49$, $w_2 \approx 0.152$, $w_3 \approx -2.48$, and $b \approx 25.11$. Now we apply these to the **unseen** testing data.
Test Row: Bedrooms ($x_1$) = 5, Sqft ($x_2$) = 3000, Age ($x_3$) = 3. Actual Price = 580k.
Prediction Calculation:
$\hat{y} = (w_1 \times x_1) + (w_2 \times x_2) + (w_3 \times x_3) + b$
$\hat{y} = (20.49 \times 5) + (0.152 \times 3000) + (-2.48 \times 3) + 25.11$
$\hat{y} = 102.45 + 456 - 7.44 + 25.11 = 576.12$.
The predicted price is 576.12k. This is very close to the actual price of 580k!
This chart shows the relationship between `Sqft` (X-axis) and `Price` (Y-axis). The blue dots are the training data used to learn the line. The orange dots are the unseen test data. The green line is the final regression line our model learned.
Predicting a "Yes" or "No" category.
Logistic Regression is used when you want to predict which of two categories an item belongs to. Instead of a straight line, it finds an S-shaped curve (the Sigmoid function) that maps the input features to a probability between 0 and 1.
Here, $P(y=1)$ is the probability of the outcome being "Yes" (or 1), and $z$ is the output of a linear equation. This formula ensures the output is always between 0 and 1, making it perfect for predicting probabilities.
The model learns by minimizing its prediction errors. Let's walk through the first step of training with our data to predict if a student `Passed`.
Hours_Studied ($x_1$) | Previous_Score ($x_2$) | Passed ($y$) |
---|---|---|
2.5 | 55 | 0 |
9.0 | 90 | 1 |
... | ... | ... |
Step 1: Initialize. We start with a guess. Let's say weights $w_1=0, w_2=0$ and bias $b=0$. We choose a **Learning Rate**, $\alpha=0.1$.
Step 2: Calculate Linear Score (z). For the
first row ($x_1=2.5, x_2=55$):
$z = (0 \times 2.5) + (0 \times 55) + 0 = 0$.
Step 3: Calculate Probability (p). We pass $z$
through the Sigmoid function:
$p = 1 / (1 + e^{-0}) = 1 / (1+1) = 0.5$. The model
predicts a 50% chance of passing.
Step 4: Calculate Error. For logistic regression, the error is the difference between the actual label and the predicted probability: $E = y - p = 0 - 0.5 = -0.5$.
Step 5: Update Weights & Bias. The update rule
is the same as linear regression: $w_{\text{new}} = w_{\text{old}} - \alpha \times
(-\text{feature} \times \text{error})$.
$w_1 \text{(new)} = 0 - 0.1 \times (-2.5 \times -0.5) = -0.125$
$w_2 \text{(new)} = 0 - 0.1 \times (-55 \times -0.5) = -2.75$
$b \text{(new)} = 0 - 0.1 \times (-1 \times -0.5) = -0.05$
This process repeats for all training rows over many epochs until the weights converge.
After training, let's assume the final learned parameters are: $w_1=1.5, w_2=0.1, b=-12$. We now apply these to our **unseen** test data.
Hours_Studied ($x_1$) | Previous_Score ($x_2$) | Passed ($y$) |
---|---|---|
7.0 | 85 | 1 |
6.0 | 88 | 1 |
Test Row 1: Hours_Studied ($x_1$) = 7.0, Previous_Score ($x_2$) = 85. Actual = 1 (Pass).
Prediction Calculation:
1. Calculate $z = (1.5 \times 7.0) + (0.1 \times 85) - 12 = 10.5 + 8.5 - 12 = 7.0$.
2. Calculate Probability $p = 1 / (1 + e^{-7.0}) \approx 0.999$.
3. Since $p > 0.5$, the model predicts **1 (Pass)**. This is a correct prediction!
Here, $P(y=1)$ is the probability of the outcome being "Yes" (or 1), and $z$ is the output of a linear equation ($z = mx + b$). This formula ensures the output is always between 0 and 1, making it perfect for predicting probabilities.