An Introduction to Machine Learning & Logistic Regression

by Sajjad Ahmed Niloy

1. What is Machine Learning?

Teaching computers to learn from data.

Machine Learning is a field of artificial intelligence where we don't write explicit rules for a program. Instead, we provide a computer with a large amount of data and let it learn patterns, relationships, and "rules" on its own. The goal is to create a **model** that can make predictions or decisions about new, unseen data.

2. Types of Machine Learning

Different ways a computer can learn.

Machine Learning

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Supervised Learning

The computer learns from data that is already labeled with the correct answers. It's like studying with flashcards where the question is on one side and the answer is on the other.

Example Training Data:
Regression (Predicting House Price) Size_sqft | Bedrooms | Price ------------------------------------ 1500 | 3 | 300000 2200 | 4 | 450000 Classification (Predicting Spam) Email_Length | Has_Link | Is_Spam ------------------------------------ 1204 | 1 | 1 (Yes) 350 | 0 | 0 (No)

Unsupervised Learning

The computer is given unlabeled data and must find patterns or structures on its own. It's like being given a box of mixed fruits and asked to sort them into groups based on similarity, without being told what the fruits are.

Example Training Data (Note: No Label!):
Clustering (Grouping Customers) Annual_Income | Spending_Score ------------------------------------ 70000 | 85 35000 | 20

Reinforcement Learning

The computer learns by performing actions and receiving rewards or penalties. It's like teaching a dog a new trick: you give it a treat (reward) when it does it right, and nothing when it does it wrong. Over time, it learns which actions lead to the most rewards.

Example Training Data (Game AI):
Learning to Play a Game State | Action | Reward ------------------------------------------------ (Player at A) | Move_Up | +10 (Player near Ghost)| Move_Left| -500

3. Linear Regression in Depth

Predicting a numerical value like a house price.

This model finds the best linear relationship between input features and an output value. With multiple features, the formula expands:

$$ \hat{y} = w_1x_1 + w_2x_2 + w_3x_3 + b $$

Here, $\hat{y}$ is the predicted price, the $x$ values are our features, the $w$ values are the learned **weights**, and $b$ is the **bias**.

Phase 1: Training the Model

The model learns by minimizing error with **Gradient Descent**. Let's walk through the first step of training with our data to predict `Price`.

Step 1: Initialize. We start with a guess. Let's say weights $w_1=0, w_2=0, w_3=0$ and bias $b=0$. We also choose a **Learning Rate**, $\alpha=0.0001$.

Step 2: Predict (First Row). For the first training row ($x_1=3, x_2=1500, x_3=10$):
$\hat{y} = (0 \times 3) + (0 \times 1500) + (0 \times 10) + 0 = 0$.

Step 3: Calculate Error. $E = \text{Actual} - \text{Predicted} = 300 - 0 = 300$.

Step 4: Update Weights & Bias. We use the error to "nudge" the weights: $w_{\text{new}} = w_{\text{old}} - \alpha \times (-\text{feature} \times \text{error})$.
$w_1 \text{(new)} = 0 - 0.0001 \times (-3 \times 300) = 0.09$
$w_2 \text{(new)} = 0 - 0.0001 \times (-1500 \times 300) = 45.0$
$w_3 \text{(new)} = 0 - 0.0001 \times (-10 \times 300) = 0.3$
$b \text{(new)} = 0 - 0.0001 \times (-1 \times 300) = 0.03$

Step 5: Repeat. The algorithm repeats this for all training rows, over and over. After many passes, the final learned parameters will converge to optimal values.

Phase 2: Testing the Model

After training, the model has learned its final parameters. Let's say they are: $w_1 \approx 20.49$, $w_2 \approx 0.152$, $w_3 \approx -2.48$, and $b \approx 25.11$. Now we apply these to the **unseen** testing data.

Test Row: Bedrooms ($x_1$) = 5, Sqft ($x_2$) = 3000, Age ($x_3$) = 3. Actual Price = 580k.

Prediction Calculation:
$\hat{y} = (w_1 \times x_1) + (w_2 \times x_2) + (w_3 \times x_3) + b$
$\hat{y} = (20.49 \times 5) + (0.152 \times 3000) + (-2.48 \times 3) + 25.11$
$\hat{y} = 102.45 + 456 - 7.44 + 25.11 = 576.12$.
The predicted price is 576.12k. This is very close to the actual price of 580k!

Python Code: From Data to Prediction

import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # Create the full dataset using a Pandas DataFrame data = { 'Bedrooms': [3, 4, 2, 3, 2, 5, 3, 4, 5, 2], 'Sqft': [1500, 2200, 1100, 1800, 1200, 2800, 1600, 2500, 3000, 1000], 'Age': [10, 5, 12, 7, 20, 2, 8, 1, 3, 15], 'Price': [300, 450, 230, 380, 250, 550, 330, 510, 580, 210] # in 1000s } df = pd.DataFrame(data) # Separate features (X) and target (y) X = df[['Bedrooms', 'Sqft', 'Age']] y = df['Price'] # Split data into training (8 rows) and testing (2 rows) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create and train the model on the training set model = LinearRegression() model.fit(X_train, y_train) # Test the model on the unseen test set score = model.score(X_test, y_test) print(f"Model R-squared score on test data: {score:.2f}")

Visualization

This chart shows the relationship between `Sqft` (X-axis) and `Price` (Y-axis). The blue dots are the training data used to learn the line. The orange dots are the unseen test data. The green line is the final regression line our model learned.

4. Logistic Regression in Depth

Predicting a "Yes" or "No" category.

Logistic Regression is used when you want to predict which of two categories an item belongs to. Instead of a straight line, it finds an S-shaped curve (the Sigmoid function) that maps the input features to a probability between 0 and 1.

$$ P(y=1) = \frac{1}{1 + e^{-z}} \quad \text{where} \quad z = w_1x_1 + w_2x_2 + b $$

Here, $P(y=1)$ is the probability of the outcome being "Yes" (or 1), and $z$ is the output of a linear equation. This formula ensures the output is always between 0 and 1, making it perfect for predicting probabilities.

Phase 1: Training the Model

The model learns by minimizing its prediction errors. Let's walk through the first step of training with our data to predict if a student `Passed`.

Training Set (8 rows)
Hours_Studied ($x_1$) Previous_Score ($x_2$) Passed ($y$)
2.5 55 0
9.0 90 1
... ... ...

Step 1: Initialize. We start with a guess. Let's say weights $w_1=0, w_2=0$ and bias $b=0$. We choose a **Learning Rate**, $\alpha=0.1$.

Step 2: Calculate Linear Score (z). For the first row ($x_1=2.5, x_2=55$):
$z = (0 \times 2.5) + (0 \times 55) + 0 = 0$.

Step 3: Calculate Probability (p). We pass $z$ through the Sigmoid function:
$p = 1 / (1 + e^{-0}) = 1 / (1+1) = 0.5$. The model predicts a 50% chance of passing.

Step 4: Calculate Error. For logistic regression, the error is the difference between the actual label and the predicted probability: $E = y - p = 0 - 0.5 = -0.5$.

Step 5: Update Weights & Bias. The update rule is the same as linear regression: $w_{\text{new}} = w_{\text{old}} - \alpha \times (-\text{feature} \times \text{error})$.
$w_1 \text{(new)} = 0 - 0.1 \times (-2.5 \times -0.5) = -0.125$
$w_2 \text{(new)} = 0 - 0.1 \times (-55 \times -0.5) = -2.75$
$b \text{(new)} = 0 - 0.1 \times (-1 \times -0.5) = -0.05$

This process repeats for all training rows over many epochs until the weights converge.

Phase 2: Testing the Model

After training, let's assume the final learned parameters are: $w_1=1.5, w_2=0.1, b=-12$. We now apply these to our **unseen** test data.

Testing Set (2 rows)
Hours_Studied ($x_1$) Previous_Score ($x_2$) Passed ($y$)
7.0 85 1
6.0 88 1

Test Row 1: Hours_Studied ($x_1$) = 7.0, Previous_Score ($x_2$) = 85. Actual = 1 (Pass).

Prediction Calculation:
1. Calculate $z = (1.5 \times 7.0) + (0.1 \times 85) - 12 = 10.5 + 8.5 - 12 = 7.0$.
2. Calculate Probability $p = 1 / (1 + e^{-7.0}) \approx 0.999$.
3. Since $p > 0.5$, the model predicts **1 (Pass)**. This is a correct prediction!

$$ P(y=1) = \frac{1}{1 + e^{-z}} $$

Here, $P(y=1)$ is the probability of the outcome being "Yes" (or 1), and $z$ is the output of a linear equation ($z = mx + b$). This formula ensures the output is always between 0 and 1, making it perfect for predicting probabilities.

from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score import pandas as pd import numpy as np # Sample data: X = hours studied, y = passed (1) or failed (0) data = { 'Hours_Studied': [1, 2, 3, 4, 5, 6, 7, 8], 'Passed': [0, 0, 0, 1, 0, 1, 1, 1] } df = pd.DataFrame(data) # Separate features (X) and target (y) X = df[['Hours_Studied']] y = df['Passed'] # Split the data for training and testing X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create and train the model model = LogisticRegression() model.fit(X_train, y_train) # Predict if a student who studied 4.5 hours will pass hours_studied = pd.DataFrame({'Hours_Studied': [4.5]}) prediction = model.predict(hours_studied) print(f"Prediction for 4.5 hours (0=Fail, 1=Pass): {prediction[0]}")