1️⃣ CS Sem 1 2️⃣ CS Sem 2 3️⃣ CS Sem 3 4️⃣ CS Sem 4 5️⃣ CS Sem 5 6️⃣ CS Sem 6 💡 IT Branch 📡 ECE Branch 🏫 Class 9 🎒 Class 10 🔬 Class 11 🧪 Class 12 🎓 MCA / PG 📜 PhD / Research

Computer ScienceSEM-6Machine Learning

Machine Learning Complete Notes — B.Tech CS Sem 6

✍️ WohoTech Team📅 Last Updated: 2026-03-11📄 72 pages · 3.5 MB

Machine Learning kya hai?

Machine Learning is a subset of AI where systems learn from data to make decisions without being explicitly programmed.

Traditional Programming:  Rules + Data → Output
Machine Learning:         Data + Output → Rules (Model)

Types of ML:

Supervised Learning: Labeled data, predict output
Unsupervised Learning: Unlabeled data, find patterns
Semi-Supervised: Mix of labeled + unlabeled
Reinforcement Learning: Agent learns via reward/punishment

1. ML Pipeline

Machine Learning Pipeline

Complete Workflow:

Data Collection — Web scraping, APIs, sensors, surveys
Data Preprocessing — Handle missing values, outliers, encoding
Feature Engineering — Select, transform, create features
Model Selection — Choose algorithm based on problem type
Training — Fit model on training data
Evaluation — Test on held-out data with metrics
Deployment — API/service, monitor performance

2. Data Preprocessing

Handling Missing Values

import pandas as pd
from sklearn.impute import SimpleImputer

# Remove rows with missing data
df.dropna(subset=['salary'])

# Fill with mean (numerical)
df['age'].fillna(df['age'].mean(), inplace=True)

# Imputer
imputer = SimpleImputer(strategy='mean')  # or 'median', 'most_frequent'
X = imputer.fit_transform(X)

Feature Scaling

Why? Algorithms like KNN, SVM, Neural Networks sensitive to scale.

from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

# Standardization: (x - mean) / std → mean=0, std=1
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_train)

# Normalization: (x - min) / (max - min) → [0, 1]
scaler = MinMaxScaler()

# Robust (outlier-resistant): (x - median) / IQR
scaler = RobustScaler()

Encoding Categorical Variables

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# Label Encoding (ordinal: low < medium < high)
le = LabelEncoder()
df['grade'] = le.fit_transform(df['grade'])  # A=0, B=1, C=2

# One-Hot Encoding (nominal: no order)
pd.get_dummies(df, columns=['city'])
# Delhi → [1,0,0], Mumbai → [0,1,0], Pune → [0,0,1]

3. Supervised Learning

Linear Regression

Predict continuous values. Best-fit line minimizing MSE.

y = w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ
Cost Function: MSE = (1/n)Σ(y_pred - y_actual)²

Gradient Descent Update:

w = w - α * ∂(MSE)/∂w     (α = learning rate)

Metrics: MAE, MSE, RMSE, R² (coefficient of determination)

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Logistic Regression

Predict binary classes (0 or 1). Uses sigmoid function.

σ(z) = 1 / (1 + e^(-z))    Output: probability [0, 1]
Cost: Binary Cross-Entropy (Log Loss)
Decision boundary: P(y=1) > 0.5 → class 1

Decision Tree

Tree-like model of decisions. Splits data on feature that maximizes information gain.

Information Gain = Entropy(parent) - Σ(weighted Entropy(child))
Entropy(S) = -Σ p_i * log₂(p_i)
Gini Impurity = 1 - Σ p_i²      (CART algorithm uses this)

Hyperparameters:

max_depth — prevent overfitting
min_samples_split — minimum samples to split node
min_samples_leaf — minimum samples at leaf

Random Forest

Ensemble of Decision Trees using Bagging (Bootstrap Aggregating).

1. Sample n subsets (with replacement) from training data
2. Train one Decision Tree on each subset
   (with random subset of features at each split)
3. Predict by majority vote (classification) or average (regression)

Why better? Reduces variance (overfitting) by averaging uncorrelated trees.

SVM (Support Vector Machine)

Find hyperplane that maximally separates classes.

Margin = 2 / ||w||    (maximize margin)
Support Vectors = points closest to hyperplane

Kernel Trick (for non-linear data):
- Linear Kernel: K(x,y) = x·y
- RBF/Gaussian: K(x,y) = exp(-γ||x-y||²)
- Polynomial: K(x,y) = (x·y + c)^d

C (regularization): Low C = soft margin (more misclassification allowed)
                    High C = hard margin (tries to classify all correctly → overfitting)

KNN (K-Nearest Neighbors)

Classify based on k nearest data points (majority vote).

Distance: Euclidean = √Σ(xᵢ - yᵢ)²
         Manhattan = Σ|xᵢ - yᵢ|

Small k → complex boundary (overfitting)
Large k → smooth boundary (underfitting)
Optimal k → use cross-validation (typically odd, √n)

4. Unsupervised Learning

K-Means Clustering

Algorithm:
1. Initialize k centroids randomly
2. Assign each point to nearest centroid
3. Recompute centroids as mean of cluster
4. Repeat until convergence

Choosing k: Elbow Method (plot inertia vs k, find elbow)
Limitation: Sensitive to outliers, assumes spherical clusters

Hierarchical Clustering

Agglomerative (bottom-up): Start with n clusters, merge
Divisive (top-down): Start with 1, split
Produces dendrogram — no need to specify k upfront

PCA (Principal Component Analysis)

Dimensionality Reduction — transform data to fewer dimensions while retaining variance.

1. Standardize data
2. Compute covariance matrix
3. Find eigenvalues + eigenvectors
4. Select top k eigenvectors (principal components)
5. Project data onto new space

Explained variance ratio: how much variance each PC captures
Choose k where cumulative variance > 95%

5. Neural Networks

Neural Network Architecture

Architecture

Input Layer → Hidden Layers → Output Layer
Each neuron: z = Σ(wᵢxᵢ) + b, output = activation(z)

Activation Functions

| Function | Formula | Use Case | |---|---|---| | Sigmoid | 1/(1+e^-z) | Binary output | | Tanh | (e^z - e^-z)/(e^z + e^-z) | Hidden layers | | ReLU | max(0, z) | Hidden layers (most common) | | Leaky ReLU | max(0.01z, z) | Fixes dead neurons | | Softmax | e^zᵢ/Σe^zⱼ | Multi-class output |

Backpropagation

Forward Pass: Compute prediction (y_hat)
Loss: L = -(y*log(y_hat) + (1-y)*log(1-y_hat))
Backward Pass: Compute gradients ∂L/∂w using chain rule
Update: w = w - α * ∂L/∂w

Regularization Techniques

L1 Regularization: Loss + λΣ|w|  → Sparse weights, feature selection
L2 Regularization: Loss + λΣw²   → Small weights, prevents large weights
Dropout: Randomly zero out neurons during training (p=0.5 typical)
Batch Normalization: Normalize layer inputs for stable training
Early Stopping: Monitor validation loss, stop when it starts increasing

6. Evaluation Metrics

Classification Metrics

Confusion Matrix:
                Predicted Positive  Predicted Negative
Actual Positive:    TP                  FN
Actual Negative:    FP                  TN

Accuracy  = (TP + TN) / Total
Precision = TP / (TP + FP)    (out of predicted positive, how many actually positive)
Recall    = TP / (TP + FN)    (out of actual positive, how many detected)
F1 Score  = 2 * (Precision * Recall) / (Precision + Recall)

When to use what?

Accuracy: Balanced classes
Precision: Minimize False Positives (spam detection — don't mark good email as spam)
Recall: Minimize False Negatives (cancer detection — don't miss real cases)
F1: Imbalanced classes

ROC-AUC: Area Under ROC Curve. AUC=1 perfect, AUC=0.5 random.

Regression Metrics

MAE = (1/n)Σ|y - ŷ|             (Mean Absolute Error)
MSE = (1/n)Σ(y - ŷ)²           (Mean Squared Error)
RMSE = √MSE                      (Root MSE, same unit as y)
R² = 1 - SS_res/SS_tot          (1 = perfect, 0 = mean only, can be negative)

7. Ensemble Methods

Bagging (Bootstrap Aggregating)

Train multiple models on different random subsets (with replacement)
Reduces variance (overfitting)
Example: Random Forest

Boosting

Train models sequentially, each focusing on previous errors
Reduces bias (underfitting)
Examples: AdaBoost, Gradient Boosting, XGBoost, LightGBM

Stacking

Train multiple different models (base learners)
Train a meta-model on their predictions
Combines strengths of different algorithms

Important Interview Questions

Supervised vs Unsupervised vs Reinforcement Learning?
Overfitting kya hai, kaise handle karte hain?
Bias-Variance tradeoff explain karo
Cross-validation kya hai, k-fold kaise kaam karta hai?
Feature selection vs Feature extraction?
Decision Tree vs Random Forest vs Gradient Boosting?
SVM mein kernel kya hota hai?
Gradient Descent types (Batch, SGD, Mini-batch)?
Learning rate kitna hona chahiye?
Imbalanced dataset kaise handle karte hain? (SMOTE, class weights, undersampling)
Precision vs Recall vs F1 — kab kya use karein?
ROC curve kya hai?
Regularization kyon zaroori hai?
PCA kab use karein?
What is transfer learning?

📄 Download Complete PDF Notes

Complete ML notes: supervised/unsupervised learning, regression, classification, SVM, decision trees, neural networks, evaluation metrics, and interview questions for B.Tech CS Sem 6.

72 pages · 3.5 MB · Updated 2026-03-11

Free Download ↓

❓ Frequently Asked Questions

Supervised aur Unsupervised learning mein kya fark hai?▾

Supervised learning mein labeled data hoti hai (input + correct output). Model output predict karna seekhta hai. Unsupervised mein unlabeled data se patterns dhundhte hain (clustering, dimensionality reduction).

Overfitting kya hai, kaise rokein?▾

Model training data pe bahut acha karta hai but test data pe bura — overfitting. Rokne ke liye: regularization (L1/L2), dropout, cross-validation, more training data, simpler model.

Bias-Variance tradeoff kya hai?▾

High bias = underfitting (simple model). High variance = overfitting (complex model). Sweet spot: balance dono. Bagging reduces variance, Boosting reduces bias.

Random Forest kyon better hai Decision Tree se?▾

Random Forest multiple trees ka ensemble hai — reduces overfitting through bagging + random feature selection. Decision Tree single tree hai, easily overfits.

Gradient Descent kya hai?▾

Optimization algorithm jo model parameters (weights) ko update karta hai loss function minimize karne ke liye. Loss ke negative gradient direction mein small steps leta hai.

📌 Related Notes

csSEM-6

Compiler Design — Complete Notes CS Sem 6

Compiler Design

CSSEM-6

Software Engineering — Complete Notes with SDLC, Agile, Testing

Software Engineering

CSSEM-4

DBMS Complete Notes — B.Tech CS Sem 4

Database Management Systems

CSSEM-1

Engineering Mathematics 1 — Calculus, Matrices, Differential Equations

Engineering Mathematics 1