Machine Learning Specialization

May 10, 2026·Supervised machine learning: Regression and Classification·12 min read

Pages

Week01

Cost Functions

We call a cost function 'cost' because the model weights are costing us this much error

using different cost functions models cost graph differently allowing us to optimize better

Why use Squared error instead of Absolute error?

When using squared error, the outliers results in more taller bumps in cost function graph which results is faster convergence

Absolute mean square error partial derivate:

Gradient descent visualization

Week02

Linear regression weight update equation

Multiple linear regression weigh update equation

Normal equation

The normal equation is an alternative technique to gradient descent for linear regression. It provides a closed-form solution for the weights (w) and bias (b) without iterations.

computationally expensive when the number of features is large (> 10,000).

Normal equation uses analytical approach when solving linear regression problems

Whereas gradient descent is a numerical approach

Feature scaling techniques

How to test if your model has converged?

Week03

Why logistic regression classifies despite it being called regression?

Regression: act of going back to previous state

We are still modelling relationship between independent and dependent variables, instead of predicting continuous outcome, we are predicting probability of binary outcome

If we choose different z, we can map complex decision boundries

Logistic regression loss function

Squared error function give non convex cost function

Tou we use Logistic loss function:

Cost function is average loss of all training examples

We can simplify above stepwise function into this:

Therefore cost function looks like this:

Variance & Bias

Goal is to find balance between bias and variance

Highly biased ML model makes wrong assumptions leading to underfitting of training data

If our ML model has high bias, it might make overly simplistic assumptions such as:

Model is biased towards specific features/patterns in data and makes predictions solely on those biases, rather than accurately capturing true relationship between features and target variable

Model with low bias is not biased towards overly simplistic assumptions. A very low bias can introduce overfitting

How to solve overfitting?

Regularization

Make some w's nearer to zero

If we were to map the higher order polynomial onto dataset, we would get obnoxious mapping. New cost function:

The regularization term adds a penalty proportional to sum of squares of weights

During optimization(gradient descent), this penalty encourages algorithm to find a solution where weights are smaller, this is because larger weights increase the cost, and minimizing the cost function will shrink the weights

Regularization on Linear regression

Regularization on logistic regression