Machine Learning Specialization
Week01
Cost Functions
We call a cost function 'cost' because the model weights are costing us this much error

using different cost functions models cost graph differently allowing us to optimize better


Why use Squared error instead of Absolute error?
When using squared error, the outliers results in more taller bumps in cost function graph which results is faster convergence
Absolute mean square error partial derivate:

Gradient descent visualization

Week02
Linear regression weight update equation

Multiple linear regression weigh update equation

Normal equation
The normal equation is an alternative technique to gradient descent for linear regression. It provides a closed-form solution for the weights (w) and bias (b) without iterations.
computationally expensive when the number of features is large (> 10,000).
Normal equation uses analytical approach when solving linear regression problems
Whereas gradient descent is a numerical approach
Feature scaling techniques
How to test if your model has converged?

Week03
Why logistic regression classifies despite it being called regression?
Regression: act of going back to previous state
We are still modelling relationship between independent and dependent variables, instead of predicting continuous outcome, we are predicting probability of binary outcome


If we choose different z, we can map complex decision boundries

Logistic regression loss function

Squared error function give non convex cost function

Tou we use Logistic loss function:


Cost function is average loss of all training examples
We can simplify above stepwise function into this:


Therefore cost function looks like this:


Variance & Bias
Goal is to find balance between bias and variance
Highly biased ML model makes wrong assumptions leading to underfitting of training data
If our ML model has high bias, it might make overly simplistic assumptions such as:
Model is biased towards specific features/patterns in data and makes predictions solely on those biases, rather than accurately capturing true relationship between features and target variable

Model with low bias is not biased towards overly simplistic assumptions. A very low bias can introduce overfitting

How to solve overfitting?
Regularization
Make some w's nearer to zero

If we were to map the higher order polynomial onto dataset, we would get obnoxious mapping. New cost function:

The regularization term adds a penalty proportional to sum of squares of weights
During optimization(gradient descent), this penalty encourages algorithm to find a solution where weights are smaller, this is because larger weights increase the cost, and minimizing the cost function will shrink the weights
Regularization on Linear regression

Regularization on logistic regression
