m = amount of data
Sigmoid Disadvantage
For a very positive or negative value, gradient flats out in sigmoid
Results in very small gradients
Makes training very slow during gradient descent
Why deep learning algorithms are popular