Linear Regression ⭐⭐

Directory Structure 📁

│   collinear_dataset.py     
│   compare_time.py
│   contour_plot.gif
│   degreevstheta.py
│   gif1.gif
│   gif2.gif
│   linear_regression_test.py
│   line_plot.gif
│   Makefile
│   metrics.py
│   Normal_regression.py     
│   plot_contour.py
│   poly_features_test.py    
│   README.md
│   surface_plot.gif
│
├───images
│       q5plot.png
│       q6plot.png
│       q8features.png       
│       q8samples.png
│
├───linearRegression
│   │   linearRegression.py
│   │   __init__.py
│   │
│   └───__pycache__
│           linearRegression.cpython-37.pyc
│           __init__.cpython-37.pyc
│
├───preprocessing
│   │   polynomial_features.py
│   │   __init__.py
│   │
│   └───__pycache__
│           polynomial_features.cpython-37.pyc
│           __init__.cpython-37.pyc
│
├───temp_images
└───__pycache__
        metrics.cpython-37.pyc

Instructions to run 🏃

make help
make regression
make polynomial_features
make normal_regression
make poly_theta
make contour
make compare_time
make collinear

Stochastic GD (Batch size = 1) ☝️

Learning rate type = constant RMSE: 0.9119624181584616 MAE: 0.7126923090787688
Learning rate type = inverse RMSE: 0.9049599308106121 MAE: 0.7098334683036919

Vanilla GD (Batch size = N) ✋

Learning rate type = constant RMSE: 0.9069295672718122 MAE: 0.7108301179089876
Learning rate type = inverse RMSE: 0.9607329070540364 MAE: 0.7641616657610887

Mini Batch GD (Batch size between 1 and N(5)) 🤘

Learning rate type = constant RMSE: 0.9046502501334435 MAE: 0.7102161700019564
Learning rate type = inverse RMSE: 0.9268357442221973 MAE: 0.7309246821952116

Polynomial Feature Transformation 🔰

The output [[1, 2]] is [[1, 1, 2, 1, 2, 4]]
The output for [[1, 2, 3]] is [[1, 1, 2, 3, 1, 2, 3, 4, 6, 9]]
The outputs are similar to sklearn's PolynomialFeatures fit transform

Theta vs degree 📈

Conclusion - As the degree of the polynomial increases, the norm of theta increases because of overfitting.

L2 Norm of Theta vs Degree of Polynomial for varying Sample size 📈

Conclusion

As the degree increases magnitude of theta increases due to overfitting of data.
But at the same degree, as the number of samples increases, the magnitude of theta decreases because more samples reduce the overfitting to some extent.

Linear Regression line fit 🔥

Linear Regression Surface plot 🔥

Linear Regression Contour plot 🔥

Time Complexities ⏳

Theoretical time complexity of Normal equation is O(D^2N) + O(D^3)
Theoretical time complexity of Gradient Descent equation is O((t+N)D^2)

Time vs Number of Features ⏳📊

When the number of samples are kept constant, normal equation solution takes more time as it has a factor of D^3 whereas Gradient Descent has a factor of D^2 in the time complexity.

Time vs Number of Samples ⏳📊

When the number of features are kept constant varying number of samples, it can be noticed that time for normal equation is still higher as compared to gradient descent because of computational expenses.

Multicollinearity in Dataset ❗ ❗

The gradient descent implementation works for the multicollinearity.
But as the multiplication factor increases, RMSE and MAE values takes a large shoot
It reduces the precision of the coefficients

Name	Name	Last commit message	Last commit date
Latest commit SoniSiddharth done May 23, 2021 a5c5f95 · May 23, 2021 History 2 Commits
images	images	done	May 23, 2021
linearRegression	linearRegression	done	May 23, 2021
preprocessing	preprocessing	done	May 23, 2021
.gitattributes	.gitattributes	done	May 23, 2021
.gitignore	.gitignore	done	May 23, 2021
Makefile	Makefile	done	May 23, 2021
Normal_regression.py	Normal_regression.py	done	May 23, 2021
README.md	README.md	done	May 23, 2021
collinear_dataset.py	collinear_dataset.py	done	May 23, 2021
compare_time.py	compare_time.py	done	May 23, 2021
contour_plot.gif	contour_plot.gif	done	May 23, 2021
degreevstheta.py	degreevstheta.py	done	May 23, 2021
gif1.gif	gif1.gif	done	May 23, 2021
gif2.gif	gif2.gif	done	May 23, 2021
line_plot.gif	line_plot.gif	done	May 23, 2021
linear_regression_test.py	linear_regression_test.py	done	May 23, 2021
metrics.py	metrics.py	done	May 23, 2021
plot_contour.py	plot_contour.py	done	May 23, 2021
poly_features_test.py	poly_features_test.py	done	May 23, 2021
surface_plot.gif	surface_plot.gif	done	May 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Regression ⭐⭐

Directory Structure 📁

Instructions to run 🏃

Stochastic GD (Batch size = 1) ☝️

Vanilla GD (Batch size = N) ✋

Mini Batch GD (Batch size between 1 and N(5)) 🤘

Polynomial Feature Transformation 🔰

Theta vs degree 📈

L2 Norm of Theta vs Degree of Polynomial for varying Sample size 📈

Linear Regression line fit 🔥

Linear Regression Surface plot 🔥

Linear Regression Contour plot 🔥

Time Complexities ⏳

Time vs Number of Features ⏳📊

Time vs Number of Samples ⏳📊

Multicollinearity in Dataset ❗ ❗

About

Releases

Packages

Languages

SoniSiddharth/ML-Linear-Regression-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Linear Regression ⭐⭐

Directory Structure 📁

Instructions to run 🏃

Stochastic GD (Batch size = 1) ☝️

Vanilla GD (Batch size = N) ✋

Mini Batch GD (Batch size between 1 and N(5)) 🤘

Polynomial Feature Transformation 🔰

Theta vs degree 📈

L2 Norm of Theta vs Degree of Polynomial for varying Sample size 📈

Linear Regression line fit 🔥

Linear Regression Surface plot 🔥

Linear Regression Contour plot 🔥

Time Complexities ⏳

Time vs Number of Features ⏳📊

Time vs Number of Samples ⏳📊

Multicollinearity in Dataset ❗ ❗

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages