Daily log to track my progress on the 100 days of ML code challenge.
100 Day ML Challenge to learn and develop machine learning products. Since this is my second time performing this challenge, this time around I will be focusing more on the production enviroment rather than the concepts and theory behind ML/DL models. I will be placing heavy emphasis on the ML pipeline and the process of taking an ML model and applying into a real-world application.
- Learn how to deploy ML models into production.
- Learn about scripting for MLOps
- Learn about the ML Pipeline
- TFX Components
- Orchestration tools
- Learn more about privacy & protection for ML applications
- Learn the CI/CD for ML
- Learn about CUDA and get hands on experience
- Learn more about deep reinforcement learning
- Markov Chains
- Stationary distribution probabilities
- Markov Decision Processes
- Learn more about generative learning
- Learn about AWS microservices
[Additional]
- Milti-modal Systems
- Apache Flume
- Apache Beam
- Apache Airflow
- Using Amdahl's Law
- MySQL/Apache Spark
- PostgreSQL
- Optimizing/architecting software/hardware solutions for ML
Deploy Machine Learning Models to Production With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform
Building Machine Learning Powered Applications Going from Idea to Products
Building Machine Learning Pipelines Automating Model Life Cycles with TensorFlow
Hands-On GPU Programming with Python and CUDA: Explore High-performance Parallel Computing with CUDA
@article{madewithml, title = "MLOps - Made With ML", author = "Goku Mohandas", url = "https://madewithml.com/courses/mlops/organization/" year = "2021", }
This whole challenge will be documented on youtube during live streams. The link to the playlist: 100 Days of ML
- Develop a web application using FLASK
- Code a linear regression model
- Deploy the trained model as a REST service
- Read the section about streamlit
- Create UI using streamlit
- Deploy the trained model as a REST service
- code LSTM model
- Train the LSTM model
- Create UI using streamlit
- Deploy the trained model as a REST service
- Read Chapter 4: ML Deployment using Docker
- Create a dockerfile for Flask App
- Create Docker image
- Push our Docker image to DockerHub
- Read chapter 5: ML Deployment using Kubernetes
- Create GCP Project
- Enable and utilize the Kubernetes Engine API on GCP
- Read topics under scripting from MadeWithML
- Apply learning by adding it to current projects
- Read Ch1: Introduction
- Read Ch2: Introduction to TensorFlow Extended
- Read Ch2: Introduction to TensorFlow Extended
- Read Ch3: Data Ingestion
- Download and setup TFX
- Execute TFX Data Ingestion examples
- Follow [TFX tutorial](https://www.tensorflow.org/tfx/tutorials/tfx/penguin_simple#install_tfx)
- Try using Colab
- Check online resources and try to debugg [TFX tutorial](https://www.tensorflow.org/tfx/tutorials/tfx/penguin_simple#install_tfx)
- Read Ch4: Data Validation
- Read online resources on TFX Data Validation
- Read online resources on TFX Data Validation
- Execute example code for TFVD
- Read Ch5: Data Preprocessing
- Read online resources on TFX Data Preprocessing
- Read more about feature engineering
- Feature engineering vs ML engineering
- Execute example code for TF Transform
- Read about production from [Made with ML](https://madewithml.com/courses/mlops/)
- Watch youtube videos on CI/CD workflows
- Learn more about Github Actions
- Read Ch 6: Model Training
- Read online resources on TFX Trainer Component
- Read Ch 7: Model Analysis and Validation
- Read online resources on TF Model Analysis
- Read Ch 8: Model Deployment with TensorFlow Serving
- Continue reading Ch8: Model Deployment with TensorFlow Serving
- Read online resources on TF Serving
- Look into simple ML models to deploy
- Create the architecture for the pipeline
- Setup/Decide on github project organization
- Choose how to build front-end
- What orchestration tool to use?
- How to integrate CI/CD into project
- FLASK Web deployment vs Model Server
- Create order of events
- Look into setting up GitHub Project
- List out dependencies
- Create Dockerfile
- Build Docker Image
- Run Docker container
- Check if GPU is being used
- Troubleshoot Dockerfile
- Build Docker Image
- Update README.md
- Setup TFX pipeline
- Design TFX architecture
- Understand the template code
- Clear out the template files and rewrite the pipeline
- Change permissions for files in /ml
- Add the IMDB datset into /data
- Convert dataset to desired format
- Setup formating/styling
- Black & flake8
- Setup github actions
- Fix features.py
- Generate statistics
- Visualize statistics
- Infer schema
- Build updated docker image
- Figure out why example gen is not loading
- Change preprocessing.py
- Test Preprocessing
- Change preprocessing.py
- Test Preprocessing
- Test Preprocessing
- Add TFX Transform to tfx pipeline components
- Prototype the tfx trainer component in jupyter
- Modify model.py
- Modify model.py
- Add tfx trainer component to pipeline
- Test pipeline
- Read ch: 12 of Building Ml Piplines - Kubeflow pipeline
- Read ch: 11 of Building Ml Piplines - Apache Beam & Apache Airflow
- Read [Graph-based Neural Structured Learning in TFX](https://www.tensorflow.org/tfx/tutorials/tfx/neural_structured_learning#the_trainer_component)
- Figure out how the tfx IMDB example in tensorflow tutorial works
- Understand the use for custom tfx components in the tutorial
- Understand the preprocessing required for the model using the IMDB dataset
- Find a book to read for CUDA
- Search for alternative resources online
- Download a digital version of the book
- Start the introductory chapters
- Using Amdahl's Law
- Learn about Mandelbrot set
- What are profilers? cProfile module
- Setting up GPU programming environment in Linux
- Search for research internships
- Create a spredsheet with all the important dates listed
- Create and organize a small google document with all relavent links and information
- Test on local environment
- Figure out depencies
- Create Dockerfile
- Build Docker image
- Test docker container with all the dependencies for programming CUDA using Python
- Read chapter 3: Getting Started with PyCUDA
- CPU vs GPU timing
- Parallelizing the mandelbrot set
- Re-implement the tokenization
- Use keras TextVectorizer
- Build sentence sequences
- Code using sequential utilizing CPU
- Parallelize the code to run on GPU
- Funcitonal programming
- Parallel scan and reduction kernel basics
- Kernels, Threads, Blocks, and Grids
- Talk about all of the knowledge I've gained so far
- Current objectives
- Future tasks
- Finish up future tasks
- Include GitHub
- Link website
- Start with introduction
- Read chapter 1
- Continue chapter 1 - From Product Goal to ML Framing
- Start chapter 2 - Create a Plan
- Start reading ML Paper every week
- Finish chapter 2 - Create a Plan
- Review Part I. Find the Correct ML Approach
- Read more about model metrics
- Find an existing problem
- Determine if it can be solved using ML
- Look for datasets and determine what model would work
- Read Chapter 3 - Build Your First End-to-End Pipeline
- Read Chapter 4 - Aquire an Initial Dataset
- Continue Chapter 4
- Review Part II. Build a Working Pipeline
- Setup GitHub repo for paper-a-week challenge
- Decide on a list of papers to read
- Start reading the first paper
- Read Part III - Iterate on Models
- Read Chapter 5 - Train and Evaluate Your Model
- Finish reading Ch 5
- Read Chapter 6 - Debug Your ML Problems
- Finish reading Chapter 6
- Think of ways to test preprocessing in ML-Pipelines project
- Build the model on Jupyter Notebook
- Finish Data Ingestion
- Finish filtering HTML tags from string
- Vectorize the filtered string
- Figure out how to create embeddings
- Create embeddings
- Analyze processed data
- What is the model doing?
- How to build the DL model?
- Build the DL model
- Run tests
- Read and take notes on Data preprocessing - Tidy data - by Hadley Wickham
- Continue with section 3
- Revise sections 1-3
- Continue with section 4
- Finished reading Tidy Data
- Start Chapter 7 - Using Classifiers for Writing Recommendations
- Finish Chapter 7
- Review Part III - Iterate on Models
- Read Statistical Modeling: The Two Cultures - by Leo Breiman
- Finished Section 3 and 4
- Read section 5 - the use of data models
- Start section 6 - the limitations of data models
- Read section 9
- Fix info on the slides
- Make up a presentation
- Add/subtract information
- Read section 10
- Read section 11
- Read section 12
- Finished Statistical Modeling: The Two Cultures by Leo Breiman
- Start reading A Study in Rashomon Curves and Volumes: A New Perspective on Generalization and Model Simplicity in Machine Learning (Semenova et al)
- Update the Paper a Week repository with annotated papers
- Checked out Ishan Misra and Yann LeCun's blog post on Self-supervised learning
- Finish reading [Self-supervised learning: The dark matter of intelligence](https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/)
- Continue with Paper a Week: Rashomon Curves and Volumes
- Read Paper a Week: Rashomon Curves and Volumes
- Explore the different statistical concepts introduced in Rashomon Curves & Volumes
- Continue reading Rashomon Curves and Volumes
- Learn about the different complexity measures introduced in the Rashomon Cruves paper
- Read [How to do Research At the MIT AI Lab](https://dspace.mit.edu/bitstream/handle/1721.1/41487/AI_WP_316.pdf?sequence=4&isAllowed=y)
- Checked out reading lists for AI from [Stanford](http://i.stanford.edu/pub/cstr/reports/cs/tr/86/1093/CS-TR-86-1093.pdf) and [Berkley](https://ml.berkeley.edu/reading-list/)
- Read the [EfficientNet](https://arxiv.org/pdf/1905.11946.pdf) paper
- Learn about FLOPS
- Read the [Yolo](https://arxiv.org/abs/1506.02640) paper
- Lay out the architecture of EfficientNet
- Figure out if its possible to code it using TensorFlow
- Start implementation
- Understand what MBConv blocks are
- Learn about inverted residual convolution
- Get an understanding of the Bottleneck residual block
- What are dwise layers?
- Start by implementing a simple convblock
- Design and code the MBConv block
- Learn about the architecture of PointNet
- Continue with the berkely reading list
- Continue reading the paper
- [DeepMind’s New Super Model: Perceiver IO is a Transformer that can Handle Any Dataset](https://pub.towardsai.net/deepminds-new-super-model-perceiver-io-is-a-transformer-that-can-handle-any-dataset-dfcffa85fe61)
- Breifly learn about multi-modal models
- [Learn about semi-supervised learning models](http://www.cs.cmu.edu/~10701/slides/17_SSL.pdf)
- What is the co-training algorithm
- Learn about the co-training algorithm in depth
- Read [Combining Labeled and Unlabeled Data with Co-Trainingy (Mitchell et all)](https://www.cs.cmu.edu/~avrim/Papers/cotrain.pdf)
- Review SSL with [Semi-Supervised Learning](http://pages.cs.wisc.edu/~jerryzhu/pub/SSL_EoML.pdf)
- Learn more about the functionality of multi-modal models
- Learn about the difference between SSL and WSL
- Start on the medium article
- Finish the medium article