Lost in Cupboard - An Image Recognition Data Science Project

The aim of this project is to classify pictures of household items into six categories.

A pretrained CNN based on the EfficientNet V2 architecture and pretrained weights from Imagenet is used for transfer learning. Image augmentation is applied to create variations of the trainable dataset and increase generalizability.

The used architecture is shown in the graph below:

The final model is deployed as prediction service to Kubernetes.

Data

The data used for this project is taken from the Kitchenware Classification Competition at Kaggle.

The dataset has 9.367 images of kitchenware items. 60% of the images (5.560) are labeled. The aim is to use image recognition to categorize unlabeld images of kitchenware items.

Project Structure

Root folder:

01-data-understanding.ipynb: Analyse the content of the images. Characterize similarities and differences between the classes and infer reasonable data augmentation options.
02-model-training.ipynb: Train model and perform hyper-parameter tuning.
train.py: Train final CNN architecture.
data/:
- images/: Folder will be created when downloading the data.
- test_labeled.csv: 200 labeled images for testing locally without submitting to Kaggle.
models/: Folder for trained models.
- train_history.csv: Logging file for the performed training cycles.
deployment/: Folder contains all configurations for the deplyoment.
- README.md: Step-by-step explanation for deploying the model.
src/:
- generate_dataset_preview.py: Script which generates the cover image for this repository.
- kitchenware_helper.py: Helper functions for training on and analysing the data.

Getting Started

You could run the code either locally or on Saturn Cloud with a GPU.

Setup on Saturn Cloud

Training the model on a GPU is significantly faster.

You could use Saturn Cloud to train the model on a GPU for free.

Prerequisites:

Saturn Cloud account. If not signup here for free.
Kaggle account for downloading the data. If not signup here.

After having clicked on the button above follow these steps:

Create jupyter workspace
Open tab "Secrets"
Edit the kaggle.json secret
Click "new secret"
Enter "kaggle.json" as name
Paste content from kaggle.json API file as value (see below how to download this API file)
Click "Save"
Go to tab "Overview" and click "Start" to start the Jupyter Server

Setup Locally

Note: These instructions are focused on unixoid shells. Please use either Unix, macOS or WSL on Windows.

Prerequisites:

Have Python 3.9 and pip installed. Try to run python3.9 --version and pip3 --version.
Have git installed. Try to run git --version.
Kaggle account for downloading the data. If not signup here.

Install pipenv
Check if you have pipenv already:

pip list | grep pipenv

If the output is empty, follow this section, otherwise go to the next section for cloning the repository.

Install pipenv if you don't have it yet. This command can help you getting started:

pip install --user pipenv

You could find more installation options for pipenv here.

Clone Repository

git clone https://github.com/LoHertel/lost-in-cupboard.git
cd lost-in-cupboard/

Create environment:

pipenv install --dev

Activate environment:

pipenv shell

Setup kaggle
Go to your kaggle account, scroll down to "API" and click "Create New API Token".
A "kaggle.json" file gets downloaded. Move this file from you downloads folder into ~/.kaggle.

When you are using WSL, you could run this command in bash to copy the file:

mkdir -p ~/.kaggle/ 
cp /mnt/c/Users/<your Windows user>/Downloads/kaggle.json -t ~/.kaggle/ 
chmod 600 ~/.kaggle/kaggle.json

Training

Download Data

Note: You need a Kaggle account and API key for downloading the data. See above.

kaggle competitions download -c kitchenware-classification
mkdir data
unzip kitchenware-classification.zip -d data > /dev/null
rm kitchenware-classification.zip

Perform training

Read data understanding notebook
Read training notebook
Execute training of final model:

python train.py 10

Note: 10 is the number of epochs, that the training should run. The number can be changed in the bash command above.

Deployment

To deploy the prediction service to a Kubernetes cluster (cloud / local), please follow the deployment instructions here.

Further Improvement

Implement class specific data augmentation rules.
Try more CNN architectures besides EfficientNetV2.

Special Mentions

This is my capstone project I for the Machine Learning Zoomcamp 2022.

I'd like to thank the DataTalks.Club and Alexey Grigorev for hosting the Machine Learning Zoomcamp completely free. If you want to upskill on Machine Learning, please check out their self-paced course. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lost in Cupboard - An Image Recognition Data Science Project

Data

Project Structure

Getting Started

Setup on Saturn Cloud

Setup Locally

Training

Deployment

Further Improvement

Special Mentions

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
deployment		deployment
images		images
models		models
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
01-data-understanding.ipynb		01-data-understanding.ipynb
02-model-training.ipynb		02-model-training.ipynb
Makefile		Makefile
Pipfile		Pipfile
README.md		README.md
saturn-cloud-recipe.json		saturn-cloud-recipe.json
train.py		train.py

LoHertel/lost-in-cupboard

Folders and files

Latest commit

History

Repository files navigation

Lost in Cupboard - An Image Recognition Data Science Project

Data

Project Structure

Getting Started

Setup on Saturn Cloud

Setup Locally

Training

Deployment

Further Improvement

Special Mentions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages