DEPLOYMENTS AND HOW TO USE IT (PLEASE READ)

[!]https://docs.google.com/document/d/1R6nR_AweptKE9sJPMdnFxIeO3jDxQfkfBhI2Ld4GCDc/edit?usp=sharing

BRAND NEW CHATBOT UI [!]https://k8chatbot.vercel.app/

[!]https://docs.google.com/presentation/d/1fE-f3UlMdvPvwsPu7tjNVFxIesq9U5uNW-DJhCRIQb4/edit?usp=sharing

LATEST UPDATE!!!!!!!!!

FOR THE FRONTEND OF GEMINI LOGGING, A USER CAN SEE IT BEAUTIFULLY WITH A WELL UI. To know more, see below- To run the backend for the gemini remediation and advice setup

python3 src/server.py

To run the frontend

cd frontend/k8s-remediation-dashboard/src

npm run dev

See below for the screenshots-

Downloadable-

(PLEASE NOTE- The data is being scraped from prometheus EVERY 5 MINUTES. Model will be best trained with time.)

The vercel app is for the frontend (additional feature) and does not really come under the model training and gemini output. We used prometheus in kubernetes using minikube to scrape the data. We tried to make it a public IP but due to security constraints and few free tier cloud options, we decided to keep it local. If needed, you can run it on your own prometheus and dataset (through src/fetch_live_metrics and data/k8s_live_metrics.csv) The model is under models/k8s_failure_model_live.pkl This has been deployed online. The gemini output and remediation step is under src/predictgemini.py src/jsonextractor.py .(predictgeministreamlit.py was for testing to integrate with streamlit) ALL OF THIS BEAUTIFULLY COMES TOGETHER IN streamlitapp.py in the root directory

(PLEASE NOTE- IT WORKS WITH LOCAL IP, WE COULD NOT RUN PROMETHEUS GLOBALLY AS MENTIONED. BUT PLEASE TRY IT OUT. HENCE THE FETCH METRICS IS WITH THE CURRENT SMALL AMOUNT OF DATA)

Read below to know more about our project.

Kubernetes Failure Prediction

This project aims to build a machine learning model for predicting Kubernetes cluster failures using real-time and historical cluster metrics. The goal is to identify potential issues in a Kubernetes environment, such as pod/node failures, resource exhaustion, and network issues, before they occur.

The system leverages a variety of tools and libraries, including Prometheus for metrics collection, Python for data processing, and machine learning algorithms to predict failures.

Project Overview

Kubernetes clusters can face a variety of issues, from pod/node failures to resource exhaustion or network issues. Predicting these failures in advance can help maintain a more stable and efficient cluster. This project includes:

Data collection from Kubernetes clusters.
Feature engineering to prepare metrics for machine learning.
Training of a machine learning model to predict failures.
Deployment of the model in a Kubernetes environment.
Evaluation and visualization of the model's performance.

System Requirements

Python 3.7+
Prometheus (for fetching live Kubernetes metrics)
Docker (for containerizing the application)
Kubernetes (for deployment)
Machine Learning Libraries: scikit-learn, pandas, numpy, matplotlib, joblib, etc.

Project Structure

kubernetes-failure-prediction/
├── src/                         # Code for data collection, model training, and evaluation
│   ├── deployment.yaml          # Kubernetes deployment configuration
│   ├── generate_output.py       # Generates model output for analysis
│   ├── __pycache__/             # Compiled Python files
│   ├── feature_engineering.py   # Script for feature engineering
│   ├── jsonextractor.py         # Extracts JSON data for processing
│   ├── test_model.py            # Tests for evaluating model performance
│   └── external_data_link.txt    # External link to large datasets
├── docs/                         # Documentation files
│   └── README.md                 # This file
├── presentation/                 # Slides and recorded demo (YouTube/Drive link)
│   ├── slides.pptx               # Slides for the presentation
│   └── demo_link.txt             # Link to recorded demo (YouTube/Google Drive)
├── deployment/                   # Files for deploying the model to Kubernetes
│   ├── kubernetes_deploy.yaml    # Kubernetes deployment configuration
│   └── Dockerfile                # Dockerfile for containerizing the model
├── tests/                        # Unit and integration tests
├── requirements.txt              # Python dependencies
└── LICENSE                       # License information

Setup Instructions

1. Clone the repository

git clone https://github.com/your-username/kubernetes-failure-prediction.git
cd kubernetes-failure-prediction

2. Create a virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows, use venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Set up Prometheus

Ensure that you have Prometheus running and that it's scraping Kubernetes metrics. You can set up Prometheus as per Kubernetes documentation.

Usage

1. Collect Metrics

You can collect Kubernetes metrics by running:

python src/fetch_live_metrics.py

This will fetch live metrics from your Prometheus instance.

2. Train the Model

To train the model on your dataset, use the following command:

python src/train_model_live.py

This will train the model on the collected data and output the trained model as failure_predictor.pkl in the models/ directory.

3. Predict Failures

Once the model is trained, you can use it to predict failures in your Kubernetes cluster:

python src/predictgemini.py

This script will load the trained model and predict potential failures based on real-time metrics.

Model Evaluation

To evaluate the model's performance, use the following script:

python src/test_model.py

This will test the model on a test dataset and display evaluation metrics such as accuracy, precision, recall, and F1 score.

Deployment

1. Dockerize the Application

Use the Dockerfile to containerize the application:

docker build -t k8s-failure-prediction .

2. Deploy to Kubernetes

You can deploy the model using the Kubernetes configuration in deployment.yaml:

kubectl apply -f deployment.yaml

This will deploy your model to a Kubernetes cluster. Make sure that your cluster has access to the necessary metrics from Prometheus.

Testing

Unit and integration tests are located in the tests/ directory. To run the tests, use:

pytest tests/

This will run all the unit and integration tests to ensure the code is working as expected.

Licenses

This will run all the unit and integration tests to ensure the code is working as expected.

Licenses

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Prometheus for real-time metrics collection.
scikit-learn for machine learning algorithms.
Kubernetes for orchestration and deployment.

Feel free to contribute to the project or suggest improvements via issues and pull requests. Happy coding!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DEPLOYMENTS AND HOW TO USE IT (PLEASE READ)

LATEST UPDATE!!!!!!!!!

Kubernetes Failure Prediction

Table of Contents

Project Overview

System Requirements

Project Structure

Setup Instructions

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Set up Prometheus

Usage

1. Collect Metrics

2. Train the Model

3. Predict Failures

Model Evaluation

Deployment

1. Dockerize the Application

2. Deploy to Kubernetes

Testing

Licenses

Licenses

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.devcontainer		.devcontainer
DEPLOYMENT_CODE		DEPLOYMENT_CODE
api		api
chatbot-app		chatbot-app
data		data
docs		docs
frontend/k8s-remediation-dashboard		frontend/k8s-remediation-dashboard
models		models
presentation		presentation
src		src
yaml		yaml
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
smaple_error.json		smaple_error.json
streamlit.py		streamlit.py
streamlitapp.py		streamlitapp.py

CPPavithra/Kubernetes-Failure-Predictor

Folders and files

Latest commit

History

Repository files navigation

DEPLOYMENTS AND HOW TO USE IT (PLEASE READ)

LATEST UPDATE!!!!!!!!!

Kubernetes Failure Prediction

Table of Contents

Project Overview

System Requirements

Project Structure

Setup Instructions

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Set up Prometheus

Usage

1. Collect Metrics

2. Train the Model

3. Predict Failures

Model Evaluation

Deployment

1. Dockerize the Application

2. Deploy to Kubernetes

Testing

Licenses

Licenses

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages