GitHub - Aditya-professional-life/AI-Multi-Modal-Assistant

Here’s a README.md file for your project:

# Multi-Modal AI Voice Assistant

## Overview

The Multi-Modal AI Voice Assistant is a sophisticated application designed to interact with users through advanced text and image-based responses. Utilizing cutting-edge technologies like Groq and Google Generative AI, this assistant provides comprehensive and contextually accurate answers based on user inputs. It integrates various functionalities, including image processing, clipboard text extraction, and a versatile user interface to enhance the overall user experience.

## Features

- **Advanced AI Integration**: Utilizes Groq and Google Generative AI for high-quality text and image-based responses.
- **Image Processing**: Includes capabilities for screenshot capture and webcam image analysis to provide detailed context.
- **Clipboard Extraction**: Extracts and incorporates clipboard text into responses for enriched user interaction.
- **Versatile Interface**: Supports both text input and manual command selection for flexible interaction with the assistant.

## Installation

To get started with the Multi-Modal AI Voice Assistant, follow these installation steps:

1. **Clone the Repository**:

   ```bash
   git clone https://github.com/Aditya-professional-life/multi-modal-ai-assistant.git

Navigate to the Project Directory:
```
cd multi-modal-ai-assistant
```
Install the Required Dependencies:
```
pip install -r requirements.txt
```
Ensure that you have the following libraries installed:
- faster-whisper
- groq
- google-generativeai
- opencv-python
- Pillow
- pyperclip
- pyttsx3
- SpeechRecognition
- pytesseract

Configuration

API Keys: Configure your API keys for Groq and Google Generative AI by replacing the placeholders in the code with your actual keys.
Webcam and Screenshot Paths: The application saves captured images and screenshots as webcam.jpg and screenshot.jpg respectively. Ensure these paths are writable.

Usage

Running the Assistant:

Execute the script:
```
python main.py
```
Interaction Modes:
- Voice Mode: Say the wake word followed by your command to interact via voice.
- Text Mode: Type your prompt directly and see the response.
Commands:
- "Take Screenshot": Captures a screenshot of your current screen.
- "Capture Webcam": Takes a photo using the webcam.
- "Extract Clipboard": Incorporates text from the clipboard into the response.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your proposed changes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or support, please reach out to [email protected].

Happy coding!


Feel free to adjust any specific details, like API key setup or paths, to match your actual project setup.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
assistant.py		assistant.py
tempCodeRunnerFile.py		tempCodeRunnerFile.py
test.py		test.py
testing.py		testing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Configuration

Usage

Contributing

License

Contact

About

Releases

Packages

Languages

Aditya-professional-life/AI-Multi-Modal-Assistant

Folders and files

Latest commit

History

Repository files navigation

Configuration

Usage

Contributing

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages