Skip to content

Aditya-professional-life/AI-Multi-Modal-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Here’s a README.md file for your project:

# Multi-Modal AI Voice Assistant

## Overview

The Multi-Modal AI Voice Assistant is a sophisticated application designed to interact with users through advanced text and image-based responses. Utilizing cutting-edge technologies like Groq and Google Generative AI, this assistant provides comprehensive and contextually accurate answers based on user inputs. It integrates various functionalities, including image processing, clipboard text extraction, and a versatile user interface to enhance the overall user experience.

## Features

- **Advanced AI Integration**: Utilizes Groq and Google Generative AI for high-quality text and image-based responses.
- **Image Processing**: Includes capabilities for screenshot capture and webcam image analysis to provide detailed context.
- **Clipboard Extraction**: Extracts and incorporates clipboard text into responses for enriched user interaction.
- **Versatile Interface**: Supports both text input and manual command selection for flexible interaction with the assistant.

## Installation

To get started with the Multi-Modal AI Voice Assistant, follow these installation steps:

1. **Clone the Repository**:

   ```bash
   git clone https://github.com/Aditya-professional-life/multi-modal-ai-assistant.git
  1. Navigate to the Project Directory:

    cd multi-modal-ai-assistant
  2. Install the Required Dependencies:

    pip install -r requirements.txt

    Ensure that you have the following libraries installed:

    • faster-whisper
    • groq
    • google-generativeai
    • opencv-python
    • Pillow
    • pyperclip
    • pyttsx3
    • SpeechRecognition
    • pytesseract

Configuration

  1. API Keys: Configure your API keys for Groq and Google Generative AI by replacing the placeholders in the code with your actual keys.

  2. Webcam and Screenshot Paths: The application saves captured images and screenshots as webcam.jpg and screenshot.jpg respectively. Ensure these paths are writable.

Usage

  1. Running the Assistant:

    Execute the script:

    python main.py
  2. Interaction Modes:

    • Voice Mode: Say the wake word followed by your command to interact via voice.
    • Text Mode: Type your prompt directly and see the response.
  3. Commands:

    • "Take Screenshot": Captures a screenshot of your current screen.
    • "Capture Webcam": Takes a photo using the webcam.
    • "Extract Clipboard": Incorporates text from the clipboard into the response.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your proposed changes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or support, please reach out to [email protected].


Happy coding!


Feel free to adjust any specific details, like API key setup or paths, to match your actual project setup.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages