EchoPath: Smart AI Cane for the Visually Impaired

EchoPath is an innovative project that empowers blind and visually impaired individuals by providing real-time, AI-powered environmental guidance. By combining a finely tuned computer vision model with natural language generation and audible feedback, EchoPath helps users navigate safely and confidently. This project won Harvard's Makeathon 2025 Most Interactive Design Award.

Overview

EchoPath integrates a finely tuned YOLO11n vision model—trained on the COCO128 dataset—with a local Llama 3.2 model (via Ollama) to generate short, concise scene descriptions. The system then uses macOS's built-in say command to deliver audible guidance, providing real-time updates to the user through their earbuds.

Features

Real-Time Object Detection:
Uses a fine-tuned YOLO11n model (trained on COCO128) to detect objects in live video.
Natural Language Generation:
Queries a local Llama 3.2 model to convert detected object lists into clear, 8–10 word sentences.
Audible Feedback:
Utilizes macOS’s built-in say command to convert text into speech, delivering voice guidance.
Live Camera Feed Broadcast:
Supports broadcasting the live camera feed from a Raspberry Pi via GStreamer, which can then be used as a source input to the vision model.

Architecture and Workflow

The system consists of three primary components:

Video Capture & Object Detection:
- A camera captures live video.
- The YOLO11n model processes each frame in real time to detect objects.
Language Generation:
- When a change in detected objects is noted, a prompt is sent to a locally hosted Llama 3.2 model using Ollama.
- The model returns a concise, natural sentence (8–10 words) describing the scene.
Text-to-Speech (TTS):
- A dedicated TTS worker thread calls macOS’s say command to speak the generated sentence, providing real-time audio feedback without interrupting the camera feed.

Canvas Diagram

Below is an illustrative diagram of the workflow:

  [Camera Feed] --> [YOLO11n Object Detection]
                              |
                              v
                 [Detected Object List]
                              |
                              v
                 [Llama 3.2 Language Model]
                              |
                              v
                    [Generated Sentence]
                              |
                              v
                    [macOS 'say' Command]
                              |
                              v
                    [Audible Guidance]

GStreamer Issue Fix

When using the Raspberry Pi to broadcast the camera feed via GStreamer (as a source input for the vision model), you might encounter issues with OpenCV's GStreamer backend. Use the following instructions to fix the issue:

# ON RASPBERRYPI:
libcamera-vid -t 0 --width 1280 --height 720 --framerate 30 --codec h264 -o - | \
gst-launch-1.0 fdsrc ! h264parse ! rtph264pay config-interval=1 pt=96 ! \
udpsink host=<IP ADDRESS> port=5000
# Replace <IP ADDRESS> with the appropriate IP.

# ON YOUR LAPTOP:
gst-launch-1.0 udpsrc port=5000 ! application/x-rtp, encoding-name=H264 ! \
rtph264depay ! avdec_h264 ! videoconvert ! autovideosink

# Fixing GStreamer in OpenCV:
# Navigate to your desired location for the opencv-python repo:
git clone --recursive https://github.com/skvark/opencv-python.git
cd opencv-python
export CMAKE_ARGS="-DWITH_GSTREAMER=ON"
pip install --upgrade pip wheel
# This build step can take from 5 minutes to over 2 hours depending on your hardware.
pip wheel . --verbose
pip install opencv_python*.whl
# Note: The wheel may be generated in the dist/ directory.

Note: Watch out for the GStreamer issue if you plan to use this feature for broadcasting the live camera feed from the Raspberry Pi to your vision model.

Setup and Installation

Prerequisites

Operating System: macOS (for TTS using say command)
Python: Version 3.8 or above
Hardware: Raspberry Pi (optional, for GStreamer broadcast) or a default webcam
Ollama CLI: Configured to run the llama3.2:latest model locally
YOLO Model File: Fine-tuned YOLO11n model (e.g., yolo-tuned.pt) in the models/ directory

Installation Steps

Clone the Repository:

git clone https://github.com/yourusername/EchoPath.git
cd EchoPath

Create and Activate a Virtual Environment:

python3 -m venv venv
source venv/bin/activate

Install Dependencies:
```
pip install -r requirements.txt
```
(Optional) Fix GStreamer Issue:
Follow the steps listed in the GStreamer Issue Fix section if you plan to use a GStreamer pipeline.

Usage

Run the Application:
```
python main.py
```
Operation:
- The application opens a window with a live video feed.
- It continuously processes frames, detects objects with the fine-tuned YOLO11n model, and queries the Llama 3.2 model when changes occur.
- The generated description is spoken using macOS's say command.
- Press q in the display window to exit the application.

Project Structure

EchoPath/
├── .git/                    # Git repository files
├── README.md                # Project documentation (this file)
├── assets/                  # Project assets (images, icons, etc.)
├── datasets/                # Data for training or testing
├── main.py                  # Main application code
├── models/                  # YOLO model files (fine-tuned YOLO11n on COCO128)
├── opencv-python/           # Custom OpenCV Python build (if used)
├── requirements.txt         # Python package dependencies
└── voices.txt               # List of available voices (optional)

Team and Awards

Team Name: EchoPath
Awards: Winner of Harvard's Makeathon 2025 Most Interactive Design Award

Main Contributors

Taha Ababou: GitHub LinkedIn
Aditya Rampal: GitHub LinkedIn
Anton Garcia Abril Beca: GitHub LinkedIn
Jake Garrett: GitHub LinkedIn

License

This project is open-source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EchoPath: Smart AI Cane for the Visually Impaired

Table of Contents

Overview

Features

Architecture and Workflow

Canvas Diagram

GStreamer Issue Fix

Setup and Installation

Prerequisites

Installation Steps

Usage

Project Structure

Team and Awards

Main Contributors

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
datasets		datasets
models		models
opencv-python @ 255564a		opencv-python @ 255564a
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
main.py		main.py
opencv_custom.sh		opencv_custom.sh
requirements.txt		requirements.txt
voices.txt		voices.txt

tahababou12/EchoPath

Folders and files

Latest commit

History

Repository files navigation

EchoPath: Smart AI Cane for the Visually Impaired

Table of Contents

Overview

Features

Architecture and Workflow

Canvas Diagram

GStreamer Issue Fix

Setup and Installation

Prerequisites

Installation Steps

Usage

Project Structure

Team and Awards

Main Contributors

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages