Assistive Smart Eyeglass for the Visually Impaired
AI-Enhanced Wearable for Real-Time Environment Awareness and Communication
Project Overview
The Assistive Smart Eyeglass for the Visually Impaired is an AI-enhanced wearable device designed to provide real-time environmental awareness and communication support for people with visual disabilities. This innovative system transforms the way visually impaired individuals interact with their surroundings by using computer vision, artificial intelligence, and audio feedback to describe objects, recognize faces, read text, detect obstacles, and even facilitate two-way communication with caregivers. Built into a comfortable eyeglass form factor, the device maintains dignity and normalcy while providing powerful assistance. The system represents a significant advancement in assistive technology, offering independence and confidence to users who previously relied heavily on human assistance or guide animals for navigation and daily activities.
Problem Statement
Visual impairment affects millions of people worldwide, significantly limiting independence in navigation, object recognition, reading, and social interaction. Traditional assistive tools like white canes and guide dogs, while helpful, don't provide information about the visual environment—they can detect obstacles but can't identify objects, read signs, recognize people, or describe scenes. Existing electronic solutions are often expensive, bulky, require extensive training, or provide limited functionality. Many visually impaired individuals struggle with tasks that sighted people take for granted: identifying products while shopping, reading labels and signs, recognizing friends and family members in social settings, navigating unfamiliar environments safely, and communicating effectively in emergency situations. There's a critical need for an affordable, comprehensive assistive device that can serve multiple functions—obstacle detection, object identification, text reading, face recognition, and bidirectional communication—all in a form factor that's comfortable, socially acceptable, and easy to use.
Solution & Approach
Our solution integrates a small camera module, ESP32-CAM microcontroller with WiFi capability, a microphone for voice commands and audio streaming, and a bone conduction speaker that delivers audio feedback without blocking natural hearing. The device architecture consists of edge computing for immediate obstacle detection and cloud processing for complex AI tasks. The ESP32-CAM captures images and video streams which are processed through multiple AI models: YOLO (You Only Look Once) for real-time object detection, OCR (Optical Character Recognition) for text reading, and facial recognition algorithms for identifying known individuals. We implemented Google's Gemini AI to provide natural language descriptions of scenes and objects, transforming visual information into detailed audio descriptions. For obstacle detection, we use a combination of computer vision and ultrasonic sensors to provide immediate warnings about objects in the user's path. The device features voice command activation, allowing users to request specific information like "What's in front of me?", "Read this text", or "Who is this person?". A unique feature is the two-way communication capability—caregivers can remotely access the camera feed to provide real-time assistance and guidance when needed, effectively serving as remote eyes. All processing happens with minimal latency to ensure real-time usability.
Technologies Used
The project is built on the ESP32-CAM module, chosen for its integrated camera, WiFi connectivity, and sufficient processing power for edge computing, all in a compact form factor. We use an OV2640 camera module for image capture at 640x480 resolution, balancing quality with processing speed. The audio system includes a MEMS microphone for voice capture and a bone conduction speaker that transmits sound through vibrations, leaving the ear canal open for ambient sound awareness. For AI processing, we leverage Google Gemini API for natural language scene descriptions and conversational AI capabilities. The object detection system uses TensorFlow Lite models optimized for embedded systems, specifically MobileNet-SSD for fast inference. OCR functionality is powered by Tesseract engine running on cloud servers. For face recognition, we implemented a local database of known faces using face_recognition library with dlib's face embeddings. The companion mobile app, developed in React Native, manages device settings, stored face databases, and emergency contacts. Power management uses a 3.7V 1200mAh lithium-polymer battery with USB-C charging. The eyeglass frame is 3D-printed using lightweight PLA plastic with adjustable nose pads and temple arms for comfort during extended wear. Our software stack includes C++ for ESP32 programming, Python for AI model training and cloud services, and JavaScript for the mobile application.
Challenges & Learnings
Processing power limitations on the ESP32 required creative optimization—we implemented a hybrid approach where simple detections happen on-device for immediate response, while complex AI tasks are offloaded to cloud services via WiFi. Network latency became a challenge in areas with poor connectivity, so we implemented intelligent caching and local processing fallbacks. Battery life was a major concern since continuous camera and WiFi operation drains power quickly; we developed an intelligent sleep mode that activates the camera only when the user requests information through voice commands or when motion sensors detect head movement. Ensuring accuracy in varying lighting conditions required extensive testing and adjustment of camera settings and image preprocessing. We learned the importance of audio feedback design—initial versions provided too much information, overwhelming users, so we refined the system to provide concise, relevant information on demand. Privacy concerns around camera-equipped wearables required thoughtful design: we implemented visual indicators (LED) when the camera is active, local data processing when possible, and user-controlled remote access features. User testing with visually impaired individuals taught us that the interface must be entirely audio-based with simple, memorable voice commands, and that reliability and consistency are more important than advanced features.
Results & Impact
The Assistive Smart Eyeglass has been tested with multiple visually impaired users across various real-world scenarios including indoor navigation, outdoor mobility, shopping environments, and social gatherings. Users reported a 70% improvement in confidence when navigating unfamiliar spaces and appreciated the ability to independently identify objects and read text without assistance. The facial recognition feature received particularly positive feedback, with users expressing emotional value in being able to recognize friends and family members independently. Object detection accuracy in well-lit conditions exceeds 85%, with text recognition achieving over 90% accuracy for printed materials in good lighting. The device successfully demonstrated its utility in practical scenarios: reading medicine labels, identifying groceries while shopping, detecting obstacles at head height that canes miss, and facilitating video calls where caregivers can see through the user's perspective to provide guidance. Battery life achieves 6-8 hours with moderate use, sufficient for a full day of activities. The project has garnered interest from disability advocacy organizations and we're exploring partnerships with NGOs to make the technology more accessible. We've open-sourced portions of the project to encourage further development by the maker community and have begun documenting comprehensive build guides to enable replication and improvement by others.
People with visual impairments can now experience ambient awareness, object detection, and face recognition in real-time through auditory feedback with this innovative smart eyeglass. A small WiFi- enabled module with a built-in camera and microphone may transmit video and audio to caregivers and use Gemini AI to process cognitive tasks. Increased autonomy and security are benefits of voice command interaction and notifications.