18 Mar 09:56

v1.3.0 Latest

Latest

What's Changed

Explore mode UX refinements by @AzizStark in #12
Explore mode and chat mode bug fixes by @AzizStark in #13
Feat: Omni parser V2 integration by @sanju-presidio in #14

Features

Integrated OmniParser v2: Delivers enriched, annotated screenshots to the LLM, enabling more informed decision-making for task execution
Automated Playwright Installation: Playwright binaries now automatically install after npm install, streamlining the setup process.
Factifai Logo Added: Improved visual identity with the addition of the Factifai logo.
OpenAI Support for Explore Mode: Explore mode now supports OpenAI models, expanding LLM options and capabilities.
Chat History and Persistence: Added chat history tracking with file storage persistence, allowing users to revisit previous conversations.

Bug Fixes

LLM Context Isolation: Resolved context contamination between different operating modes, ensuring accurate and isolated responses.
Chat Context Management: Implemented context management to prevent exceeding LLM token limits on complex websites.
Explore Mode Graph Fix: Corrected a bug causing incorrect graph rendering in explore mode.
Seamless VNC Mode Switching: Resolved issues with VNC mode switching, ensuring a smoother user experience.

Enhancements

UX Improvements: General UX enhancements implemented to improve usability and overall user experience.

Contributors

AzizStark and sanju-presidio

Assets 2

11 Mar 08:02

v1.2.0

What's Changed

Explore Chat: Specialized chat interface for Click-through exploration of interconnected web content of a website. #11
Graph View: Visual representation of web pages and their relationships for easier navigation and understanding of site structure #11
Page Node System: Interactive page nodes that display content and allow navigation between related pages #11
Recent Chats: Easy access to previous explore mode conversations #11

New Contributors

@AzizStark #11

Contributors

AzizStark

Assets 2

03 Mar 08:13

v1.1.0

Release Notes

Enhancements

Added browser-centric approach on the puppeteer mode. (#9)
General improvements & bugfixes (#9)

Assets 2

03 Mar 08:15

v1.0.0

Release Notes

Built-in support for leading vision-language models:

Claude: Anthropic's advanced vision and reasoning model
OpenAI: GPT-4o with visual understanding capabilities
Gemini: Google's multimodal AI for computer interaction
OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

AI-Powered Computer Control

Intelligent element detection and navigation
Automated verification and validation
Comprehensive test documentation with automated screenshot capture for each step
Integrated test case export with visual step-by-step documentation

Assets 2