Releases: presidio-oss/factif-ai
Releases · presidio-oss/factif-ai
v1.3.0
What's Changed
- Explore mode UX refinements by @AzizStark in #12
- Explore mode and chat mode bug fixes by @AzizStark in #13
- Feat: Omni parser V2 integration by @sanju-presidio in #14
Features
- Integrated OmniParser v2: Delivers enriched, annotated screenshots to the LLM, enabling more informed decision-making for task execution
- Automated Playwright Installation: Playwright binaries now automatically install after
npm install
, streamlining the setup process. - Factifai Logo Added: Improved visual identity with the addition of the Factifai logo.
- OpenAI Support for Explore Mode: Explore mode now supports OpenAI models, expanding LLM options and capabilities.
- Chat History and Persistence: Added chat history tracking with file storage persistence, allowing users to revisit previous conversations.
Bug Fixes
- LLM Context Isolation: Resolved context contamination between different operating modes, ensuring accurate and isolated responses.
- Chat Context Management: Implemented context management to prevent exceeding LLM token limits on complex websites.
- Explore Mode Graph Fix: Corrected a bug causing incorrect graph rendering in explore mode.
- Seamless VNC Mode Switching: Resolved issues with VNC mode switching, ensuring a smoother user experience.
Enhancements
- UX Improvements: General UX enhancements implemented to improve usability and overall user experience.
v1.2.0
What's Changed
- Explore Chat: Specialized chat interface for Click-through exploration of interconnected web content of a website. #11
- Graph View: Visual representation of web pages and their relationships for easier navigation and understanding of site structure #11
- Page Node System: Interactive page nodes that display content and allow navigation between related pages #11
- Recent Chats: Easy access to previous explore mode conversations #11
New Contributors
v1.1.0
v1.0.0
Release Notes
Built-in support for leading vision-language models:
- Claude: Anthropic's advanced vision and reasoning model
- OpenAI: GPT-4o with visual understanding capabilities
- Gemini: Google's multimodal AI for computer interaction
- OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent
AI-Powered Computer Control
- Intelligent element detection and navigation
- Automated verification and validation
- Comprehensive test documentation with automated screenshot capture for each step
- Integrated test case export with visual step-by-step documentation