Skip to content

Playwright#50

Open
CakeRepository wants to merge 123 commits intomasterfrom
playwright
Open

Playwright#50
CakeRepository wants to merge 123 commits intomasterfrom
playwright

Conversation

@CakeRepository
Copy link
Copy Markdown
Member

No description provided.

This was an error that was caused mainly to reduce reduce human readability to not use a class that is used by another lib
Fixed confusing local chat history
Merge pull request #2 from CakeRepository/master
Merge pull request #5 from CakeRepository/multiagent
Multiagent - this should have been in main
… tool usage instructions

- Added Reset to Default buttons for Actioner, Planner, and Coordinator prompts
- Completely rewrote default system prompts with comprehensive tool usage guides
- Actioner prompt now includes detailed examples, correct tool call formats, and common mistakes to avoid
- Added step-by-step workflow examples for common tasks (opening apps, clicking buttons, browser automation)
- Emphasized mandatory use of window handles for all keyboard/mouse operations
- Added visual formatting with emojis for better readability
- Planner prompt now includes iterative planning approach (one step at a time)
- Coordinator prompt updated with clearer routing guidelines
- All prompts now emphasize observation-first approach (CaptureWholeScreen before acting)
- Added static methods in ToolConfig to retrieve default prompts: GetDefaultActionerPrompt(), GetDefaultPlannerPrompt(), GetDefaultCoordinatorPrompt()

This should significantly improve the agents' ability to use tools correctly.
- Implemented `convert_omniparser_to_onnx.py` for converting YOLO model from PyTorch to ONNX format.
- Created `download_and_convert_all.py` to download both icon_detect and icon_caption_florence models and convert them to ONNX.
- Added PowerShell scripts: `download_omniparser_model.ps1` for downloading the icon_detect model, and `setup_omniparser_complete.ps1` for complete setup including model conversion.
- Introduced `setup_omniparser_full.ps1` for downloading and converting both detection and captioning models.
- Developed test scripts: `test_ocr_simple.ps1` for Tesseract OCR initialization and `test_simple_omniparser.ps1` for testing model loading and inference.
- Included model weights for icon_detect in the repository.
feat: Add OmniParser model downloader and converter scripts
- Updated action prompt in LMStudioActioner to include explicit execution instructions for users.
- Added a workflow diagram to the Multi-Agent Architecture documentation to illustrate agent interactions and processes.
- Improved logging for function invocation in the chat client setup.
Enhance action execution prompts and improve chat client configuration
Introduces Google Gemini provider support in AIProviderConfigForm and AIClientFactory, adds Clipboard and FileSystem plugins, and updates ToolConfig to enable these plugins. Refactors Actioner and MultiAgentActioner to use the new AIClientFactory for provider-agnostic chat client creation. Improves LM Studio config validation and UI, adds auto-detect for local AI endpoints, and updates OmniParserForm to reflect embedded ONNX model usage. Also includes minor fixes and enhancements to plugin APIs, OCR helper, and project file resource handling.
Add Gemini and local plugin support; UI and config updates
OmniParser support has been removed from the project, including all related source files, ONNX models, and UI references. The installer scripts and project configuration have been updated to reflect this change. Users are now directed to use Playwright for web automation instead.
te
te
te
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant