v0.7.5 — Security Hardening + 13 Providers

Give your AI
eyes and hands

Connect any AI to your desktop. It sees your screen, moves the mouse, types, clicks — anything you can do, it can do. Works with Claude, GPT, Gemini, Llama, or any AI.

40
Tools
8
Pipeline Layers
3
Transport Modes
any
AI Model
Use it your way

No app integrations. No API keys per service.

If it's on your screen, your AI can use it. Gmail, Slack, Figma, Jira, native apps, legacy software — anything with a UI.

🧑‍💻 Tell it what to do

Bring an API key and just describe what you want. clawdcursor figures out the steps, executes them, and verifies they worked. You stay in plain English.

clawdcursor doctor clawdcursor start

🤖 Connect your AI directly

Already using Claude Code, Cursor, or Windsurf? Add clawdcursor and your AI gets full desktop control as a native tool — no extra setup, no extra API calls.

Claude Code Cursor Windsurf Zed
Setup

Three ways to connect

One server. Three modes. They all give your AI the same desktop access.

Claude Code / Cursor / Windsurf

  1. Run clawdcursor consent once
  2. Add to your AI client's config (one JSON block)
  3. Desktop tools appear natively in your AI
  4. No extra API key, no extra setup
  5. Your AI decides what to do — clawdcursor does it
40
Tools available
$0
Extra cost

Give it a task directly

  1. Run clawdcursor doctor to set up your AI
  2. Run clawdcursor start
  3. Type what you want in plain English
  4. clawdcursor figures out the steps and executes
  5. Works with 13+ providers: Anthropic, OpenAI, Gemini, Groq, Ollama, and more
8
Pipeline layers
$0
Simple tasks

Build with it

  1. Run clawdcursor start
  2. 40 tools available over HTTP on localhost (smart tools included)
  3. OpenAI function-calling format
  4. Call individual tools or send full tasks
  5. Browse all schemas at GET /tools
:3847
Localhost
any
HTTP client
3-Stage Pipeline

Text first. Vision last. Any model.

Every task flows through the cheapest path that can handle it. The pre-processor classifies intent, the router handles mechanical actions for free, and the AI stages only fire when reasoning is needed. Provider-agnostic — works with Claude, GPT, Gemini, Kimi, Llama, or any OpenAI-compatible model.

Stage 1 · SnapshotBuilder

FREE — no LLM cost

Parallel OCR + accessibility tree capture in one pass. Produces a unified perception object with every text element, button, input field, and their exact screen coordinates. Also runs spatial layout analysis — detecting toolbar zones, content area center, sidebars, and status bar from element clustering. The LLM gets a text-based map of the screen without needing vision.

🧠

Stage 2 · TextNavigator ★

CHEAP — any text model ($0.25/1M tokens with Haiku)

Primary reasoning stage. A cheap text LLM reads the OCR snapshot, spatial layout, and app-specific guide, then decides the next action: click, type, drag, key press, scroll, or done. One action per LLM call, verified by re-scanning. Handles 90% of tasks without ever taking a screenshot.

🎯

Stage 3 · VisionFiller

EXPENSIVE — vision model, last resort only

Fires only when Stage 2 can't handle it (CAPTCHAs, complex spatial tasks, visual content). Takes screenshots, sends to a vision LLM, executes tool calls. Supports Anthropic native Computer Use and any OpenAI-compatible vision model (GPT-4o, Gemini, Kimi k2.5).

🔍

Pre-processor + Task Classifier

Before the pipeline runs: a regex parser decomposes compound tasks ("type X, then save as Y" → 2 subtasks). A zero-cost classifier categorizes each subtask — mechanical (router handles it for free), navigation (Ctrl+L + URL), reasoning (Stage 2), or spatial (needs vision). No LLM wasted on tasks a regex can route.

🗺️

Textual Scaffolding (Spatial Layout Analysis)

ClawdCursor's breakthrough: instead of sending screenshots, it analyzes WHERE elements are clustered on screen and builds a text-based spatial map. The LLM reads: Content area: center = (1920, 1200) — CLICK HERE. This lets any text model — even a 7B parameter local model — know exactly where to click in Google Docs, Excel, Paint, or any app. No vision required.

What Makes It Different

Provider-agnostic. Community-driven. Universal.

v0.7.5 is fully provider-agnostic — no hardcoded model names, no provider-specific hacks. 13 providers auto-detected. App knowledge crowdsourced. Display-agnostic across any resolution and DPI.

📖

App Guides — 86+ apps

Community-contributed JSON instruction manuals. Keyboard shortcuts, workflows, UI layout hints, and tips. Excel (116 shortcuts), Paint, Notepad, Outlook, Spotify, Discord, Figma, and more. Install with clawdcursor guides install excel. Loaded automatically when the app is detected.

🖱️

Smart Tools (Blind Agent)

Click buttons by name, type into fields by label, read screen text — all without screenshots. smart_click, smart_type, smart_read use accessibility + OCR fallback automatically.

⌨️

Shortcuts Engine

Built-in keyboard shortcut database. shortcuts_list discovers available shortcuts for any app. shortcuts_execute fires them instantly. Zero LLM cost.

Ground-Truth Verifier

Re-scans the screen after every action. Done verification checks if task keywords appear. Drawing tasks get lenient verification. The agent can't hallucinate success.

🔌

Three Transport Modes

clawdcursor start (full agent + REST), clawdcursor serve (tools only, bring your own brain), clawdcursor mcp (MCP stdio for Claude Code, Cursor, Windsurf, Zed).

🌍

Truly Universal

13 providers auto-detected. Display-agnostic (720p to 5K, any DPI). Cross-platform (Windows, macOS, Linux). Provider-agnostic — zero hardcoded model names. Community-driven app knowledge. If a human can do it on screen, clawdcursor can too.

Get Started

Two commands. That's it.

Install, start. Providers are auto-detected on first run.

PowerShell
# Install
powershell -c "irm https://clawdcursor.com/install.ps1 | iex"

# Start (auto-detects your AI providers on first run)
clawdcursor start

Requires Node.js 20+. Consent is one-time and stored in ~/.clawdcursor/consent. Server binds to localhost only.

Give your AI a body.

Open source. Any model. Any client. Your desktop, controlled.

Star on GitHub