Connect any AI to your desktop. It sees your screen, moves the mouse, types, clicks — anything you can do, it can do. Works with Claude, GPT, Gemini, Llama, or any AI.
If it's on your screen, your AI can use it. Gmail, Slack, Figma, Jira, native apps, legacy software — anything with a UI.
Bring an API key and just describe what you want. clawdcursor figures out the steps, executes them, and verifies they worked. You stay in plain English.
Already using Claude Code, Cursor, or Windsurf? Add clawdcursor and your AI gets full desktop control as a native tool — no extra setup, no extra API calls.
One server. Three modes. They all give your AI the same desktop access.
clawdcursor consent onceclawdcursor doctor to set up your AIclawdcursor startclawdcursor startGET /toolsEvery task flows through the cheapest path that can handle it. The pre-processor classifies intent, the router handles mechanical actions for free, and the AI stages only fire when reasoning is needed. Provider-agnostic — works with Claude, GPT, Gemini, Kimi, Llama, or any OpenAI-compatible model.
FREE — no LLM cost
Parallel OCR + accessibility tree capture in one pass. Produces a unified perception object with every text element, button, input field, and their exact screen coordinates. Also runs spatial layout analysis — detecting toolbar zones, content area center, sidebars, and status bar from element clustering. The LLM gets a text-based map of the screen without needing vision.
CHEAP — any text model ($0.25/1M tokens with Haiku)
Primary reasoning stage. A cheap text LLM reads the OCR snapshot, spatial layout, and app-specific guide, then decides the next action: click, type, drag, key press, scroll, or done. One action per LLM call, verified by re-scanning. Handles 90% of tasks without ever taking a screenshot.
EXPENSIVE — vision model, last resort only
Fires only when Stage 2 can't handle it (CAPTCHAs, complex spatial tasks, visual content). Takes screenshots, sends to a vision LLM, executes tool calls. Supports Anthropic native Computer Use and any OpenAI-compatible vision model (GPT-4o, Gemini, Kimi k2.5).
Before the pipeline runs: a regex parser decomposes compound tasks ("type X, then save as Y" → 2 subtasks). A zero-cost classifier categorizes each subtask — mechanical (router handles it for free), navigation (Ctrl+L + URL), reasoning (Stage 2), or spatial (needs vision). No LLM wasted on tasks a regex can route.
ClawdCursor's breakthrough: instead of sending screenshots, it analyzes WHERE elements are clustered on screen and builds a text-based spatial map. The LLM reads: Content area: center = (1920, 1200) — CLICK HERE. This lets any text model — even a 7B parameter local model — know exactly where to click in Google Docs, Excel, Paint, or any app. No vision required.
v0.7.5 is fully provider-agnostic — no hardcoded model names, no provider-specific hacks. 13 providers auto-detected. App knowledge crowdsourced. Display-agnostic across any resolution and DPI.
Community-contributed JSON instruction manuals. Keyboard shortcuts, workflows, UI layout hints, and tips. Excel (116 shortcuts), Paint, Notepad, Outlook, Spotify, Discord, Figma, and more. Install with clawdcursor guides install excel. Loaded automatically when the app is detected.
Click buttons by name, type into fields by label, read screen text — all without screenshots. smart_click, smart_type, smart_read use accessibility + OCR fallback automatically.
Built-in keyboard shortcut database. shortcuts_list discovers available shortcuts for any app. shortcuts_execute fires them instantly. Zero LLM cost.
Re-scans the screen after every action. Done verification checks if task keywords appear. Drawing tasks get lenient verification. The agent can't hallucinate success.
clawdcursor start (full agent + REST), clawdcursor serve (tools only, bring your own brain), clawdcursor mcp (MCP stdio for Claude Code, Cursor, Windsurf, Zed).
13 providers auto-detected. Display-agnostic (720p to 5K, any DPI). Cross-platform (Windows, macOS, Linux). Provider-agnostic — zero hardcoded model names. Community-driven app knowledge. If a human can do it on screen, clawdcursor can too.
Install, start. Providers are auto-detected on first run.
# Install
powershell -c "irm https://clawdcursor.com/install.ps1 | iex"
# Start (auto-detects your AI providers on first run)
clawdcursor start
Requires Node.js 20+. Consent is one-time and stored in ~/.clawdcursor/consent. Server binds to localhost only.