Clawd Cursor v0.7.5 · The Glove for Any AI Hand

Use it your way

No app integrations. No API keys per service.

If it's on your screen, your AI can use it. Gmail, Slack, Figma, Jira, native apps, legacy software — anything with a UI.

🧑‍💻 Tell it what to do

Bring an API key and just describe what you want. clawdcursor figures out the steps, executes them, and verifies they worked. You stay in plain English.

clawdcursor doctor clawdcursor start

🤖 Connect your AI directly

Already using Claude Code, Cursor, or Windsurf? Add clawdcursor and your AI gets full desktop control as a native tool — no extra setup, no extra API calls.

Claude Code Cursor Windsurf Zed

Setup

Three ways to connect

One server. Three modes. They all give your AI the same desktop access.

Claude Code / Cursor / Windsurf

Run clawdcursor consent once
Add to your AI client's config (one JSON block)
Desktop tools appear natively in your AI
No extra API key, no extra setup
Your AI decides what to do — clawdcursor does it

40

Tools available

$0

Extra cost

Give it a task directly

Run clawdcursor doctor to set up your AI
Run clawdcursor start
Type what you want in plain English
clawdcursor figures out the steps and executes
Works with 13+ providers: Anthropic, OpenAI, Gemini, Groq, Ollama, and more

8

Pipeline layers

$0

Simple tasks

Build with it

Run clawdcursor start
40 tools available over HTTP on localhost (smart tools included)
OpenAI function-calling format
Call individual tools or send full tasks
Browse all schemas at GET /tools

:3847

Localhost

any

HTTP client

3-Stage Pipeline

Text first. Vision last. Any model.

Every task flows through the cheapest path that can handle it. The pre-processor classifies intent, the router handles mechanical actions for free, and the AI stages only fire when reasoning is needed. Provider-agnostic — works with Claude, GPT, Gemini, Kimi, Llama, or any OpenAI-compatible model.

⚡

Stage 1 · SnapshotBuilder

FREE — no LLM cost

Parallel OCR + accessibility tree capture in one pass. Produces a unified perception object with every text element, button, input field, and their exact screen coordinates. Also runs spatial layout analysis — detecting toolbar zones, content area center, sidebars, and status bar from element clustering. The LLM gets a text-based map of the screen without needing vision.

🧠

Stage 2 · TextNavigator ★

CHEAP — any text model ($0.25/1M tokens with Haiku)

Primary reasoning stage. A cheap text LLM reads the OCR snapshot, spatial layout, and app-specific guide, then decides the next action: click, type, drag, key press, scroll, or done. One action per LLM call, verified by re-scanning. Handles 90% of tasks without ever taking a screenshot.

🎯

Stage 3 · VisionFiller

EXPENSIVE — vision model, last resort only

Fires only when Stage 2 can't handle it (CAPTCHAs, complex spatial tasks, visual content). Takes screenshots, sends to a vision LLM, executes tool calls. Supports Anthropic native Computer Use and any OpenAI-compatible vision model (GPT-4o, Gemini, Kimi k2.5).

🔍

Pre-processor + Task Classifier

Before the pipeline runs: a regex parser decomposes compound tasks ("type X, then save as Y" → 2 subtasks). A zero-cost classifier categorizes each subtask — mechanical (router handles it for free), navigation (Ctrl+L + URL), reasoning (Stage 2), or spatial (needs vision). No LLM wasted on tasks a regex can route.

🗺️

Textual Scaffolding (Spatial Layout Analysis)

ClawdCursor's breakthrough: instead of sending screenshots, it analyzes WHERE elements are clustered on screen and builds a text-based spatial map. The LLM reads: Content area: center = (1920, 1200) — CLICK HERE. This lets any text model — even a 7B parameter local model — know exactly where to click in Google Docs, Excel, Paint, or any app. No vision required.

What Makes It Different

Provider-agnostic. Community-driven. Universal.

v0.7.5 is fully provider-agnostic — no hardcoded model names, no provider-specific hacks. 13 providers auto-detected. App knowledge crowdsourced. Display-agnostic across any resolution and DPI.

📖

App Guides — 86+ apps

Community-contributed JSON instruction manuals. Keyboard shortcuts, workflows, UI layout hints, and tips. Excel (116 shortcuts), Paint, Notepad, Outlook, Spotify, Discord, Figma, and more. Install with clawdcursor guides install excel. Loaded automatically when the app is detected.

🖱️

Smart Tools (Blind Agent)

Click buttons by name, type into fields by label, read screen text — all without screenshots. smart_click, smart_type, smart_read use accessibility + OCR fallback automatically.

⌨️

Shortcuts Engine

Built-in keyboard shortcut database. shortcuts_list discovers available shortcuts for any app. shortcuts_execute fires them instantly. Zero LLM cost.

✅

Ground-Truth Verifier

Re-scans the screen after every action. Done verification checks if task keywords appear. Drawing tasks get lenient verification. The agent can't hallucinate success.

🔌

Three Transport Modes

clawdcursor start (full agent + REST), clawdcursor serve (tools only, bring your own brain), clawdcursor mcp (MCP stdio for Claude Code, Cursor, Windsurf, Zed).

🌍

Truly Universal

13 providers auto-detected. Display-agnostic (720p to 5K, any DPI). Cross-platform (Windows, macOS, Linux). Provider-agnostic — zero hardcoded model names. Community-driven app knowledge. If a human can do it on screen, clawdcursor can too.

Get Started

Two commands. That's it.

Install, start. Providers are auto-detected on first run.

PowerShell

# Install
powershell -c "irm https://clawdcursor.com/install.ps1 | iex"

# Start (auto-detects your AI providers on first run)
clawdcursor start

Terminal (zsh)

# Grant Accessibility: System Settings → Privacy → Accessibility → add Terminal

# Install
curl -fsSL https://clawdcursor.com/install.sh | bash

# Start (auto-detects your AI providers on first run)
clawdcursor start

Terminal (bash)

# Install
curl -fsSL https://clawdcursor.com/install.sh | bash

# Start (auto-detects your AI providers on first run)
clawdcursor start

Requires Node.js 20+. Consent is one-time and stored in ~/.clawdcursor/consent. Server binds to localhost only.

Give your AI
eyes and hands

No app integrations. No API keys per service.

🧑‍💻 Tell it what to do

🤖 Connect your AI directly

Three ways to connect

Claude Code / Cursor / Windsurf

Give it a task directly

Build with it

Text first. Vision last. Any model.

Stage 1 · SnapshotBuilder

Stage 2 · TextNavigator ★

Stage 3 · VisionFiller

Pre-processor + Task Classifier

Textual Scaffolding (Spatial Layout Analysis)

Provider-agnostic. Community-driven. Universal.

App Guides — 86+ apps

Smart Tools (Blind Agent)

Shortcuts Engine

Ground-Truth Verifier

Three Transport Modes

Truly Universal

Two commands. That's it.

Give your AI a body.

Give your AIeyes and hands

No app integrations. No API keys per service.

🧑‍💻 Tell it what to do

🤖 Connect your AI directly

Three ways to connect

Claude Code / Cursor / Windsurf

Give it a task directly

Build with it

Text first. Vision last. Any model.

Stage 1 · SnapshotBuilder

Stage 2 · TextNavigator ★

Stage 3 · VisionFiller

Pre-processor + Task Classifier

Textual Scaffolding (Spatial Layout Analysis)

Provider-agnostic. Community-driven. Universal.

App Guides — 86+ apps

Smart Tools (Blind Agent)

Shortcuts Engine

Ground-Truth Verifier

Three Transport Modes

Truly Universal

Two commands. That's it.

Give your AI a body.

Give your AI
eyes and hands