New v0.6 — Record meetings locally · Korean, Japanese & 96 more languages. See what's new →

← Blog

macparakeet-cli after 1.0: local voice for Apple Silicon agents

macparakeet-cli is now installable through Homebrew and ready for Apple Silicon agents that need local speech-to-text, transcript search, and meeting artifacts without cloud STT.

There’s a quiet shift happening in personal AI: the agent moves out of the chat tab and into a daemon.

OpenClaw and Hermes Agent are both part of that shift. They are designed around long-running agents on user-controlled compute rather than one-off chatbot sessions. OpenAI hired OpenClaw’s creator, and OpenClaw remains an open-source project. ClawHub also now defines a normal skill packaging path: a folder with SKILL.md, optional supporting files, and runtime metadata under metadata.openclaw (skill format).

The natural hardware for this kind of agent is often an Apple Silicon Mac mini: quiet, always-on, power efficient, and strong enough to run local models. But that stack still needs a voice layer.

Whisper.cpp is mature, but it does not use the Neural Engine. Cloud speech APIs are fast, but they are paid, metered, and not local-first. Python Parakeet ports are useful for experiments, but they do not give agents a stable Mac-native automation surface with persistence, prompts, and a shared app database.

That is the slot macparakeet-cli is built to fill.

What changed since 1.0

macparakeet-cli 1.0 was the stability commitment: semver, a public compatibility policy, JSON contracts, and enough documentation for agents to call the tool without guessing.

Since then, the standalone Homebrew channel went live:

brew install moona3k/tap/macparakeet-cli
macparakeet-cli --version   # 2.3.1
macparakeet-cli health --json

That matters because an OpenClaw or Hermes agent running on a headless Mac no longer needs MacParakeet.app installed just to get speech-to-text. The CLI can be installed like any other local tool, with Homebrew-managed ffmpeg and yt-dlp dependencies.

The repo integration docs have also been refreshed for agent consumers:

Those docs now explain the current CLI conventions: --json for fixed-shape query/envelope commands, --format json for format-selecting commands, app-default transcription flags for GUI-parity smoke tests, environment-variable API keys for prompt runs, and the network boundaries around local STT, YouTube downloads, optional LLM calls, and telemetry.

One timing note: the public Homebrew artifact today is macparakeet-cli 2.3.1. Current main also contains the next agent-facing conveniences, including macparakeet-cli spec --json and meetings results add for writing externally generated meeting output back as PromptResult rows. Those commands ship with the next app/CLI artifact; the docs are prepared for that surface, but marketplace-style discovery is still a separate publishing step.

How this composes with an agent

              Parakeet TDT 0.6B v3
                       |
              FluidAudio
              CoreML on the Apple Neural Engine
                       |
              +-----------------+
              | MacParakeetCore |
              | STT | DB | LLM  |
              +--------+--------+
                       |
          +------------+-------------+
          |                          |
          v                          v
   macparakeet-cli            MacParakeet.app
   public semver surface      SwiftUI app
          |
   +------+------+-------------------+
   |             |                   |
   v             v                   v
Homebrew   OpenClaw/Hermes     shell automation
install    integration docs     and test harnesses

The CLI is the load-bearing automation surface. The Mac app is one polished client of the same core library. An agent is another client: it shells out to macparakeet-cli, parses stdout, branches on exit code, and reads/writes the same local SQLite database.

What an agent gets today

When you wire macparakeet-cli into an OpenClaw, Hermes, Codex, or generic shell-based agent, the agent gains:

  • Local file transcription. Pass an audio/video path or a YouTube URL and get transcript data back without a cloud STT service.
  • Persistent history. Transcriptions and dictations live in the shared MacParakeet SQLite database at ~/Library/Application Support/MacParakeet/macparakeet.db.
  • Search. Agents can search prior dictations and transcriptions instead of repeatedly asking the user to upload or re-transcribe files.
  • Prompt library access. Built-in and user-defined prompts can run against saved transcripts with a configured LLM provider.
  • Shared defaults. Agents can read and set speech engine, speaker detection, processing mode, audio retention, YouTube quality, and telemetry preferences through the same preferences suite the app uses.
  • No cloud STT. Parakeet TDT runs locally on Apple Silicon via FluidAudio and CoreML.

The basic vocabulary is intentionally boring:

macparakeet-cli health --json
macparakeet-cli transcribe "/path/to/audio.mp3" --format json
macparakeet-cli transcribe "https://www.youtube.com/watch?v=..." --format json
macparakeet-cli history transcriptions --json
macparakeet-cli history search-transcriptions "design review" --json
macparakeet-cli prompts list --json
macparakeet-cli prompts run "Action items" \
  --transcription "<id-or-prefix>" \
  --provider anthropic \
  --api-key-env ANTHROPIC_API_KEY \
  --model claude-sonnet-4-6 \
  --json

That is enough for a Mac mini agent to listen to files, remember transcript history, and produce structured meeting summaries without adding another transcription backend.

What is still not automatic

There is an important distinction between “usable by agents” and “discoverable from a marketplace.”

MacParakeet is usable by Apple Silicon agents today if the agent has shell access and the CLI is installed. But OpenClaw agents will not discover MacParakeet from ClawHub until we publish an actual skill folder with SKILL.md. Hermes has the same shape: a repo doc can tell a Hermes operator what to wire up, but a listing/submission is still needed for registry-style discovery.

So the current state is:

  • Usable today: yes, via Homebrew and the CLI docs.
  • Ready for OpenClaw/Hermes integration work: yes, the repo docs now describe the correct commands and conventions.
  • Published as a registry skill/listing: not yet.

That is the right order. The CLI contract should be boring and documented before the registry wrapper tells other agents to depend on it.

Privacy boundaries

The core promise is unchanged: speech-to-text runs locally.

The CLI does not send audio or transcripts to a cloud STT provider. Network egress is limited to explicit surfaces:

  • YouTube downloads through yt-dlp
  • Optional LLM provider calls when a user asks for generated output and supplies or configures a provider
  • Helper/model/download flows when requested
  • Privacy-safe CLI telemetry that can be disabled with MACPARAKEET_TELEMETRY=0, DO_NOT_TRACK=1, or macparakeet-cli config set telemetry off

That boundary is what makes MacParakeet useful in an agent stack. A local agent that can read your files, calendar, messages, or meetings should not need to upload raw voice to a transcription service just to understand what was said.

Where to go next

Filing issues, submitting integrations, asking questions: github.com/moona3k/macparakeet.