macMLX is a native macOS application for running local large language models on Apple Silicon. It provides a SwiftUI GUI, a command-line tool called macmlx, and an OpenAI-compatible HTTP server on localhost:8000 — all powered by Apple's MLX framework via the mlx-swift-lm Swift package. It is open source under the Apache 2.0 license.

Does macMLX require Python?

No. macMLX is fully Swift-native. The default inference path uses the mlx-swift-lm Swift Package Manager dependency in-process. There is no Python runtime, no Electron, and no external binary required.

Which Macs can run macMLX?

macMLX requires macOS 14 (Sonoma) or later and an Apple Silicon chip — M1, M2, M3, or M4 (any variant: base, Pro, Max, Ultra). It does not run on Intel-based Macs.

Is macMLX free and open source?

Yes. macMLX is licensed under Apache 2.0 and its source code is hosted at https://github.com/magicnight/mac-mlx. The application itself is free to download and use.

How does macMLX compare to LM Studio, Ollama, and oMLX?

LM Studio uses Electron and GGUF-format models. Ollama is CLI-only and uses GGUF. oMLX runs MLX natively but exposes a web UI rather than a native Mac app. macMLX is the only tool that combines a genuinely native SwiftUI macOS app, a Swift-native CLI, and MLX-native inference in one project. All three expose an OpenAI-compatible HTTP API.

Can I use macMLX with Cursor, Continue, Cline, Raycast, Zed, or Open WebUI?

Yes. macMLX runs an always-on OpenAI-compatible server at http://localhost:8000/v1 whenever a model is loaded (or whenever `macmlx serve` is running). Any tool that accepts a custom OpenAI base URL — Cursor, Continue, Cline, Raycast, Zed, Open WebUI, and similar — works without modification. The API key can be any non-empty string.

Which LLMs can macMLX run?

macMLX runs any MLX-format model from Hugging Face. The mlx-community organisation publishes hundreds of pre-quantised models including Qwen3, Gemma 3, Llama 3, DeepSeek, Phi 3, Mistral, and SmolLM. Starting v0.4.1, macMLX will also support 16 vision-language model architectures (Qwen2.5-VL, Qwen3-VL, Gemma-3, SmolVLM, Paligemma, Pixtral, and more) via the MLXVLM library already in the dependency tree.

Does macMLX collect telemetry or send data to the cloud?

No. macMLX runs entirely on-device. There is no telemetry, no cloud inference, and no account system. The only outbound network activity is model downloads from Hugging Face (or a user-configured mirror such as hf-mirror.com).

Where does macMLX store its data?

All persistent data lives under ~/.mac-mlx/ in the user's real home directory: models (weights), conversations (chat history JSON), model-params (per-model overrides), downloads (resume-data for interrupted downloads), benchmarks (one JSON per run), logs (Pulse LoggerStore), kv-cache (v0.4.0 tiered prompt cache, 16-way sharded safetensors), settings.json, and macmlx.pid (shared between GUI and CLI). App Sandbox was disabled in v0.3.6 to keep GUI and CLI on the same filesystem view — matches LM Studio, Ollama, and oMLX. Gatekeeper remains the user-trust layer.

Why is the DMG not signed or notarized?

The project does not currently have a paid Apple Developer account (tracked in issue #19). Until that changes, the DMG is unsigned and Gatekeeper will block it on first launch. Users can bypass with `xattr -cr /Applications/macMLX.app` in Terminal, or via Finder right-click → Open.

What is on the roadmap?

v0.3.6 landed 13 user-reported bug fixes plus post-QA hot patches including a collapsible renderer, sandbox-off, an Ollama API compatibility layer, and a FIFO generation semaphore. v0.3.7 is a maintenance release pinning CI to Node.js 24, teeing MLX stdout into the Logs tab, adding HF model-update detection, and sharing a PID file between GUI and CLI. v0.4.0 (in progress) is engine parity with oMLX: tiered KV cache (merged in PR #26), multi-model pool with LRU + pinning (PR #27), and an MCP server MVP (next). v0.4.1 adds Vision-Language Model support via MLXVLM. v0.5 brings continuous batching (blocked on upstream mlx-swift-lm BatchGenerator), LoRA adapter loading, and an MCP client. v0.6 adds speech I/O via DePasqualeOrg/mlx-swift-audio (replaces the original WhisperKit plan). v0.7 introduces an opt-in Community Benchmarks service aggregating submissions by chip × model × quantisation × macOS version.

Which inference engine does macMLX use?

macMLX uses Apple's mlx-swift-lm Swift Package (pinned at 3.31.x) in-process. This includes MLXLLM for text models and MLXVLM for vision-language models (used starting v0.4.1). Subprocess-based engines (SwiftLM for 100B+ MoE models, Python mlx-lm) were originally closed as not planned because App Sandbox blocked spawning external binaries. Sandbox was disabled in v0.3.6, which unblocks both paths — they are candidates for v0.5 or v0.6 but not yet committed.

macMLX — Native macOS LLM inference for Apple Silicon, powered by Apple MLX

Why macMLX

Not another Electron wrapper.

The only tool that gives newcomers a real SwiftUI app AND gives developers a real CLI — both talking to the same in-process MLX engine.

Feature	macMLX	LM Studio	Ollama	oMLX
Native macOS GUI	SwiftUI	Electron	—	Web UI
MLX-native inference	✓	GGUF only	GGUF only	✓
Command-line interface	✓ Swift-native	—	✓	✓
Resumable downloads + HF mirrors	✓	partial	partial	—
OpenAI-compatible API	✓ always on	✓	✓	✓
Zero Python required	✓	✓	✓	—

Three surfaces, one core

Built like a macOS app should be.

MacMLXCore is the Swift SPM package that owns all inference. The GUI, the CLI, and the HTTP server are thin shells over the same protocol.

macMLX.app

SwiftUI, macOS 14+, Apple Silicon only.

Onboarding wizard picks engine + model directory
HuggingFace browser with resumable downloads
Conversation sidebar: rename, delete, rewind-to-here
Parameters Inspector (⌘⌥I) — per-model persistence
Benchmark tab, Logs tab, Menu bar extra
Sparkle EdDSA-signed auto-update

macmlx (CLI)

swift-argument-parser · native ANSI dashboards.

pull · list · run · serve · ps · stop
Honours preferredEngine + per-model parameters from GUI
Unicode progress bars with sub-cell precision
Boxed startup banner · coloured REPL prompt
PIDFile coordination · graceful SIGTERM
JSON output on every command for scripting

OpenAI + Ollama compatible API

Hummingbird 2 · localhost:8000 · SSE + NDJSON streaming.

POST /v1/chat/completions · GET /v1/models · GET /x/status
Ollama API compatibility: /api/tags, /api/chat, /api/generate, /api/show (v0.3.6)
Cold-swap loads any local model on demand (v0.3.3) · multi-model pool (v0.4.0)
FIFO generation semaphore serialises concurrent clients (v0.3.6)
Drop-in for Cursor, Continue, Cline, Raycast, Zed, Open WebUI, Claude Code, Immersive Translate
CORS + request logger + alias routes (v0.3.6) · real RSS on /x/status

Current release

v0.3.7 — maintenance release.

Released April 18, 2026. Four items agreed after the v0.3.6 post-QA pass: CI pinned to Node.js 24, MLX stdout/stderr teed into the Logs tab, HF model-update detection via sidecar, shared PID file between GUI and CLI.

v0.3.7 2026-04-18

Added

StdoutCapture.install() dups STDOUT/STDERR into LogManager at launch — mlx-swift-lm library prints now visible in the Logs tab
HF model-update detection via .macmlx-meta.json sidecar recording Hub commit SHA; throttled once per 24h; orange "Update available" badge
Shared ~/.mac-mlx/macmlx.pid between GUI and CLI with Record.owner enum (.gui | .cli); prevents double-binding :8000

Changed

GitHub Actions pinned to Node.js 24: actions/checkout@v5 + actions/cache@v5 (Node 20 is deprecated)
macmlx ps now shows Owner: GUI | CLI

Context

v0.3.6 shipped same day: 13 user-reported bug fixes, sandbox disabled, CORS + Ollama API compatibility layer with NDJSON streaming, FIFO generation semaphore
v0.4.0 engine-parity work already in flight: KV cache tiering merged (PR #26), ModelPool in review (PR #27), MCP server MVP next
Backward-compat: pre-v0.3.7 PID files (no owner key) decode as .cli; pre-v0.3.7 downloads get no update-check until re-downloaded once

Roadmap

Shipped. Shipping. Next.

Two products, one shared MacMLXCore. Twelve releases since v0.1. Each row below links to the actual tag or plan document.

Shipped

v0.1.0

Initial MVP

Native SwiftUI GUI · menu bar · CLI (serve / pull / run / list / ps / stop) · HuggingFace downloader · OpenAI-compatible API · Sparkle auto-update · memory-aware onboarding.

v0.2.0

Download + chat polish (10 issues)

Resumable downloads survive cancel + app quit · HF mirrors · Markdown rendering · message edit/regenerate · Parameters Inspector (⌘⌥I).

v0.3.0

Benchmark feature + cross-cutting gap-fix

Local benchmark tab (prefill + generation TPS, TTFT, peak RSS, history, Share-to-Community issue template) · 4 CRITICAL + 3 HIGH + 3 MEDIUM gap-fixes from an independent code review · bilingual README.

v0.3.1

Five UX fixes

macmlx list segfault fixed · chat banner flicker fixed · Markdown paragraph breaks preserved · manually-copied models auto-appear · chat toolbar model switcher actually works · max-tokens TextField replaces click-heavy Stepper.

v0.3.2

Conversation sidebar + rewind-to-here

Collapsible sidebar lists saved conversations · inline rename · delete with confirmation · right-click any message → Rewind drops every later message.

v0.3.3

API cold-swap model loading

/v1/chat/completions now auto-loads any locally-downloaded model by ID · concurrent swaps serialised actor-side · OpenAI-style 404 model_not_found error shape.

v0.3.4

Logs tab (native over Pulse LoggerStore)

SwiftUI Table with time / level badge / category / message · search field + level picker · Clear button wipes the on-disk store.

v0.3.5

Native ANSI CLI dashboards; SwiftTUI + PulseUI removed

In-house CLITerm toolkit replaces stub-linked SwiftTUI · PulseUI dropped (ConsoleView is iOS/iPadOS-only) · Logs tab keeps working via direct LoggerStore access.

v0.3.6

13 bug fixes · sandbox off · Ollama API · FIFO generation semaphore

Collapsible <think> renderer (Qwen3 / DeepSeek-R1 / Gemma) · App Sandbox disabled (matches LM Studio / Ollama / oMLX) · CORS + request logger + alias routes · Ollama API compatibility layer (api/tags, api/chat, api/generate, api/show) with NDJSON streaming · FIFO semaphore around every generation path · GUI/CLI state synchronised via LoadHook · chat rendering + sidebar rebuilt.

v0.3.7

Maintenance release — 4 items from v0.3.6 post-QA

CI pinned to Node.js 24 (actions/checkout@v5 + actions/cache@v5) · MLX stdout/stderr teed into the Logs tab via StdoutCapture · HF model-update detection via .macmlx-meta.json sidecar recording Hub commit SHA · shared ~/.mac-mlx/macmlx.pid between GUI and CLI with Owner enum.

In progress

v0.4.0

Engine parity with oMLX — KV cache + model pool + MCP

Pivot from the original VLM-first plan (moved to v0.4.1) after a 2026-04-18 comparison against oMLX (10.6k★). Three independent sub-features, same release — each with low-to-medium risk because the mlx-swift-lm APIs already exist.

Tiered KV cache (hot RAM + cold SSD) — merged in PR #26 · hot LRU dict + cold safetensors at ~/.mac-mlx/kv-cache/ (16-way sharded) via mlx-swift-lm's savePromptCache / loadPromptCache · Settings → KV Cache · reduced TTFT on repeat prefixes (Claude Code / Cursor / Zed)
Multi-model pool with auto-swap — PR #27 · ModelPool actor, [String: InferenceEngine] keyed by model ID, resident-memory cap (default 50% RAM), LRU auto-eviction, per-model pin toggle on Models tab
MCP server MVP (next) — macmlx mcp serve CLI subcommand over stdio via modelcontextprotocol/swift-sdk v0.11.x, exposing list_models + chat tools · Claude Desktop / Cursor drop-in

Next minor

v0.4.1

Vision-Language Model support (#23)

MLXVLM already ships in our mlx-swift-lm dependency — 16 VLM architectures available without a new dependency: Qwen2.5-VL · Qwen3-VL · Gemma-3 · SmolVLM / SmolVLM2 · Paligemma · Pixtral · Idefics3 · FastVLM · LFM2-VL · glm_ocr · mistral3. Original v0.4.0 scope, shifted one dot after the engine-parity pivot.

ChatMessage gains images: [ImageAttachment] (backward-compatible decode)
ModelFormat.mlxVLM tag · auto-detected from config.json's model_type
Image picker in ChatInputView (NSOpenPanel + drag-drop + paste)
OpenAI multimodal content-array parsing in HummingbirdServer · 10MB/image · 4 images/msg cap
Images copied to ~/.mac-mlx/conversations/<uuid>/images/ so conversations stay portable

Later

v0.5

Continuous batching · LoRA adapters · MCP client

Continuous batching blocked on upstream mlx-swift-lm shipping BatchGenerator + BatchKVCache (tracked against Python mlx-lm PRs #941 / #1101) · LoRA adapter loading — drop in existing HF adapters, no training UI · MCP client (counterpart to v0.4.0's server role) — configure external MCP servers via ~/.mac-mlx/mcp.json so chat models tool-call through them.

v0.6

Speech I/O — MLX-native via DePasqualeOrg/mlx-swift-audio

Replaces the original WhisperKit + AVSpeechSynthesizer plan. MLX-native STT (Whisper, Fun-ASR for Chinese) + TTS (Marvis streaming, Chatterbox voice cloning, CosyVoice 2 / 3). Kokoro deliberately excluded to avoid GPL-3 espeak-ng transitive.

v0.7

Community Benchmarks service NEW

Today the Benchmark tab's Share to Community button pre-fills a GitHub issue. Tomorrow: an opt-in remote endpoint receives submissions, aggregates by chip × model × quant × macOS version, and serves a public leaderboard — the data page inside the app and on this website. Inspired by omlx.ai's community benchmarks.

Submission: POST /v1/benchmarks with BenchmarkResult JSON + anonymised HardwareInfo
Opt-in — no data leaves the Mac unless the user explicitly clicks Share
Public browsable leaderboard on this website — filter by chip family, memory, model family, quant
GitHub-issue submission continues as a fallback for users who prefer not to run the remote service

Reopenable after v0.3.6 sandbox removal

#12 / #13

Subprocess-based engines (SwiftLM, Python mlx-lm)

Originally closed as not planned because App Sandbox blocked spawning external binaries. Sandbox was disabled in v0.3.6 — both are candidates for v0.5 or v0.6 but not yet committed. SwiftLM fills the 100B+ MoE gap (Gemma 4 MoE / Llama 4 MoE / DeepSeek-V3); Python mlx-lm adds max model coverage.

#20

Homebrew tap for the CLI

Unblocked once the CLI tarball ships as a release asset. Target v0.4.x.

Still blocked

#19

Signed + notarized DMG

Needs a paid Apple Developer account. Until then, DMG is unsigned — Gatekeeper asks users to run xattr -cr on first launch.

Benchmarks — today & tomorrow

From shared-issue to live leaderboard.

The Benchmark tab already measures prefill + generation tok/s, TTFT, peak RSS, load time, and stores history locally. Sharing is a one-click GitHub-issue pre-fill today. v0.7 plans to turn that submission into data you can query.

Today — v0.3.0

Share to Community

Result is encoded into a pre-filled GitHub issue using benchmark_submission.yml. Review before submit; nothing leaves the Mac until you click Create Issue.

benchmark Benchmark · Qwen3-8B-4bit · M3 Max 64GB

## System
Chip: Apple M3 Max
Memory: 64 GB
macOS: 15.4
Engine: MLX Swift (mlx-swift-lm 3.31.3)

## Result
Model:       Qwen3-8B-4bit
Prefill TPS: 182.4
Gen TPS:     45.1
TTFT:        0.32 s
Peak RSS:    18.2 GB
Load time:   4.8 s
Runs:        3 (median)

v0.7 — Community Benchmarks

Browsable leaderboard (preview)

Remote endpoint stores your opt-in submission, aggregates globally by chip × model × quantisation × macOS version, and publishes a filterable table on this site — plus inside the app so the Benchmark tab can show you how your Mac compares.

Chip Mem Model Gen TPS TTFT N

M4 Max 128G Qwen3-8B-4bit 62.3 0.21 s 47

M3 Max 64G Qwen3-8B-4bit 45.1 0.32 s 118

M3 Pro 36G Qwen3-8B-4bit 31.0 0.45 s 72

M2 Max 64G Qwen3-8B-4bit 28.4 0.51 s 34

M1 Max 32G Qwen3-8B-4bit 22.1 0.64 s 28

Mockup. Real data once v0.7 ships.

Architecture

One protocol. Three consumers. No leaky abstractions.

macMLX.app SwiftUI

macmlx CLI

HTTP clients Cursor · Continue · curl

↓

MacMLXCore Swift SPM · @MainActor / actors · Swift 6 strict concurrency

↓

InferenceEngine protocol

HummingbirdServer localhost:8000/v1

↓

MLXSwiftEngine

mlx-swift-lm 3.31.3 · MLXLLM + MLXVLM (v0.4.1) · in-process · tiered KV cache (v0.4.0) · multi-model pool (v0.4.0)

↓

Apple Silicon Metal · ANE · Unified Memory

Data at `~/.mac-mlx/`

~/.mac-mlx/
├── models/              # weights (default; changeable)
├── conversations/       # chat history JSON
├── model-params/        # per-model overrides
├── downloads/           # resume-data for interrupted downloads
├── benchmarks/          # benchmark results (one JSON per run)
├── kv-cache/            # v0.4.0 tiered prompt cache (16-way sharded)
├── logs/                # Pulse LoggerStore
├── settings.json        # user preferences
└── macmlx.pid           # shared between GUI + CLI (v0.3.7)

Unified real $HOME path so GUI and CLI see the same files. App Sandbox was disabled in v0.3.6 (matches LM Studio / Ollama / oMLX). Gatekeeper remains the user-trust layer; the ~/.mac-mlx/ dotfile keeps the directory visible to power users in Finder.

Quickstart

Running a 4-bit 8B model in 60 seconds.

1

Install

Download the DMG from Releases, drag macMLX.app to /Applications. On first launch, run xattr -cr /Applications/macMLX.app in Terminal (DMG is not notarized yet — issue #19).
2

Onboard

The setup wizard picks ~/.mac-mlx/models as the default model directory and selects the MLX Swift engine. Memory check warns if your Mac has less than the model's recommended RAM.
3

Download + chat

Open the Models tab, switch to Hugging Face, search for a model (try mlx-community/Qwen3-8B-4bit), click Download — progress bar with live speed and ETA, resumable across app quits. Load it from the Local tab, then head to Chat.

# install dev tools, clone, build the CLI
git clone https://github.com/magicnight/mac-mlx && cd mac-mlx
brew bundle
swift build --package-path macmlx-cli -c release

# download, run, serve
macmlx pull mlx-community/Qwen3-8B-4bit     # resumable
macmlx list                                   # local models
macmlx run Qwen3-8B-4bit "Hello, world"      # single prompt
macmlx run Qwen3-8B-4bit                       # interactive REPL
macmlx serve                                   # OpenAI API on :8000
macmlx ps                                      # is serve running?
macmlx stop                                    # graceful SIGTERM

# v0.3.6 preview
macmlx search qwen3 --sort likes --limit 10  # new in v0.3.6
macmlx serve --log-level debug --log-stderr  # new in v0.3.6

# anything OpenAI-compatible works. API key is ignored.
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-8B-4bit",
    "messages": [{"role": "user", "content": "Hi"}],
    "stream": true
  }'

# cold-swap: ask for any locally-downloaded model by ID
# server loads it on demand (v0.3.3+), concurrent swaps serialised
curl http://localhost:8000/v1/chat/completions \
  -d '{"model":"gemma-3-4b-it-qat-4bit","messages":[...]}'

# real RSS reported on status
curl http://localhost:8000/x/status | jq
# { "state": "ready", "model": "Qwen3-8B-4bit", "rss_gb": 18.2, ... }

Special thanks

Standing on shoulders.

macMLX wouldn't exist without these open-source projects. Click through and star them.

MLX Apple · ml-explore

Local LLMs on Apple Silicon, done the macOS-native way.

Not another Electron wrapper.

Built like a macOS app should be.

macMLX.app

macmlx (CLI)

OpenAI + Ollama compatible API

v0.3.7 — maintenance release.

Shipped. Shipping. Next.

Initial MVP

Download + chat polish (10 issues)

Benchmark feature + cross-cutting gap-fix

Five UX fixes

Conversation sidebar + rewind-to-here

API cold-swap model loading

Logs tab (native over Pulse LoggerStore)

Native ANSI CLI dashboards; SwiftTUI + PulseUI removed

13 bug fixes · sandbox off · Ollama API · FIFO generation semaphore

Maintenance release — 4 items from v0.3.6 post-QA

Engine parity with oMLX — KV cache + model pool + MCP

Vision-Language Model support (#23)

Continuous batching · LoRA adapters · MCP client

Speech I/O — MLX-native via DePasqualeOrg/mlx-swift-audio

Community Benchmarks service NEW

Subprocess-based engines (SwiftLM, Python mlx-lm)

Homebrew tap for the CLI

Signed + notarized DMG

From shared-issue to live leaderboard.

Share to Community

Browsable leaderboard (preview)

One protocol. Three consumers. No leaky abstractions.

Running a 4-bit 8B model in 60 seconds.

Install

Onboard

Download + chat

Standing on shoulders.

MLX

mlx-swift-lm

swift-transformers

Hummingbird

Sparkle

Pulse

swift-argument-parser

Swama

oMLX

SwiftLM

WhisperKit

SwiftTUI

Local LLMs on Apple Silicon,
done the macOS-native way.