Appearance
Local API Server
This page covers mere.run api serve, the local API surface exposed by the package.
Public surface
mere.run api servemere.run status
What it is for
The API server lets you expose supported local engines through a local process instead of shelling out to the CLI for every request. It is useful for:
- local automation
- editor tooling
- simple local integrations
- experimenting with the runtime through HTTP
It is not a hosted-service or relay layer. This repo keeps the server local and package-scoped.
Runtime entrypoints
CLI
Sources/MereRunCLI/Commands/APIServeCommand.swift
Supporting stack
Sources/MereRunCLI/Support/Hummingbirdpackage dependency declared inPackage.swift
Example
bash
swift run mere.run api serve --engine text-chat-gemma4In another terminal, confirm that the server is reachable and which model it reports:
bash
swift run mere.run statusNetwork-exposed example:
bash
export MERERUN_API_KEY=change-me
swift run mere.run api serve \
--engine text-chat-gemma4 \
--host 0.0.0.0 \
--port 11434 \
--api-key "$MERERUN_API_KEY" \
--rate-limit-per-minute 120Design notes
- the API server follows the same model-resolution and model-store rules as the rest of the CLI
mere.run statusis the preferred quick check before wiring an editor or agent to a local server- it is intentionally local-first
- it should not reintroduce relay, billing, or hosted-infrastructure concerns
- non-loopback binds require an API key, and the OpenAI-compatible chat route supports basic rate limiting
- chat requests must use
Content-Type: application/json; browser-simple form/text posts are rejected before the request body is processed - chat requests are validated before generation;
max_tokens,max_completion_tokens,temperature, andtop_pmust stay within bounded ranges - LoRA adapters are configured at server startup with
--lora; request bodies cannot select local LoRA paths - streaming and JSON error paths are sanitized so the local server does not reflect raw internal runtime details back to clients
OpenAI chat compatibility
POST /v1/chat/completions accepts the common Chat Completions request shape:
system,developer,user,assistant, andtoolmessages- string content, text content parts, nullable assistant content, and image content parts when the selected engine supports vision
- assistant
tool_callsand tool response messages tools,tool_choice,parallel_tool_callsresponse_formatstream_options.include_usagestop,seed, penalties, logprobs, reasoning controls, and provider-thinking controls as typed request fieldsmax_completion_tokensalongside legacymax_tokens
The server does not silently drop high-impact fields. Native engines either map supported fields into ChatRequest or return an OpenAI-style invalid_request_error before generation. Metadata-style fields such as metadata, user, and service_tier are accepted as request context but do not change local generation.
Engine compatibility:
text-chat-deepseek-v4-flash: raw-proxies the original request body tods4-server, preserving DS4's OpenAI-compatible behavior.text-chat-gemma4: accepts function tools and emits OpenAI tool-call responses when the model generates a tool call.text-chat-q35: accepts function tools and one image content part per message.text-chat-klein: supportsresponse_format: {"type":"json_object"}with local JSON retry behavior.text-code: accepts plain text chat requests and rejects tools, images, reasoning controls, logprobs, seed, stop sequences, and structured outputs with explicit errors.
Streaming responses only emit assistant content tokens. Local progress labels stay in logs/stderr, and stream_options.include_usage adds the final usage chunk before [DONE].
If you are working on this area, read CLI and Runtime Internals after the command source.
Troubleshooting
Start with:
bash
swift run mere.run statusserver: downmeans nothing answered the configured/healthURL.loaded models: unavailable (requires API key)means/healthworked but/v1/modelsneeds--api-keyorMERERUN_API_KEY.- A wrong loaded model usually means another server is already bound to that host and port.