Appearance
Speech Runtime
This page covers speech synthesis, transcription, and voice-profile management.
Public surface
mere.run speech synthesizemere.run speech transcribemere.run speech profile listmere.run speech profile createmere.run speech profile delete
Model families
Text-to-speech
speech-tts-qwen3-nanospeech-tts-qwen3-customvoice
Speech-to-text
speech-asr-qwen3speech-asr-parakeet
Typical workflows
Synthesize speech
bash
swift run mere.run speech synthesize \
"Hello from mere.run" \
--output ./hello.wavTranscribe audio
bash
swift run mere.run speech transcribe ./hello.wav --backend autoManage voice profiles
bash
swift run mere.run speech profile list
swift run mere.run speech profile create ./reference.wav --name narrator
swift run mere.run speech profile delete narratorRuntime entrypoints
CLI
Sources/MereRunCLI/Commands/SpeechSynthesizeCommand.swiftSources/MereRunCLI/Commands/SpeechTranscribeCommand.swiftSources/MereRunCLI/Commands/SpeechProfileListCommand.swiftSources/MereRunCLI/Commands/SpeechProfileCreateCommand.swiftSources/MereRunCLI/Commands/SpeechProfileDeleteCommand.swift
TTS runtime
Sources/AudioTTS/Qwen3TTS/Qwen3TTSGenerator.swiftSources/AudioTTS/Qwen3TTS/Qwen3TTSGenerator+Loading.swiftSources/AudioTTS/Qwen3TTS/Qwen3TTSGenerator+Generation.swiftSources/AudioTTS/Qwen3TTS/Qwen3TTSGenerator+Support.swift
Tokenizer internals:
Sources/AudioTTS/Qwen3TTS/Qwen3TTSSpeechTokenizer.swiftSources/AudioTTS/Qwen3TTS/Qwen3TTSSpeechTokenizer+Encoder.swiftSources/AudioTTS/Qwen3TTS/Qwen3TTSSpeechTokenizer+Decoder.swift
STT runtime
Sources/AudioSTT/Qwen3ASR/Qwen3ASRGenerator.swiftSources/AudioSTT/Parakeet/ParakeetGenerator.swift
How speech synthesis flows
At a high level:
- the CLI resolves the chosen TTS model
- optional profile or voice configuration is loaded
- text is converted into the model’s intermediate token representation
- the generator produces waveform data
- audio is written to disk through the codec/output helpers
How transcription flows
- audio input is decoded into the expected local format
- the selected backend loads its model components
- the backend produces a transcript
- the CLI prints or writes the output
Notes for contributors
- speech code spans both
AudioTTSandAudioSTT; do not assume it all lives underMereRunCore - profile management is CLI-facing but depends on the same canonical model and model-store conventions as the rest of the repo
See Architecture Reading Map for a recommended reading order.