Speech Runtime

This page covers speech synthesis, transcription, and voice-profile management.

Public surface

mere.run speech synthesize
mere.run speech transcribe
mere.run speech profile list
mere.run speech profile create
mere.run speech profile delete

Model families

Text-to-speech

speech-tts-qwen3-nano
speech-tts-qwen3-customvoice

Speech-to-text

speech-asr-qwen3
speech-asr-parakeet

Typical workflows

Synthesize speech

bash

swift run mere.run speech synthesize \
  "Hello from mere.run" \
  --output ./hello.wav

Transcribe audio

bash

swift run mere.run speech transcribe ./hello.wav --backend auto

Manage voice profiles

bash

swift run mere.run speech profile list
swift run mere.run speech profile create ./reference.wav --name narrator
swift run mere.run speech profile delete narrator

Runtime entrypoints

CLI

Sources/MereRunCLI/Commands/SpeechSynthesizeCommand.swift
Sources/MereRunCLI/Commands/SpeechTranscribeCommand.swift
Sources/MereRunCLI/Commands/SpeechProfileListCommand.swift
Sources/MereRunCLI/Commands/SpeechProfileCreateCommand.swift
Sources/MereRunCLI/Commands/SpeechProfileDeleteCommand.swift

TTS runtime

Sources/AudioTTS/Qwen3TTS/Qwen3TTSGenerator.swift
Sources/AudioTTS/Qwen3TTS/Qwen3TTSGenerator+Loading.swift
Sources/AudioTTS/Qwen3TTS/Qwen3TTSGenerator+Generation.swift
Sources/AudioTTS/Qwen3TTS/Qwen3TTSGenerator+Support.swift

Tokenizer internals:

Sources/AudioTTS/Qwen3TTS/Qwen3TTSSpeechTokenizer.swift
Sources/AudioTTS/Qwen3TTS/Qwen3TTSSpeechTokenizer+Encoder.swift
Sources/AudioTTS/Qwen3TTS/Qwen3TTSSpeechTokenizer+Decoder.swift

STT runtime

Sources/AudioSTT/Qwen3ASR/Qwen3ASRGenerator.swift
Sources/AudioSTT/Parakeet/ParakeetGenerator.swift

How speech synthesis flows

At a high level:

the CLI resolves the chosen TTS model
optional profile or voice configuration is loaded
text is converted into the model’s intermediate token representation
the generator produces waveform data
audio is written to disk through the codec/output helpers

How transcription flows

audio input is decoded into the expected local format
the selected backend loads its model components
the backend produces a transcript
the CLI prints or writes the output

Notes for contributors

speech code spans both AudioTTS and AudioSTT; do not assume it all lives under MereRunCore
profile management is CLI-facing but depends on the same canonical model and model-store conventions as the rest of the repo

See Architecture Reading Map for a recommended reading order.

Speech Runtime ​

Public surface ​

Model families ​

Text-to-speech ​

Speech-to-text ​

Typical workflows ​

Synthesize speech ​

Transcribe audio ​

Manage voice profiles ​

Runtime entrypoints ​

CLI ​

TTS runtime ​

STT runtime ​

How speech synthesis flows ​

How transcription flows ​

Notes for contributors ​

Speech Runtime

Public surface

Model families

Text-to-speech

Speech-to-text

Typical workflows

Synthesize speech

Transcribe audio

Manage voice profiles

Runtime entrypoints

CLI

TTS runtime

STT runtime

How speech synthesis flows

How transcription flows

Notes for contributors