Text-to-Speech Configuration

Route: /ttsSidebar Group: Capabilities Access: Owner (Cross-tenant)

Overview

TTS (Text-to-Speech) allows GoClaw to read agent responses aloud using synthesized voice. The system supports multiple providers and can be configured to automatically play audio based on conditions.

Configuration Guide

1. Select the Primary Provider

Go to /tts and select a provider from the list:

None (Disabled) — disable TTS
OpenAI — use OpenAI TTS API
ElevenLabs — use ElevenLabs API (high-quality voices)
Edge (Free) — use Microsoft Edge TTS (free, no API key required)
MiniMax — use MiniMax TTS API

2. Configure Auto-Apply Mode

Value	Description
`off`	Agent can use TTS tool manually
`always`	All replies get audio
`inbound`	Only when user sends voice/audio
`tagged`	Only when reply contains special tag

3. Reply Mode

Final only — only play audio for the final complete response
All — play audio for all turns in the conversation

4. Limits and Timeout

Max Text Length — maximum text length before truncation (default: 1500 characters)
Timeout (ms) — maximum wait time for API call (default: 30000 ms)

5. Per-Provider Settings

Each provider has its own parameters:

OpenAI TTS

API Key (from env OPENAI_API_KEY)
Model: default gpt-4o-mini-tts
Voice: alloy, echo, fable, onyx, nova, shimmer

ElevenLabs

API Key (from env ELEVENLABS_API_KEY)
Voice ID: default pMsXgVXv3BLzUgSXRplE
Model ID: default eleven_multilingual_v2

Edge TTS

Enabled toggle
Voice: default en-US-MichelleNeural
Speech Rate: reading speed (e.g., +10%, -20%)

MiniMax

API Key + Group ID
Model: default speech-02-hd
Voice ID: select a voice

User Interface (UI)

The TTS configuration page includes:

Status card — displays the active provider and connection status
General Settings — primary provider, auto-apply mode, reply mode, max text length, timeout
Per-provider settings cards — each provider has a dedicated card with corresponding fields
Save button — save all settings
Refresh button — reload from server

Example

Configure TTS using Edge TTS (free), auto-play for inbound messages:

json5

{
  tts: {
    provider: "edge",
    auto: "inbound",
    mode: "final",
    max_length: 1500,
    timeout_ms: 30000,
    edge: {
      enabled: true,
      voice: "vi-VN-HoaiMyNeural",  // Vietnamese voice
      rate: "+0%"
    }
  }
}

Notes

TTS provider API keys are not stored in config.json — use environment variables or .env.local
Edge TTS does not require an API key but depends on internet connectivity
auto: "tagged" requires the agent to return a special tag in its response — check the corresponding agent documentation
When mode: "all", the number of API calls increases significantly — monitor costs with OpenAI/ElevenLabs

Text-to-Speech Configuration ​

Overview ​

Configuration Guide ​

1. Select the Primary Provider ​

2. Configure Auto-Apply Mode ​

3. Reply Mode ​

4. Limits and Timeout ​

5. Per-Provider Settings ​

User Interface (UI) ​

Example ​

Notes ​

See Also ​