October 25, 2025

How AgentVibes Works Under the Hood: A Technical Deep Dive

Ever wondered how AgentVibes brings your AI assistant to life with voice responses? Join us for a comprehensive exploration of the architecture, from Claude Code output styles to MCP servers and TTS providers.

architecturetechnicaldeep-divetutorial

By Paul Preibisch

If you've been using AgentVibes to give your Claude AI assistant a voice, you might be curious about what's happening behind the scenes. How does Claude Code know when to speak? How does the personality system work? And what exactly is this MCP server everyone's talking about?

In this deep dive, we'll unpack the entire AgentVibes architecture in a way that makes sense to developers at any level. By the end, you'll understand not just what AgentVibes does, but how it does it.

The Big Picture: What Problem Does AgentVibes Solve?

Before we dive into code, let's understand the problem AgentVibes solves.

Claude Code is an amazing AI coding assistant, but it's entirely text-based. You type a request, Claude responds with text, runs commands, and writes code. But what if Claude could tell you when it's starting a task? What if it could vocally confirm when it's done? What if it could do all this with personality—speaking like a pirate, a zen master, or even sarcastically?

That's exactly what AgentVibes does. It transforms Claude Code from a silent text assistant into a voice-enabled AI companion with character and charm.

Architecture Overview: The Four Core Systems

AgentVibes is built on four interconnected systems:

Output Style System - The AI's instructions for when to speak
Hook System - The bash scripts that generate and play audio
Provider System - The TTS engines (ElevenLabs or Piper)
MCP Server - Natural language control interface

Let's explore each one.

System 1: The Output Style - Teaching Claude When to Speak

What is an Output Style?

In Claude Code, an "output style" is essentially a set of instructions that tells the AI assistant how to format and present its responses. Think of it as a personality overlay that changes Claude's behavior without changing its core capabilities.

AgentVibes provides an output style called "Agent Vibes" (located at .claude/output-styles/agent-vibes.md). This markdown file contains detailed instructions that become part of Claude's system prompt when activated.

The Two-Point Protocol

The core genius of the AgentVibes output style is its Two-Point TTS Protocol:

1. ACKNOWLEDGMENT (Start of task) When Claude receives a user command, it:

Checks current personality/sentiment settings
Generates a unique acknowledgment in that style
Executes the TTS script to speak it
Then proceeds with the actual work

2. COMPLETION (End of task) After completing the task, Claude:

Uses the same personality/sentiment as acknowledgment
Generates a unique completion message
Executes the TTS script again

Here's the critical part from .claude/output-styles/agent-vibes.md:10-20:

### 1. ACKNOWLEDGMENT (Start of task)
After receiving a user command:
1. Check sentiment FIRST: `SENTIMENT=$(cat .claude/tts-sentiment.txt 2>/dev/null)`
2. If no sentiment, check personality: `PERSONALITY=$(cat .claude/tts-personality.txt 2>/dev/null)`
3. Use sentiment if set, otherwise use personality
4. **Generate UNIQUE acknowledgment** - Use AI to create a fresh response in that style
5. Execute TTS: `.claude/hooks/play-tts.sh "[message]" "[VoiceName]"`
6. Proceed with work

Why This Matters

This two-point protocol creates natural conversational flow:

User: "Check git status"
Claude (spoken): "I'll check that for you right away"
Claude (text): runs git status command
Claude (spoken): "Your repository is clean and up to date"

The AI doesn't just blindly execute—it communicates like a helpful assistant would.

Settings Priority System

AgentVibes has a sophisticated three-tier priority system for how Claude should speak:

Priority 0: Language (.claude/tts-language.txt)

Controls which language TTS speaks
Examples: "english", "spanish", "french"
When set to non-English, ALL TTS is in that language

Priority 1: Sentiment (.claude/tts-sentiment.txt)

Applies personality style WITHOUT changing voice
Examples: "sarcastic", "flirty", "professional"
Keeps your current voice but changes speaking style

Priority 2: Personality (.claude/tts-personality.txt)

Changes BOTH voice AND speaking style
Examples: "pirate" = Pirate Marshal voice + pirate speak
Each personality has an assigned voice

The output style checks these in order—if language is set, speak in that language. If sentiment is set, use that style. Otherwise fall back to personality.

System 2: The Hook System - Where the Magic Happens

The hook system is a collection of bash scripts in .claude/hooks/ that do the actual work of generating and playing audio. Let's trace the journey of a TTS request.

The Entry Point: play-tts.sh

When Claude's output style executes .claude/hooks/play-tts.sh "Hello world" "Aria", here's what happens:

File: .claude/hooks/play-tts.sh (the router)

TEXT="$1"          # "Hello world"
VOICE_OVERRIDE="$2"  # "Aria" (optional)

# Get active provider (elevenlabs or piper)
ACTIVE_PROVIDER=$(get_active_provider)

# Route to provider-specific implementation
case "$ACTIVE_PROVIDER" in
  elevenlabs)
    exec "$SCRIPT_DIR/play-tts-elevenlabs.sh" "$TEXT" "$VOICE_OVERRIDE"
    ;;
  piper)
    exec "$SCRIPT_DIR/play-tts-piper.sh" "$TEXT" "$VOICE_OVERRIDE"
    ;;
esac

This script is a provider router. It doesn't generate audio itself—it delegates to the appropriate provider implementation. This is the provider abstraction pattern in action.

Provider Implementations

Each provider has its own script that handles the specifics:

For ElevenLabs (.claude/hooks/play-tts-elevenlabs.sh):

Resolves voice name to voice ID (looks up "Aria" → actual voice ID)
Detects current language setting (for multilingual support)
Makes API call to ElevenLabs with text, voice, and language
Saves audio to temp file
Plays audio using system player (paplay/aplay/mpg123)
Handles SSH detection and audio optimization

For Piper (.claude/hooks/play-tts-piper.sh):

Resolves voice name to Piper model (e.g., "en_US-lessac-medium")
Downloads voice model if not cached
Runs local Piper TTS engine (no API call)
Saves audio to temp file
Plays audio using system player

The Personality Manager

One of the most interesting hooks is personality-manager.sh. Let's see how it works.

File: .claude/hooks/personality-manager.sh:111-244

When you run /agent-vibes:personality pirate, this script:

# 1. Validates personality exists
if [[ ! -f "$PERSONALITIES_DIR/${PERSONALITY}.md" ]]; then
  echo "❌ Personality not found: $PERSONALITY"
  exit 1
fi

# 2. Saves personality to config file
echo "$PERSONALITY" > "$PERSONALITY_FILE"

# 3. Detects active provider (ElevenLabs or Piper)
ACTIVE_PROVIDER=$(cat "$CLAUDE_DIR/tts-provider.txt")

# 4. Reads assigned voice from personality file
if [[ "$ACTIVE_PROVIDER" == "piper" ]]; then
  ASSIGNED_VOICE=$(get_personality_data "$PERSONALITY" "piper_voice")
else
  ASSIGNED_VOICE=$(get_personality_data "$PERSONALITY" "voice")
fi

# 5. Switches to that voice automatically
"$VOICE_MANAGER" switch "$ASSIGNED_VOICE" --silent

# 6. Plays a personality-appropriate acknowledgment
REMARK=$(pick_random_example_from_personality_file)
.claude/hooks/play-tts.sh "$REMARK"

Personality Configuration Files

Each personality is defined in a markdown file like .claude/personalities/pirate.md:

---
name: pirate
description: Seafaring swagger and nautical language
elevenlabs_voice: Pirate Marshal
piper_voice: en_US-joe-medium
---

## AI Instructions
Speak like a classic pirate captain. Use "arr", "matey", "ahoy",
"avast", "ye", "yer", "be" instead of "is/are". Reference sailing,
treasure, the seven seas, and ships.

## Example Responses
- "Arr, I'll be searchin' through yer code for that scurvy bug!"
- "Ahoy! The tests be passin' like a fair wind!"
- "Avast ye! Found the error hidin' in line 42, the sneaky bilge rat!"

The AI reads this file and uses the "AI Instructions" section to generate unique responses in that style. The example responses are just guidance—the AI creates fresh variations each time.

Provider Manager

The provider manager (provider-manager.sh) handles switching between ElevenLabs and Piper:

# Get active provider
get_active_provider() {
  local provider_file=""

  # Check project-local first, then global
  if [[ -f ".claude/tts-provider.txt" ]]; then
    provider_file=".claude/tts-provider.txt"
  elif [[ -f "$HOME/.claude/tts-provider.txt" ]]; then
    provider_file="$HOME/.claude/tts-provider.txt"
  fi

  cat "$provider_file" 2>/dev/null || echo "elevenlabs"
}

# Switch provider
switch_provider() {
  local new_provider="$1"
  echo "$new_provider" > "$CLAUDE_DIR/tts-provider.txt"
  echo "✅ Switched to $new_provider provider"
}

This allows seamless switching between paid (ElevenLabs) and free (Piper) TTS without changing any other configuration.

System 3: The Provider System - Two Engines, One Interface

AgentVibes supports two TTS providers with the same interface:

ElevenLabs Provider

Architecture: Cloud-based API

How it works:

Accepts text, voice name, and language code
Makes HTTPS POST request to ElevenLabs API
Receives MP3 audio stream
Detects if running over SSH (checks $SSH_CONNECTION)
If SSH detected, converts to OGG format (prevents audio corruption)
Plays audio using local audio player

Code snippet from .claude/hooks/play-tts-elevenlabs.sh:

# Make API request
RESPONSE=$(curl -s -X POST \
  "https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}" \
  -H "xi-api-key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{
    \"text\": \"$TEXT\",
    \"model_id\": \"eleven_multilingual_v2\",
    \"language_code\": \"$LANGUAGE_CODE\",
    \"voice_settings\": {
      \"stability\": 0.5,
      \"similarity_boost\": 0.75
    }
  }" \
  --output "$AUDIO_FILE")

# SSH audio optimization
if [[ -n "$SSH_CONNECTION" ]]; then
  # Convert MP3 to OGG to prevent corruption over SSH
  ffmpeg -i "$AUDIO_FILE" -c:a libopus -b:a 128k "$OGG_FILE"
  AUDIO_FILE="$OGG_FILE"
fi

# Play audio
paplay "$AUDIO_FILE" 2>/dev/null || aplay "$AUDIO_FILE"

Piper Provider

Architecture: Local neural TTS

How it works:

Accepts text and voice model name
Downloads voice model if not cached (stored in ~/.local/share/piper/)
Runs Piper engine locally (no internet required)
Generates WAV audio
Plays audio using local audio player

Code snippet from .claude/hooks/play-tts-piper.sh:

# Check if voice model exists
VOICE_PATH="$HOME/.local/share/piper/voices/${VOICE}.onnx"

if [[ ! -f "$VOICE_PATH" ]]; then
  # Download voice model
  "$SCRIPT_DIR/piper-download-voices.sh" "$VOICE"
fi

# Generate speech locally
echo "$TEXT" | piper \
  --model "$VOICE_PATH" \
  --output_file "$AUDIO_FILE"

# Play audio
paplay "$AUDIO_FILE" 2>/dev/null || aplay "$AUDIO_FILE"

Why Two Providers?

ElevenLabs:

✅ Superior voice quality
✅ 150+ voices with distinct characters
✅ Perfect multilingual support (29 languages)
❌ Requires API key and paid plan
❌ Needs internet connection
❌ API costs per character

Piper:

✅ Completely free
✅ Works offline
✅ No API key needed
✅ 50+ voices
❌ Moderate voice quality
❌ Basic multilingual support
❌ Requires local installation

By supporting both, AgentVibes lets users choose based on their priorities: quality vs. cost.

System 4: The MCP Server - Natural Language Control

The Model Context Protocol (MCP) server is AgentVibes' newest feature. It exposes all AgentVibes functionality through a standardized protocol that AI assistants can use.

What is MCP?

MCP is a protocol that allows AI assistants to discover and use external tools. Think of it like REST API for AI assistants—instead of manually typing commands like /agent-vibes:switch Aria, you can just say "Switch to Aria voice" and the AI figures out the right tool to call.

The MCP Server Architecture

File: mcp-server/server.py (Python implementation)

class AgentVibesServer:
    """MCP Server for AgentVibes TTS functionality"""

    def __init__(self):
        # Find the .claude directory (where hooks live)
        self.claude_dir = self._find_claude_dir()
        self.hooks_dir = self.claude_dir / "hooks"

    async def text_to_speech(
        self,
        text: str,
        voice: Optional[str] = None,
        personality: Optional[str] = None,
        language: Optional[str] = None,
    ) -> str:
        """Convert text to speech using AgentVibes"""

        # Temporarily set personality if specified
        if personality:
            await self._run_script(
                "personality-manager.sh",
                ["set", personality]
            )

        # Temporarily set language if specified
        if language:
            await self._run_script(
                "language-manager.sh",
                ["set", language]
            )

        # Call the TTS script
        args = ["bash", str(self.hooks_dir / "play-tts.sh"), text]
        if voice:
            args.append(voice)

        # Execute asynchronously (non-blocking)
        result = await asyncio.create_subprocess_exec(
            *args,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )

        return "✅ Audio played successfully"

How MCP Tools are Registered

The server registers tools that the AI can discover:

@server.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="text_to_speech",
            description="Speak text using AgentVibes TTS",
            inputSchema={
                "type": "object",
                "properties": {
                    "text": {"type": "string"},
                    "voice": {"type": "string", "optional": True},
                    "personality": {"type": "string", "optional": True},
                    "language": {"type": "string", "optional": True},
                },
            },
        ),
        Tool(name="switch_voice", ...),
        Tool(name="list_voices", ...),
        Tool(name="set_personality", ...),
        # ... 20+ more tools
    ]

MCP in Action

When you say "Switch to Aria voice" in Claude Desktop with AgentVibes MCP installed:

Claude receives your natural language request
Claude sees the switch_voice tool is available
Claude calls: switch_voice(voice_name="Aria")
MCP server executes: bash .claude/hooks/voice-manager.sh switch Aria
Voice manager saves "Aria" to .claude/tts-voice.txt
MCP server returns: "✅ Switched to Aria voice"
Claude responds to you with confirmation

You never had to know the slash command syntax or where files are stored!

Project-Specific vs Global Settings

One clever feature of the MCP server is how it handles settings:

# Determine where to save settings based on context
cwd = Path.cwd()

if (cwd / ".claude").is_dir() and cwd != self.agentvibes_root:
    # Real Claude Code project with .claude directory
    env["CLAUDE_PROJECT_DIR"] = str(cwd)
    # Settings will be saved to project's .claude/
else:
    # Claude Desktop, Warp, or non-project context
    # Settings will be saved to ~/.claude/

This means:

In Claude Code projects: Settings are project-specific (each project can have different voice/personality)
In Claude Desktop/Warp: Settings are global (consistent across all conversations)

Data Flow: Following a TTS Request From Start to Finish

Let's trace a complete request to see how all systems work together.

Scenario: You ask Claude Code to "Check git status" with the pirate personality active.

Step 1: Output Style Triggers Acknowledgment

Claude's output style instructions kick in:

1. Check personality setting:
   - Reads .claude/tts-personality.txt → "pirate"

2. Read personality configuration:
   - Reads .claude/personalities/pirate.md
   - Extracts AI instructions: "Speak like a classic pirate captain..."

3. Generate unique acknowledgment:
   - AI creates: "Arr matey, I'll be checkin' yer git status right away!"

4. Execute TTS:
   - Calls: .claude/hooks/play-tts.sh "Arr matey, I'll be checkin' yer git status right away!"

Step 2: TTS Router Determines Provider

play-tts.sh routes the request:

# Read active provider
ACTIVE_PROVIDER=$(cat .claude/tts-provider.txt) → "elevenlabs"

# Route to ElevenLabs implementation
exec .claude/hooks/play-tts-elevenlabs.sh "$TEXT" "$VOICE"

Step 3: ElevenLabs Provider Generates Audio

play-tts-elevenlabs.sh does the heavy lifting:

# 1. Resolve voice
VOICE_NAME="Pirate Marshal"  # from pirate.md
VOICE_ID=$(lookup_voice_id "$VOICE_NAME") → "abc123xyz789"

# 2. Detect language
LANGUAGE_CODE=$(cat .claude/tts-language.txt) → "en"

# 3. Call ElevenLabs API
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/$VOICE_ID" \
  -H "xi-api-key: $API_KEY" \
  -d '{"text": "Arr matey, I'\''ll be checkin'\''..."}' \
  --output /tmp/tts_12345.mp3

# 4. Check if over SSH
if [[ -n "$SSH_CONNECTION" ]]; then
  # Convert MP3 to OGG to prevent corruption
  ffmpeg -i /tmp/tts_12345.mp3 /tmp/tts_12345.ogg
  AUDIO_FILE=/tmp/tts_12345.ogg
fi

# 5. Play audio
paplay /tmp/tts_12345.ogg

Step 4: Claude Proceeds With Task

Claude runs the git status command while audio plays in parallel (non-blocking).

Step 5: Output Style Triggers Completion

After task completes:

1. Generate completion message:
   - AI creates: "Yer repository be clean as a whistle, captain!"

2. Execute TTS:
   - Calls: .claude/hooks/play-tts.sh "Yer repository be clean as a whistle, captain!"

3. Same flow as Step 2-3 repeats

The entire flow takes ~2-3 seconds for acknowledgment and completion combined.

Installation Architecture: How AgentVibes Gets Installed

When you run npx agentvibes install --yes, here's what happens:

Step 1: NPM Package Execution

# NPM downloads AgentVibes package to cache
~/.npm/_npx/[hash]/node_modules/agentvibes/

# NPM executes the bin script
./bin/agent-vibes install --yes

Step 2: Installer Script Runs

File: src/installer.js

The installer:

Detects installation location (current directory or global ~/.claude/)
Creates .claude/ directory structure
Copies all files from package:
- Commands → .claude/commands/agent-vibes/
- Hooks → .claude/hooks/
- Personalities → .claude/personalities/
- Output styles → .claude/output-styles/
Makes all bash scripts executable (chmod +x)
Creates default configuration files

Directory Structure Created

.claude/
├── commands/
│   └── agent-vibes/
│       ├── agent-vibes.md              # Main command file
│       ├── switch.md                   # /agent-vibes:switch
│       ├── list.md                     # /agent-vibes:list
│       ├── personality.md              # /agent-vibes:personality
│       └── ... (50+ command files)
├── hooks/
│   ├── play-tts.sh                     # Main TTS router
│   ├── play-tts-elevenlabs.sh         # ElevenLabs implementation
│   ├── play-tts-piper.sh               # Piper implementation
│   ├── personality-manager.sh          # Personality system
│   ├── voice-manager.sh                # Voice switching
│   ├── provider-manager.sh             # Provider switching
│   ├── language-manager.sh             # Language settings
│   └── ... (20+ hook scripts)
├── personalities/
│   ├── pirate.md
│   ├── flirty.md
│   ├── sarcastic.md
│   ├── zen.md
│   └── ... (19 personality files)
├── output-styles/
│   └── agent-vibes.md                  # Output style instructions
├── tts-voice.txt                       # Current voice (e.g., "Aria")
├── tts-personality.txt                 # Current personality (e.g., "pirate")
├── tts-provider.txt                    # Current provider (e.g., "elevenlabs")
└── tts-language.txt                    # Current language (e.g., "english")

Step 3: Post-Install (MCP Dependencies)

If installing for MCP use:

# Install Python dependencies
cd mcp-server/
pip install -r requirements.txt
# Installs: mcp (MCP SDK), aiosqlite, etc.

Configuration Storage: Where Settings Live

AgentVibes uses simple text files for configuration. This makes it easy to understand, debug, and even manually edit.

Project-Local vs Global

Project-Local (.claude/ in project directory):

Used when working in a Claude Code project
Settings are specific to that project
Example: /home/user/my-app/.claude/tts-voice.txt

Global (~/.claude/ in home directory):

Used for Claude Desktop, Warp, and when no project .claude/ exists
Settings are shared across all sessions
Example: /home/user/.claude/tts-voice.txt

Configuration Files

| File | Purpose | Example Value | |------|---------|---------------| | tts-voice.txt | Current voice name | Aria | | tts-personality.txt | Current personality | pirate | | tts-sentiment.txt | Current sentiment (optional) | sarcastic | | tts-provider.txt | Active TTS provider | elevenlabs | | tts-language.txt | TTS language | spanish |

Reading Configuration in Code

The hooks use a consistent pattern:

# Check project-local first, fallback to global
get_current_voice() {
  if [[ -f ".claude/tts-voice.txt" ]]; then
    cat ".claude/tts-voice.txt"
  elif [[ -f "$HOME/.claude/tts-voice.txt" ]]; then
    cat "$HOME/.claude/tts-voice.txt"
  else
    echo "Aria"  # Default
  fi
}

This ensures settings are found regardless of context.

Advanced Features Deep Dive

Language Learning Mode

One of AgentVibes' coolest features is language learning mode. When enabled, every TTS message plays twice—once in your main language, then again in your target language.

How it works:

The output style is modified to call TTS twice:

# First call - main language (English)
.claude/hooks/play-tts.sh "I'll check that for you"

# Second call - target language (Spanish)
.claude/hooks/play-tts.sh "Lo verificaré para ti" "es_ES-davefx-medium"

The translation happens via API (if using ElevenLabs multilingual voices) or by using language-specific Piper voices.

SSH Audio Optimization

AgentVibes automatically detects SSH sessions and optimizes audio:

# Detect SSH
if [[ -n "$SSH_CONNECTION" ]]; then
  IS_SSH=true
fi

if [[ "$IS_SSH" == "true" ]]; then
  # Convert MP3 to OGG with Opus codec
  # This prevents audio corruption over SSH tunnels
  ffmpeg -i "$MP3_FILE" -c:a libopus -b:a 128k "$OGG_FILE"
  AUDIO_FILE="$OGG_FILE"
fi

Why? MP3 streaming over SSH can have corruption. OGG/Opus format is more robust for network transmission.

BMAD Plugin Integration

AgentVibes can integrate with the BMAD METHOD (a multi-agent framework). When a BMAD agent activates, AgentVibes automatically switches to that agent's assigned voice.

How it works:

BMAD agent activates (e.g., /BMad:agents:pm for project manager)
BMAD writes agent ID to .bmad-agent-context file
AgentVibes output style checks this file
If BMAD plugin is enabled, looks up voice in .claude/plugins/bmad-voices.md
Automatically switches to that voice

This creates the illusion of multiple distinct AI personalities in conversations.

Performance Considerations

Non-Blocking Audio Playback

TTS requests run asynchronously—Claude doesn't wait for audio to finish before continuing work:

# Play audio in background
paplay "$AUDIO_FILE" &

# Claude continues immediately
# (runs git status, writes code, etc.)

This means acknowledgment audio plays while Claude is already working on your task.

Audio Caching

AgentVibes saves audio files temporarily:

AUDIO_FILE="/tmp/agentvibes_tts_${RANDOM}_${TIMESTAMP}.mp3"

Files are kept for the duration of the session, allowing the /agent-vibes:replay command to work. Cleanup happens automatically when terminal session ends.

Provider Performance

ElevenLabs:

API latency: ~500-1000ms
Audio quality: Excellent (256kbps MP3)
Bandwidth: ~2KB per second of audio

Piper:

Generation latency: ~200-500ms (local)
Audio quality: Good (22kHz WAV)
Bandwidth: None (offline)

Text Length Limits

AgentVibes limits text length to prevent issues:

# Truncate long text
if [ ${#TEXT} -gt 500 ]; then
  TEXT="${TEXT:0:497}..."
fi

This prevents:

Excessive API costs (ElevenLabs charges per character)
Slow generation (long audio takes time to produce)
User confusion (very long TTS messages are hard to follow)

Error Handling and Resilience

AgentVibes has multiple layers of error handling:

API Failure Handling

# Try ElevenLabs API
RESPONSE=$(curl -s -X POST "$API_ENDPOINT" ...)

if [[ $? -ne 0 ]] || [[ ! -f "$AUDIO_FILE" ]]; then
  echo "⚠️  TTS request failed (API error or network issue)"
  exit 1
fi

If the API fails, error is logged but doesn't crash Claude Code—the task continues without audio.

Missing Configuration Graceful Degradation

# If no voice configured, use default
VOICE=$(cat .claude/tts-voice.txt 2>/dev/null || echo "Aria")

# If no personality configured, use normal
PERSONALITY=$(cat .claude/tts-personality.txt 2>/dev/null || echo "normal")

Missing files don't cause crashes—sensible defaults are used.

Provider Fallback

If Piper isn't installed, AgentVibes can guide installation:

if ! command -v piper &> /dev/null; then
  echo "❌ Piper not installed"
  echo "   Install with: /agent-vibes:provider install piper"
  exit 1
fi

Clear error messages help users fix issues themselves.

Testing and Quality Assurance

AgentVibes includes a test suite:

# Run tests
npm test

# This executes
bats test/unit/*.bats

Test files validate:

Voice resolution (name → ID mapping)
Personality file parsing
Provider switching logic
Configuration file handling

Conclusion: The Bigger Picture

AgentVibes demonstrates several important software engineering principles:

1. Separation of Concerns

Output style (when to speak) is separate from hooks (how to speak)
Provider abstraction (ElevenLabs vs Piper) is separate from voice management
MCP server is separate from core functionality

2. Provider Pattern

Multiple TTS engines behind a single interface
Easy to add new providers (OpenAI TTS, Google TTS, etc.)

3. Configuration as Data

Simple text files instead of complex databases
Easy to version control, debug, and manually edit

4. Progressive Enhancement

Core functionality works with minimal setup
Advanced features (MCP, BMAD, language learning) layer on top
Graceful degradation when features aren't available

5. User Experience First

Natural language control (MCP) instead of memorizing commands
Instant feedback (acknowledgment/completion)
Personality makes it fun, not just functional

Whether you're building your own AI integrations, designing CLI tools, or just curious about how AgentVibes works, I hope this deep dive has given you a comprehensive understanding of the architecture.

The beauty of AgentVibes isn't just that it makes Claude talk—it's that it does so with a clean, maintainable, extensible architecture that other developers can learn from and build upon.

What's Next?

Now that you understand how AgentVibes works under the hood, you might want to:

Create custom personalities - Edit .claude/personalities/*.md files
Extend the MCP server - Add new tools in mcp-server/server.py
Build custom output styles - Create your own instructions in .claude/output-styles/
Contribute to the project - Submit PRs on GitHub

Happy coding, and may your AI assistant always speak with personality! 🎤✨

The Big Picture: What Problem Does AgentVibes Solve?

Architecture Overview: The Four Core Systems

System 1: The Output Style - Teaching Claude When to Speak

What is an Output Style?

The Two-Point Protocol

Why This Matters

Settings Priority System

System 2: The Hook System - Where the Magic Happens

The Entry Point: play-tts.sh

Provider Implementations

The Personality Manager

Personality Configuration Files

Provider Manager

System 3: The Provider System - Two Engines, One Interface

ElevenLabs Provider

Piper Provider

Why Two Providers?

System 4: The MCP Server - Natural Language Control

What is MCP?

The MCP Server Architecture

How MCP Tools are Registered

MCP in Action

Project-Specific vs Global Settings

Data Flow: Following a TTS Request From Start to Finish

Step 1: Output Style Triggers Acknowledgment

Step 2: TTS Router Determines Provider

Step 3: ElevenLabs Provider Generates Audio

Step 4: Claude Proceeds With Task

Step 5: Output Style Triggers Completion

Installation Architecture: How AgentVibes Gets Installed

Step 1: NPM Package Execution

Step 2: Installer Script Runs

Directory Structure Created

Step 3: Post-Install (MCP Dependencies)

Configuration Storage: Where Settings Live

Project-Local vs Global

Configuration Files

Reading Configuration in Code

Advanced Features Deep Dive

Language Learning Mode

SSH Audio Optimization

BMAD Plugin Integration

Performance Considerations

Non-Blocking Audio Playback

Audio Caching

Provider Performance

Text Length Limits

Error Handling and Resilience

API Failure Handling

Missing Configuration Graceful Degradation

Provider Fallback

Testing and Quality Assurance

Conclusion: The Bigger Picture

What's Next?

Ready to give your AI a voice?