Building go-elevenlabs

A Go SDK for AI Audio Generation

An AI-Assisted Development Case Study

Using Claude Opus 4.5 with Claude Code

Section 1

Introduction & Overview

What is ElevenLabs and how we approached the SDK

What is ElevenLabs? 🎙️

ElevenLabs is an AI audio platform for realistic audio generation

  • Text-to-Speech - Convert text to realistic speech with multiple voices
  • Speech-to-Text - Transcribe audio with speaker diarization
  • Speech-to-Speech - Voice conversion in real-time
  • Sound Effects - Generate sound effects from text descriptions
  • Music Composition - Generate music from text prompts
  • Voice Design - Create custom AI voices with specific characteristics
  • Real-Time APIs - WebSocket streaming + Twilio phone integration

Goal: Build a comprehensive Go SDK for AI audio and voice agents

Project Scope 📋

Category Services
Core Audio Text-to-Speech, Speech-to-Text, Sound Effects, Music
Voice Voices, Voice Design, Models, Speech-to-Speech
Processing Audio Isolation, Forced Alignment, Text-to-Dialogue
Content Projects, Pronunciation, Dubbing
Real-Time WebSocket TTS, WebSocket STT, Twilio, Phone Numbers
Utility History, User

OpenAPI Spec: 204 operations (~54K lines) | Generated Code: ~330K lines

Output: 44+ Go source files (~8K lines handwritten) + 19 test files

Architecture Overview 🏗️

go-elevenlabs/
├── client.go              # Main client with service accessors
├── texttospeech.go        # Text-to-Speech service wrapper
├── speechtotext.go        # Speech-to-Text + real-time STT
├── speechtospeech.go      # Voice conversion service
├── websockettts.go        # Real-time TTS streaming
├── websocketstt.go        # Real-time STT streaming
├── twilio.go              # Twilio + phone integration
├── music.go               # Music composition + stem separation
├── ttsscript/             # TTS script authoring package
├── voices/                # Voice reference package
├── internal/api/          # ogen-generated API client (~330K lines)
└── docsrc/                # MkDocs documentation site (32 pages)

Key Design Decisions 🎯

1. ogen for API Client Generation

  • Type-safe, no reflection
  • Handles optional/nullable fields correctly
  • Generated from OpenAPI spec (54K lines)

2. Wrapper Services Pattern

  • Clean, idiomatic Go interface
  • Hides ogen complexity from users
  • Provides simplified method signatures

3. Functional Options Pattern

client, err := elevenlabs.NewClient(
    elevenlabs.WithAPIKey("your-api-key"),
    elevenlabs.WithTimeout(5 * time.Minute),
)

Section 2

Implementation Deep Dive

Features, API Coverage, Testing & Documentation

19 Services Implemented ✨

Audio Generation

  • Text-to-Speech
  • Sound Effects
  • Music

Transcription

  • Speech-to-Text
  • Forced Alignment

Voice

  • Voices
  • Voice Design
  • Models
  • Speech-to-Speech

Processing

  • Audio Isolation
  • Text-to-Dialogue

Real-Time

  • WebSocket TTS ⚡
  • WebSocket STT ⚡
  • Twilio Integration
  • Phone Numbers

Content

  • Projects, Dubbing
  • Pronunciation
  • History, User

API Coverage 📊

Coverage Categories Methods
Full TTS, STT, S2S, Voices, Models, History, User, SFX, Alignment, Isolation, Dialogue, Music, Pronunciation ~55
Partial Voice Design, Projects, Dubbing, Phone/Twilio ~20
Not Covered PVC, ConvAI, Knowledge Base, Workspace, MCP ~129

Coverage Highlights

  • Core audio features: Fully covered (TTS, STT, Music, S2S)
  • Real-time streaming: WebSocket TTS + STT for voice agents
  • Phone integration: Twilio calls + phone number management
  • Enterprise features: Not yet covered (Conversational AI agents)

Documentation: Full coverage page with method-level details

Example: Text-to-Speech 💻

// Simple usage
audio, err := client.TextToSpeech().Simple(ctx, voiceID, "Hello world!")

// Full control
resp, err := client.TextToSpeech().Generate(ctx, &elevenlabs.TTSRequest{
    VoiceID: "21m00Tcm4TlvDq8ikWAM",
    Text:    "Hello with custom settings!",
    ModelID: "eleven_multilingual_v2",
    VoiceSettings: &elevenlabs.VoiceSettings{
        Stability:       0.6,
        SimilarityBoost: 0.8,
        Style:           0.1,
        SpeakerBoost:    true,
    },
    OutputFormat: "mp3_44100_192",
})

// Streaming for real-time playback
stream, err := client.TextToSpeech().GenerateStream(ctx, request)

Example: Text-to-Dialogue 🎭

// Generate multi-speaker conversation
audio, err := client.TextToDialogue().Simple(ctx, []elevenlabs.DialogueInput{
    {Text: "Welcome to the show!", VoiceID: hostVoice},
    {Text: "Thanks for having me.", VoiceID: guestVoice},
    {Text: "Let's dive into today's topic.", VoiceID: hostVoice},
})

// With timestamps for video sync
resp, err := client.TextToDialogue().GenerateWithTimestamps(ctx, &elevenlabs.DialogueRequest{
    Inputs: dialogueInputs,
})

for _, seg := range resp.VoiceSegments {
    fmt.Printf("Speaker %s: %.2fs - %.2fs\n", seg.VoiceID, seg.StartTime, seg.EndTime)
}

Use cases: Podcasts, audiobooks, educational content, demos

Testing Strategy 🧪

Test Coverage

Package Test Files Key Tests
Core SDK 10 files Client, TTS, Voices, Models, History
New Services 6 files STT, Alignment, Isolation, Dialogue, VoiceDesign, Music
Utilities 1 file Pronunciation rules, PLS export

Test Types

  • Validation Tests: Required fields, value ranges
  • Service Tests: Service accessibility and initialization
  • Response Tests: Struct initialization and field access
$ go test ./...
ok  github.com/agentplexus/go-elevenlabs    0.270s

$ golangci-lint run
0 issues

Documentation Created 📚

MkDocs Site Structure (28 pages)

  • Getting Started: Installation, configuration, quick start
  • Services (15 pages): All implemented services with examples
  • Guides: LMS courses, pronunciation rules, TTS script authoring
  • Utilities: voices, ttsscript, retryhttp docs
  • API Reference: Client, errors, coverage page

Utility Packages

  • voices/: Pre-made voice constants and metadata
  • ttsscript/: Multilingual script authoring
  • mogo retryhttp: HTTP retry with exponential backoff

Coverage Page

  • All 204 API methods categorized
  • Method-level coverage status with ✓/✗
  • SDK method mapping

Documentation Flow 📖

flowchart LR A["📚 Docs Home"] --> B["🚀 Getting Started"] A --> C["⚙️ Services (15)"] A --> D["📋 API Reference"] A --> E["📖 Guides"] A --> F["💡 Examples"] D --> G["✓/✗ Coverage"] style A fill:#667eea,stroke:#764ba2,color:#fff style B fill:#667eea,stroke:#764ba2,color:#fff style C fill:#667eea,stroke:#764ba2,color:#fff style D fill:#667eea,stroke:#764ba2,color:#fff style E fill:#667eea,stroke:#764ba2,color:#fff style F fill:#667eea,stroke:#764ba2,color:#fff style G fill:#764ba2,stroke:#667eea,color:#fff

Service Docs Include:

  • Basic usage examples
  • Full options with all parameters
  • Response structures
  • Multiple use case examples
  • Best practices

Utility Packages 📦

ttsscript - Script Authoring

script, _ := ttsscript.LoadScript("course.json")
compiler := ttsscript.NewCompiler()
segments, _ := compiler.Compile(script, "en")
jobs := formatter.Format(segments)

voices - Voice Reference

// Use constants instead of IDs
audio, _ := client.TextToSpeech().Simple(
    ctx, voices.Rachel, text)

retryhttp - Retry Transport

import "github.com/grokify/mogo/net/http/retryhttp"

rt := retryhttp.NewWithOptions(
    retryhttp.WithMaxRetries(3),
    retryhttp.WithInitialBackoff(1*time.Second),
    retryhttp.WithLogger(slog.Default()),
)
client, _ := elevenlabs.NewClient(
    elevenlabs.WithHTTPClient(rt.Client()),
)
// Auto-retry on 429, 5xx + injectable logging

Section 3

AI-Assisted Development

Claude Opus 4.5 performance, insights & lessons learned

Claude Opus 4.5 DevEx 🧠

Session Configuration

Setting Value
Model Claude Opus 4.5 (claude-opus-4-5-20251101)
Context Extended (with summarization)
Tools Full Claude Code toolset

Development Approach

  • Iterative implementation with immediate testing
  • Parallel file reads and writes for efficiency
  • Todo tracking for complex multi-step tasks
  • Continuous golangci-lint validation

Session Statistics 📊

Source Analysis

Category Count
OpenAPI Spec 54K lines
Generated Code 330K lines
API Methods 204

Output Created

Category Count
Go Source Files 44+
Handwritten Code ~8K lines
Test Files 19
Doc Pages 32
Services 19
Utility Packages 2 (+mogo)

What Claude Opus 4.5 Handled Well 💪

  1. ogen Type Handling

    • OptString, OptNilString
    • OptInt, OptNilInt
    • OptFloat64, OptNilFloat64
    • Complex oneOf response types
  2. Wrapper Service Design

    • Clean interface over generated code
    • Simplified method signatures
    • Consistent validation patterns
  1. Documentation Generation

    • 15 service documentation pages
    • Comprehensive code examples
    • Best practices sections
    • API coverage analysis
  2. Test Coverage

    • Validation tests
    • Service accessibility tests
    • Response struct tests

Challenges & Solutions 🔧

  1. Challenge 1: ogen Optional Types

    • Issue: Various OptXxx and OptNilXxx types
    • Solution: Careful use of NewOptString() vs NewOptNilString()
  2. Challenge 2: oneOf Response Types

    • Issue: API returns different response types
    • Solution: Type switches to handle variants
    switch r := resp.(type) {
    case *api.TextToSpeechOK:
        return r.Data, nil
    default:
        return nil, &APIError{Message: "unexpected response"}
    }
    
  3. Challenge 3: Large Generated Codebase

    • Issue: 330K lines of generated code
    • Solution: Targeted grep searches for method signatures

Key Takeaways 💡

AI-Assisted SDK Development Insights

  1. Wrapper services provide clean interfaces over generated code
  2. Document coverage explicitly - helps users understand what's available
  3. Test validation thoroughly - required fields, value ranges, error messages
  4. Write docs alongside code - service docs created with implementation
  5. Use todo tracking - essential for multi-file parallel tasks

Result

A production-ready Go SDK with 15 services, comprehensive documentation, and full test coverage

Section 4

Conclusion

Deliverables, future work & resources

Project Deliverables 📦

Deliverable Status
19 Service Wrappers ✅ Complete
Real-Time Services ✅ WebSocket TTS/STT, Twilio
ogen API Client ✅ Complete (204 methods)
Test Suite ✅ Complete (19 test files)
MkDocs Documentation ✅ Complete (32 pages)
API Coverage Page ✅ Complete

Repository: github.com/agentplexus/go-elevenlabs

Future Enhancements 🔮

Priority APIs to Add

  • Conversational AI Agents: Full agent management and conversations
  • Professional Voice Cloning: Train custom voices with samples
  • Voice Library: Discover and share community voices
  • Knowledge Base / RAG: Document management for agent context
  • Workspace Management: Enterprise team features

Community

  • Open for contributions
  • Issues and PRs welcome
  • MIT License

Resources 🔗

  • Repository: github.com/agentplexus/go-elevenlabs
  • Documentation: agentplexus.github.io/go-elevenlabs
  • ElevenLabs: elevenlabs.io/docs
  • Go Package: pkg.go.dev/github.com/agentplexus/go-elevenlabs

Contact

  • GitHub: @agentplexus

Thank You 🙏

go-elevenlabs

A Go SDK for AI Audio Generation

Built with Claude Opus 4.5 + Claude Code

Welcome to Building go-elevenlabs. <break time="500ms"/> A Go SDK for AI Audio Generation. <break time="700ms"/> This is an AI-Assisted Development Case Study, <break time="400ms"/> where we built an entire SDK using Claude Opus 4.5 with Claude Code. <break time="800ms"/>

Section 1: Introduction and Overview. <break time="600ms"/> Let's start by understanding what ElevenLabs is, <break time="300ms"/> and how we approached building this SDK. <break time="800ms"/>

What is ElevenLabs? <break time="500ms"/> ElevenLabs is an AI audio platform that provides cutting-edge audio generation capabilities. <break time="600ms"/> It offers several key features. <break time="400ms"/> Text-to-Speech for converting text to realistic speech with multiple voices. <break time="500ms"/> Speech-to-Text for transcribing audio with speaker diarization. <break time="500ms"/> Sound Effects for generating sound effects from text descriptions. <break time="500ms"/> Music Composition for generating music from prompts. <break time="500ms"/> Voice Design for creating custom AI voices. <break time="500ms"/> And Dubbing for translating and dubbing video content. <break time="600ms"/> Our goal was to build a comprehensive Go SDK wrapping the ElevenLabs API. <break time="800ms"/>

Let's look at the project scope. <break time="500ms"/> The SDK includes 15 service wrappers. <break time="400ms"/> Core audio services like Text-to-Speech, Speech-to-Text, Sound Effects, and Music. <break time="500ms"/> Voice management for Voices, Voice Design, and Models. <break time="500ms"/> Processing services like Audio Isolation, Forced Alignment, and Text-to-Dialogue. <break time="500ms"/> Content management with Projects, Pronunciation, and Dubbing. <break time="500ms"/> And utility services including History and User. <break time="600ms"/> The OpenAPI specification contains 204 API operations across 54,000 lines. <break time="500ms"/> The ogen generator produced over 330,000 lines of typed Go code. <break time="500ms"/> We wrote 37 Go source files with about 6,000 lines of handwritten code. <break time="800ms"/>

Here's the architecture overview. <break time="500ms"/> At the root level, we have the client.go which is the main entry point. <break time="500ms"/> Each service has its own file like texttospeech.go, voices.go, and so on. <break time="500ms"/> Error handling is centralized in errors.go. <break time="500ms"/> The internal/api directory contains the ogen-generated API client with over 330,000 lines. <break time="500ms"/> The docsrc directory contains the MkDocs documentation site. <break time="500ms"/> And the examples directory has usage examples. <break time="800ms"/>

Let me walk you through the key design decisions. <break time="600ms"/> First, we chose ogen for API client generation. <break time="500ms"/> ogen provides type-safe code with no reflection, <break time="400ms"/> and correctly handles optional and nullable fields which are common in the ElevenLabs API. <break time="600ms"/> Second, we used wrapper services over the generated code. <break time="500ms"/> This provides a clean, idiomatic Go interface while hiding ogen complexity. <break time="600ms"/> Third, we used the Functional Options pattern for configuration. <break time="500ms"/> This allows for clean, readable client initialization with optional parameters. <break time="800ms"/>

Section 2: Implementation Deep Dive. <break time="600ms"/> Now let's explore the features, API coverage, testing, and documentation. <break time="800ms"/>

Here are the 15 services we implemented. <break time="500ms"/> Text-to-Speech with streaming and timestamps. <break time="400ms"/> Speech-to-Text with diarization support. <break time="400ms"/> Voices for listing, getting, and managing voices. <break time="400ms"/> Voice Design for generating custom AI voices. <break time="400ms"/> Sound Effects for generating audio from descriptions. <break time="400ms"/> Music for composing music from prompts. <break time="400ms"/> Audio Isolation for extracting vocals. <break time="400ms"/> Forced Alignment for word-level timestamps. <break time="400ms"/> Text-to-Dialogue for multi-speaker conversations. <break time="400ms"/> Dubbing for video translation. <break time="400ms"/> Projects for long-form audio content. <break time="400ms"/> Pronunciation for dictionary management. <break time="400ms"/> History for generation history. <break time="400ms"/> Models for available AI models. <break time="400ms"/> And User for account information. <break time="800ms"/>

Let's look at the API coverage. <break time="500ms"/> The ElevenLabs API has 204 total methods across 25 categories. <break time="600ms"/> We have full coverage of 13 categories. <break time="400ms"/> Text-to-Speech, Speech-to-Text, Voices, Models, History, User, Sound Effects, Forced Alignment, Audio Isolation, Text-to-Dialogue, Music, and Pronunciation. <break time="700ms"/> We have partial coverage of 3 categories. <break time="400ms"/> Voice Design, Projects, and Dubbing. <break time="600ms"/> And 11 categories are not yet covered. <break time="400ms"/> These include Speech-to-Speech, Professional Voice Cloning, Conversational AI, Knowledge Base, and more. <break time="600ms"/> We created a detailed coverage page in the documentation. <break time="800ms"/>

Here's an example of the Text-to-Speech service. <break time="500ms"/> The simple method takes a voice ID and text and returns audio. <break time="500ms"/> The Generate method provides full control with voice settings. <break time="500ms"/> You can set stability, similarity boost, style, and speaker boost. <break time="500ms"/> Streaming methods are also available for real-time playback. <break time="800ms"/>

And here's how the Text-to-Dialogue service works. <break time="500ms"/> You provide an array of dialogue inputs, each with text and a voice ID. <break time="500ms"/> The service generates combined audio with different speakers. <break time="600ms"/> This is great for podcasts, audiobooks, educational content, and demos. <break time="800ms"/>

Our testing approach covers validation and service accessibility. <break time="500ms"/> We test request validation to ensure required fields are checked. <break time="500ms"/> We test service initialization to verify all 15 services are accessible. <break time="500ms"/> And we test response struct initialization. <break time="600ms"/> We have 17 test files covering the SDK. <break time="500ms"/> All tests pass with golangci-lint showing zero issues. <break time="800ms"/>

We created comprehensive documentation. <break time="500ms"/> The MkDocs site includes Getting Started guides for installation, configuration, and quick start. <break time="500ms"/> 15 Service pages covering all implemented services. <break time="500ms"/> API Reference with client documentation, error handling, and coverage details. <break time="500ms"/> Guides for LMS course production and pronunciation rules. <break time="500ms"/> And an Examples page with code samples. <break time="600ms"/> Total of 25 documentation pages created. <break time="800ms"/>

Here's the service documentation flow we created. <break time="500ms"/> Starting from the main documentation, users can navigate to Getting Started for setup. <break time="500ms"/> Then to Services for the 15 service wrappers. <break time="500ms"/> To API Reference for technical details including the coverage page. <break time="500ms"/> To Guides for use case tutorials. <break time="500ms"/> And to Examples for code samples. <break time="600ms"/> This provides a complete learning path for SDK users. <break time="800ms"/>

We also created three utility packages. <break time="500ms"/> The ttsscript package provides structured script authoring for multilingual TTS content. <break time="500ms"/> Instead of storing raw SSML, you author in JSON and compile to any TTS engine format. <break time="600ms"/> The voices package provides constants and metadata for all pre-made ElevenLabs voices. <break time="500ms"/> And the retryhttp package provides HTTP retry with exponential backoff. <break time="500ms"/> It works with any HTTP client and includes injectable logging via slog. <break time="800ms"/>

Section 3: AI-Assisted Development. <break time="600ms"/> Now let's look at Claude Opus 4.5's performance, <break time="300ms"/> and the insights and lessons we learned. <break time="800ms"/>

Let's look at the Claude Opus 4.5 developer experience. <break time="500ms"/> For the session configuration, we used Claude Opus 4.5 model, <break time="400ms"/> with Extended context and summarization to handle the large codebase. <break time="500ms"/> We had access to the full Claude Code toolset. <break time="600ms"/> Our development approach was iterative, <break time="400ms"/> implementing services with immediate testing. <break time="400ms"/> We leveraged parallel file reads and writes for efficiency, <break time="500ms"/> and used todo tracking for complex multi-step tasks. <break time="800ms"/>

Here are the session statistics. <break time="500ms"/> The OpenAPI spec was 54,000 lines. <break time="400ms"/> ogen generated 330,000 lines of typed Go code. <break time="500ms"/> We wrote 37 Go source files with about 6,000 lines of handwritten code. <break time="500ms"/> Created 17 test files. <break time="400ms"/> And 25 documentation pages. <break time="500ms"/> 15 service wrappers were implemented. <break time="500ms"/> 204 API methods were analyzed and categorized. <break time="600ms"/> The entire SDK was built iteratively over multiple sessions. <break time="800ms"/>

What did Claude Opus 4.5 handle particularly well? <break time="500ms"/> First, ogen type handling. <break time="400ms"/> Correctly working with OptString, OptNilString, OptInt, and other complex optional types. <break time="600ms"/> Second, wrapper service design. <break time="400ms"/> Creating clean interfaces that hide generated code complexity. <break time="600ms"/> Third, documentation generation. <break time="400ms"/> Creating comprehensive service docs with examples and best practices. <break time="600ms"/> Fourth, test coverage. <break time="400ms"/> Writing validation tests, service tests, and struct tests for all services. <break time="800ms"/>

Of course, there were challenges along the way. <break time="500ms"/> Challenge 1: ogen optional types. <break time="400ms"/> The generated code uses various OptXxx types that require careful handling. <break time="400ms"/> Solution was to use NewOptString, NewOptNilString appropriately based on the API. <break time="600ms"/> Challenge 2: oneOf response types. <break time="400ms"/> Some API endpoints return different response types. <break time="400ms"/> Solution was to use type switches to handle different response variants. <break time="600ms"/> Challenge 3: Large generated codebase. <break time="400ms"/> 330,000 lines of generated code to navigate. <break time="400ms"/> Solution was to use targeted grep searches and read specific method signatures. <break time="800ms"/>

Let's summarize the key takeaways for AI-assisted SDK development. <break time="500ms"/> First, wrapper services provide clean interfaces. <break time="400ms"/> Don't expose generated code directly to users. <break time="500ms"/> Second, document coverage explicitly. <break time="400ms"/> The coverage page helps users understand what's available. <break time="500ms"/> Third, test validation thoroughly. <break time="400ms"/> Required fields, value ranges, and error messages. <break time="500ms"/> Fourth, write documentation alongside code. <break time="400ms"/> Service docs were created with the implementation. <break time="500ms"/> Fifth, use todo tracking for multi-file tasks. <break time="400ms"/> Creating 6 service docs in parallel was tracked systematically. <break time="800ms"/>

Section 4: Conclusion. <break time="600ms"/> Let's wrap up with the deliverables, future work, and resources. <break time="800ms"/>

Here's a summary of the project deliverables. <break time="500ms"/> 15 Service Wrappers: Complete. <break time="300ms"/> ogen API Client: Complete with 204 methods. <break time="300ms"/> Test Suite: Complete with 17 test files. <break time="300ms"/> MkDocs Documentation: Complete with 25 pages. <break time="300ms"/> API Coverage Page: Complete with method-level details. <break time="300ms"/> CI/CD Pipeline: Complete with GitHub Actions. <break time="500ms"/> All deliverables are available in the repository. <break time="800ms"/>

What about future enhancements? <break time="500ms"/> There are several APIs we could add. <break time="400ms"/> Speech-to-Speech for voice conversion. <break time="400ms"/> Professional Voice Cloning for training custom voices. <break time="400ms"/> Voice Library for discovering community voices. <break time="400ms"/> Conversational AI for agent interactions. <break time="400ms"/> And Workspace Management for enterprise features. <break time="600ms"/> The project is open for contributions. <break time="400ms"/> Issues and pull requests are welcome. <break time="400ms"/> The SDK is released under the MIT License. <break time="800ms"/>

Here are the important links. <break time="500ms"/> The repository is at github.com/agentplexus/go-elevenlabs. <break time="500ms"/> Documentation is at agentplexus.github.io/go-elevenlabs. <break time="500ms"/> ElevenLabs official docs are at elevenlabs.io/docs. <break time="600ms"/> You can find me on GitHub at @agentplexus. <break time="800ms"/>

Thank you for joining this presentation. <break time="500ms"/> go-elevenlabs: A Go SDK for AI Audio Generation. <break time="600ms"/> Built with Claude Opus 4.5 and Claude Code. <break time="800ms"/> Thanks for watching! <break time="800ms"/>