Building go-elevenlabs

Category	Services
Core Audio	Text-to-Speech, Speech-to-Text, Sound Effects, Music
Voice	Voices, Voice Design, Models, Speech-to-Speech
Processing	Audio Isolation, Forced Alignment, Text-to-Dialogue
Content	Projects, Pronunciation, Dubbing
Real-Time	WebSocket TTS, WebSocket STT, Twilio, Phone Numbers
Utility	History, User

Coverage	Categories	Methods
Full ✓	TTS, STT, S2S, Voices, Models, History, User, SFX, Alignment, Isolation, Dialogue, Music, Pronunciation	~55
Partial ✓	Voice Design, Projects, Dubbing, Phone/Twilio	~20
Not Covered ✗	PVC, ConvAI, Knowledge Base, Workspace, MCP	~129

Package	Test Files	Key Tests
Core SDK	10 files	Client, TTS, Voices, Models, History
New Services	6 files	STT, Alignment, Isolation, Dialogue, VoiceDesign, Music
Utilities	1 file	Pronunciation rules, PLS export

Setting	Value
Model	Claude Opus 4.5 (`claude-opus-4-5-20251101`)
Context	Extended (with summarization)
Tools	Full Claude Code toolset

Category	Count
OpenAPI Spec	54K lines
Generated Code	330K lines
API Methods	204

Category	Count
Go Source Files	44+
Handwritten Code	~8K lines
Test Files	19
Doc Pages	32
Services	19
Utility Packages	2 (+mogo)

Deliverable	Status
19 Service Wrappers	Complete
Real-Time Services	WebSocket TTS/STT, Twilio
ogen API Client	Complete (204 methods)
Test Suite	Complete (19 test files)
MkDocs Documentation	Complete (32 pages)
API Coverage Page	Complete

Welcome to Building go-elevenlabs. <break time="500ms"/> A Go SDK for AI Audio Generation. <break time="700ms"/> This is an AI-Assisted Development Case Study, <break time="400ms"/> where we built an entire SDK using Claude Opus 4.5 with Claude Code. <break time="800ms"/>

Section 1: Introduction and Overview. <break time="600ms"/> Let's start by understanding what ElevenLabs is, <break time="300ms"/> and how we approached building this SDK. <break time="800ms"/>

What is ElevenLabs? <break time="500ms"/> ElevenLabs is an AI audio platform that provides cutting-edge audio generation capabilities. <break time="600ms"/> It offers several key features. <break time="400ms"/> Text-to-Speech for converting text to realistic speech with multiple voices. <break time="500ms"/> Speech-to-Text for transcribing audio with speaker diarization. <break time="500ms"/> Sound Effects for generating sound effects from text descriptions. <break time="500ms"/> Music Composition for generating music from prompts. <break time="500ms"/> Voice Design for creating custom AI voices. <break time="500ms"/> And Dubbing for translating and dubbing video content. <break time="600ms"/> Our goal was to build a comprehensive Go SDK wrapping the ElevenLabs API. <break time="800ms"/>

Let's look at the project scope. <break time="500ms"/> The SDK includes 15 service wrappers. <break time="400ms"/> Core audio services like Text-to-Speech, Speech-to-Text, Sound Effects, and Music. <break time="500ms"/> Voice management for Voices, Voice Design, and Models. <break time="500ms"/> Processing services like Audio Isolation, Forced Alignment, and Text-to-Dialogue. <break time="500ms"/> Content management with Projects, Pronunciation, and Dubbing. <break time="500ms"/> And utility services including History and User. <break time="600ms"/> The OpenAPI specification contains 204 API operations across 54,000 lines. <break time="500ms"/> The ogen generator produced over 330,000 lines of typed Go code. <break time="500ms"/> We wrote 37 Go source files with about 6,000 lines of handwritten code. <break time="800ms"/>

Here's the architecture overview. <break time="500ms"/> At the root level, we have the client.go which is the main entry point. <break time="500ms"/> Each service has its own file like texttospeech.go, voices.go, and so on. <break time="500ms"/> Error handling is centralized in errors.go. <break time="500ms"/> The internal/api directory contains the ogen-generated API client with over 330,000 lines. <break time="500ms"/> The docsrc directory contains the MkDocs documentation site. <break time="500ms"/> And the examples directory has usage examples. <break time="800ms"/>

Let me walk you through the key design decisions. <break time="600ms"/> First, we chose ogen for API client generation. <break time="500ms"/> ogen provides type-safe code with no reflection, <break time="400ms"/> and correctly handles optional and nullable fields which are common in the ElevenLabs API. <break time="600ms"/> Second, we used wrapper services over the generated code. <break time="500ms"/> This provides a clean, idiomatic Go interface while hiding ogen complexity. <break time="600ms"/> Third, we used the Functional Options pattern for configuration. <break time="500ms"/> This allows for clean, readable client initialization with optional parameters. <break time="800ms"/>

Section 2: Implementation Deep Dive. <break time="600ms"/> Now let's explore the features, API coverage, testing, and documentation. <break time="800ms"/>

Here are the 15 services we implemented. <break time="500ms"/> Text-to-Speech with streaming and timestamps. <break time="400ms"/> Speech-to-Text with diarization support. <break time="400ms"/> Voices for listing, getting, and managing voices. <break time="400ms"/> Voice Design for generating custom AI voices. <break time="400ms"/> Sound Effects for generating audio from descriptions. <break time="400ms"/> Music for composing music from prompts. <break time="400ms"/> Audio Isolation for extracting vocals. <break time="400ms"/> Forced Alignment for word-level timestamps. <break time="400ms"/> Text-to-Dialogue for multi-speaker conversations. <break time="400ms"/> Dubbing for video translation. <break time="400ms"/> Projects for long-form audio content. <break time="400ms"/> Pronunciation for dictionary management. <break time="400ms"/> History for generation history. <break time="400ms"/> Models for available AI models. <break time="400ms"/> And User for account information. <break time="800ms"/>

Let's look at the API coverage. <break time="500ms"/> The ElevenLabs API has 204 total methods across 25 categories. <break time="600ms"/> We have full coverage of 13 categories. <break time="400ms"/> Text-to-Speech, Speech-to-Text, Voices, Models, History, User, Sound Effects, Forced Alignment, Audio Isolation, Text-to-Dialogue, Music, and Pronunciation. <break time="700ms"/> We have partial coverage of 3 categories. <break time="400ms"/> Voice Design, Projects, and Dubbing. <break time="600ms"/> And 11 categories are not yet covered. <break time="400ms"/> These include Speech-to-Speech, Professional Voice Cloning, Conversational AI, Knowledge Base, and more. <break time="600ms"/> We created a detailed coverage page in the documentation. <break time="800ms"/>

Here's an example of the Text-to-Speech service. <break time="500ms"/> The simple method takes a voice ID and text and returns audio. <break time="500ms"/> The Generate method provides full control with voice settings. <break time="500ms"/> You can set stability, similarity boost, style, and speaker boost. <break time="500ms"/> Streaming methods are also available for real-time playback. <break time="800ms"/>

And here's how the Text-to-Dialogue service works. <break time="500ms"/> You provide an array of dialogue inputs, each with text and a voice ID. <break time="500ms"/> The service generates combined audio with different speakers. <break time="600ms"/> This is great for podcasts, audiobooks, educational content, and demos. <break time="800ms"/>

Our testing approach covers validation and service accessibility. <break time="500ms"/> We test request validation to ensure required fields are checked. <break time="500ms"/> We test service initialization to verify all 15 services are accessible. <break time="500ms"/> And we test response struct initialization. <break time="600ms"/> We have 17 test files covering the SDK. <break time="500ms"/> All tests pass with golangci-lint showing zero issues. <break time="800ms"/>

We created comprehensive documentation. <break time="500ms"/> The MkDocs site includes Getting Started guides for installation, configuration, and quick start. <break time="500ms"/> 15 Service pages covering all implemented services. <break time="500ms"/> API Reference with client documentation, error handling, and coverage details. <break time="500ms"/> Guides for LMS course production and pronunciation rules. <break time="500ms"/> And an Examples page with code samples. <break time="600ms"/> Total of 25 documentation pages created. <break time="800ms"/>

Here's the service documentation flow we created. <break time="500ms"/> Starting from the main documentation, users can navigate to Getting Started for setup. <break time="500ms"/> Then to Services for the 15 service wrappers. <break time="500ms"/> To API Reference for technical details including the coverage page. <break time="500ms"/> To Guides for use case tutorials. <break time="500ms"/> And to Examples for code samples. <break time="600ms"/> This provides a complete learning path for SDK users. <break time="800ms"/>

We also created three utility packages. <break time="500ms"/> The ttsscript package provides structured script authoring for multilingual TTS content. <break time="500ms"/> Instead of storing raw SSML, you author in JSON and compile to any TTS engine format. <break time="600ms"/> The voices package provides constants and metadata for all pre-made ElevenLabs voices. <break time="500ms"/> And the retryhttp package provides HTTP retry with exponential backoff. <break time="500ms"/> It works with any HTTP client and includes injectable logging via slog. <break time="800ms"/>

Section 3: AI-Assisted Development. <break time="600ms"/> Now let's look at Claude Opus 4.5's performance, <break time="300ms"/> and the insights and lessons we learned. <break time="800ms"/>

Let's look at the Claude Opus 4.5 developer experience. <break time="500ms"/> For the session configuration, we used Claude Opus 4.5 model, <break time="400ms"/> with Extended context and summarization to handle the large codebase. <break time="500ms"/> We had access to the full Claude Code toolset. <break time="600ms"/> Our development approach was iterative, <break time="400ms"/> implementing services with immediate testing. <break time="400ms"/> We leveraged parallel file reads and writes for efficiency, <break time="500ms"/> and used todo tracking for complex multi-step tasks. <break time="800ms"/>

Here are the session statistics. <break time="500ms"/> The OpenAPI spec was 54,000 lines. <break time="400ms"/> ogen generated 330,000 lines of typed Go code. <break time="500ms"/> We wrote 37 Go source files with about 6,000 lines of handwritten code. <break time="500ms"/> Created 17 test files. <break time="400ms"/> And 25 documentation pages. <break time="500ms"/> 15 service wrappers were implemented. <break time="500ms"/> 204 API methods were analyzed and categorized. <break time="600ms"/> The entire SDK was built iteratively over multiple sessions. <break time="800ms"/>

What did Claude Opus 4.5 handle particularly well? <break time="500ms"/> First, ogen type handling. <break time="400ms"/> Correctly working with OptString, OptNilString, OptInt, and other complex optional types. <break time="600ms"/> Second, wrapper service design. <break time="400ms"/> Creating clean interfaces that hide generated code complexity. <break time="600ms"/> Third, documentation generation. <break time="400ms"/> Creating comprehensive service docs with examples and best practices. <break time="600ms"/> Fourth, test coverage. <break time="400ms"/> Writing validation tests, service tests, and struct tests for all services. <break time="800ms"/>

Of course, there were challenges along the way. <break time="500ms"/> Challenge 1: ogen optional types. <break time="400ms"/> The generated code uses various OptXxx types that require careful handling. <break time="400ms"/> Solution was to use NewOptString, NewOptNilString appropriately based on the API. <break time="600ms"/> Challenge 2: oneOf response types. <break time="400ms"/> Some API endpoints return different response types. <break time="400ms"/> Solution was to use type switches to handle different response variants. <break time="600ms"/> Challenge 3: Large generated codebase. <break time="400ms"/> 330,000 lines of generated code to navigate. <break time="400ms"/> Solution was to use targeted grep searches and read specific method signatures. <break time="800ms"/>

Let's summarize the key takeaways for AI-assisted SDK development. <break time="500ms"/> First, wrapper services provide clean interfaces. <break time="400ms"/> Don't expose generated code directly to users. <break time="500ms"/> Second, document coverage explicitly. <break time="400ms"/> The coverage page helps users understand what's available. <break time="500ms"/> Third, test validation thoroughly. <break time="400ms"/> Required fields, value ranges, and error messages. <break time="500ms"/> Fourth, write documentation alongside code. <break time="400ms"/> Service docs were created with the implementation. <break time="500ms"/> Fifth, use todo tracking for multi-file tasks. <break time="400ms"/> Creating 6 service docs in parallel was tracked systematically. <break time="800ms"/>

Section 4: Conclusion. <break time="600ms"/> Let's wrap up with the deliverables, future work, and resources. <break time="800ms"/>

Here's a summary of the project deliverables. <break time="500ms"/> 15 Service Wrappers: Complete. <break time="300ms"/> ogen API Client: Complete with 204 methods. <break time="300ms"/> Test Suite: Complete with 17 test files. <break time="300ms"/> MkDocs Documentation: Complete with 25 pages. <break time="300ms"/> API Coverage Page: Complete with method-level details. <break time="300ms"/> CI/CD Pipeline: Complete with GitHub Actions. <break time="500ms"/> All deliverables are available in the repository. <break time="800ms"/>

What about future enhancements? <break time="500ms"/> There are several APIs we could add. <break time="400ms"/> Speech-to-Speech for voice conversion. <break time="400ms"/> Professional Voice Cloning for training custom voices. <break time="400ms"/> Voice Library for discovering community voices. <break time="400ms"/> Conversational AI for agent interactions. <break time="400ms"/> And Workspace Management for enterprise features. <break time="600ms"/> The project is open for contributions. <break time="400ms"/> Issues and pull requests are welcome. <break time="400ms"/> The SDK is released under the MIT License. <break time="800ms"/>

Here are the important links. <break time="500ms"/> The repository is at github.com/agentplexus/go-elevenlabs. <break time="500ms"/> Documentation is at agentplexus.github.io/go-elevenlabs. <break time="500ms"/> ElevenLabs official docs are at elevenlabs.io/docs. <break time="600ms"/> You can find me on GitHub at @agentplexus. <break time="800ms"/>

Thank you for joining this presentation. <break time="500ms"/> go-elevenlabs: A Go SDK for AI Audio Generation. <break time="600ms"/> Built with Claude Opus 4.5 and Claude Code. <break time="800ms"/> Thanks for watching! <break time="800ms"/>

Building go-elevenlabs

A Go SDK for AI Audio Generation

Section 1

Introduction & Overview

What is ElevenLabs?

Project Scope

Architecture Overview

Key Design Decisions

1. ogen for API Client Generation

2. Wrapper Services Pattern

3. Functional Options Pattern

Section 2

Implementation Deep Dive

19 Services Implemented

API Coverage

Coverage Highlights

Example: Text-to-Speech

Example: Text-to-Dialogue

Testing Strategy

Test Coverage

Test Types

Documentation Created

MkDocs Site Structure (28 pages)

Utility Packages

Coverage Page

Documentation Flow

Utility Packages

ttsscript - Script Authoring

voices - Voice Reference

retryhttp - Retry Transport

Section 3

AI-Assisted Development

Claude Opus 4.5 DevEx

Session Configuration

Development Approach

Session Statistics

Source Analysis

Output Created

What Claude Opus 4.5 Handled Well

Challenges & Solutions

Challenge 1: ogen Optional Types

Challenge 2: oneOf Response Types

Challenge 3: Large Generated Codebase

Key Takeaways

AI-Assisted SDK Development Insights

Result

Section 4

Conclusion

Project Deliverables

Future Enhancements

Priority APIs to Add

Community

Resources

Links

Contact

Thank You

go-elevenlabs