VoxBirdAI | Sr. Software Engineer & Voice Synthesis Lead

Introduction

I was initially hired at VoxBird AI as a Senior Web Developer, leveraging my extensive experience with React, JavaScript, TailwindCSS, and AWS services (including Lambda, S3, and more). However, after the team discovered my machine learning expertise, my role quickly evolved.

Voice Synthesis Leadership

I was appointed to lead the voice synthesis teams, where I built and trained sophisticated voice models and datasets to replicate the voices of several high-profile celebrities.

Snoop Dogg, 50 Cent, Cardi B, Bailey Zimmerman, Donald Trump, Dr. Dre

Innovative Testing: "AI or Not?"

I pioneered a novel approach to test our voice models in an unbiased way. The system, which I called 'AI or Not?', presented users with various audio samples and challenged them to determine whether each sample was generated by AI or was an authentic recording.

I implemented a sophisticated feedback system that collected user responses from the 'AI or Not?' tests and fed this data directly back into our training pipeline. This continuous improvement loop allowed us to systematically identify and eliminate audio artifacts that were giving away the AI-generated nature of our samples, resulting in increasingly natural-sounding voice synthesis.

Technical Innovations

Feedback loop system for model improvement

Fine-tuning methodology for artifact elimination

Custom ElevenLabs integration tuning system

ElevenLabs Integration Enhancement

Developed specialized tuning system surpassing standard implementations:

Parameter optimization algorithms based on voice characteristics

Custom post-processing filters for naturalness

Voice-specific tuning profiles for maximum quality

Conclusion

After a productive year at VoxBird AI, I decided to resign from my position. I continue to wish the team the best of luck in their future endeavors and innovations in the AI space.