Resemble AI brings Andy Warhol’s voice back for new Netflix docu-series

Mar 11, 2022

A world’s first this week: AI voices were used extensively to deliver end-to-end dialogue within a major entertainment project.

Just two days ago, on March 9, Netflix released a docu-series on Andy Warhol, the American artist, film director, and producer who was a leading figure in the visual art movement known as pop art. Andy Warhol’s famous works include Campbell’s Soup Cans, the Cow Series, and the Marilyn Diptych. In fact, the Andy Warhol Museum in Pittsburgh hosts 900 paintings; approximately 100 sculptures; nearly 2,000 works on paper; more than 1,000 published and unique prints; 4,000 photographs; 60 feature films; 200 Screen Tests; and more than 4,000 videos.

“The Andy Warhol Diaries,” directed by Andrew Rossi and produced by Ryan Murphy, is based on a 1989 book of the same title edited by Pat Hackett. Almost every morning between November 24, 1976 and February 17, 1987 — five days before Andy Warhol’s death, Hackett would call Warhol and transcribe what he had done the day before. The final result was a 807 page book.

Using Resemble AI’s generative voice technology, Andy Warhol’s voice was recreated to recite his own words from the diaries, creating an immersive six-part documentary on the artist’s life.

What does this mean?

With Resemble AI’s synthetic speech engine, Andy Warhol’s voice was not only recreated, but crafted to the performance and requirements of Emmy-nominated director Andrew Rossi. This is the first time that AI voices have been used extensively to deliver end-to-end dialogue within a major entertainment project.

Using Resemble AI’s web platform, Rossi and his team were able to quickly tune and make various iterations of each line in seconds, crafting each line to perfection.

Caption: Iterations of Andy Warhol’s synthetic voice with varying pitch and deliveries created through Resemble AI’s web-based editor.

How was this made possible?

Since most of Andy Warhol’s audio recordings are archived from the 70s and 80s, there isn’t an abundance of audio data available. After sifting through all of the data, Resemble AI accumulated just 3 minutes and 12 seconds of usable data.

Creating a Voice Model from 3 minutes

Resemble AI’s proprietary Deep Learning models are able to recreate voices with minimal data. With a large foundational model, and a modern Deep Learning architecture, Resemble AI’s model is able to adapt to new voices with just a handful of samples.

Through Resemble AI’s neural data pipeline, Andy’s data was cleaned and normalized. After uploading a dataset, Resemble AI’s automated pipeline extracts various features and computes numerous metrics to filter the parts of the dataset for training. Common with other machine learning pipelines, the data that is inputted has a significant impact on how the output is constructed.

Resemble AI exposes the results of the analysis back to the user so that an attempt can be made to rectify as much data as possible.

Adding performance with tunable knobs

Once Andy Warhol’s voice model was ready be consumed, the creative team simply imported all of the lines from “The Andy Warhol Diaries” into the web-based editor. This enabled them to create a baseline of how the AI would predict a sentence.

Using the intuitive web authoring tool, the creative team behind the docu-series could go in and tweak the output to their liking. This could be anything from slowing down portions of the delivery, to creating rising or falling inflections.

A view of Resemble AI’s emotion editor used to create specific styles and tones for delivery.

Adding final touches with style transfer

Although the knobs were satisfactory for the creation of some lines, they weren’t enough to get the delivery exactly the way that the creative team wanted it. Some of the lines needed further tuning and tweaking to get the right emphasis and pronunciation.

This is where Resemble AI’s advanced style transfer technique came into play. With this, the creative team could get reference audio clips of another speaker delivering the sentence and get the output in Andy Warhol’s voice. This amount of flexibility increased productivity, and naturally allowed the creative team to add human-like imperfections to the output which made it far more engaging.

Caption: Style transfer enabled creatives to get an accurate delivery with Andy Warhol’s AI Voice.

The future of entertainment

“Generative audio is one of the most incredible and underutilized areas of AI. It has the ability to change the way that we create and interact with content.” says Resemble AI CEO Zohaib Ahmed. “It opens up new possibilities for entertainment and storytelling. Resemble’s Voice AI platform makes it possible to create entire movies, TV shows, and video games with AI-generated voices. This will allow for more creative freedom and new forms of expression—all rooted in consent and transparency.”

Resemble AI’s technology is being used by some of the largest media companies in the world to create content that was previously impossible. Whether its transferring a voice into dozens of other languages, creating thousands of dynamic personalized messages from celebrities, or creating unique real-time conversational agents, Resemble AI is changing how content is created.

With Resemble AI, creating engaging and high-quality voice content is now easier than ever, enabling content creators to add a whole new level of authenticity to their work, and will add a new level of immersion for the audience.

A guest post by Resemble AI.

Are you a founder in the smart hardware or machine learning sector? Let’s talk!

Contact Ubiquity Ventures

Ubiquity Ventures — led by Sunil Nagaraj — is a seed-stage venture capital firm managing $100 million with a focus on startups transforming real-world physical problems into problems solved with "software beyond the screen". Ubiquity's portfolio includes B2B technology companies that utilize smart hardware or machine learning to solve business problems outside the reach of computers and smartphones.