Audio Generation

Summary: XBRUSH's audio generation provides four features: text-to-speech (TTS), AI music composition, sound effects, and lip-sync. You can generate various types of audio content using text or video as input.

What is Audio Generation?

Audio generation is XBRUSH's feature for creating voice, music, sound effects, and lip-sync audio using AI. It supports four sub-features: TTS for reading text aloud, music composition in your desired genre, automatic background sound generation for videos, and lip-sync that combines an image with audio.

Audio Generation Overview

Audio generation overview

Shows the key features and usage of the audio generation function.

Select the Audio Generation tab on the generation screen.
Audio and video files may be required depending on the feature.
Four sub-features are available: Voice / Music / Sound Effects / Lip-sync.
Voice reads text aloud.
Music composes a piece of music based on a text description.
Sound Effects generates background music that matches an uploaded video.
Lip-sync applies a given audio track to a video, synchronizing the lip movements.

Voice (Text To Speech)

Voice generation

Shows the key features and usage of the text-to-speech function.

Converts text to speech.
Multiple options are available, including male/female voices and voice tone.
Enter the text you want read aloud and click Generate.

Music

Music generation

Shows the key features and usage of the music generation function.

Generates music.
Enter your desired genre, instruments, and mood as text, then generate.
For example, entering "Compose a fingerstyle piece for classical guitar with a melody" will produce a guitar instrumental.

Sound Effects

Sound effect generation

Shows the key features and usage of the sound effect generation function.

Generates background music.
Upload the video you want background music for, then click Generate.
The system creates background music that fits the uploaded video.

Lip-sync

Lip-sync generation

Shows the key features and usage of the lip-sync generation function.

Combines audio with an image to create a synchronized video.
Upload an image and add the desired voice audio to generate a lip-synced video.

Next Steps

Video Generation

Editor — Create presentations using images and videos

Publish — Review visibility settings, content review, and important notes

What is Audio Generation?​

Audio Generation Overview​

Voice (Text To Speech)​

Music​

Sound Effects​

Lip-sync​