SpeechCraft: AI-Powered Voice Conversion

Generate speech in a specific voice by providing text and a reference audio sample!

Overview

SpeechCraft is an AI tool that allows you to convert written text into spoken audio, mimicking the voice characteristics from a reference audio sample. This is useful for creating voiceovers, narration, or dialogue in a consistent voice without needing to record the voice actor for every line.

SpeechCraft Screenshot

Text Input

Provide the text you want to convert to speech.

Reference Audio

Upload a reference audio file or URL containing the voice you want to use.

AI Voice Cloning

Generate new speech in the style and tone of the reference voice.

Customizable Options

Select model type and remove silence for refined output.

How to Use SpeechCraft

Follow these steps to convert text to speech using a reference voice:

Enter Text to Convert

Input the text you want to convert into speech in the main text area. The estimated credit cost will update as you type.

Provide Reference Audio

Provide the audio sample of the voice you want to use:

  • Upload Reference Audio: Click Choose File to upload an audio file from your device (audio/* formats supported). Maximum file size is 10MB.
  • Reference Audio URL: Enter the URL of an existing audio file online.

Enter Reference Text (Optional)

In the Reference Text field, you can optionally enter the text that is spoken in the reference audio. Providing this can sometimes help the AI better understand and replicate the voice.

Select Model Type

Choose the Model Type from the dropdown menu:

  • F5-TTS: A general-purpose TTS model.
  • E2-TTS: Another TTS model option. Experiment to see which model works best for your specific reference audio and text.

Remove Silence (Optional)

Check the Remove Silence box if you want the AI to automatically detect and remove periods of silence from the generated audio.

Convert to Speech

Click the Convert to Speech button. Ensure you have enough credits. The AI will process your text and reference audio to generate the new speech audio.

Input Parameters and Options

SpeechCraft requires text and a reference audio source, along with adjustable parameters:

Text to convert
string
required

The written content you want to convert into spoken audio.

Upload Reference Audio
file

Upload an audio file containing the voice you want to replicate. Max size: 10MB.

Reference Audio URL
string
required

Enter the URL of an audio file containing the voice you want to replicate.

Reference Text
string

(Optional) The text spoken in the reference audio.

Model Type
Enum

Select the AI model for speech synthesis (F5-TTS or E2-TTS).

Remove Silence
boolean

Toggle to automatically remove silent portions from the output audio.

Generated Audio History

SpeechCraft keeps a history of your generated audio conversions.

Credits

Converting text to speech with SpeechCraft costs 1 credit per 100 characters of the text you enter in the main text area. The estimated cost is displayed below the text input field.

Your current credit balance is displayed at the top left of the interface. Click the Buy More button to purchase additional credits if needed.

Tips for Best Results

High-Quality Reference Audio

Use a clear, high-fidelity audio sample with minimal background noise for the best voice replication.

Accurate Reference Text (Optional)

If providing reference text, ensure it accurately matches the audio sample.

Clear Text to Convert

Ensure the text you want to convert is free of typos and complex formatting.

Experiment with Models

Try both F5-TTS and E2-TTS models to see which provides a better result for your specific voice and text.

Troubleshooting

If you encounter issues with SpeechCraft, consider these solutions:

Conclusion

SpeechCraft provides a powerful and accessible way to generate speech in a desired voice using AI. By providing clear text and a quality reference audio sample, you can create custom voiceovers and audio content with ease.