SpeechCraft: AI-Powered Voice Conversion

Generate speech in a specific voice by providing text and a reference audio sample!

Overview

SpeechCraft is an AI tool that allows you to convert written text into spoken audio, mimicking the voice characteristics from a reference audio sample. This is useful for creating voiceovers, narration, or dialogue in a consistent voice without needing to record the voice actor for every line.

Text Input

Provide the text you want to convert to speech.

Reference Audio

Upload a reference audio file or URL containing the voice you want to use.

AI Voice Cloning

Generate new speech in the style and tone of the reference voice.

Customizable Options

Select model type and remove silence for refined output.

How to Use SpeechCraft

Follow these steps to convert text to speech using a reference voice:

Enter Text to Convert

Input the text you want to convert into speech in the main text area. The estimated credit cost will update as you type.

Provide Reference Audio

Provide the audio sample of the voice you want to use:

Upload Reference Audio: Click Choose File to upload an audio file from your device (audio/* formats supported). Maximum file size is 10MB.
Reference Audio URL: Enter the URL of an existing audio file online.

Enter Reference Text (Optional)

In the Reference Text field, you can optionally enter the text that is spoken in the reference audio. Providing this can sometimes help the AI better understand and replicate the voice.

Select Model Type

Choose the Model Type from the dropdown menu:

F5-TTS: A general-purpose TTS model.
E2-TTS: Another TTS model option. Experiment to see which model works best for your specific reference audio and text.

Remove Silence (Optional)

Check the Remove Silence box if you want the AI to automatically detect and remove periods of silence from the generated audio.

Convert to Speech

Click the Convert to Speech button. Ensure you have enough credits. The AI will process your text and reference audio to generate the new speech audio.

Input Parameters and Options

SpeechCraft requires text and a reference audio source, along with adjustable parameters:

Text to convert

string

required

The written content you want to convert into spoken audio.

Upload Reference Audio

file

Upload an audio file containing the voice you want to replicate. Max size: 10MB.

Reference Audio URL

string

required

Enter the URL of an audio file containing the voice you want to replicate.

Reference Text

string

(Optional) The text spoken in the reference audio.

Model Type

Enum

Select the AI model for speech synthesis (F5-TTS or E2-TTS).

Remove Silence

boolean

Toggle to automatically remove silent portions from the output audio.

Generated Audio History

SpeechCraft keeps a history of your generated audio conversions.

View History

Your generated audio will appear in the Generated Audio History section below the main interface.

Playback

Listen to your generated audio directly in the history table.

Download

Download individual audio files (WAV format).

Clear All

Clear all items from your history.

Credits

Converting text to speech with SpeechCraft costs 1 credit per 100 characters of the text you enter in the main text area. The estimated cost is displayed below the text input field.

Your current credit balance is displayed at the top left of the interface. Click the Buy More button to purchase additional credits if needed.

Tips for Best Results

High-Quality Reference Audio

Use a clear, high-fidelity audio sample with minimal background noise for the best voice replication.

Accurate Reference Text (Optional)

If providing reference text, ensure it accurately matches the audio sample.

Clear Text to Convert

Ensure the text you want to convert is free of typos and complex formatting.

Experiment with Models

Try both F5-TTS and E2-TTS models to see which provides a better result for your specific voice and text.

Troubleshooting

If you encounter issues with SpeechCraft, consider these solutions:

Generated Voice Doesn't Match Reference

Ensure your reference audio is clear and the voice is prominent. Try a different reference audio sample. Experiment with the Model Type.

Processing Errors

Ensure your reference audio file is in a supported format and under the 10MB size limit if uploading. Check your internet connection.

Audio Quality Issues

Try a different reference audio sample. Experiment with the Model Type.

Conclusion

SpeechCraft provides a powerful and accessible way to generate speech in a desired voice using AI. By providing clear text and a quality reference audio sample, you can create custom voiceovers and audio content with ease.

Get Started with AI Tutor

AI Tutor

Pixio

Account

Machine

Pixio API Endpoint

AI Tutor RAG API Endpoint

AI Tutor API Endpoint

SpeechCraft

SpeechCraft: AI-Powered Voice Conversion

Overview

Text Input

Reference Audio

AI Voice Cloning

Customizable Options

How to Use SpeechCraft

Input Parameters and Options

Generated Audio History

Credits

Tips for Best Results

High-Quality Reference Audio

Accurate Reference Text (Optional)

Clear Text to Convert

Experiment with Models

Troubleshooting

Conclusion

Get Started with AI Tutor

AI Tutor

Pixio

Account

Machine

Pixio API Endpoint

AI Tutor RAG API Endpoint

AI Tutor API Endpoint

​SpeechCraft: AI-Powered Voice Conversion

​Overview

Text Input

Reference Audio

AI Voice Cloning

Customizable Options

​How to Use SpeechCraft

​Input Parameters and Options

​Generated Audio History

​Credits

​Tips for Best Results

High-Quality Reference Audio

Accurate Reference Text (Optional)

Clear Text to Convert

Experiment with Models

​Troubleshooting

​Conclusion

SpeechCraft: AI-Powered Voice Conversion

Overview

How to Use SpeechCraft

Input Parameters and Options

Generated Audio History

Credits

Tips for Best Results

Troubleshooting

Conclusion