Convert text to speech using a reference audio voice with advanced AI.
Generate speech in a specific voice by providing text and a reference audio sample!
SpeechCraft is an AI tool that allows you to convert written text into spoken audio, mimicking the voice characteristics from a reference audio sample. This is useful for creating voiceovers, narration, or dialogue in a consistent voice without needing to record the voice actor for every line.
Provide the text you want to convert to speech.
Upload a reference audio file or URL containing the voice you want to use.
Generate new speech in the style and tone of the reference voice.
Select model type and remove silence for refined output.
Follow these steps to convert text to speech using a reference voice:
Enter Text to Convert
Input the text you want to convert into speech in the main text area. The estimated credit cost will update as you type.
Provide Reference Audio
Provide the audio sample of the voice you want to use:
Enter Reference Text (Optional)
In the Reference Text field, you can optionally enter the text that is spoken in the reference audio. Providing this can sometimes help the AI better understand and replicate the voice.
Select Model Type
Choose the Model Type from the dropdown menu:
Remove Silence (Optional)
Check the Remove Silence box if you want the AI to automatically detect and remove periods of silence from the generated audio.
Convert to Speech
Click the Convert to Speech button. Ensure you have enough credits. The AI will process your text and reference audio to generate the new speech audio.
SpeechCraft requires text and a reference audio source, along with adjustable parameters:
The written content you want to convert into spoken audio.
Upload an audio file containing the voice you want to replicate. Max size: 10MB.
Enter the URL of an audio file containing the voice you want to replicate.
(Optional) The text spoken in the reference audio.
Select the AI model for speech synthesis (F5-TTS
or E2-TTS
).
Toggle to automatically remove silent portions from the output audio.
SpeechCraft keeps a history of your generated audio conversions.
View History
Your generated audio will appear in the Generated Audio History section below the main interface.
Playback
Listen to your generated audio directly in the history table.
Download
Download individual audio files (WAV format).
Clear All
Clear all items from your history.
Converting text to speech with SpeechCraft costs 1 credit per 100 characters of the text you enter in the main text area. The estimated cost is displayed below the text input field.
Your current credit balance is displayed at the top left of the interface. Click the Buy More button to purchase additional credits if needed.
Use a clear, high-fidelity audio sample with minimal background noise for the best voice replication.
If providing reference text, ensure it accurately matches the audio sample.
Ensure the text you want to convert is free of typos and complex formatting.
Try both F5-TTS and E2-TTS models to see which provides a better result for your specific voice and text.
If you encounter issues with SpeechCraft, consider these solutions:
Generated Voice Doesn't Match Reference
Ensure your reference audio is clear and the voice is prominent. Try a different reference audio sample. Experiment with the Model Type.
Processing Errors
Ensure your reference audio file is in a supported format and under the 10MB size limit if uploading. Check your internet connection.
Audio Quality Issues
Try a different reference audio sample. Experiment with the Model Type.
SpeechCraft provides a powerful and accessible way to generate speech in a desired voice using AI. By providing clear text and a quality reference audio sample, you can create custom voiceovers and audio content with ease.