In this article we will discuss how to convert text to speech using Python.

Table of contents


Introduction

The text-to-speech (TTS) conversion along with speech synthesis became increasingly popular with the growth of programming communities.

There are currently several Python libraries that allow for this functionality and are continuously maintained and have new features added to them.

To continue following this tutorial we will need the following Python library: pyttsx3.

If you don’t have it installed, please open “Command Prompt” (on Windows) and install it using the following code:


pip install pyttsx3

Basic text to speech conversion using Python

The basic functionality of this library is very simple to use. All we are required to do is import the library and initialize the speech engine, have the text in the string format, and execute the text to speech conversion process:


import pyttsx3

engine = pyttsx3.init()
engine.say('This is a test phrase.')
engine.runAndWait()

What you will hear at default settings is a female voice that pronounced the phrase quite fast. For the cases when you either want to change the voice or the speech rate or the volume, the library provides a lot of flexibility.

The engine instance of the class we initialized has the .getProperty() method which will help us adjust the current default settings to the ones we want.

Now you can start to explore more features and learn more about how to convert text to speech using Python.


Changing voice

The The engine instance of the class we initialized in the previous section has pyttsx3 library has two types of voices included in the default configuration: male voice and female voice.

These can be retrieved by simply running:


voices = engine.getProperty('voices')

print(voices)

What you should get in return is a list that has the local memory locations of each voice. Now we want to try each of them, and we simply run the text to speech basic usage code through a loop:


for voice in voices:
   engine.setProperty('voice', voice.id)
   engine.say('This is a test phrase.')
engine.runAndWait()

An observation we can get is that the male voice is stored in the list at index 0 and female voice is stored in the list at index 1.

To set the voice as a permanent setting, the engine instance of the class we initialized has the .setProperty() method. It will allow us to specify which of the two voices the code should use.

Let’s say I want to permanently change the voice to male’s (remember it’s at index 0):


engine.setProperty('voice', voices[0].id)

Now every phrase you will try to run through using the initialized engine will always have the male voice.


Changing speech rate

After we changed the voice, we may want to adjust the speech rate of how fast each phrase is being said.

Using the known .getProperty() method we will first find out what the current speech rate is:


rate = engine.getProperty('rate')
print(rate)

For the default settings the rate showed to be 200 (which should be in words per minute).

When I listened to the engine initially I thought it was too fast, so I would like to decrease the words per minute rate to let’s say 125. Similarly to setting the voice, we will use .setProperty() method to work with the speech rate and test it:


engine.setProperty('rate', 125)
engine.say('This is a test phrase.')
engine.runAndWait()

You should hear a significantly slower speech that it more comfortable to listen.

In another case, if you feel that the speech rate is too low you can always adjust it and generally just keep trying different values until you find the one that you are satisfied with.


Changing volume

Similarly to the speech rate adjustment, we can alter the volume of the voice we set.

Using the known .getProperty() method we will first find out what the current volume is:


volume = engine.getProperty('volume')
print(volume)

For the default settings the rate showed to be 1.0 (which is the maximum we can have and the range is between 0 and 1).

You can basically choose any value between 0 and 1 to see how the volume changes. Similarly to setting the speech rate, we will use .setProperty() method to work with the volume and test it:


engine.setProperty('volume', 0.5)
engine.say('This is a test phrase.')
engine.runAndWait()

Here we set the volume to be half of what it was before and notice the difference when we listen to the test phrase.

Such a setting allows for great flexibility with adjustments depending on the narrative based on the use of your text to speech conversion.


Save speech as mp3 file using Python

Another wonderful functionality provided in this library is the ability to simply store our text to speech conversions as mp3 files which can be listened later in any audio player.

The code is very simple and requires two things from the user: the text that will be converted to speech and the name for the output file:


engine.save_to_file('This is a test phrase.', 'test.mp3')
engine.runAndWait()

The above code will save the output as an mp3 file in the same location where you Python script is. You can of course alter the destination by specifying it in the output file path.


Conclusion

In this article we discussed how to convert text to speech using Python.

By working through this code, you should be able to convert full texts to speech with the required adjustments.

I also encourage you to check out my other posts on Python Programming.

You can learn more about the pyttsx3 library here.

Feel free to leave comments below if you have any questions or have suggestions for some edits.