by Michael Ludvig

How to teach your projects to talk with AWS Polly

Photo by Jeff Sheldon on Unsplash

Voice output can take your project to a whole new level. Are you are a maker building a new home automation tool or a professional developer working on a commercial gadget? Follow this tutorial to learn how to add a natural voice to your project with very little effort!

TTS = Text To Speech

​​Text To Speech systems are nothing new. The first time I heard a computer speak was sometimes in the last century on my Commodore C64. It was a hardly intelligible, monotonic, robotic voice. Yet so exciting!!​​

​​Fast forward to 2019 and listen to the modern, state of the art TTS systems. Alexa, Siri, Cortana and others — they all sound so natural! And they can all be easily mistaken for real human speakers. Wouldn’t it be nice to have your Raspberry Pi project talk to you like that?

​​Meet AWS Polly

​​Amazon’s cloud platform AWS offers many easy to use cloud-based solutions for various tasks. From database and computing services, through IoT broker and various message queues, right up to a ready to use image recognition. And — the topic of the today’s article — state of the art Text To Speech service AWS Polly.

​​AWS Polly — the voice behind Amazon Alexa — at the moment supports 57 different voices across 19 languages. You can choose between males and females, children, adults, different accents — G’day Australia! — and for some extra experimentation, you can try having a Japanese voice say some English text, for example.​​

​​In this post we will create a simple Python 3 app with all the Text To Speech building blocks that you can then reuse in your project.

We are going to use Raspberry Pi running Raspbian. That’s just to have some baseline platform — there is nothing Raspberry specific in the code, and it will run just fine on any internet-connected device where you can get Python installed, be it Windows, Mac, or Linux.

Prerequisites​​

We can in general configure and consume AWS services in three different ways:

  • ​​AWS Web console — the GUI
  • ​​aws-cli — the command line client
  • ​​AWS SDK — the Software Development Kit. For Python, it’s boto3​ library.​​

​​For starters, let’s install the latter two using pip3​.​

​​[email protected]:~ $ sudo pip3 install awscli boto3Collecting awscli[...]
​​Installing collected packages: docutils, pyasn1, rsa, urllib3, six, python-dateutil, jmespath, botocore, s3transfer, colorama, PyYAML, awscli, boto3
​​Successfully installed PyYAML-3.13 awscli-1.16.25 boto3-1.9.15 botocore-1.12.15 colorama-0.3.9 docutils-0.14 jmespath-0.9.3 pyasn1-0.4.4 python-dateutil-2.7.3 rsa-3.4.2 s3transfer-0.1.13 six-1.11.0 urllib3-1.23

Now we can ​​​​verify that both aws-cli​ and boto3 work.

​​[email protected]:~ $ aws --versionaws-cli/1.16.25 Python/3.5.3 Linux/4.14.70-v7+ botocore/1.12.15​​
​​​​[email protected]:~ $ python3 -c "import boto3; print(boto3.__version__)"1.9.15

​​​We will also need the pygame library. It comes pre-installed in Raspbian but if you’re following this tutorial on some other platform you may need to install it using pip3 install pygame​.

​​Testing audio output

​​As this tutorial is all about audio output, let’s test that pygame​ can actually play sounds. We’ll make use of one of the audio files from Scratch that comes with Raspbian, or you can play your own mp3​, ogg​, or wav​ file.

​​In your favourite text editor or in a Python IDE like Thonny​ open a new file audio_test.py​ and insert this code:

Save it and run it by pressing F5 in Thonny or from the command line with python3 audio_test.py. You should hear a man laughing.

It’s critical to get this working before we move on. No audio working = no Polly talking!

AWS Credentials

Amazon offers a free tier for most services so we can test them without paying a cent. With the Polly free tier we can convert up to 5 Million characters per month in the first year of using the service — that should be plenty for most of us, as it’s roughly 5 days of continuous talking!

I want to keep this article focused on Polly, so please follow the steps in a side-post to create your AWS credentials. We will need them for the future demos.

AWS Polly — credentials
Create credentials for AWS Pollymedium.com

Before continuing make sure that the credentials are correctly configured.

​​​​[email protected]:~ $ aws polly describe-voices{    "Voices": [        {            "Gender": "Male",            "Id": "Russell",            "LanguageCode": "en-AU",            "LanguageName": "Australian English",            "Name": "Russell"        },        ... many more voices listed ...}

If instead you see an error like this, go back to the Credentials article and double-check all the steps.

​​​​[email protected]:~ $ aws polly describe-voicesAn error occurred (AccessDeniedException) when calling the DescribeVoices operation: User: arn:aws:iam::123456789012:user/polly is not authorized to perform: polly:DescribeVoices

Hello Polly

With the access credentials in place and pygame audio working, we can finally get AWS Polly to say something.

Photo by Vladislav Klapin on Unsplash

The official AWS SDK (Software Development Kit) for Python is called boto3 and supports almost all AWS services, including Polly. It automatically handles authentication, request signing, response decoding and so on.

To synthesise speech through AWS Polly, we essentially need only one line of Python code. We’ll be calling polly.synthesize_speech() from boto3.

boto3.client('polly').synthesize_speech(    OutputFormat='ogg_vorbis',    VoiceId='Brian',     Text='Hello, I am Polly! Even though I sound like Brian.')

Of course to actually play the synthesised speech we will need a few more lines to initialise the pygame audio output. Save this code as audio_helper.py, we will use it later.

With the boring audio stuff out of the way, the actual Polly-related code is a neat, short program. Save it as hello_polly.py.

That’s it in a nutshell! Run it with python3 hello_polly.py and if the stars are aligned, audio unmuted, speakers connected, and AWS credentials valid, you should hear it speak.

Alternatively, listen to the output here: hello_polly.ogg

Advanced talking

Just like HTML enriches plain text with bold and italics, paragraphs, and images, SSML Speech Synthesis Markup Language — introduces similar tags to create more engaging voice output by using different voices, changing tempo, pitch, volume, and so on.

To use SSML, simply wrap the text in <speak>…&lt;/speak> marks and add TextType='ssml' parameter when calling synthesize_speech().

Let’s replace the plain text in hello_polly.py with a simple SSML text and save it as ssml_simple.py. Here are only the changed lines, the rest of the program remains the same.

The complete list of available SSML tags is documented on Amazon’s SSML Tags Supported by Amazon Polly page.

Listen to the output here: ssml_simple.ogg

I hear voices…

In the programs above, we only used the voices of Brian and Emma. Polly, however, knows many many more voices that speak different languages and different accents — from English, German or French, through to Japanese and Chinese to the somewhat unexpected Icelandic or Romanian. All of 19 languages, many with different accents, for example British, American, Australian and Indian English.

Photo by Nicholas Green on Unsplash

Listing the available languages is another simple call to Polly API: polly.describe_voices() .

Once we receive the list of voices, we can get each voice to introduce itself. With a little bit of SSML we’ll make sure that the name is said in its native language, but the rest of the sentence is in English. Sometimes with a different accent! With SSML and different voices, we can create a truly multi-cultural experience.

Listen to the output here: describe_voices.ogg

Now what?

Now it’s the time to add voice to your projects. How about changing your Raspberry Pi based alarm clock from the boring beep beep beep to a personalised Wake up Michael! Wake up, it’s 10AM already!! Or how about upgrading your Twitter display to a Twitter reader? Or get your door camera tot welcome your visitors by their name? Of course that would also need some face recognition. But don’t worry, we will get to that in one of the future articles.

At enterprise IT we can help you with AWS Polly integration into your commercial projects. Or with any other AWS services for that matter.

And by the way you can download all the code from this article from my AWS Polly GitHub repository and start playing now. Literally.

See you next time!