]> marstr Code Repo - Pilot2AWS/summary
 
descriptionMaking Pilot2ATC sound more natural. With Amazon Polly.
last changeSun, 21 Jul 2024 20:47:18 +0000 (22:47 +0200)
shortlog
2024-07-21 MarStrAdded background noise and adjusted sample rates for... master
2024-07-09 MarStrChanges to repoinfo file
2024-07-09 MarStrChanges to repoinfo file
2024-07-09 MarStrCorrections to README
2024-07-09 MarStrCorrections to README
2024-07-09 MarStrCorrections to README
2024-07-09 MarStrCorrections to README
2024-07-09 MarStrCorrections to README
2024-07-09 MarStrCorrections to README
2024-07-09 MarStrCorrections to README
2024-07-09 MarStrCorrections to README
2024-07-09 MarStrAdded README.md with details on the code.
2024-07-05 MarStrInitial commit of script to git repo
heads
6 months ago master
Description
Pilot2AWS
A Python script allowing to use standard or neural voices of Amazon Polly, in Pilot2ATC. It does so by monitoring Pilot2ATC's output and leveraging Amazon's Polly technology, to generate voice responses that sound more natural.
Now with added immersion as the result sounds very much like an actual radio transmission, including background noise.
The voices that Pilot2ATC can use are those that are available on your system, on Windows this set is usually extremely limited. On top of that, they sound very robotic. Not what you'd want when talking to ATC in a flight. While there are solutions available that leverage AI (custom versions of ChatGPT), the costs are - in my opinion - too high. SayIntentions is my prime example. Clocking in at 30 Dollars per month (at the time of this writing), it is deemed not feasible for most.
I did have a look at ChatGPT itself for ATC, and other available tools. I came to the conclusion that Pilot2ATC is the best currently available. It does cost 55 Euros - but it is a one-off payment, you get to keep the tool forever.
Through this Python script, you can make Pilot2ATC sound more natural, providing better overall immersion.
Note: this works fine on Windows. Linux and Mac are untested - but I see no reason why it should not work on those two systems. It is Python after all.
Not entirely free!
IMPORTANT: Using this approach may end up not being free. You have a free allowance of 1 million characters per month, no matter which model you choose. But after that, using the service will incur costs. How high the costs are, depends on 1) the amount of ATC calls you make. More calls = more characters. And 2) the selected model. Neural voices are much more natural, but they also require more power to generate - hence the higher costs.
To find out what is best for you, have a look at these pages first:
- Amazon Polly Pricing (Link)
- AWS Polly Cost Calculator (Link)
For example -
If you make 100 ATC calls per day, which have 150 characters on average, you would end up with: 3,000 calls per month, with 450,000 characters in total. Standard Text-to-Speech (monthly): 1.80 USD (again, the standard voice model).
However - I find AWS the best compromise between quality and cost. This is much better than 30 USD per month for SayIntentions.
Preparation
Needless to say, you will need an AWS account. If you do not have one, you will need to get one now. The account itself is free. Go here to create one: (Link)
Then, you will need to provide payment information, in my case it was a credit card. While you will not be charged if you stay within the free allowance, you will be charged if you exceed them. In our case with ATC responses it should, however, not break your bank.
Once this is done you will basically have access to AWS services. You can now log in with your specified email and password as the root user which has access to the AWS console.
Inside the console, go the top right (your name), click on it, and choose Security Credentials.
In the middle you will find the section Access Keys. Click on Create Access Key and **WRITE DOWN** the key and secret somewhere, as it will not be shown again. You will also have the option to download a CSV with the credentials - you can do that too and save it to a safe and private location.
If you want to use AWS and its services elsewhere, you are of course free to install the AWS CLI tools. But for this script, it is not necessary.
Setup
You will need four Python modules: pygame, boto3, numpy and scipy. Install them like so:
pip install boto3
pip install pygame
pip install numpy
pip install scipy
Next, open up the Pilot2AWS.py script with your favorite editor and make the necessary adjustments as follows:
# -------------------------------------------------------------------
# Enter your access data and your AWS region
# -------------------------------------------------------------------
atc_aws_key = "YOUR_AWS_KEY"
atc_aws_secret = "YOUR_AWS_SECRET"
atc_aws_region = "SERVER REGION"
The first two are self-explanatory. These are the key and secret you have acquired in the previous step.
Most regions have access to Polly - selecting the correct region will accelerate the data transfer of audio data from AWS to your machine. Go to this (Link) and choose your region. For me that is eu-central-1.
If you want the script to show which voice it has selected and what the response is, you can change
atc_show_responses = True
to True. Default is True. Setting this to False disables this display per line spoken.
Next, you need to select your voice model. Default is standard.
atc_aws_voicemodel = 'standard'
Change this to 'neural' if you want to use the more sophisticated voice generation model. Again, keep in mind that this model is more expensive and can incur significantly higher costs should you exceed the free limits.
The next section defines where the text log that Pilot2ATC generates, is located.
atc_pilot2atc_log = "C:\\Users\\windo\\Documents\\pilot2atclog.txt"
This is a default example I left in - you need to find the path of your file. This can also be on a network drive - important is that the Python script can access this location. Notice the double backslashes for the folder delimiters.
Enable text logging in Pilot2ATC
Logging to a text file is not enabled by default in the software. Start Pilot2ATC, and open the Config window. In the window, go to the Speech tab.
In the lower half, you will find a box titled Conversation Text File Path.
- Click on "Enable"
- Click on "New on Startup"
The second option ensures that the previous log is overwritten and ensures proper operation of this script.
Then, click on the three dots on the far right, and select a path and a file name. This must be identical to the line for the atc_pilot2atc_log setting above.
Turn off Windows/TTS voices from Pilot2ATC
Open the Config window again, should you have closed it already. For the best results, go to the Voices tab. There, reduce the volume of all voices to 0%. This is to prevent hearing the AWS voice and the Windows voice at the same time.
You may also mute the sound output of Pilot2ATC in your sound settings.
Finally: RUNNING
Once you have done all the above steps, open a command prompt and navigate to where the script is located. Then, simply run
python .\Pilot2AWS.py
You will be greeted with a small message, and the script begins to work immediately. Meaning it intercepts ATC messages it will find in the text file. Once an "update" is detected, the voice is generated and played.
The script can be on any machine you want - important is that the script has access to the text file Pilot2ATC generates.
Detailed configuration
By default, the script will pick a name at random from the defined voices for each model. A request is sent to AWS, the current ATC sentence will be synthesized and sent back as binary stream.
If you want to use only one particular voice, you can either
- remove all other voices from the array for a voice model, or
- change the voice_to_use variable on line 104 to point to a particular array index for the voices, instead of the random number
The script waits 3 seconds before it enters its next processing loop. You can change the number in the last value to something else - however: a lower value performs loops in shorter intervals, but is also more resource-intensive. A higher value is less resource-intensive, but provides slower voice updates.
I find 3 seconds to be a good balance.
The sound format is OGG Vorbis. I would recommend you to leave it at that. It is the best compromise between quality and data transfer size.
Future plans
I may need to figure out how to efficiently read X-Plane's dataref values so that I can further enhance realism. For example only pick another voice if you left a certain area or changed the type of contact. I will be looking into this at some time - for now I am happy with how this has turned out.
History
v1.02 - Implemented mechanism that generates 8kHz white noise, and change to generate the voice also with 8kHz. Implemented code to mix both sounds together
v1.01 - Updated loop mechanism for more efficiency and accuracy