From e7e8e5ec59174366e4a265acfb27060f9f4a86aa Mon Sep 17 00:00:00 2001 From: MarStr Date: Tue, 9 Jul 2024 13:25:54 +0200 Subject: [PATCH] Added README.md with details on the code. --- README.md | 148 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..253de71 --- /dev/null +++ b/README.md @@ -0,0 +1,148 @@ +### Pilot2AWS + +A Python script allowing to use standard or neural voices of Amazon Polly, in Pilot2ATC. It does so by monitoring Pilot2ATC's output and leveraging Amazon's Polly technology, to generate voice responses that sound more natural. + +The voices that Pilot2ATC can use are those that are available on your system, on Windows this set is usually extremely limited. On top of that, they sound very robotic. Not what you'd want when talking to ATC in a flight. While there are solutions available that leverage AI (custom versions of ChatGPT), the costs are - in my opinion - too high. SayIntentions is my prime example. Clocking in at 30 Dollars per month (at the time of this writing), it is deemed not feasible for most. + +I did have a look at ChatGPT itself for ATC, and other available tools. I came to the conclusion that Pilot2ATC is the best currently available. It does cost 55 Euros - but it is a one-off payment, you get to keep the tool forever. + +Through this Python script, you can make Pilot2ATC sound more natural, providing better overall immersion. + +Note: this works fine on Windows. Linux and Mac are untested - but I see no reason why it should not work on those two systems. It is Python after all. + + +## Not entirely free! + +**IMPORTANT:** Using this approach may end up not being free. You have a free allowance of 1 million characters per month, no matter which model you choose. But after that, using the service will incur costs. How high the costs are, depends on 1) the amount of ATC calls you make. More calls = more characters. And 2) the selected model. Neural voices are much more natural, but they also require more power to generate - hence the higher costs. + +To find out what is best for you, have a look at these pages first: +- [Amazon Polly Pricing](https://aws.amazon.com/polly/pricing/) +- [AWS Polly Cost Calculator](https://calculator.aws/#/createCalculator/polly) + +For example - + +If you make 100 ATC calls per day, which have 150 characters on average, you would end up with: 3,000 calls per month, with 450,000 characters in total. Standard Text-to-Speech (monthly): **1.80 USD** (again, the standard voice model). + +However - I find AWS the best compromise between quality and cost. This is much better than 30 USD per month for SayIntentions. + + +## Preparation + +Needless to say, you will need an AWS account. If you do not have one, you will need to get one now. The account itself is free. Go here to [create one.](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html?nc2=h_ct&src=header_signup) + +Then, you will need to provide payment information, in my case it was a credit card. While you will not be charged if you stay within the free allowance, you will be charged if you exceed them. In our case with ATC responses it should, however, not break your bank. + +Once this is done you will basically have access to AWS services. You can now log in with your specified email and password as the root user which has access to the AWS console. + +Inside the console, go the top right (your name), click on it, and choose Security Credentials. + +In the middle you will find the section **Access Keys**. Click on _Create Access Key_ and **WRITE DOWN** the key and secret somewhere, as it will not be shown again. You will also have the option to download a CSV with the credentials - you can do that too and save it to a safe and private location. + +If you want to use AWS and its services elsewhere, you are of course free to install the AWS CLI tools. But for this script, it is not necessary. + + +## Setup + +You will need two Python modules: boto3 and pygame. Install them like so: + +``` +pip install boto3 +pip install pygame +``` + +Next, open up the Pilot2AWS.py script with your favorite editor and make the necessary adjustments as follows: + + +``` +# ------------------------------------------------------------------- +# Enter your access data and your AWS region +# ------------------------------------------------------------------- +atc_aws_key = "YOUR_AWS_KEY" +atc_aws_secret = "YOUR_AWS_SECRET" +atc_aws_region = "SERVER REGION" +``` + +The first two are self-explanatory. These are the key and secret you have acquired in the previous step. + +Most regions have access to Polly - selecting the correct region will accelerate the data transfer of audio data from AWS to your machine. Go [here](https://www.aws-services.info/polly.html) and choose your region. For me that is eu-central-1. + +If you want the script to show which voice it has selected and what the response is, you can change + +``` +atc_show_responses = True +``` + +to True. Default is True. Setting this to False disables this display per line spoken. + +Next, you need to select your voice model. Default is standard. + +``` +atc_aws_voicemodel = 'standard' +``` + +Change this to 'neural' if you want to use the more sophisticated voice generation model. Again, keep in mind that this model is more expensive and can incur significantly higher costs should you exceed the free limits. + +The next section defines where the text log that Pilot2ATC generates, is located. + +``` +atc_pilot2atc_log = "C:\\Users\\windo\\Documents\\pilot2atclog.txt" +``` + +This is a default example I left in - you need to find the path of your file. This can also be on a network drive - important is that the Python script can access this location. Notice the double backslashes for the folder delimiters. + +**Enable text logging in Pilot2ATC** + +Logging to a text file is not enabled by default in the software. Start Pilot2ATC, and open the Config window. In the window, go to the Speech tab. + +In the lower half, you will find a box titled _Conversation Text File Path_. + +- Click on "Enable" +- Click on "New on Startup" + +The second option ensures that the previous log is overwritten and ensures proper operation of this script. + +Then, click on the three dots on the far right, and select a path and a file name. This must be identical to the line for the atc_pilot2atc_log setting above. + +**Turn off Windows/TTS voices from Pilot2ATC** + +Open the Config window again, should you have closed it already. For the best results, go to the Voices tab. There, reduce the volume of all voices to 0%. This is to prevent hearing the AWS voice and the Windows voice at the same time. + +You may also mute the sound output of Pilot2ATC in your sound settings. + + +## Finally: RUNNING + +Once you have done all the above steps, open a command prompt and navigate to where the script is located. Then, simply run + +``` +python .\Pilot2AWS.py +``` + +You will be greeted with a small message, and the script begins to work immediately. Meaning it intercepts ATC messages it will find in the text file. Once an "update" is detected, the voice is generated and played. + +The script can be on any machine you want - important is that the script has access to the text file Pilot2ATC generates. + + +## Detailed configuration + +By default, the script will pick a name at random from the defined voices for each model. A request is sent to AWS, the current ATC sentence will be synthesized and sent back as binary stream. + +If you want to use only one particular voice, you can either + +- remove all other voices from the array for a voice model, or +- change the voice_to_use variable on line 104 to point to a particular array index for the voices, instead of the random number + +The script waits 3 seconds before it enters its next processing loop. You can change the number in the last value to something else - however: a lower value performs loops in shorter intervals, but is also more resource-intensive. A higher value is less resource-intensive, but provides slower voice updates. + +I find 3 seconds to be a good balance. + +The sound format is OGG Vorbis. I would recommend you to leave it at that. It is the best compromise between quality and data transfer size. + + +## Future plans + +I may need to figure out how to efficiently read X-Plane's dataref values so that I can further enhance realism. For example only pick another voice if you left a certain area or changed the type of contact. I will be looking into this at some time - for now I am happy with how this has turned out. + + +## History +v1.01 - Updated loop mechanism for more efficiency and accuracy -- 2.30.2