How to use a Bash script to manage downloading and viewing files from an AWS S3 bucket

As you can read in this article, I recently had some trouble with my email server and decided to outsource email administration to Amazon's Simple Email Service (SES).

The problem with that solution was that I had SES save new messages to an S3 bucket, and using the AWS Management Console to read files within S3 buckets gets stale really fast.

So I decided to write a Bash script to automate the process of downloading, properly storing, and viewing new messages.

While I wrote this script for use on my Ubuntu Linux desktop, it wouldn't require too much fiddling to make it work on a macOS or Windows 10 system through Windows SubSystem for Linux.

Here's the complete script all in one piece. After you take a few moments to look it over, I'll walk you through it one step at a time.

#!/bin/bash
# Retrieve new messages from S3 and save to tmpemails/ directory:
aws s3 cp \
   --recursive \
   s3://bucket-name/ \
   /home/david/s3-emails/tmpemails/  \
   --profile myaccount

# Set location variables:
tmp_file_location=/home/david/s3-emails/tmpemails/*
base_location=/home/david/s3-emails/emails/

# Create new directory to store today's messages:
today=$(date +"%m_%d_%Y")
[[ -d ${base_location}/"$today" ]] || mkdir ${base_location}/"$today"

# Give the message files readable names:
for FILE in $tmp_file_location
do
   mv $FILE ${base_location}/${today}/email$(rand)
done

# Open new files in Gedit:
for NEWFILE in ${base_location}/${today}/*
do
   gedit $NEWFILE
done

The complete Bash script

We'll begin with the single command to download any messages currently residing in my S3 bucket (by the way, I've changed the names of the bucket and other filesystem and authentication details to protect my privacy).

aws s3 cp \
   --recursive \
   s3://bucket-name/ \
   /home/david/s3-emails/tmpemails/  \
   --profile myaccount

Of course, this will only work if you've already installed and configured the AWS CLI for your local system. Now's the time to do that if you haven't already.

The cp command stands for "copy," --recursive tells the CLI to apply the operation even to multiple objects, s3://bucket-name points to my bucket (your bucket name will obviously be different), the /home/david... line is the absolute filesystem address to which I'd like the messages copied, and the --profile argument tells the CLI which of my multiple AWS accounts I'm referring to.

The next section sets two variables that will make it much easier for me to specify filesystem locations through the rest of the script.

tmp_file_location=/home/david/s3-emails/tmpemails/*
base_location=/home/david/s3-emails/emails/

Note how the value of the tmp_file_location variable ends with an asterisk. That's because I want to refer to the files within that directory, rather than the directory itself.

I'll create a new permanent directory within the .../emails/ hierarchy to make it easier for me to find messages later. The name of this new directory will be the current date.

today=$(date +"%m_%d_%Y")
[[ -d ${base_location}/"$today" ]] || mkdir ${base_location}/"$today"

I first create a new shell variable named today that will be populated by the output of the date +"%m_%d_%Y" command. date itself outputs the full date/timestamp, but what follows ("%m_%d_%Y") edits that output to a simpler and more readable format.

I then test for the existence of a directly using that name - which would indicate that I've already received emails on that day and, therefore, there's no need to recreate the directory. If such a directory does not exist (||), then mkdir will create it for me. If you don't run this test, your command could return annoying error messages.

Since Amazon SES gives ugly and unreadable names to each of the messages it drops into my S3 bucket, I'll now dynamically rename them while, at the same time, moving them over to their new home (in the dated directory I just created).

for FILE in $tmp_file_location
do
   mv $FILE ${base_location}/${today}/email$(rand)
done

The for...do...done loop will read each of the files in the directory represented by the $tmp_file_location variable and then move it to the directory I just created (represented by the $base_location variable in addition to the current value of $today).

As part of the same operation, I'll give it its new name, the string "email" followed by a random number generated by the rand command. You may need to install a random number generator: that'll be apt install rand on Ubuntu.

An earlier version of the script created names differentiated by shorter, sequential numbers that were incremented using a count=1...count=$((count+1)) logic within the for loop. That worked fine as long as I didn't happen to receive more than one batch of messages on the same day. If I did, then the new messages would overwrite older files in that day's directory.

I guess it's mathematically possible that my rand command could assign overlapping numbers to two files but, given that the default range rand uses is between 1 and 32,576, that's a risk I'm willing to take.

At this point, there should be files in the new directory with names like email3039, email25343, etc. for each of the new messages I was sent.

Running the tree command on my own system shows me that five messages were saved to my 02_27_2020 directory, and one more to 02_28_2020 (these files were generated using the older version of my script, so they're numbered sequentially).

There are currently no files in tmpemails - that's because the mv command moves files to their new location, leaving nothing behind.

$ tree
.
├── emails
│   ├── 02_27_2020
│   │   ├── email1
│   │   ├── email2
│   │   ├── email3
│   │   ├── email4
│   │   ├── email5
│   └── 02_28_2020
│       └── email1
└── tmpemails

The final section of the script opens each new message in my favorite desktop text editor (Gedit). It uses a similar for...do...done loop, this time reading the names of each file in the new directory (referenced using the "today" command) and then opening the file in Gedit. Note the asterisk I added to the end of the directory location.

for NEWFILE in ${base_location}/${today}/*
do
   gedit $NEWFILE
done

There's still one more thing to do. If I don't clean out my S3 bucket, it'll download all the accumulated messages each time I run the script. That'll make it progressively harder to manage.

So, after successfully downloading my new messages, I run this short script to delete all the files in the bucket:

#!/bin/bash
# Delete all existing emails 

aws s3 rm --recursive s3://bucket-name/ --profile myaccount