One tool that comes in handy for managing files with S3 is s3cmd. If you’re an Ubuntu user, you can even use apt (Advanced Package Toolset) to install. Other distributions also likely offer support.
The logic for putting files to the Cloud is pretty simple. After configuring s3cmd you can use the “put” option to send the file to its final resting place. In Ubuntu, s3cmd gets installed to /usr/bin, so my command to put the file big_recording_file.wav to the recording folder on my bucket would look something like this:
/usr/bin/s3cmd put big_recording_file.wav s3://stephensBucket/recordings/
That’s fine for moving a single file. To move a number of files, you could write a simple shell script to find recordings, copy them to S3, then move the file to a “done” directory so that it doesn’t get sent again by accident, wasting time. Here’s a minimally viable shell script typical of the sort clients have used to move files:
#!/bin/bash # This is information used to place the files in S3 S3BUCKETNAME=myBucket # change this to whatever your bucket name is FOLDER=myFolder # the folder in the bucket COUNT=1000 # The maximum number of files to move each run # These are defaults, and should not normally be changed S3DIR=/usr/bin TEMP=/tmp DATE=`date +%F` WAV_LOC=/var/spool/asterisk/monitor FINAL_DEST=$WAV_LOC/done mkdir -p $FINAL_DEST # creates the directory if it doesn't exist # Find the first $COUNT files in the target directory that aren't: # * In the final destination directory # * unmixed # * newer than a day echo "$DATE :: Find and upload WAV files older than 1 day.." for i in `find $WAV_LOC -mtime +1 -iname "*.wav" | grep -v "-in|-out|$FINAL_DEST" | head -n $COUNT` do # put the file in S3 $S3DIR/s3cmd put $i s3://$S3BUCKETNAME/$FOLDER/$i # check for an error placing the file if [ $? != 0 ] then echo "$DATE :: Command failed! See output for errors. Exiting." exit 255 else echo -n "$DATE :: Removing backed up files... " # you can do other things here, too mv $i $FINAL_DEST && echo "Moved $i to $FINAL_DEST" fi done
The COUNT is where we determine the maximum number of files to move at a time. Note that in this example, there is no locking, we’re not checking to make sure all the relevant directories exist, and the recording is simply being moved to /var/spool/asterisk/monitor/done. Ultimately, something else will have to be done periodically to remove the recordings from the server itself.
It should also be noted that s3cmd isn’t highly efficient. If you have a lot of files and a lot of bandwidth, you will want to find a way to use concurrency to send multiple files simultaneously. If running a script like this, you should also make sure to capture the output by redirecting output to a log file.
If you or your software is keeping track of the recording location, you can add statements to update that location in the “# you can do other things here, too” area. For instance, Indosoft Q-Suite is call center software that keeps track of the location of the recording because it allows the use of multiple telephony servers, and presents recordings via the admin screens. Therefore Q-Suite installs sending files to S3 normally have to update the location in the database before finishing, so that the S3 recordings can be presented to the QA team.
If you’re interested in what a script with locking and multiple processes would look like, you can see an example on this page.
Edit: I changed the s3cmd line slightly to specifically name the file at the end. I found that older versions of s3cmd don’t behave as we would like without the file name specified, and newer versions can accept the filename without an issue.