There is a simple, useful idiom to make your bash scripts more robust - ensuring they always perform necessary cleanup operations, even when something unexpected goes wrong. The secret sauce is a pseudo-signal provided by bash, called EXIT, that you can trap; commands or functions trapped on it will execute when the script exits for any reason. Let's see how this works.

The basic code structure is like this:

  1. #!/bin/bash

  2. function finish {

  3. # Your cleanup code here

  4. }

  5. trap finish EXIT

You place any code that you want to be certain to run in this "finish" function. A good common example: creating a temporary scratch directory, then deleting it after.

  1. #!/bin/bash

  2. scratch=$(mktemp -d -t tmp.XXXXXXXXXX)

  3. function finish {

  4. rm -rf "$scratch"

  5. }

  6. trap finish EXIT

You can then download, generate, slice and dice intermediate or temporary files to the $scratch directory to your heart's content. [1]

  1. # Download every linux kernel ever.... FOR SCIENCE!

  2. for major in {1..4}; do

  3. for minor in {0..99}; do

  4. for patchlevel in {0..99}; do

  5. tarball="linux-${major}-${minor}-${patchlevel}.tar.bz2"

  6. curl -q "http://kernel.org/path/to/$tarball" -o "$scratch/$tarball" || true

  7. if [ -f "$scratch/$tarball" ]; then

  8. tar jxf "$scratch/$tarball"

  9. fi

  10. done

  11. done

  12. done

  13. # magically merge them into some frankenstein kernel ...

  14. # That done, copy it to a destination

  15. cp "$scratch/frankenstein-linux.tar.bz2" "$1"

  16. # Here at script end, the scratch directory is erased automatically

Compare this to how you'd remove the scratch directory without the trap:

  1. #!/bin/bash

  2. # DON'T DO THIS!

  3. scratch=$(mktemp -d -t tmp.XXXXXXXXXX)

  4. # Insert dozens or hundreds of lines of code here...

  5. # All done, now remove the directory before we exit

  6. rm -rf "$scratch"

What's wrong with this? Plenty:

Keeping Services Up, No Matter What

Another scenario: Imagine you are automating some system administration task, requiring you to temporarily stop a server... and you want to be dead certain it starts again at the end, even if there is some runtime error. Then the pattern is:

  1. function finish {

  2. # re-start service

  3. sudo /etc/init.d/something start

  4. }

  5. trap finish EXIT

  6. sudo /etc/init.d/something stop

  7. # Do the work...

  8. # Allow the script to end and the trapped finish function to start the

  9. # daemon back up.

A concrete example: suppose you have MongoDB running on an Ubuntu server, and want a cronned script to temporarily stop the process for some regular maintenance task. The way to handle it is:

  1. function finish {

  2. # re-start service

  3. sudo service mongdb start

  4. }

  5. trap finish EXIT

  6. # Stop the mongod instance

  7. sudo service mongdb stop

  8. # (If mongod is configured to fork, e.g. as part of a replica set, you

  9. # may instead need to do "sudo killall --wait /usr/bin/mongod".)

Capping Expensive Resources

There is another situation where the exit trap is very useful: if your script initiates an expensive resource, needed only while the script is executing, and you want to make certain it releases that resource once it's done. For example, suppose you are working with Amazon Web Services (AWS), and want a script that creates a new image.

(If you're not familar with this: Servers running on the Amazon cloud are called "instances". Instances are launched from Amazon Machine Images, a.k.a. "AMIs" or "images". AMIs are kind of like a snapshot of a server at a specific moment in time.)

A common pattern for creating custom AMIs looks like:

  1. Run an instance (i.e. start a server) from some base AMI.
  2. Make some modifications to it, perhaps by copying a script over and then executing it.
  3. Create a new image from this now-modified instance.
  4. Terminate the running instance, which you no longer need.

That last step is really important. If your script fails to terminate the instance, it will keep running and accruing charges to your account. (In the worst case, you won't notice until the end of the month, when your bill is way higher than you expect. Believe me, that's no fun!)

If our AMI-creation is encapsulated in a script, we can set an exit trap to destroy the instance. Let's rely on the EC2 command line tools:

  1. #!/bin/bash

  2. # define the base AMI ID somehow

  3. ami=$1

  4. # Store the temporary instance ID here

  5. instance=''

  6. # While we are at it, let me show you another use for a scratch directory.

  7. scratch=$(mktemp -d -t tmp.XXXXXXXXXX)

  8. function finish {

  9. if [ -n "$instance" ]; then

  10. ec2-terminate-instances "$instance"

  11. fi

  12. rm -rf "$scratch"

  13. }

  14. trap finish EXIT

  15. # This line runs the instance, and stores the program output (which

  16. # shows the instance ID) in a file in the scratch directory.

  17. ec2-run-instances "$ami" > "$scratch/run-instance"

  18. # Now extract the instance ID.

  19. instance=$(grep '^INSTANCE' "$scratch/run-instance" | cut -f 2)

At this point in the script, the instance (EC2 server) is running [2]. You can do whatever you like: install software on the instance, modify its configuration programatically, et cetera, finally creating an image from the final version. The instance will be terminated for you when the script exits - even if some uncaught error causes it to exit early. (Just make sure to block until the image creation process finishes.)

Plenty Of Uses

I believe what I've covered in this article only scratches the surface; having used this bash pattern for years, I still find new interesting and fun ways to apply it. You will probably discover your own situations where it will help make your bash scripts more reliable.

Footnotes