Understanding Unix Environment Variables

shell
Posted on: 2014-02-12

Environment variables are an important concept that comes up a lot on Unix-like1 systems. They're universal; whether you're programming Ruby, Haskell, or PHP, you use environment variables all the time.

Commonly used for configuration, they're really a general-purpose tool, allowing us to pass information between programs, even if they're written in different languages.

Today I'll explain how they work. But first, here are a few problems I've run into that boiled down to environment variables.

  • Q: Why can't Vim find this script that I can run from my shell? A: Because the shell adds the script's folder to my $PATH, but I didn't start Vim from the shell.
  • Q: Why is my Rake task that executes bundle install in some other directory using it's own repo's Gemfile? A: Because $BUNDLE_GEMFILE is inherited by the subshell. (Bundler.with_clean_env {} fixes that.)
  • Q: How does Git know what editor to open to do a commit without my configuring that? A: It uses $EDITOR

Maybe those examples made sense to you, maybe not. In either case, let's start with the basics.

Every running program is a process

On a Unix-like system, every running program is a process. This includes obvious programs (your editor, browser, music player, etc) and things your system is doing in the background (running crob jobs, updating your clock, backing up files, etc). Even simple commands, like ls, are really programs; calling ls creates a very short-lived process.

Every process (except for "process 0") is the child of another process. For instance, when I start iTerm, it starts a child zsh process. If I run vim from there, zsh has created a child process of vim. In vim, if I execute a shell command like :!git add % to stage the current file, vim creates a short-lived zsh child process to carry out that command, and that process creates a short-lived child git process.

You can explore the lineage of your processes if you like; every process has both a "process id" (PID) and a "parent process id" (PPID). In a shell, echo $$ will give you its PID; ps -ef will list all processes with PID and PPID. So if echo $$ returns 37503, ps -ef | grep 37503 ought to have an entry matching bin/zsh. Repeat the process with its PPID to see what process spawned zsh, and keep walking upwards from there.

Inheriting the environment

Every time a process creates a child, it gives it certain information. Environment variables are part of that.

The easiest way to see this is to experiment at the command-line. In your shell, you can set an environment variable like this: HI=ho. You can see its value like this: echo $HI.

So this shell now has an environment variable called $HI. Let's start another shell as a child process and see if it has it. Type zsh (or bash), and in the new shell, echo $HI.

Nothing, huh? That's because we didn't specify that we wanted the new shell to inherit this value. Type exit to go back to the first shell.

We can pass an environment variable to the child shell in one of two ways:

  1. Set it as we launch the child, like this: HI=hee zsh. In the new shell, echo $HI will output hee.
  2. Export our value, like this: EXPORT HI=har. Now any process we launch will inherit $HI. So if we type zsh, then echo $HI, it will output har.

Nothing prevents the child process from changing $HI to something else, and it can pass that change to its own children. But that change will never affect its parent process. If it did, environment variables would be a nasty form of global variables - ones that cross all programs on your system! Any dumb or evil process could change $EDITOR to nano, and the next time you typed git commit, you'd be editing with an unexpected program.

(Note: if you want to use a script explicitly for the purpose of changing environment variables in your shell, you can source my_script, which executes it "in the current shell context", kind of like calling "eval" in many programming languages. See https://ss64.com/bash/source.html)

Crossing language lines

I've said that environment variables let us pass information between programs written in different languages. Here's a quick demo using the shell, Ruby, and PHP.

A small Ruby script:

# test_env.rb
puts "In Ruby, env var 'GROCERIES' is  '#{ENV['GROCERIES']}'"
ENV['GROCERIES'] = "#{ENV['GROCERIES']}, pickles"
puts %x{php ./test_env.php}

A small PHP script:

Calling them from the shell:

#!bash
GROCERIES=bread ruby test_env.rb
# In Ruby, env var 'GROCERIES' is  'bread'
# In PHP, env var 'GROCERIES' is 'bread, pickles'

More realistically, one might pass the URL of a database or the location of a file. (Although if the parent process already has the file open, the child process can just use the file handle; file descriptors get passed along to the child automatically.)

Wrap up

You can do a lot to control your computer by setting up environment variables as you wish. The $PATH variable is a particularly important one; it tells the computer where to look for programs. For example, if you type nginx, where should it look to see if it has an executable by that name? The $PATH variable tells it the list of directories to look in.

Because environment variables are so universal, they're also important in cloud applications. Heroku, for example, created a set of recommendations for "software as a service" applications called The Twelve Factor App. They say that using environment variables for app configuration is the best way to make code portable between your machine, a staging environment, and production.

Whether you're building SaaS apps or just trying to make your own computer behave, understanding environment variables will serve you well.


Footnotes

1. Linux, OSX, etc.