Understanding the Unix Command Line

unix
Posted on: 2016-02-15

I've been using the Unix command line since about 2008. When I started, I learned ls, cd, and a few basics like that. But even after becoming more proficient, it was a while before I understood what I'm really doing when I type those commands (and I still have much to learn).

If you're like I was - not a complete newbie, but pretty shallow in your command-line fu - let me fill in some of the gaps for you and show you some of the powerful things you can do when you fire up your terminal. But before that...

What's a Terminal?

I used to say "terminal" or "shell" interchangeably, but they're not the same thing. We generally open a terminal program and run a shell in it, but a terminal can display any text-based user interface. If you think about it, that's what editors like Vim and Nano have: a visual interface drawn with characters instead of pixels.

Terminal programs, like Terminal or iTerm2 or GNOME Terminal, handle things like color and cursor position, letting you scroll back through command output history, etc.

But the key thing is that terminals are for text-only interfaces. In a shell (like bash or zsh), we type text commands and get text output, so that's a perfect thing to do in a terminal.

Processes and PIDS

Another thing I didn't initially understand is that when I'm typing in a shell, that shell is a process being run by the operating system. In a Unix-like system, every running program is a process, and a shell is no exception.

Knowing a few things about processes will come in really handy.

Every process has a process ID (PID). You can see the shell's PID using echo $$.

When you're at the shell, you can use ps to see a list of processes controlled by a shell process.

In one shell, run echo $$, then ruby -e "(1..100).each {|i| puts i; sleep 1}". In another shell, run ps -f. This will list processes, each with its pid and its parent pid - the id of the process that started it. You'll see your Ruby program running. Notice that it has a PID, and that it's PPID is the same as the PID of the shell where you started it.

You can ask the ruby program to shut down by running kill followed by its PID - eg, kill 1234. The command kill sounds super harsh, but the name is kind of a historic relic. It really should be called signal. By default, it sends a signal called TERM, which means "would you please kindly shut down?" The process is expected to gracefully finish what it's doing. kill -KILL [pid] is a special case in that the program itself doesn't get the signal; it's interpreted more like "operating system, DESTROY THIS PROCESS!". Every signal is actually an integer and the names are historic and arcane, but a particular program can mostly decide how to interpret them. kill -l will list them all. Nginx has some interesting signal responses - eg, kill -HUP tells it to reload its configuration.

STDIN, STDOUT, STDERR, and redirecting

Every process gets three "file descriptors": standard in, standard out, and standard error.

Standard Input

Standard in can be the keyboard: wc -l by itself will wait for you to type and press control + d to indicate the end of the "file", then it will count the lines you typed. Standard in can also come from another process: in cat somefile | wc -l, wc -l gets its standard in from the standard out of cat.

Standard Output

Standard out will go to the screen by the default: cat somefile will show the contents on the screen. Standard out can also go to another process: in cat somefile | wc -l, standard out from cat is piped to wc -l.

Standard Error

Standard error is exactly like standard out, but for "other" messages, ones you wouldn't want piped to another program. Eg, if you do curl -v somesite.com, the site's HTML will go to standard out, and "metadata" like "connected to this IP on this port" and "got these headers" will go to standard error. In this example, both of those print on the screen, so they're indistinguishable. To see the difference, you have to redirect one or both of them.

Redirecting

You can capture a program's standard output to a file with > or >>. Either > or >> will create the file if it doesn't exist. If it does exist, > will overwrite the contents, whereas >> will append to it. 2> tells what to do with a command's standard error ("file descriptor 2"; where "1" is stdout). Some examples:

#!bash
# HTML goes to the file, headers and messages print to screen
curl -v nathanmlong.com > nathanmlong_index.html

# append the html to this file
curl -v nathanmlong.com >> web_pages.html

# put HTML and metadata in separate files
curl -v nathanmlong.com > web_page.html 2> metadata.txt

# put stdout and stderr in the same file
curl -v nathanmlong.com > somefile 2>&1

# same thing; `1>` is the same as `>`
curl -v nathanmlong.com 1> somefile 2>&1

You can also use < to mean "read from this source"; sort < somefile.txt takes somefile.txt as input for sort.

Pipes!!

Pipes are awesome! They can be thought of as simply a way to string commands together to make bigger commands. We've seen a couple of small examples already, but I want you to see their true power.

#!bash
# outputs several lines
echo "hello\nthere\nfaithful\nfriend"

# keep only the ones containing an e (could be any regex pattern)
echo "hello\nthere\nfaithful\nfriend" | grep 'e'

# keep only the ones NOT containing an e
echo "hello\nthere\nfaithful\nfriend" | grep -v 'e'

# keep only lines with 'e' and only characters 3-8
echo "hello\nthere\nfaithful\nfriend" | grep -v 'e' | cut -c 3-8

You could use a similar pipeline to find all the Rails log entries that contain the request parameters and snip out just those params.

Here's a more complex example (from Peter Sobot's blog) to answer the question: what's the longest word in the dictionary that contains the word "purple"?

#!bash
# purple_finder.sh
# Read in the system's dictionary.
cat /usr/share/dict/words |

# Find words containing 'purple'
grep purple |                   

# Count the letters in each word
awk '{print length($1), $1}' |

# Sort lines ("${length} ${word}")
sort -n |                       

# Take the last line of the input
tail -n 1 |                     

# Take the second part of the line
cut -d " " -f 2 |               

# Output the results
# (this is just here so that any of the lines
# above can be commented out)
cat                             

Paste that into example.sh and run zsh example.sh and you'll find out.

Now try commenting out every line but the first and last ones and re-run it. Then uncomment the grep purple and re-run, then uncomment the awk command and re-run, and so on. You'll see how the pipeline of commands gradually builds toward the final answer.

This runs really fast! There are 236k words in that dictionary (which I learned by running wc -l /usr/share/dict/words, and we get our answer in about a tenth of a second (which I learned by running time zsh example.sh. Let's make it slower so we can see what's happening.

#!bash
# slow_purple_finder.sh
# Read in the system's dictionary.
cat /usr/share/dict/words |     

# Add some slowness to this whole process
ruby -e 'while l = STDIN.gets do; STDOUT.puts(l); sleep 0.00001; STDOUT.flush; end' |

# Find words containing 'purple'
grep purple |                   

# Count the letters in each word
awk '{print length($1), $1}' |

# Sort lines ("${length} ${word}")
sort -n |                       

# Take the last line of the input
tail -n 1 |                     

# Take the second part of the line
cut -d " " -f 2 |               

# Output the results
# (this is just here so that any of the lines
# above can be commented out)
cat                             

Now before you run that, in another terminal, run watch -n 0.25 ps -f -o rss. That means "every quarter second, rerun this ps command that shows processes and their memory usage". While that's running, do ps -f in another terminal. See that awk, cut, etc are all their own processes? Input is passed from one to another like an assembly line with all workers working at the same time. That fact is what makes the whole thing really fast and efficient.

It's fast because each process may be running on a different CPU core, so if you have 4 cores, it can complete 4 times as fast as if it were done sequentially.

It's memory efficient, too, because (eg) while cat is pulling lines of the file off the disk, grep is deciding whether to send each one along; none of them every have to hold the entire dictionary in memory at one time. It's like drinking water through a straw; whether you drink an ounce or a gallon, the straw probably never contains more than a tablespoon.

The shell keeps an "inbox" for each process, called its "standard input", and if it gets too full, it makes the process that's writing to it pause for a bit; this also limits memory usage.

Because these run in parallel, you can also use them for continuous output. Eg, to see lines that appear in your log file in real time, but only if they contain "DEBUG", do tail -f logfile | grep "DEBUG", and make sure all your debugging messages contain that string. If these didn't run in parallel, you'd have to stop the tail process so that grep could get the output and filter out what you want, but since they're parallel, you can get results in real time.

For lots more about pipelines, see Peter Sobot's awesome blog post.

Essential commands

General-purpose:

  • less:
    • less some_huge_file lets you scroll and search in it without loading the whole thing into memory

Especially good for pipeline construction:

  • head:
    • head somefile shows the first few lines of it (imagine a massive logfile)
    • head -4 somefile shows the first 4 lines
  • tail:
    • tail somefile shows the last few lines of it (imagine a massive logfile)
    • tail -4 somefile shows the last 4 lines
    • tail -f somefile continually outputs as the file is appended (eg a logfile)
  • grep:
    • cat somefile | grep somestring outputs only matching lines

    • tail -f development.log | grep somestring outputs only matching lines

  • sort:
    • cat somefile | sort sorts the lines and outputs them. Flags control what kind of sorting (alphabetic, numeric, etc)

  • uniq:
    • cat somefile | sort | uniq throws away repeated lines (sorting is required)

  • cut:
    • cat somefile | head -2 | cut -c 1-20 gives first 20 chars of first 2 lines

  • sed:
    • cat somefile | sed 's/pickle/bear/' # change all pickles to bears

sed can do a ton more stuff, and awk can also do a ton of stuff - they are actually their own programming languages! But if you know Ruby, you can use it instead:

#!bash
# When running `ruby`::
#   - `-e` means "execute this snippet of code instead of a file"
#   - `n` means "run once for every line of STDIN"
#   - `p` means "print every line of STDIN (possibly after mutating it)"
# see `man ruby` for more about ruby's flags

# outputs even numbers from 1 to 10
seq 1 10 | ruby -ne 'puts $_ if $_.to_i % 2 == 0'

# outputs "HI" and "THERE" (must mutate $_ to see)
echo "hi\nthere" | ruby -pe '$_.upcase!'

Environment variables

Environment variables can be set like GREET=hi and read like echo $GREET. Any child process gets a copy of any of its parent's environment variables that have been exported - eg, export GREET=hi, then run ruby -e 'puts ENV["GREET"]'. Note that a child process gets a copy of its parents environment variables; it can modify its copy, but not its parent's copy.

$PWD is the "present working directory".

$PATH is a very important environment variable.
Every command runs a process - eg ls runs the program found at which ls, which is the first place in $PATH that it finds a file named ls whose file permissions include execution.

$PATH controls where the shell looks for programs. PATH="" will break your shell, but you can just exit that shell. You can add your own script directories to PATH, like I did with ~/.dotfiles/scripts/md_preview.

See my blog post on environment variables and this one from honeybadger for more details.

Expansions

Bash does several passes through your command before running it.

  • ls ~/foo or ls ./foo- directory expansion
  • echo $(whomai) - command substitution
  • echo $TERM - environment variable substitution
  • ls *.txt - glob expansion - turns into (eg) ls foo.txt bar.txt ... - ls "*.txt" makes that one argument
  • touch foo{1,2,3}.txt - brace expansion
  • alias g="git"; g status - alias expansion

All of these happen before running it, so you can stick echo in front to see what they do; after the expansion it just finds echo with some arguments and runs it.

#!bash
 # expands to  `echo touch foo1.txt foo2.txt foo2.txt`
echo touch foo{1,2,3}.txt

# expands to  `echo rm foo.txt foo.rb` (if those exist)
echo rm foo.*

If you do set -x, it will show you these expansions as it runs them. set +x turns it off.

One related trick: <(some_command) lets you treat the output of that command as a file (the OS makes a temporary file). So diff <(grep '=' file1) <(grep '=' file2) will compare these two files, but only the lines that contain =.

Adding your own commands

One of my favorite things about the command line is how customizable it is.

You can add your own commands to your command line in one of several ways.

Functions and Aliases

You can write your own shell functions:

#!bash
# Define a word
function define() { curl dict://dict.org/d:$1 }
define boat # prints a dictionary definition

# Recursively grep in files
function grepfiles(){ grep -Ri $1 *}

And alias existing commands, with or without options:

#!bash
# less typing
alias g="git"

# use the new version I installed with Homebrew, not the system default
alias vi="vim"

# by default, colorize output and mark executables with *
alias ls="ls -G" 

# see rspec test output in documentation format
alias docspec="rspec --format doc --order default"

# start postgres the way I usually do
alias dbstart='pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres.log start'

# open some housekeeping files in an editor
alias work='cd ~/work; mvim -p billable_hours.csv general_todo.md learning.md'

# A no-op: `g s` means `git status` for me, but if I mistype it as `gs`,
# warn me instead of running ghostscript
alias gs='echo "if you really want ghostscript, \"where gs\" to find it"'

If you want aliases and functions to persist forever, put them in your shell's config file (eg, ~/.bashrc or ~/.zshconfig). The config file is run every time you start a shell, as if you typed its contents yourself.

Executable programs somewhere in $PATH

If you save this as timestamp in a folder on your $PATH:

#!ruby
#!/usr/bin/env ruby
# This program is a bit silly because `date` is already a Unix command
require "time"
puts Time.now.strftime("%Y-%m-%d %H:%M:%S")

...then you chmod +x timestamp to make that file executable, you'll be able to run timestamp from the command line. (If timestamp was in your current working directory but not on your $PATH, you'd have to say ./timestamp to help the shell find it.)

Exit Statuses

Every program exits with either 0 if it succeeded or another number if it failed somehow.

You can check the last exit status in a subsequent command; it's available as $?. Eg:

#!bash
ls -l
echo $? # 0, because `-l` is a valid option
ls -z
echo $? # 1, because `-z` is an invalid option

You can use this info to decide what to do next. && and || make it very succinct. && foo means "run foo if the last command was successful", and || bar means "run bar if the last command failed."

So this:

#!bash
rspec spec && say "success" || say "failure"

...tells audibly you whether your tests passed (on a Mac, which has the say command - try espeak on Linux). This works because rspec correctly sets its exit status based on whether the tests all passed.

If you don't care about the exit status but still want audible notification, do:

#!bash
rspec spec; say "done"

Other programming constructs

Bash/zsh are full programming languages, so you can do looping, conditionals, etc, if you want.

  • for i in apple banana cake; do touch $i.txt; done
  • for i in $(ls *.txt); do echo "This is a text file: $i" done
  • for i in $(seq 10); do echo "I'm counting to 10 like: $i"; done

I don't tend to use conditionals and loops much in the shell; I tend to turn to Ruby for things like that. But you can read lots more on this if you're interested.

Using the command line from Vim

One of my favorite Vim tricks is to highlight some lines, then call out to the shell to transform them. For example, to sort some lines in Vim, highlight and !sort.

See "Making Vim, Unix and Ruby sing harmony" for more details.

(You can probably do this sort of thing in emacs, but I don't know it, so you emacs users will have to do your own research here.)

Reading man pages

"Unix will give you enough rope to shoot yourself in the foot. If you didn't think rope would do that, you should have read the man page." - https://twitter.com/mhoye/status/694646265657708544

man is a command to show the "manual page" for a program, if it has one. These are great for reference if you know how to read them. Here's the start of man ls on OS X:

#!bash
LS(1)                     BSD General Commands Manual                    LS(1)

NAME
     ls -- list directory contents

SYNOPSIS
     ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]

When they put [ ] around something, it means it's optional. ... means you can put multiple things in that slot. So ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...] should be read as: "you can type ls by itself. You can also pass any of these flags to control what it outputs. You can also pass a file name, or more than one file name."

(By the way, there's nothing special about flags, they're just arguments that the program may decide to interpret a specific way. So ls -l .git just has two arguments.)

As is typical, man ls explains in detail what every possible flag will do, but only gives a single usage example.

See TLDR Pages (accessible through a Ruby client, among others - gem install tldrb) for an example-oriented help utility.

History

You may know that control + r searches your history of commands to re-execute one.

history lets you see your history directly - eg history | tail -5 for the most recent 5 commands. From the numbered entries you see there, !1234 would re-run that number. You can configure how items of history your shell remembers.

Wrapup

The command line is super powerful and sometimes even fun. I hope this has helped you get a better perspective on it, and inspired you to learn more.

Thanks to the nice folks at TMA for letting me develop this training for their development team. Particularly, thanks to Mark Cooke, Adam Hunter and Rod Turnham for their support, and to Simone and Kim for being enthusiastic listeners.