Chapter 4. Variable Vernacular

It is not uncommon to see an error message or an assignment statement that contains the idiom ${0##*/}, which looks to be some sort of reference to $0, but something more is going on. Let’s take a closer look at variable references and what some of these extra characters do for us. What we’ll find is a whole array of string manipulations that give you quite a bit of power in a few special characters.

Variable Reference

Referencing a variable’s value is very straightforward in most programming languages. You either just use the name of the variable or add a character to the name to explicitly say that you want to retrieve the value. That’s true with bash: you assign to the variable by name, VAR=something, and you retrieve the value with a dollar-sign prefix: $VAR. If you’re wondering why we need the dollar sign, consider that bash deals largely with strings, so:

MSG="Error: FILE not found"

will give you a simple literal string of the four words shown, whereas:

MSG="Error: $FILE not found"

will replace the $FILE with the value of that variable (which, presumably, would hold the name of the file that it was looking for).

Variable Interpolation

Be sure to use double quotes if you want this string substitution to occur. Using single quotes takes all characters literally, and no substitutions happen.

To avoid confusion over where the variable name ends (the spaces make it easy in this example), a more complete syntax for variable reference uses braces around the variable name ${FILE}, and could have been used in our example.

This syntax, with the braces, is the foundation for much special syntax around variable references. For example, we can put a hash sign in front of a variable name ${#VAR}, to return not its value but the string length of the value.

${VAR} ${#VAR}

oneword

7

/usr/bin/longpath.txt

21

many words in one string

24

3

1

2356

4

1427685

7

But bash can do more than simply retrieve the value or its length.

Parameter Expansion

When retrieving the value of a variable, certain substitutions or edits can be specified, affecting the value that is returned (though not the value in the variable—except in one case). The syntax involves special sequences of characters inside the braces used to delineate the variable’s name, like the characters inside these braces: ${VAR##*/}. Here are a few such expansions worth knowing.

Shorthand for basename

When you invoke a script, you might use just its filename as the command to invoke the script, but that assumes that the script has execute permissions and is in a directory located in one of the directories in your PATH variable. You might invoke the script with ./scriptname if the script is in your current directory. You might invoke it with a full pathname, /home/smith/utilities/scriptname, or even a relative pathname if your current working directory is nearby.

Whichever way you invoke the script, $0 will contain the sequence of characters that you used to invoke the script—relative path or absolute path, however you expressed it.

When you want to print that script’s name out in a usage message, you likely want just the basename, the name of the file itself, not any of the path that got you there:

echo "usage: ${0##*/} namesfile datafile"

You might see it in a usage message, telling the user the correct syntax for running the script, or it might be the righthand side of an assignment to a variable. In that later case, we hope that the variable is called something like PROGRAM or SCRIPT because that’s what this expression returns—the name of the script that is executing.

Let’s take a closer look at this particular parameter expansion on $0, one that you can use to get just the basename without all the other parts of the path.

Path or Prefix Removal

You can remove characters from the front (prefix or lefthand side) or the tail (suffix or righthand side) of that value. To remove a certain set of characters from the left side of a string, you add a # and a shell pattern onto the parameter reference, a pattern that matches those characters that you want to remove.

The expression ${MYVAL#img_} would remove the characters img_ if they were the first characters of the string in the MYVAL variable. Using a more complex pattern, we could write ${MYVAL#*_}. This would remove any sequence of characters up to, and including, an underscore. (If there was no such pattern that matched, its full value is returned unaltered.)

A single # says that it will use the shortest match possible (nongreedy). A double ## says to use the longest match possible (greedy).

Now, perhaps, can you see what the expression ${0##*/} will do?

It will start with the value in $0, the pathname used to invoke the script. Then, from the lefthand side of the value, it will remove the longest match of any number of characters ending in a slash. Thus, it is removing all the parts of the path used in invoking the script, leaving just the name of the script itself.

Here are some possible values for $0 and this pattern we’ve discussed, to see how both the short (#) and long (##) match might differ in results:

Value in $0 Expression Result returned

./ascript

${0#*/}

ascript

./ascript

${0##*/}

ascript

../bin/ascript

${0#*/}

bin/ascript

../bin/ascript

${0##*/}

ascript

/home/guy/bin/ascript

${0#*/}

home/guy/bin/ascript

/home/guy/bin/ascript

${0##*/}

ascript

Notice that the shortest matching pattern for */ can match just the slash by itself.

Shell Patterns, Not Regular Expressions

The patterns used in parameter expansion are not regular expressions. They are only shell pattern matching, where * matches 0 or more characters, ? matches a single character, and [chars] matches any one of the characters inside the braces.

Shorthand for dirname or Suffix Removal

Similar to how # will remove a prefix, that is, remove from the lefthand side, we can remove a suffix, that is, from the righthand side, by using %. A double percent sign indicates removing the longest possible match. Here are some examples that show how to remove a suffix. The first examples show a variable $FN, which holds the name of an image file. It might end in .jpg or .jpeg or .png or .gif. See how the different patterns remove various parts of the righthand side of the string. The last few examples show how to get something similar to dirname from the $0 parameter:

Value in shell variable Expression Result returned

img.1231.jpg

${FN%.*}

img.1234

img.1231.jpg

${FN%%.*}

img

./ascript

${0%/*}

.

./ascript

${0%%/*}

.

/home/guy/bin/ascript

${0%/*}

/home/guy/bin

/home/guy/bin/ascript

${0%%/*}

This parameter substitution for dirname isn’t an exact replica of the output from the command. It differs in the case where the path is /file because dirname would return just a slash, whereas our parameter substitution would remove it all. You can check for this if you want with some additional logic in your script, you could ignore this case if you don’t expect to see it, or you can just add a slash to the end of the parameter, as in ${0%/*}/, so that all results would end in a slash.

Prefix and Suffix Removal

You can remember that # removes the left part and % the right part because, at least on a standard US keyboard, # is shift-3, which is to the left of % at shift-5.

Other Modifiers

More than just # and %, there are a few other modifiers that can alter a value via parameter expansion. You can convert either the first character or all characters in a string to uppercase via ^ or ^^, respectively, or to lowercase via , or ,, as shown in these examples:

Value in shell variable TXT Expression Result returned

message to send

${TXT^}

Message to send

message to send

${TXT^^}

MESSAGE TO SEND

Some Words

${TXT,}

some Words

Do Not YELL

${TXT,,}

do not yell

You might also consider declare -u UPPER and declare -l lower, which declare these shell variables to have their content converted to upper- or lowercase, respectively, for any text assigned to those variables.

The most flexible modifier is the one that does a substitution anywhere in the string, not just at the front or tail of the string. Similar to the sed command, it uses the slash, /, to indicate what pattern to match and what value to replace it with. A single slash means a single substitution (of the first occurrence). Using two slashes means to replace every occurrence. Here are a few examples:

Value in shell variable FN Expression Result returned

FN="my filename with spaces.txt”

${FN/ /_}

my_filename with spaces.txt

FN="my filename with spaces.txt”

${FN// /_}

my_filename_with_spaces.txt

FN="my filename with spaces.txt”

${FN// /}

myfilenamewithspaces.txt

FN="/usr/bin/filename”

${FN//\// }

usr bin filename

FN="/usr/bin/filename”

${FN/\// }

usr/bin/filename

No Trailing Slash

Note that there is no trailing slash like you would find in other similar commands like sed or vi. The closing brace ends the substitution.

Why not always use this substitution mechanism? Why bother with # or % substitution from the ends of the string? Consider this filename: frank.gifford.gif, and suppose you wanted to change this filename to a jpg file using Image Magick’s convert command (that’s another story). The substitute using / doesn’t have a way to anchor the search to one end of the string or the other. If you had read in the filename and tried to replace the .gif with .jpg, what you would end up with is frank.jpgford.gif. For situations like this, the % substitution, which takes from the end of the string, works much better.

Another useful modifier will extract a substring of the variable. After the variable name, put a colon, then the offset to the first character of the substring that you want to extract. Since this is an offset, start at 0 for the first character of the string. Next, put another colon and the length of the substring you want. If you leave off this second colon and a length, then you get the whole rest of the string. Here are a few examples:

Value in shell variable FN Expression Result returned

/home/bin/util.sh

${FN:0:1}

/

/home/bin/util.sh

${FN:1:1}

h

/home/bin/util.sh

${FN:3:2}

me

/home/bin/util.sh

${FN:10:4}

util

/home/bin/util.sh

${FN:10}

util.sh

Example 4-1 shows the use of parameter expansion to parse data out of some input to create and process specific fields to use when automatically creating a configuration for firewall rules. We’ve also included a larger table of bash parameter expansions in the code, as we do a lot in this book, as a “real code readability” example. The output follows in Example 4-2.

Example 4-1. Parsing using parameter expansions: code
#!/usr/bin/env bash
# parameter-expansion.sh: parameter expansion for parsing, and a big list
# Original Author & date: _bash Idioms_ 2022
# bash Idioms filename: examples/ch04/parameter-expansion.sh
#_________________________________________________________________________
# Does not work on Zsh 5.4.2!

customer_subnet_name='Acme Inc subnet 10.11.12.13/24'

echo ''
echo "Say we have this string: $customer_subnet_name"

customer_name=${customer_subnet_name%subnet*}  # Trim from 'subnet' to end
subnet=${customer_subnet_name##* }             # Remove leading 'space*'
ipa=${subnet%/*}                               # Remove trailing '/*'
cidr=${subnet#*/}                              # Remove up to '/*'
fw_object_name=${customer_subnet_name// /_}    # Replace space with '_-
fw_object_name=${fw_object_name////-}          # Replace '/' with '-'
fw_object_name=${fw_object_name,,}             # Lowercase

echo ''
echo 'When the code runs we get:'
echo ''
echo "Customer name: $customer_name"
echo "Subnet:        $subnet"
echo "IPA            $ipa"
echo "CIDR mask:     $cidr"
echo "FW Object:     $fw_object_name"

# bash Shell Parameter Expansion: https://oreil.ly/Af8lw

# ${var#pattern}                Remove shortest (nongreedy) leading pattern
# ${var##pattern}               Remove longest (greedy) leading pattern
# ${var%pattern}                Remove shortest (nongreedy) trailing pattern
# ${var%%pattern}               Remove longest (greedy) trailing pattern

# ${var/pattern/replacement}    Replace first +pattern+ with +replacement+
# ${var//pattern/replacement}   Replace all +pattern+ with +replacement+

# ${var^pattern}                Uppercase first matching optional pattern
# ${var^^pattern}               Uppercase all matching optional pattern
# ${var,pattern}                Lowercase first matching optional pattern
# ${var,,pattern}               Lowercase all matching optional pattern

# ${var:offset}                 Substring starting at +offset+
# ${var:offset:length}          Substring starting at +offset+ for +length+

# ${var:-default}               Var if set, otherwise +default+
# ${var:-default}               Assign +default+ to +var+ if +var+ not already set
# ${var:?error_message}         Barf with +error_message+ if +var+ not set
# ${var:+replaced}              Expand to +replaced+ if +var+ _is_ set

# ${#var}                       Length of var
# ${!var[*]}                    Expand to indexes or keys
# ${!var[@]}                    Expand to indexes or keys, quoted

# ${!prefix*}                   Expand to variable names starting with +prefix+
# ${!prefix@}                   Expand to variable names starting with +prefix+, quoted

# ${var@Q}                      Quoted
# ${var@E}                      Expanded (better than `eval`!)
# ${var@P}                      Expanded as prompt
# ${var@A}                      Assign or declare
# ${var@a}                      Return attributes
Example 4-2. Parsing using parameter expansions: output
Say we have this string: Acme Inc subnet 10.11.12.13/24

When the code runs we get:

Customer name: Acme Inc
Subnet:        10.11.12.13/24
IPA            10.11.12.13
CIDR mask:     24
FW Object:     acme_inc_subnet_10.11.12.13-24

Conditional Substitutions

Some of these variable substitutions are conditional, that is, they happen only if certain conditions are met. You could accomplish the same thing using if statements around the assignments, but these idioms make for shorter code for certain common cases. These conditional substitutions are shown here with a colon and then another special character: a minus, plus, or equal sign. The condition that they check for is this: is the variable null or unset? A null variable is a variable whose value is the null string. An unset variable is one that hasn’t yet been assigned or was explicitly unset (think “discarded”) with the unset command. With positional parameters (like $1, $2, etc.), they are unset if the user doesn’t supply a parameter in that position.

If you don’t include the colon in these conditional substitutions, then they only consider the case of an unset variable; null values are returned as is.

Default Values

A common scenario is a script with a single, optional parameter. If the parameter isn’t supplied when the script is invoked, then a default value should be used. In bash, we might write something like this:

LEN=${1:-5}

This will set the variable LEN either to the value of the first parameter ($1)—if one was supplied—or else to the value 5. Here is an example script:

LEN="${1:-5}"
cut -d',' -f2-3 /tmp/megaraid.out | sort | uniq -c | sort -rn | head -n "$LEN"

It takes the second and third fields from a comma-separated values file called /tmp/megaraid.out, sorts those values, provides a count of the number of occurrences of each value pair, then shows the top 5 from the list. You can override the default value of 5 and show the top 3 or 10 (or however many you want) simply by specifying that count as the sole parameter to the script.

Comma-Separated Lists

Another conditional substitution, using the plus sign, also checks to see if the variable has a value and if so, if it will return a different value. That is, it returns the specified different value only if the variable is not null. Yes, that does sound strange; if it has a value, why return a different value?

A handy use for this seemingly odd logic is to construct a comma-separated list. You typically construct such a list by repeatedly appending “,value” or “value,” for every value. When doing so, you usually need an if statement to avoid having an extra comma on the front or end of this list—but not when you use this join idiom:

for fn in * ; do
    S=${LIST:+,}            # S for separator
    LIST="${LIST}${S}${fn}"
done

See also Example 7-1.

Modified Value

Up to now, none of these substitutions have modified the underlying value of the variable. There is, however, one that does. If we write ${VAR:=value}, it will act much like our preceding default value idiom, but with one big exception. If VAR is empty or unset, it will assign that value to the variable (hence, the equal sign) and return that value. (If VAR is already set, it will simply return its existing value.) Note, however, that this assigning of a value does not work for positional parameters (like $1), which is why you don’t see it used nearly as often.

$RANDOM

Bash has a very handy $RANDOM variable. As the “Bash Variables” section in the Bash Reference Manual says:

Each time this parameter is referenced, a random integer between 0 and 32767 is generated. Assigning a value to this variable seeds the random number generator.

While this is not suitable for cryptographic functions, it’s useful for rolling the dice or adding a bit of noise into otherwise too-predictable operations. We use this later in “A Simple Word Count Example”.

As shown in Example 4-3, you can pick a random element out of a list.

Example 4-3. Pick a random list element
declare -a mylist
mylist=(foo bar baz one two "three four")

range=${#mylist[@]}
random=$(( $RANDOM % $range ))  # 0 to list length count

echo "range = $range, random = $random, choice = ${mylist[$random]}"

# Shorter but less readable 6 months from now:
# echo "choice = ${mylist[$(( $RANDOM % ${#mylist[@]} ))]}"

You may also see something like this:

TEMP_DIR="$TMP/myscript.$RANDOM"
[ -d "$TEMP_DIR" ] || mkdir "$TEMP_DIR"

However, that is subject to race conditions, and is obviously a simple pattern. It is also partly predictable, but sometimes you want to have a clue as to what code is cluttering up $TMP. Don’t forget to set a trap (see “It’s a Trap!”) to clean up after yourself. We recommend you consider using mktemp, though that’s a large issue outside the scope of bash idioms.

$RANDOM and dash

$RANDOM is not available in dash, which is /bin/sh in some Linux distributions. Notably, current versions of Debian and Ubuntu use dash because it is smaller and faster than bash and thus helps to boot faster. But that means that /bin/sh, which used to be a symlink to bash, is now a symlink to dash instead, and various bash-specific features will not work. It does work in Zsh though.

Command Substitution

We’ve already used command substitution quite a bit in Chapter 2, but we haven’t talked about it. The old Bourne way to do it is `` (backticks/backquotes), but we prefer the more readable POSIX $() instead. You will see a lot of both forms, because it’s how you pull output into a variable; for example:

unique_lines_in_file="$(sort -u "$my_file" | wc -l)"

Note that these are the same, but the second one is internal and faster:

for arg in $(cat /some/file)
for arg in $(< /some/file)     # Faster than shelling out to cat

Command Substitution

Command substitution is critical to cloud and other DevOps automation because it allows you to gather and use all the IDs and details that only exist at runtime; for example:

instance_id=$(aws ec2 run-instances --image $base_ami_id ... \
  --output text --query 'Instances[*].InstanceId')

state=$(aws ec2 describe-instances --instance-ids $instance_id \
  --output text --query 'Reservations[*].Instances[*].State.Name')

Nesting Command Substitution

Nesting command substitution using `` gets very ugly, very fast, because you must escape the inner backticks in each nesting layer. It’s much easier to use $() if you can, as shown:

### Just Works
$ echo $(echo $(echo $(echo inside)))
inside

### Broken
$ echo `echo `echo `echo inside```
echo inside

### "Works" but very ugly
$ echo `echo \`echo \\\`echo inside\\\`\``
inside

Thanks to our reviewer Ian Miell for pointing this out and providing the example.

Style and Readability: Recap

When referencing a variable in bash, you have the opportunity to edit the value as you set or retrieve it. A few special characters at the end of the variable reference can remove characters from the front or end of the string value, alter its characters to upper- or lowercase, substitute characters, or give you just a substring of the original value. Common use of these handy features results in idioms for default values, basename and dirname substitutes, and the creation of a comma-separated list without using an explicit if statement.

Variable substitutions are a great feature in bash, and we recommend making good use of them. However, we also strongly recommend that you comment those statements to make it clear what sort of substitution you are attempting. The next reader of your code will thank you.

Get bash Idioms now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.