Chapter 4. Managing MSH Scope and State

So far, we’ve seen a number of different aspects of MSH, including the idea of cmdlets, pipelines, and the shell language of variables, functions, and filters. In this chapter, we’ll look at the MSH infrastructure that brings all of these components together and allows them to work seamlessly with each other.

In the following pages, we’ll look at some of the services that MSH provides. The material we’ll cover here is applicable throughout the shell, and tools such as text matching and regular expressions will likely become an indispensable part of your toolkit.

Control Access to Variables and Functions

We’ve already touched on the idea that variables defined in functions might not always be accessible from other functions, scripts, or cmdlets. This limited visibility is known as scoping , and it is used by MSH to segregate data when several different script blocks and cmdlets are in play. It’s important to realize that scoping doesn’t offer privacy or security; instead, the ability to hide is used to simplify the authoring of scripts. As we’ll see, this behavior comes in handy with more complicated scripts and tasks as it reduces the potential for these script fragments to interfere with each other.

In general, MSH controls scope automatically; scripts and functions often “just work” as a result. However, it’s always wise to understand how scoping comes into play, especially in cases in which there’s a need to do something differently.

How Do I Do That?

In interactive mode, we are working in the global scope, in which any variables or functions defined are accessible from everywhere within the shell:

    MSH D:\MshScripts> function showA { write-host $a }
    MSH D:\MshScripts> $a = 10
    MSH D:\MshScripts> showA
    10

This little example seems obvious. Let’s see what happens when we define a variable and assign it a value inside a function:

    MSH D:\MshScripts> function doWork { $processes = get-process }
    MSH D:\MshScripts> doWork
    MSH D:\MshScripts> $processes.Count
    MSH D:\MshScripts>

After we’ve run the function, it might be reasonable to expect that $processes would contain some data, yet it remains undefined when we try to inspect it. This is actually a good thing: if $processes had been storing important data, its value would have been accidentally overwritten inside the function call.

In fact, what we’ve done here is define a variable in a local scope (one that is created just for the function). By the time we’re back at the prompt, any variables defined in that scope have disappeared. Generally speaking, this behavior helps functions keep their own activities to themselves and encourages information to be emitted explicitly as a result (as we saw in the previous chapter), rather than relying on variables to pass information around.

Because there are cases where we’d like variables to live longer than the lifetime of the function that defines them, MSH has syntax for working explicitly with variables in the global scope:

    MSH D:\MshScripts> function doWork { $global:processes = get-process }
    MSH D:\MshScripts> doWork
    MSH D:\MshScripts> $processes.Count
    29

This difference in scope can give rise to some unexpected behavior. In these cases, it’s possible for a variable to take on a value that we don’t expect or for an assignment to apparently fail. Let’s go back to the showA example and add another function that updates the value of $a before we display it:

    MSH D:\MshScripts> function setA { $a = 5 }
    MSH D:\MshScripts> $a=10
    MSH D:\MshScripts> setA
    MSH D:\MshScripts> showA
    10

As in the doWork example, here the setA function is making a change to its own $a variable in its local scope. Even though there’s a global variable $a already present, the setA function will not make any changes to it. Because showA has a completely different local scope—one in which no local $a is present—it uses the value of the global $a instead.

Fortunately, MSH provides multiple levels of scope. When one function is invoked from inside another, the invoked function can see the global scope and the scope of its parent; it also has its own separate local scope. If we call the showA function from within a scope in which the value has been changed, it will see the new value instead:

    MSH D:\MshScripts> function setAndShowA { $a = 5; showA }
    MSH D:\MshScripts> $a=10
    MSH D:\MshScripts> setAndShowA
    5
    MSH D:\MshScripts> $a
    10

MSH offers several other explicit scope indictors in the same format as the global: syntax. As the local scope is always assumed, the following example is functionally equivalent to the previous setA definition, but the use of local: helps to convey the scope considerations (in other words, it clarifies how futile this setA function is):

    MSH D:\MshScripts> function setA { $local:a = 5 }

Scripts, like functions, are run within their own special script scope, which is created when the script starts and discarded when it ends. The script: prefix is convenient for modifying variables that are defined in a script but that are outside of the current function and are not global variables. For example, consider a script similar to the one in Example 4-1 that keeps a tally of failures during a series of checks and reports the number of problems at the end of the script.

Example 4-1. Use of script scope variables in a script
function checkProcessCount
{
    if ((get-process).Count -gt 50)
    {
        $script:failureCount++
    }
}

$failureCount = 0
checkProcessCount
...
"Script complete with $failureCount errors"

In cases such as running the profile, we don’t want a script to run in its own scope and would prefer it to impact the global scope. Instead of running the script by filename alone, it is dot sourced with a period followed by the filename. Running a script in this way tells MSH to load the child scope into the parent scope when the script is complete (see Example 4-2).

Example 4-2. DotSourceExample.msh
$c = 20

Now, we can see the difference between the two methods of running the script:

    MSH D:\MshScripts> .\DotSourceExample.msh
    MSH D:\MshScripts> $c

    MSH D:\MshScripts> . .\DotSourceExample.msh
    MSH D:\MshScripts> $c
    20

What Just Happened?

Scoping applies to all user-defined elements of the MSH language, including variables, functions, and filters. Fortunately, it follows a series of simple rules and is always predictable.

There are four categories of scope: global, local, script, and private.

Global scope

Only one global scope is created per MSH session when the shell is started. Global scopes are not shared between different instances of MSH.

Local scope

A new local scope is always created when a function, filter, or script is run. The new scope has read access to all scopes of its parent, its parent’s parent, and so on, up to the global scope. Because scopes are inherited downward in this fashion, children can read from (but not write to) the scope of their parents, yet parents cannot read from the scope of their children.

An alternative way of looking at this is to appreciate the lifetime of a scope (the time from its creation to the point at which it is discarded). Just as new scopes are created when entering a script block (or function, filter, etc.), they are discarded as soon as the script block is finished. Were a parent to try and access variables in a child’s scope before the script block had run, the variables wouldn’t exist yet; should they try afterward, the scope would have been discarded and all variables within it would be gone.

Script scope

A script scope is created whenever a script file is run, and it is discarded when the script finishes. All script files are subject to this behavior unless they are dot sourced, in which case their script scope is loaded into the scope of their parent when the script is complete. If one dot-sourced script (a.msh) dot sources another (b.msh), the same rules apply: when b.msh completes, its scope is loaded into the script scope of a.msh; when a.msh completes, their combined scopes are loaded into the parent scope.

Private scope

The private and local scopes are very similar, but they have one key difference: definitions made in the private scope are not inherited by any children scopes.

Table 4-1 summarizes the available scopes and their lifetimes.

Table 4-1. Scopes and their lifetimes

Scope name

Lifetime

Global

Entire MSH session

Local

Current script block and any scripts/functions invoked from it

Script

Current script file and any scripts/functions invoked from it

Private

Current script block only; scripts/functions invoked from current block will not inherit variables defined in this scope

There are a few general rules about scoping that are useful to remember:

  • Unless explicitly stated, a variable can be read and changed only within the scope in which it was created.

  • Scopes are inherited from parent to children. Children can access any data in their parents’ scope with the exception of privately scoped variables.

  • The local scope is always the current one, and any references are assumed to refer to it. In other words, any reference such as $a is interpreted as $local:a.

  • The global, local, and private scopes are always available. In some cases, such as when working interactively at the prompt, the global and local scopes will be the same.

What About...

... What if I don’t want functions to inherit the scope of the block that calls them? Although rarely used, this is the primary function of the private scope, which can be used to hide data from children. Working with the earlier example, if we now define $a as a private variable, subsequent function calls will be unable to retrieve its value:

    MSH D:\MshScripts> $private:a = 5
    MSH D:\MshScripts> showA
    MSH D:\MshScripts>

... How does get-childitem Variable: deal with different scopes? As we’ve already seen, this special drive shows the variables defined for the current scope. Executing get-childitem Variable: from the prompt will show the content of the global scope. However, running the same command from within a script file or function may return a different list of results that will include all of the global variables plus any others than have been defined in the local scope.

Now, we’re going to put variables aside for a while and discuss how the hosting environment handles strings of text.

Work with Special Characters

Although MSH does a good job of taking input and parsing it to get an understanding of intent, there are times when it needs some help. For example, consider the simple copy-item cmdlet, which takes two parameters: source and destination. In the typical case, usage is very simple and easy to follow:

    MSH D:\MshScripts> copy-item file1 file2

But what happens when filenames contain spaces?

    MSH D:\MshScripts> copy-item my file1 file2

Does this mean copy my file1 to file2, copy my to file1 file2, or something else? Clearly, we need some way of identifying where one parameter ends and another begins. These quoting rules go beyond cmdlet parameters and are used consistently throughout the shell.

Next, we’ll look at some of the different types of strings available, their delimiters, and some special character sequences employed by MSH that allow us to express exactly what we mean.

How Do I Do That?

Let’s start with an easy example:

    MSH D:\MshScripts> "In double quotes"
    In double quotes
    MSH D:\MshScripts> 'In single quotes'
    In single quotes

Does this mean we can use single quotes and double quotes interchangeably? Not exactly. MSH makes a subtle but important distinction between the two: single quotes are used to represent a literal string (one that will be used exactly as is) whereas MSH looks through strings inside double quotes and replaces any variable names with their values in a process known as variable expansion:

    MSH D:\MshScripts> $myName = "Andy"
    MSH D:\MshScripts> "Hello, $myName"
    Hello, Andy
    MSH D:\MshScripts> 'Hello, $myName'
    Hello, $myName

Single quotes are allowed inside of strings enclosed in double quotes and vice versa. This is often convenient when quotation marks are needed within a string.

    MSH D:\MshScripts> $myName = "Andy"
    MSH D:\MshScripts> 'He said "Hello, $myName"'
    He said "Hello, $myName"
    MSH D:\MshScripts> "He said 'Hello, $myName'"
    He said 'Hello, Andy'

What if I really wanted to output He said "Hello, Andy" with double quotes instead? Nested quotation marks aren’t going to cut it here, so we somehow need to include the double quote character inside the string.

MSH enables us to do this by escaping the double quote character, giving special instructions on how to interpret it differently than usual. When the grave accent character ('), also known as a backquote or backtick, is used inside a string, MSH understands that the character immediately following it has a special meaning.

    MSH D:\MshScripts> "He said '"Hello, $myName'""
    He said "Hello, Andy"
    MSH D:\MshScripts> "Col 1'tCol 2'tCol 3"
    Col 1    Col 2    Col 3

What Just Happened?

Because a string can be defined in MSH in different ways, the quoting rules are used to instruct the shell exactly how it should work with the content between the quotation marks and whether it should be passed straight through or undergo some processing first. The difference between the two main cases is in how MSH treats the $ sign and any variable names that follow it. If the single quotation marks are used, MSH does not inspect the string and uses it as is. By using double quotation marks, you are implicitly asking the shell to do a search on any variable names within the string, replacing them with the current value of the variable.

Variable expansion in double-quoted strings consistently follows some simple rules. If the $ sign appears in the string, any legal characters following it are assumed to refer to a variable name. MSH will look forward until it hits something that doesn’t qualify (such as a space, newline, tab, comma, etc.) and use everything up to that point as the variable name. The shell then looks up the value of that variable, converts it to a string, and places the value into the original string:

    MSH D:\MshScripts> $alpha = 2
    MSH D:\MshScripts> $alphabet = 9
    MSH D:\MshScripts> "$alphabet"
    9                # matches $alphabet not $alpha, otherwise this would
                     # return "2bet"
    MSH D:\MshScripts> "some value=$undefinedVariableName"
    some value=

Because MSH looks forward until it hits a nonalphanumeric character, a special syntax is used for the expansion of more complex variables such as arrays and hashtables. Parentheses can be used immediately after the $ sign to enclose the entire variable name and any indexers necessary for correct evaluation:

    MSH D:\MshScripts> $arr = @("first","second")
    MSH D:\MshScripts> "$arr[0]"
    first second[0]            # Oops!
    MSH D:\MshScripts> "$($arr[0])"
    first

The ability to nest different styles of quotation marks inside each other is often a handy shortcut, but it is not a universal solution. For example, the string 'He said "I'm here"' is invalid because there is an uneven number of single quotation marks in the string. In anything but simple cases, it’s better to rely on escape characters for including quotation marks inside a string.

With both of the approaches available, should single or double quotes be used? This depends on several factors. In clear-cut cases, the decision is based on the rules: if the string contains a $ sign that needs to be expanded, use double quotes; if the $ sign is intended literally, single quotes can be used or the character can be escaped as '$. In the general case, however, it often comes down to personal preference. Most of the examples we’ve seen throughout the book so far have used double quotes, even when variable expansion is not expected. This makes it very easy to add a variable into a string at a later date and have it expanded automatically with a negligible difference in performance.

The escape character (') has a number of meanings depending on its location and usage. When used on the command line, it indicates that the character immediately following it should be passed on without substitution or processing, which is helpful when a character has meaning to MSH but isn’t meant to be used in that fashion. For example:

    MSH D:\MshScripts> write-host Use the -Object option
    write-host : A parameter cannot be found that matches parameter 'Use'.
    At line:1 char:11
    + write-host  <<<< Use the -Object option
    MSH D:\MshScripts> write-host Use the '-Object option
    Use the -Object option
    MSH D:\MshScripts> copy-item my' file1 file2
    # equivalent to copy "my file1" "file2"

When used within a string (of either single or double quote variety), MSH knows to replace the escape character and the character immediately following it with a special character. Table 4-2 lists some of the escape sequences you’re likely to need.

Table 4-2. Common MSH escape sequences

Character

Meaning

''

Single quote

'"

Double quote

''

Grave accent

'$

Dollar symbol

'0

Null character (different than $( ))

'a

Alert (beep)

'b

Backspace

'f

Form feed

'n

Newline

'r

Carriage return

't

Tab

'v

Vertical tab

What About...

... Do Unix shells use the grave accent character for something else? Yes. In many shells, the grave accent character is used for command substitution when the output of a command is assigned to a variable and used for further processing.

For example, in bash:

    PROCESSOR='uname -p'
    NOW='date'
    echo "This machine is a $PROCESSOR"
    echo "Current time is $NOW"

Command substitution is a critical tool when data is passed around as text without any definition of structure. Because MSH works more comfortably with structured data, it’s usually more convenient to draw information from the shell, another cmdlet, or the .NET Framework than it is to draw information from the textual output of another command.

The equivalent MSH script would read:

    $processor = $Env:PROCESSOR_IDENTIFIER
    $now = get-date
    write-host "This machine is a $processor"
    write-host "Current time is $now"

Of course, given that $processor and $now represent structured data, we’re in a position to work with information such as $now.Year without any further effort. In any case, the direct equivalent to the '...' syntax is $(...), which will cause MSH to evaluate the expression in parentheses (e.g., "The current date is $(get-date)").

... What about “here strings “? The term here string refers to a third technique for defining strings. Two markers are used to represent the start and end of the string (@" and "@, respectively), and anything that stands between them, including newline characters, is included in the definition:

    MSH D:\MshScripts> $a = 10
    MSH D:\MshScripts> $longString = @"
    >>First line
    >>Second line
    >>Variable expansion $a
    >>"@
    >>
    MSH D:\MshScripts> $longString
    First line
    Second line
    Variable expansion 10
    MSH D:\MshScripts>

MSH does provide equivalent command substitution syntax for cases in which it is necessary to capture the textual output of a command:

    $pingTarget="127.0.0.1"
    $pingOutput = $(ping $pingTarget)

Use Wildcards to Define a Set of Items

Wildcard matching is one of the great shortcuts provided by almost all command shells available today. Instead of having to enumerate a list of files one by one, we can use some special characters that translate to “anything.” In this section, we’ll look at some of the evolutionary changes in MSH with regard to wildcard matching, and we’ll see how some of the new syntax can be used to list sets of files for processing more easily.

How Do I Do That?

When it comes to wildcards, MSH supports the familiar wildcard syntax using the ? and * characters to represent any character and any sequence of characters, respectively. For a quick refresher, let’s look at a few commands that make use of these wildcard characters :

    MSH D:\MshScripts> get-childitem *.msh


        Directory: FileSystem::D:\MshScripts


    Mode    LastWriteTime     Length Name
    ----    -------------     ------ ----
    -a---   Mar 20 21:55        491  createfiles.msh
    -a---   Mar 20 18:15        118  updatefiles.msh
    -a---   Mar 23 00:50        117  unittest1.msh
    -a---   Mar 28 22:04        243  unittest2.msh
    -a---   Mar 22 23:36         49  unittest4.msh
    -a---   Mar 22 23:36         88  unittestA.msh


    MSH D:\MshScripts> get-childitem *files.msh


        Directory: FileSystem::D:\MshScripts


    Mode    LastWriteTime     Length Name
    ----    -------------     ------ ----
    -a---   Mar 20 21:55        491  createfiles.msh
    -a---   Mar 20 18:15        118  updatefiles.msh

    MSH D:\MshScripts> get-childitem unittest?.msh


        Directory: FileSystem::D:\MshScripts


    Mode    LastWriteTime     Length Name
    ----    -------------     ------ ----
    -a---   Mar 23 00:50        117  unittest1.msh
    -a---   Mar 28 22:04        243  unittest2.msh
    -a---   Mar 22 23:36         49  unittest4.msh

MSH offers several other wildcard characters for more flexibility in wildcard matching. Square brackets ([]) can be used to specify a set of characters, any one of which can be used to make a match. In other words, square brackets behave somewhat like the question mark (?) but, instead of matching any character, they match just those characters that appear inside the brackets. Sometimes, instead of having to write out every possible character, it’s more convenient to define a range; the hyphen character (-) can be used between a start and an end character to indicate that anything in between is also valid:

    MSH D:\MshScripts> get-childitem unittest[14].msh


        Directory: FileSystem::D:\MshScripts


    Mode    LastWriteTime     Length Name
    ----    -------------     ------ ----
    -a---   Mar 23 00:50        117  unittest1.msh
    -a---   Mar 22 23:36         49  unittest4.msh

    MSH D:\MshScripts> get-childitem unittest[a-z].msh


        Directory: FileSystem::D:\MshScripts


    Mode    LastWriteTime     Length Name
    ----    -------------     ------ ----
    -a---   Mar 22 23:36         88  unittestA.msh

What About...

... Using a wildcard character in your filenames? Although most common filesystems prohibit the use of the ? and * in file and folder names, the [ and ] characters are generally available. If you have a file called default[12].htm, the command get-childitem default[12].htm won’t find it because the wildcard rules tell the cmdlet to look for default1.htm and default2.htm. In this case, the escape character can be used to tell MSH not to expand the wildcard syntax; get-childitem default'[12'].htm will work well.

Where Can I Learn More?

The help page for the topic about_Wildcard contains more information about the changes in wildcard matches and introduces a couple of other special match characters, including ^ and $, which refer to the start and end of the filename, respectively.

With this flexibility at our fingertips when matching against filenames, it seems unjust that so far we’ve only been able to use the -eq comparison operator to test whether two strings are identical. It’s time to see how wildcards can be applied to text strings in comparisons.

Take String Comparison Beyond -eq, -lt, and -gt

In Chapter 3, we looked at a number of comparison operators, such as -eq, -lt, and -gt, which can be applied to many of the different types of data we work with in the shell. Each of these three operators works effectively on strings by checking for identical strings and giving some idea of relative alphabetical ordering. However, there are many cases in which we’d like to do a more meaningful comparison of the actual letters within a string—for example, to test whether the string contains a certain shorter string or whether it matches a certain format such as the aaa.bbb.ccc.ddd format of a numeric IP address.

We’ll look at two approaches to matching strings in this section. The first case uses the -like operator with some basic wildcard rules to see whether one string contains another. The second technique uses the -match operator and relies on regular expressions to communicate more complex matching rules. Before we begin, let’s look at some examples of regular expressions.

Regular Expressions

A regular expression describes a set of matching strings according to a series of rules. In this section, we’ll cover a few of the basic rules and look at some common examples, but it’s important to realize that regular expressions are a vast topic that won’t be covered exhaustively here. For a more complete picture of the topic, consider picking up a copy of Mastering Regular Expressions (O’Reilly).

There are three principles that are fundamental to understanding and effectively using regular expressions. The first is the concept of alternates—that is, the idea that a single regular expression can express two or more different strings to match against. Alternates are separated by a vertical bar (|), which is the same symbol used for building a pipeline. For example, the regular expression w3svc|iisadmin|msftpsvc matches “w3svc”, “iisadmin”, “msftpsvc”, and the string “w3svc service is started but iisadmin is not.” Square brackets are often used as shorthand for specifying single-character alternates—for example, where [aeiou] is equivalent to a|e|i|o|u. The hyphen can also be used inside brackets to cover a range; [a-m] matches any letter in the first half of the alphabet.

Second, different parts of a regular expression can be grouped together using parentheses. Grouping is useful when only part of a longer regular expression is subject to alternation or quantification. For example, the regular expression (w3|msftp)svc matches both “w3svc” and “msftpsvc.” Groups can be nested inside each other, provided every open parenthesis is matched to a closing one.

Quantification, the third key part of regular expressions, gives us the power to specify how many times a certain character or sequence must occur to constitute a match. For example, the regular expression (domain\\)?user would match “user” and “domain\user” but not “domain\domain\user”. Table 4-3 describes the quantifiers available for use.

Table 4-3. Common quantifiers for denoting quantity in regular expressions

Quantifier

Matches the preceding expression...

*

Zero or more times

+

One or more times

?

Once at most

{n}

Exactly n times

{n,}

At least n times

{n,m}

At least n and at most m times

Regular expressions can also use a set of special characters as shorthand for common matches. These special characters, shown in Table 4-4, are different from those covered earlier in this chapter, and they apply only to regular expressions.

Table 4-4. Common special characters used in regular expressions

Special character

Meaning

.

Any single character

^

Start of a string

$

End of a string

\b

Word boundary (such as a space or newline)

\d

Digit (0-9)

\n

Newline

\s

Whitespace (space, tab, newline, etc.)

\t

Tab

\w

Word (alphabet plus digits and underscore)

Many of these special characters have an inverse associated with their capital letter form. For example, \S matches anything that isn’t whitespace, and \W matches anything that isn’t a word or digit.

To wrap up this short tour, Table 4-5 contains a few examples of simple regular expressions that we’ll rely on in the examples that follow.

Table 4-5. Simple regular expressions

Type of information

Regular expression

Windows username

(\w*\\)?\w*

IP address

^\d+\.\d+\.\d+\.\d+$

Simple private IP addresses (RFC 1918 defined 10.x.x.x, 172.16-32.x.x, 192.168.x.x)

^(10\.\d+\.\d+\.\d+|172\.[1-3][0-9]\.\d+\.\d+|192\.168\.\d+\.\d+)$

GUID (in the registry format of {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx})

^{?[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}}?$

With the basics in place, it’s time to match some strings.

How Do I Do That?

Let’s start by reviewing the -eq comparison operator. When used on strings, -eq does a case-insensitive test to see whether strings are identical—not close, but identical:

    MSH D:\MshScripts> "foo" -eq "foo"
    True
    MSH D:\MshScripts> "foo" -eq "bar"
    False
    MSH D:\MshScripts> "foo " -eq "foo"
    False
    MSH D:\MshScripts> "foo" -eq "FOO"
    True

The -like operator brings into play all of the wildcards we just saw. The behaviors of *, ?, [, and ] all follow the same rules as we saw when matching filenames:

    MSH D:\MshScripts> "foo" -like "foo"
    True
    MSH D:\MshScripts> "foobar" -like "foo*"
    True
    MSH D:\MshScripts> "foobar" -like "*ba?"
    True
    MSH D:\MshScripts> "gray" -like "gr[ae]y"
    True

-like has a related operator, -clike, that is used to perform case-sensitive matching. The two operators treat wildcards in almost exactly the same fashion; the only difference is that the -clike operator distinguishes between uppercase and lowercase letters:

    MSH D:\MshScripts> "foo" -like "FOO"
    True
    MSH D:\MshScripts> "foo" -clike "FOO"
    False

Both -like and -clike have inverse commands that return true when no match is made and false when a match is present. The -notlike operator is a handy shortcut for -not ("a" -like "b"):

    MSH D:\MshScripts> "foo" -notlike "FOO"
    False
    MSH D:\MshScripts> "foo" -cnotlike "FOO"
    True

Wildcard comparisons are a useful tool and can be applied to all types of string-matching tasks. However, there are types of strings that cannot be captured in sufficient detail with wildcards alone. For example, it’s possible to match one character (?) or any number of characters (*), yet there’s no way to express a match of, say, exactly four. Likewise, a wildcard match is wide open—letters, numbers, and punctuation are all allowed. For some more specific matches, it’s time to bring in the regular expressions.

MSH performs regular expression matching with the -match operator. As with -like, it, too, has related operators for case sensitivity (-cmatch) and negative matches (-notmatch and -cnotmatch).

Let’s look at a few simple regular expression matches. Although we’re looking at all of these examples as simple command-line Boolean tests, these ideas can easily be transferred to other places, such as the where-object cmdlet, taking wildcards to a whole new level:

    MSH C:\WINDOWS\system32> "ipv6.exe" -match ".*exe"
    True
    MSH D:\MshScripts> "ipv6.exe" -match ".*\d{1}.*exe"
    True
    MSH D:\MshScripts> "ipv6.exe" -match ".*\d{2}.*exe"
    False                # regex required two consecutive digits
    MSH D:\MshScripts> get-childitem | where-object { $_ -match ".*\d{2}.*exe" }


        Directory: FileSystem::C:\WINDOWS\System32


    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---          8/4/2004   5:00 AM      47104 cmdl32.exe
    -a---          8/4/2004   5:00 AM      39936 cmmon32.exe
    -a---          8/4/2004   5:00 AM      45568 drwtsn32.exe
    -a---          8/4/2004   5:00 AM      45568 extrac32.exe
    -a---          8/4/2004   5:00 AM      92224 krnl386.exe
    -a---          8/4/2004   5:00 AM     123392 mplay32.exe
    -a---          8/4/2004   5:00 AM       3252 nw16.exe
    -a---          8/4/2004   5:00 AM      32768 odbcad32.exe
    -a---          8/4/2004   5:00 AM       3584 regedt32.exe
    -a---          8/4/2004   5:00 AM      11776 regsvr32.exe
    -a---          8/4/2004   5:00 AM      33280 rundll32.exe
    ...

It’s worthwhile to compare the behavior of -like with -match to better understand their differences. Even the simplest cases turn up some surprises:

    MSH D:\MshScripts> "foobar" -like "foo"
    False
    MSH D:\MshScripts> "foobar" -match "foo"
    True

When used without any special characters, quantifiers, or alternates, regular expression matching is similar to wildcard matching with one key difference: if no wildcards are present in a -like match, the strings must be identical for a match to occur, whereas with a regular expression, it’s sufficient for the string to simply contain the regular expression. When writing regular expressions, it’s important to keep this in mind and start the regular expression with a caret (^) and end it with a dollar sign ($). The following example shows the different outcomes that result when you try to match an invalid dotted IP address against the two types of regular expression:

    MSH D:\MshScripts> "1.2.3.4.5" -match "\d+\.\d+\.\d+\.\d+"
    True        # No!
    MSH D:\MshScripts> "1.2.3.4.5" -match "^\d+\.\d+\.\d+\.\d+$"
    False       # That's better

Let’s take a look at a slightly more involved example. First, we’ll use the regular expression for a GUID and verify that it’s working correctly against a sample GUID:

    MSH D:\MshScripts> $guidRegex = "^{?[0-9a-f]{8}-([0-9a-f]{4}-){3}
    [0-9a-f]{12}}?$"
    MSH D:\MshScripts> $myGuid = [System.Guid]::NewGuid( ).ToString( )
    MSH D:\MshScripts> $myGuid
    496a3bc7-861d-4176-9778-e01f266ba835
    MSH D:\MshScripts> $myGuid -match $guidRegex
    True

For one last example, let’s turn our attention to IP addresses. To grab the current IP address, we’ll again dip into the .NET Framework and then run the IP through a couple of regular expressions to confirm that it’s both valid and non-private:

    MSH D:\MshScripts> function get-ipaddress {
    >>$hostname = [System.Net.Dns]::GetHostName( )
    >>$hosts = [System.Net.Dns]::GetHostByName($hostname)
    >>$hosts.AddressList[0].ToString( )
    >>}
    >>
    MSH D:\MshScripts> $ipRegex = "^\d+\.\d+\.\d+\.\d+$"
    MSH D:\MshScripts> $privateIpRegex = "^(10\.\d+\.\d+\.\d+|172\.[1-3][0-9]\.\
    d+\.\d+|192\.168\.\d+\.\d+)$"
    MSH D:\MshScripts> $myIP = get-ipaddress
    MSH D:\MshScripts> $myIP
    169.254.136.191
    MSH D:\MshScripts> $myIP -match $ipRegex
    True
    MSH D:\MshScripts> $myIP -notmatch $privateIpRegex
    True

What About...

... Why is -like needed? Can’t its behavior be achieved just by using the -match operator? While it’s true that regular expressions can be used to get the same results as wildcard matches, there are good reasons to have both. If the -like wildcard syntax makes a comparison easier to read, it usually makes long-term maintenance of scripts easier as well.

... Does variable expansion work here? Absolutely. As we saw earlier, MSH exercises variable expansion on any strings it sees that are enclosed in double quotes. Make sure to use single or double quotes appropriately, depending on how you want MSH to handle your variables:

    MSH D:\MshScripts> $myVar = "Andy"
    MSH D:\MshScripts> "Hello, $myVar" -ilike "*andy*"
    True
    MSH D:\MshScripts> $myRegex = "\d{3}"
    MSH D:\MshScripts> "test133" -match "test$myRegex"
    True
    MSH D:\MshScripts> "test148" -match 'test$myRegex' # no expansion
    False

Where Can I Learn More?

We’ve only scratched the surface of regular expressions in this section. They stand as a very expressive tool for solving all types of text-matching scenarios, and they can be significantly more complex and powerful than the examples we’ve looked at here. The regular expression language covered here is precisely the same as the one offerered by the .NET Framework.Information is available at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconregularexpressionslanguageelements.asp.

We’ve covered a number of distinct aspects of the MSH infrastructure in this chapter. We’ll wrap up with a discussion of the error-handling mechanisms built into MSH.

When Things Go Wrong

For anything but the simplest of tasks, there’s always a chance that something can go wrong. Aware that processes and scripts don’t always run as planned and can sometimes hit unexpected problems, MSH offers an error-handling system that gives the script author the ability to control what happens next in times of trouble.

Before we begin, it’s important to call out the two types of errors that can occur when processing a command. The first type, a non-terminating error, indicates that some problem has occurred but that execution can still continue. An example of a non-terminating error is an access problem that occurs when trying to read a protected resource or write to a read-only file. In contrast, a terminating error signifies a condition in which execution cannot possibly continue and the command is terminated.

The core of the error-handling system is exposed by the trap keyword. The keyword is always followed by a script block that contains instructions for what to do when an error occurs. To make use of this, we’ll also come across some additional ubiquitous parameters that are used to specify what a cmdlet should do in the case of error.

How Do I Do That?

Let’s start with a simple example that is guaranteed to cause a problem: division by zero. Dividing any number by zero generates an error message and causes MSH to complain:

    MSH D:\MshScripts> 100/0
    Attempted to divide by zero.
    At line:1 char:5
    + 100/0 <<<<

Whenever a runtime error occurs, MSH automatically updates the special $error array with information about the problem. The most recent error is in the first slot ([0]), the second most recent at [1], and so on:

    MSH D:\MshScripts> $error[0]
    Attempted to divide by zero.

The $error variable is useful for diagnosing errors after execution has finished, but suppose we’d like to take action as the problems arise. For this simple example, instead of just writing out the message to the screen, we want to write out a special message when a problem occurs. Let’s create a script, shown in Example 4-3, that contains a very simple error handler.

Example 4-3. SimpleTrap.msh
trap
{
    "In error handler"
}

100/0

Now, when we run the script, we’ll see that our own trap statement is run. This is just the beginning:

    MSH D:\MshScripts> SimpleTrap.msh
    In error handler
    : Attempted to divide by zero.
    At D:\MshScripts\SimpleTrap.msh:6 char:5
    + 100/0 <<<<

When inside the trap block, MSH automatically populates the special variable $_ with details of the problem that landed execution there. Now we’re in business. Example 4-4 contains the improved trap handler.

Example 4-4. ImprovedTrap.msh
trap
{
    "In error handler"
    "Problem:"+$_.Message
}

100/0

Dealing with division by zero cases probably isn’t typical of day-to-day problems. Let’s instead look at the task of copying a set of files where we know one will fail. For this scenario, let’s assume we have one folder, source, that contains files a.txt, b.txt, and c.txt, and we’re planning to copy them into the dest folder that already contains a write-protected copy of a.txt. We can set up this little structure from either an MSH or CMD prompt with the following commands:

    mkdir source
    "content" > source\a.txt
    "content" > source\b.txt
    "content" > source\c.txt

    mkdir dest
    copy source\a.txt dest\a.txt
    attrib +r dest\a.txt

Now that we’re set up, let’s try copying the contents of source to dest:

    MSH D:\MshScripts> copy-item source\* dest
    copy-item : Access to the path 'D:\MshScripts\dest\a.txt' is denied.

As expected, we see that the a.txt file could not be overwritten because it is write-protected. However, on closer inspection, look what made it into dest:

    MSH D:\MshScripts> get-childitem dest


        Directory: FileSystem::D:\MshScripts\dest


    Mode    LastWriteTime     Length Name
    ----    -------------     ------ ----
    -ar--   Apr 05 16:16          9  a.txt
    -a---   Apr 05 16:16          9  b.txt
    -a---   Apr 05 16:16          9  c.txt

Sure enough, the b.txt and c.txt files made it over. Although the copy-item cmdlet hit a problem, it kept on trying to copy the other files that matched the wildcard.

The cmdlet’s behavior in the face of a non-terminating error is controlled by the -ErrorAction option. By default, this takes a value of Continue, which, in case you hadn’t guessed, instructs the cmdlet to notify the user that a problem occurred (by generating the “Access to the path ... " message in this case) and continue processing any additional cases. By using another ErrorAction setting, we can change how the cmdlet deals with problems.

First, let’s reset the scenario by deleting the b.txt and c.txt files with a del dest\[bc].txt command. This time, we’ll tell MSH to ask us what to do if any problems arise by using the -ErrorAction Inquire setting:

    MSH D:\MshScripts> copy-item -ErrorAction Inquire source\* dest

    Confirm
    Access to the path 'D:\MshScripts\dest\a.txt' is denied.
    [Y] Yes  [A] Yes to All  [H] Halt Command  [S] Suspend  [?] Help
    (default is "Y"):

MSH will now wait for some user input about what to do next before it moves ahead.

Finally, let’s take a look at Stop, one of the other ErrorAction settings that effectively transforms any non-terminating errors into terminating errors and instructs the cmdlet to give up immediately and execute the trap handler if present. In Example 4-5, we bring together a handful of the techniques we’ve learned so far to create a simple script for ROBOCOPY-like behavior that will retry a file copy 10 times before giving up. For the sake of consistency, we’ll continue to try overwriting the write-protected file so, it’s fairly unlikely that any of the 10 attempts will succeed.

Example 4-5. RetryCopy.msh
$retryCount=10

while ($retryCount -gt 0)
{
        $success = $true

        trap {
                $script:retryCount--
                $script:success = 0
                "Retrying.."
                continue
        }

        copy-item -ErrorAction Stop source\* dest

        if ($success) { $retryCount = 0 }
}

"Done"

When it comes time to run this script, we’ll see the script iterating through its loop before finally giving up:

    MSH D:\MshScripts> .\retryCopy.msh
    Retrying..
    Retrying..
    Retrying..
    Retrying..
    Retrying..
    Retrying..
    Retrying..
    Retrying..
    Retrying..
    Retrying..
    Done

What Just Happened?

The trap keyword is a basic part of the MSH script language and is equal in standing to many other keywords such as while, if, and for. The trap keyword can be followed by an error type in square brackets to indicate that its handler should only be run if the error is of the specified type. For example, to catch only problems with division by zero, we would write the following:

    trap [DivideByZeroException]
    {
        "Divide by zero trapped"
        break
    }

A trap must always include a script block, which defines the instructions to run when a problem arises. The $_ special variable is always available within the script block to enable the script to figure out what went wrong (which is often useful for deciding what to do next).

trap blocks are subject to scoping rules just as variables are. A trap block is entered only when an error occurs at that level. A parent scope will never invoke the trap handlers of any children, but an error inside a child (such as a function, filter, or loop) will cause execution to jump to the nearest trap block. Each scope can contain several trap blocks; when more than one is present, each is executed in turn when a problem arises.

After execution has finished inside the trap block, the error has usually become evident to the user. By placing a continue statement at the end of the trap block (as the last instruction before the closing brace), MSH understands that it is to continue at the end of the trap handler instead of terminating execution.

The ErrorAction option has a number of different settings that control how cmdlets behave when problems arise. In a pipeline, it’s valid to use different ErrorAction settings for different stages; indeed, it’s this fine-grain control that gives MSH its flexibility in handling different types of errors at each stage of processing. Table 4-6 describes the valid ErrorAction settings and the effects of each one.

Table 4-6. ErrorAction values and their effects

ErrorAction value

Effect

Stop

Abort on failure and treat all non-terminating errors as terminating

Continue

Generate an error and continue

Inquire

Ask the user how to proceed (see below)

SilentlyContinue

Proceed to the next item without generating an error

The Inquire prompt is worthy of a short discussion. It is shown when a problem arises and user input is needed to determine the next steps. The “Yes” option allows execution to start up again but will result in a similar prompt for every failure case that follows. Meanwhile, “Yes to All” assumes that “Yes” will be the answer for any future failures, so no further questions will be asked. “No” has the effect of stopping the cmdlet in its tracks so that no further processing will be attempted, whereas “No to All” is assumed to stop all future cmdlets. The “Suspend” option is useful because it starts a little sub-shell with all of the current settings, state, and an MSH> prompt, and it allows for browsing and troubleshooting. In the previous example, we could have used a sub-shell to run a quick attrib -r a.txt to resolve the issue.

What About...

... Changing the default ErrorAction value? Yes, you can do it. Instead of having to supply an -ErrorAction option for every cmdlet, MSH actually picks up the default value from a global variable called $ErrorActionPreference. If your preference is to have MSH ask how to proceed in every instance of a problem, add a $ErrorActionPreference="Inquire" line to your profile.

What’s Next?

So far, we’ve made good progress and have covered all the basics of the MSH environment and its language features. In the next chapter, we’ll spend some time focusing on the pipeline, examining its behavior in more depth, and looking at how we can really make use of the rich data that passes through it.

Get Monad (AKA PowerShell) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.