Chapter 4. Glob Patterns And Other Basics

In the last chapter, I showed some simple patterns that allow you to avoid having to specify exactly what you want to wait for. In this chapter, I will describe how to use patterns that you are already probably familiar with from the shell—glob patterns. I will also describe what happens when patterns do not match. I will go over some other basic situations such as how to handle timeouts. Finally I will describe what to do at theends of scripts and processes.

The * Wildcard

Suppose you want to match all of the input and the only thing you know about it is that hi occurs within it. You are not sure if there is more to it, or even if another hi might appear. You just want to get it all. To do this, use the asterisk (*). The asterisk is a wildcard that matches any number of characters. You can write:

expect "hi*"
send "$expect_out(0,string) $expect_out(buffer)"

If the input buffer contained "philosophic\n“, expect would match the entire buffer. Here is the output from the previous commands:

hilosophic
 philosophic

The pattern hi matched the literal hi while the * matched the string "losophic\n“. The first p was not matched by anything in the pattern so it shows up in expect_out(buffer) but not in expect_out(0,string).

Earlier I said that * matches any number of characters. More precisely, it tries to match the longest string possible while still allowing the pattern itself to match. With the input buffer of "philosophic\n“, compare the effects of the following two commands:

expect "hi*"
expect "hi*hi"

In the first one, the * matches losophic\n. This is the longest possible string that the * can match while still allowing the hi to match hi. In the second expect, the * only matches losop, thereby allowing the second hi to match. If the * matched anything else, the entire pattern would fail to match.

What happens with the following command in which there are two asterisks?

expect "*hi*"

This could conceivably match in two ways corresponding to the two occurrences of “hi” in the string.

 

* matches

hi matches

* matches

possibility (1)

philosop

hi

c\n

possibility (2)

p

hi

losophic\n

What actually happens is possibility (1). The first * matches philosop. As before, each * tries to match the longest string possible allowing the total pattern to match, but the *’s are matched from left to right. The leftmost *’s match strings before the rightmost *’s have a chance. While the outcome is the same in this case (that is, the whole pattern matches), I will show cases later where it is necessary to realize that pattern matching proceeds from left to right.

* At The Beginning Of A Pattern Is Rarely Useful

Patterns match at the earliest possible character in a string. In Chapter 3 (p. 74), I showed how the pattern hi matched the first hi in philosophic. However, in the example above, the subpattern hi matched the second hi. Why the difference?

The difference is that hi was preceded by "*“. Since the * is capable of matching anything, the leading * causes the match to start at the beginning of the string. In contrast, the earliest point that the bare hi can match is the first hi. Once that hi has matched, it cannot match anything else—including the second hi.

In practice, a leading * is usually redundant. Most patterns have enough literal letters that there is no choice in how the match occurs. The only remaining difference is that the leading * forces the otherwise unmatched leading characters to be stored in expect_out(0,string). However, the characters will already be stored in expect_out(buffer) so there is little merit on this point alone.[18]

* At The End Of A Pattern Can Be Tricky

When a * appears at the right end of a pattern, it matches everything left in the input buffer (assuming the rest of the pattern matches). This is a useful way of clearing out the entire buffer so that the next expect does not return a mishmash of things that were received previously and things that are brand new.

Sometimes it is even useful to say:

expect *

Here the * matches anything. This is like saying, “I don’t care what’s in the input. Throw it away.” This pattern always matches, even if nothing is there. Remember that * matches anything, and the empty string is anything! As a corollary of this behavior, this command always returns immediately. It never waits for new data to arrive. It does not have to since it matches everything.

In the examples demonstrating * so far, each string was entered by a person who pressed return afterwards. This is typical of most programs, because they run in what is called cooked mode. Cooked mode includes the usual line-editing features such as backspace and delete-previous-word. This is provided by the terminal driver, not the program. This simplifies most programs. They see the line only after you have edited it and pressed return.

Unfortunately, output from processes is not nearly so well behaved. When you watch the output of a program such as ftp or telnet (or cat for that matter), it may seem as if lines appear on your screen as atomic units. But this is not guaranteed. For example, in the previous chapter, I showed that when ftp starts up it looks like this:

% ftp ftp.uu.net
Connected to ftp.uu.net.
220 ftp.UU.NET FTP server (Version 6.34 Thu Oct 22 14:32:01 EDT 1992) ready.
Name (ftp.uu.net:don):

Even though the program may have printed "Connected to ftp.uu.net.\n" all at once—perhaps by a single printf in a C program—the UNIX kernel can break this into small chunks, spitting out a few characters eachtime to the terminal. For example, it might print out "Conn" and then”ecte" and then "d to" and so on. Fortunately, computers are so fast that humans do not notice the brief pauses in the middle of output. The reason the system breaks up output like this is that programs usually produce characters faster than the terminal driver can display them. The operating system will obligingly wait for the terminal driver to effectively say, “Okay, I’ve displayed that last bunch of characters. Send me a couple more.” In reality, the system does not just sit there and wait. Since it is running many other programs at the same time, the system switches its attention frequently to other programs. Expect itself is one such “other program” in this sense.

When Expect runs, it will immediately ask for all the characters that a program produced only to find something like "Conn“. If told to wait for a string that matches "Name*:“, Expect will keep asking the computer if there is any more output, and it will eventually find the output it is looking for.

As I said, humans are slow and do not notice this chunking effect. In contrast, Expect is so fast that it is almost always waiting. Thus, it sees most output come as chunks rather than whole lines. With this in mind, suppose you wanted to find out the version of ftp that a host is using. By looking back at the output, you can see that it is contained in the greeting line that begins "220" and ends with "ready.“. Naively, you could wait for that line as:

expect "220*"                     ;# dangerous

If you are lucky, you might get the entire line stored in $expect_out(0,string). You might even get the next line in there as well. But more likely, you will only get a fragment, such as "220 f" or "220 ftp.UU.NE“. Since the pattern 220* matches either of these, expect has no reason to wait further and will return. As I stated earlier, expect returns with whatever is the longest string that matches the pattern. The problem here is that the remainder of the line may not have shown up yet!

If you want to get the entire line, you must be more specific. The following pattern works:

"220*ready."

By specifying the text that ends the line, you force expect to wait for the entire line to arrive. The "." is not actually needed just to find the version identifier. You could just make the pattern:

"220*re"

Leaving off the e would be too short. This would allow the pattern to match the r in server rather than ready. It is possible to make the overall pattern even shorter by looking for more unusual patterns. But quite often you trade off readability. There is an art to choosing patterns that are correct, yet not too long but still readable. A good guideline is to give more priority to readability. The pattern matching performed by Expect is very inexpensive.



[18] The more likely reason to see scripts that begin many patterns with "*" is that prior to Expect version 4, all patterns were anchored, with the consequence that most patterns required a leading "*“.

Get Exploring Expect now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.