Inserting the current date and time automatically and in such a configurable format is pretty neat and probably beyond the ken of most text editors, but its usefulness is limited. Undoubtedly more useful would be the ability to store a writestamp in a file; that is, the date and/or time the file was last written to disk. A writestamp updates itself each time the file is saved anew.
The first thing we'll need is a way to run our writestamp-updating code each time the file is saved. As we discovered in the section Hooks in Chapter 2, the best way to associate some code with a common action (such as saving a file) is by adding a function to a hook variable, provided that a suitable hook variable exists. Using M-x apropos RET hook RET, we discover four promising hook variables: after-save-hook, local-write-file-hooks, write-contents-hooks
, and write-file-hooks
.
We can discard after-save-hook
right away. We don't want our code executed, modifying writestamps, after the file is saved, because then it will be impossible to save an up-to-date version of the file!
The differences between the remaining candidates are subtle:
write-file-hooks
Code to execute for any buffer each time it is saved.
local-write-file-hooks
A buffer-local version of
write-file-hooks
. Recall from the Hooks section of Chapter 2 that a buffer-local variable is one that can have different values in different buffers. Whereaswrite-file-hooks
pertains to every buffer,local-write-file-hooks
can pertain to individual buffers. Thus, if you want to run one function while saving a Lisp file and another one when saving a text file,local-write-file-hooks
is the one to use.write-contents-hooks
Like
local-write-file-hooks
in that it's buffer-local and it contains functions to execute each time the buffer is saved to a file. However—and I warned you this was subtle—the functions inwrite-contents-hooks
pertain to the buffer's contents, while the functions in the other two hooks pertain to the files being edited. In practice, this means that if you change the major mode of the buffer, you're changing the way the contents should be considered, and thereforewrite-contents-hooks
reverts tonil
butlocal-write-file-hooks
doesn't. On the other hand, if you change Emacs's idea of which file is being edited, e.g. by invokingset-visited-file-name
, thenlocal-write-file-hooks
reverts tonil
andwrite-contents-hooks
doesn't.
We'll rule out write-file-hooks
because we'll want to invoke our writestamp-updater only in buffers that have writestamps, not every time any buffer is saved. And, hair-splitting semantics aside, we'll rule out write-contents-hooks
because we want our chosen hook variable to be immune to changes in the buffer's major mode. That leaves local-write-file-hooks
.
Now, what should the writestamp updater that we'll put in local-write-file-hooks
do? It must locate each writestamp, delete it, and replace it with an updated one. The most straightforward approach is to surround each writestamp with a distinguishing string of characters that we can search for. Let's say that each writestamp is surrounded by the strings "
WRITESTAMP(("
on the left and "
))"
on the right, so that in a file it looks something like this:
... went into the castle and lived happily ever after. The end. WRITESTAMP((12:19pm 7 Jul 96))
Let's say that the stuff inside the WRITESTAMP((…))
is put there by insert-date
(which we defined earlier) and so its format can be controlled with insert-date-format
.
Now, supposing we have some writestamps in the file to begin with,[20] we can update it at file-writing time like so:
(add-hook 'local-write-file-hooks 'update-writestamps) (defun update-writestamps () "Find writestamps and replace them with the current time." (save-excursion (save-restriction (save-match-data (widen) (goto-char (point-min)) (while (search-forward "WRITESTAMP((" nil t) (let ((start (point))) (search-forward "))") (delete-region start (- (point) 2)) (goto-char start) (insert-date)))))) nil)
There's a lot here that's new. Let's go through this function a line at a time.
First we notice that the body of the function is wrapped inside a call to save-excursion
. What save-excursion
does is memorize the position of the cursor, execute the subexpressions it's given as arguments, then restore the cursor to its original position. It's useful in this case because the body of the function is going to move the cursor all over the buffer, but by the time the function finishes we'd like the caller of this function to perceive no cursor motion. There'll be much more about save-excursion
in Chapter 8.
Next is a call to save-restriction
. This is like save-excursion
in that it memorizes some information, then executes its arguments, then restores the information. The information in this case is the buffer's restriction, which is the result of narrowing. Narrowing is covered in Chapter 9. For now let's just say that narrowing refers to Emacs's ability to show only a portion of a buffer. Since update-writestamps
is going to call widen
, which undoes the effect of any narrowing, we need save-restriction
in order to clean up after ourselves.
Next is a call to save-match-data
that, like save-excursion
and save-restriction
, memorizes some information, executes its arguments, then restores the information. This time the information in question is the result of the latest search. Each time a search occurs, information about the result of the search is stored in some global variables (as we will see shortly). Each search wipes out the result of the previous search. Our function will perform a search, but for the sake of other functions that might be calling ours, we don't want to disrupt the global match data.
Next is a call to widen
. As previously mentioned, this undoes any narrowing in effect. It makes the entire buffer accessible, which is necessary if every write-stamp is to be found and updated.
Next we move the cursor to the beginning of the buffer with (goto-char (point-min))
in preparation for the function's main loop, which is going to search for each successive writestamp and rewrite it in place. The function point-min
returns the minimum value for point, normally 1
. (The only time (point-min)
might not be 1
is when there's narrowing in effect. Since we've called widen
, we know narrowing is not in effect, so we could write (goto-char 1)
instead. But it's good practice to use point-min
where appropriate.)
The main loop looks like this:
(while (search-forward "WRITESTAMP((" nil t) ...)
This is a while
loop, which works very much like while loops in other languages. Its first argument is an expression that is tested each time around the loop. If the expression evaluates to true, the remaining arguments are executed and the whole cycle repeats.
The expression (search-forward "WRITESTAMP((" nil t)
searches for the first occurrence of the given string, starting from the current location of point. The nil
means the search is not bounded except by the end of the buffer. This is explained in more detail later. The t
means that if no match is found, search-forward
should simply return nil
. (Without the t, search-forward
signals an error, aborting the current command, if no match is found.) If the search is successful, point is moved to the first character after the matched text, and search-forward
returns that position. (It's possible to find where the match began using match-beginning
, which is shown in Figure 4-1.)
(let ((start (point))) ...)
This creates a temporary variable, start
, that holds the location of point, which is the beginning of the date string inside the WRITESTAMP((…))
delimiters.
With start
defined, the body of the let
contains:
(search-forward "))") (delete-region start (- (point) 2)) (goto-char start) (insert-date)
This call to search-forward
places point after the two closing parentheses. We still know the beginning of the timestamp, because this location is in start
, as shown in Figure 4-2.
This time, only the first argument to search-forward
, the search string, is given. Earlier we saw two additional arguments: the search bound, and whether to signal an error. When omitted, they default to nil
(unbounded search) and nil
(signal an error if the search fails).
After search-forward
succeeds—and if it fails, an error is signaled and execution of the function never gets past search-forward
—delete-region
deletes the text region that is the date in the writestamp, starting at position start
and ending before position (- (point) 2)
(two characters to the left of point), leaving the results shown in Figure 4-3.
Next, (goto-char start)
positions the cursor inside the writestamp delimiters and, finally, (insert-date)
inserts the current date.
The while
loop executes as many times as there are matches for the search string. It's important that each time a match is found, the cursor remains "to the right" of the place where the match began. Otherwise, the next iteration of the loop will find the same match for the search string!
When the while
loop is done, save-match-data
returns, restoring the match data; then save-restriction
returns, restoring any narrowing that was in effect; then save-excursion
returns, restoring point to its original location.
The final expression of update-writestamps
, after the call to save-excursion
, is
nil
This is the function's return value. The return value of a Lisp function is simply the value of the last expression in the function's body. (All Lisp functions return a value, but so far every function we've written has done its job via "side effects" instead of by returning meaningful values.) In this case we force it to be nil
. The reason is that functions in local-write-file-hooks
are treated specially. Normally, the return value of a function in a hook variable doesn't matter. But for functions in local-write-file-hooks
(also in write-file-hooks
and write-contents-hooks
), a non-nil
return value means, "This hook function has taken over the job of writing the buffer to a file." If the hook function returns a non-nil
value, the remaining functions in the hook variables are not called, and Emacs does not write the buffer to a file itself after the hook functions run. Since update-writestamps
is not taking over the job of writing the buffer to a file, we want to be sure it returns nil
.
This approach to implementing writestamps works, but there are a few problems. First, by hardwiring the strings "
WRITESTAMP(("
and "
))"
we've doomed the user to an unaesthetic and inflexible way to distinguish writestamps in text. Second, the user's preference might not be to use insert-date
for writestamps.
These problems are simple to fix. We can introduce three new variables: one that, like insert-date-format
and insert-time-format
, describes a time format to use; and two that describe the delimiters surrounding a writestamp.
(defvar writestamp-format "%C" "*Format for writestamps (c.f. 'format-time-string').") (defvar writestamp-prefix "WRITESTAMP((" "*Unique string identifying start of writestamp.") (defvar writestamp-suffix "))" "*String that terminates a writestamp.")
Now we can modify update-writestamps
to be more configurable.
(defun update-writestamps () "Find writestamps and replace them with the current time." (save-excursion (save-restriction (save-match-data (widen) (goto-char (point-min)) (while (search-forward writestamp-prefix nil t) (let ((start (point))) (search-forward writestamp-suffix) (delete-region start (match-beginning 0)) (goto-char start) (insert (format-time-string writestamp-format (current-time)))))))) nil)
In this version of update-writestamps
, we've replaced occurrences of "
WRITESTAMP(("
and "
))"
with writestamp-prefix
and writestamp-suffix
, and we've replaced insert-date
with
(insert (format-time-string writestamp-format (current-time)))
We also changed the call to delete-region
. Previously it looked like this:
(delete-region start (- (point) 2))
That was when we had the writestamp suffix hardwired to be "
))"
, which is two characters long. But now that the writestamp suffix is stored in a variable, we don't know in advance how many characters long it is. We could certainly find out, by calling length
:
(delete-region start (- (point) (length writestamp-suffix)))
but a better solution is to use match-beginning
. Remember that before the call to delete-region
is
(search-forward writestamp-suffix)
No matter what writestamp-suffix
is, search-forward
finds the first occurrence of it, if one exists, and returns the first position after the match. But extra data about the match, notably the position where the match begins, is stored in Emacs's global match-data variables. The way to access this data is with the functions match-beginning
and match-end
. For reasons that will become clear shortly, match-beginning
needs an argument of 0
to tell you the position of the beginning of the match for the latest search. In this case, that happens to be the beginning of the writestamp suffix, which also happens to be the end of the date inside the writestamp, and therefore the end of the region to delete:
(delete-region start (match-beginning 0))
Suppose the user chooses "
Written: "
and "
."
as the writestamp-prefix
and writestamp-suffix
, so that writestamps appear like so: "Written: 19 Aug 1996." This is a perfectly reasonable preference, but the string "
Written: "
is less likely than "
WRITESTAMP(("
to be completely unique. In other words, the file may contain occurrences of "
Written: "
that aren't writestamps. When update-writestamps
searches for writestamp-prefix
, it might find one of these occurrences, then search for the next occurrence of a period and delete everything in between. Worse, this unwanted deletion takes place almost undetectably, just as the file is being saved, with the cursor location and other appearances preserved.
One way to solve this problem is to impose tighter constraints on how the writestamp may appear, making mismatches less likely. One natural restriction might be to require writestamps to appear alone on a line: in other words, a string is a writestamp only if writestamp-prefix
is the first thing on the line and writestamp-suffix
is the last thing on the line.
Now it won't suffice to use
(search-forward writestamp-prefix ...)
to find writestamps, because this search isn't constrained to find matches only at the beginnings of lines.
This is where regular expressions come in handy. A regular expression—called a regexp or regex for short—is a search pattern just like the first argument to search-forward
. Unlike a normal search pattern, regular expressions have certain syntactic rules that allow more powerful kinds of searches. For example, in the regular expression '^Written:
', the caret (^
) is a special character that means, "this pattern must match at the beginning of a line." The remaining characters in the regexp '^Written:
' don't have any special meaning in regexp syntax, so they match the same way ordinary search patterns do. Special characters are sometimes called metacharacters or (more poetically) magic.
Many UNIX programs use regular expressions, among them sed, grep, awk, and perl. The syntax of regular expressions tends to vary slightly from one application to another, unfortunately; but in all cases, most characters are non-"magic" (particularly letters and numbers) and can be used to search for occurrences of themselves; and longer regexps can be built up from shorter ones simply by stringing them together. Here is the syntax of regular expressions in Emacs.
Backslash, followed by a magic character, matches that character literally. So, for example,
\.
matches a period. Since backslash itself is magic,\\
matches\
itself.A set of characters inside square brackets matches any one of the enclosed characters. So
[aeiou]
matches any occurrence ofa
ore
ori
or o oru
. There are some exceptions to this rule—the syntax of square brackets in regular expressions has its own "subsyntax," as follows:A range of consecutive characters, such as
abcd
, can be abbreviateda-d
. Any number of such ranges can be included, and ranges can be intermixed with single characters. So[a-dmx-z]
matches anya, b, c, d, m, x, y
, orz
.If the first character is a caret (
^
), then the expression matches any character not appearing inside the square brackets. So[^a-d]
matches any character excepta, b, c
, ord
.To include a right-square-bracket, it must be the first character in the set. So
[]a]
matches]
ora
. Similarly,[^]a]
matches any character except]
anda
.To include a hyphen, it must appear where it can't be interpreted as part of a range; for example, as the first or last character in the set, or following the end of a range. So
[a-e-z]
matchesa, b, c, d, e
, -, orz
.To include a caret, it must appear someplace other than as the first character in the set.
Other characters that are normally "magic" in regexps, such as
*
and.
, are not magic inside square brackets.
A regexp x may have one of the following suffixes:
An asterisk, matching zero or more occurrences of x
A plus sign, matching one or more occurrences of x
A question mark, matching zero or one occurrence of x
So
a*
matchesa, aa, aaa
, and even an empty string (zeroa
s);[21]a+
matchesa, aa, aaa
, but not an empty string; anda?
matches an empty string anda
. Note that x+
is equivalent to xx*
.The regexp
^
x matches whatever x matches, but only at the beginning of a line.The regexp x
$
matches whatever x matches, but only at the end of a line.This means that
^
x$
matches a line containing nothing but a match for x. In this case, you could leave out x altogether;^$
matches a line containing no characters.Two regular expressions x and y separated by
\ |
match whatever x matches or whatever y matches. Sohello\|goodbye
matcheshello
orgoodbye
.A regular expression x enclosed in escaped parentheses—
\(
and\)
—matches whatever x matches. This can be used for grouping complicated expressions. So\(ab\)+
matchesab, abab, ababab
, and so on. Also,\(ab\|cd\)ef
matchesabef
orcdef
.As a side effect, any text matched by a parenthesized subexpression is called a submatch and is memorized in a numbered register. Submatches are numbered from 1 through 9 by counting occurrences of
\(
in a regexp from left to right. So if the regexpab\(cd*e\)
matches the textabcddde
, then the one and only submatch is the stringcddde
. If the regexpab\(cd\|ef\(g+h\)\)j\(k*\)
matches the textabefgghjkk
, then the first submatch isefggh
, the second submatch isggh
, and the third submatch iskk
.Backslash followed by a digit n matches the same text matched by the nth parenthesized subexpression from earlier in the same regexp. So the expression
\(a+b\)\1
matchesabab, aabaab
, andaaabaaab
, but notabaab
(becauseab
isn't the same asaab
).The empty string can be matched in a wide variety of ways.
\`
matches the empty string that's at the beginning of the buffer. So\`hello
matches the stringhello
at the beginning of the buffer, but no other occurrence ofhello
.\´
matches the empty string that's at the end of the buffer.\=
matches the empty string that's at the current location of point.\b
matches the empty string that's at the beginning or end of a word. So\bgnu\b
matches the word "gnu" but not the occurrence of "gnu" inside the word "interregnum".\B
matches the empty string that's anywhere but at the beginning or end of a word. So\Bword
matches the occurrence of "word" in "sword" but not in "words".\<
matches the empty string at the beginning of a word only.\>
matches the empty string at the end of a word only.
As you can see, regular expression syntax uses backslashes for many purposes. So does Emacs Lisp string syntax. Since regexps are written as Lisp strings when programming Emacs, the two sets of rules for using backslashes can cause some confusing results. For example, the regexp ab\|cd
, when expressed as a Lisp string, must be written as "ab\\|cd"
. Even stranger is when you want to match a single \
using the regexp \\
: you must write the string "\\\\"
. Emacs commands that prompt for regular expressions (such as apropos
and keep-lines
) allow you to type them as regular expressions (not Lisp strings) when used interactively.
Now that we know how to assemble regular expressions, it might seem obvious that the way to search for writestamp-prefix
at the beginning of a line is to prepend a caret onto writestamp-prefix
and append a dollar sign onto writestamp-suffix
, like so:
(re-search-forward (concat "^" writestamp-prefix) ...) ;wrong! (re-search-forward (concat writestamp-suffix "$") ...) ;wrong!
The function concat
concatenates its string arguments into a single string. The function re-search-forward
is the regular expression version of search-forward
.
This is almost right. However, it contains a common and subtle error: either writestamp-prefix
or writestamp-suffix
may contain "magic" characters. In fact, writestamp-suffix
does, in our example: it's "
."
. Since .
matches any character (except newline), this expression:
(re-search-forward (concat writestamp-suffix "$") ...)
which is equivalent to this expression:
(re-search-forward ".$" ...)
matches any character at the end of a line, whereas we only want to match a period (.
).
When building up a regular expression as in this example, using pieces such as writestamp-prefix
whose content is beyond the programmer's control, it is necessary to "remove the magic" from strings that are meant to be taken literally. Emacs provides a function for this purpose called regexp-quote
, which understands regexp syntax and can turn a possibly-magic string into the corresponding non-magic one. For example, (regexp-quote ".")
yields "\\."
as a string. You should always use regexp-quote
to remove the magic from variable strings that are used to build up regular expressions.
We now know how to begin a new version of update-writestamps
:
(defun update-writestamps () "Find writestamps and replace them with the current time." (save-excursion (save-restriction (save-match-data (widen) (goto-char (point-min)) (while (re-search-forward (concat "^" (regexp-quote writestamp-prefix)) nil t) ...)))) nil)
Let's finish our new version of update-writestamps
by filling in the body of the while
loop. Just after re-search-forward
succeeds, we need to know whether the current line ends with writestamp-suffix
. But we can't simply write
(re-search-forward (concat (regexp-quote writestamp-suffix) "$"))
because that could find a match several lines away. We're only interested in knowing whether the match is on the current line.
One solution is to limit the search to the current line. The optional second argument to search-forward
and re-search-forward
, if non-nil
, is a buffer position beyond which the search may not go. If we plug in the buffer position corresponding to the end of the current line like so:
(re-search-forward (concat (regexp-quote writestamp-suffix)
"$")
end-of-line-position)
then the search is limited to the current line, and we'll have the answer we need. So how do we come up with end-of-line-position? We simply put the cursor at the end of the current line using end-of-line
, then query the value of point. But after we do that and before re-search-forward
begins, we must make sure to return the cursor to its original location since the search must start from there. Moving the cursor then restoring it is exactly what save-excursion
is designed to do. So we could write:
(let ((end-of-line-position (save-excursion (end-of-line) (point)))) (re-search-forward (concat (regexp-quote writestamp-suffix) "$") end-of-line-position))
which creates a temporary variable, end-of-line-position
, that is used to limit re-search-forward
; but it's simpler not to use a temporary variable if we don't really need it:
(re-search-forward (concat (regexp-quote writestamp-suffix) "$") (save-excursion (end-of-line) (point)))
Observe that the value of the save-excursion
expression is, like so many other Lisp constructs, the value of its last subexpression (point)
.
So update-writestamps
can be written like this:
(defun update-writestamps () "Find writestamps and replace them with the current time." (save-excursion (save-restriction (save-match-data (widen) (goto-char (point-min)) (while (re-search-forward (concat "^" (regexp-quote writestamp-prefix)) nil t) (let ((start (point))) (if (re-search-forward (concat (regexp-quote writestamp-suffix) "$") (save-excursion (end-of-line) (point)) t) (progn (delete-region start (match-beginning 0)) (goto-char start) (insert (format-time-string writestamp-format (current-time)))))))))) nil)
Notice that both calls to re-search-forward
have t
as the optional third argument, meaning "if the search fails, return nil
(as opposed to signaling an error)."
We have created a more or less straightforward translation of update-writestamps
from its original form to use regular expressions, but it doesn't really exploit the power of regexps. In particular, the entire sequence of finding a writestamp prefix, checking for a matching writestamp suffix on the same line, and replacing the text in between can be reduced to just these two expressions:
(re-search-forward (concat "^" (regexp-quote writestamp-prefix) "\\(.*\\)" (regexp-quote writestamp-suffix) "$")) (replace-match (format-time-string writestamp-format (current-time)) t t nil 1)
The first expression, the call to re-search-forward
, constructs a regexp that looks like this:
^prefix\(.*\)suffix$
where prefix and suffix are regexp-quote
d versions of writestamp-prefix
and writestamp-suffix
. This regexp matches one entire line, beginning with the writestamp prefix, followed by any string (which is made a submatch by the use of \(…\)
), and ending with the writestamp suffix.
The second expression is a call to replace-match
, which replaces some or all of the matched text from a previous search. It's used like this:
(replace-match new-string preserve-case literal base-string subexpression)
The first argument is the new string to insert, which in this example is the result of format-time-string
. The remaining arguments, which are all optional, have the following meanings:
- preserve-case
We set this to
t
, which tellsreplace-match
to preserve alphabetic case in new-string. If it'snil, replace-match
tries to intelligently match the case of the text being replaced.- literal
We use
t
, which means "treat new-string literally." If it'snil
, thenreplace-match
interpretsnew-string
according to some special syntax rules (for which seedescribe-function
onreplace-match
).- base-string
We use
nil
, which means "Modify the current buffer." If this were a string, thenreplace-match
would perform the replacement in the string instead of in a buffer.- subexpression
We use
1
, which means "Replace submatch 1, not the entire matched string" (which would include the prefix and the suffix).
So after finding the writestamp with re-search-forward
and "submatching" the text between the delimiters, our call to replace-match
snips out the text between the delimiters and inserts a fresh new string formatted according to writestamp-format
.
As a final improvement to update-writestamps
, we can observe that if we write
(while (re-search-forward (concat ...) ...) (replace-match ...))
then the concat
function is called each time through the loop, constructing a new string each time even though its arguments never change. This is inefficient. It would be better to compute the desired string once, before the loop, and store it in a temporary variable. The best way to write update-writestamps
is therefore:
(defun update-writestamps () "Find writestamps and replace them with the current time." (save-excursion (save-restriction (save-match-data (widen) (goto-char (point-min)) (let ((regexp (concat "^" (regexp-quote writestamp-prefix) "\\(.*\\)" (regexp-quote writestamp-suffix) "$"))) (while (re-search-forward regexp nil t) (replace-match (format-time-string writestamp-format (current-time)) t t nil 1)))))) nil)
Get Writing GNU Emacs Extensions now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.