Dates and Time 135

Example 6.3 Finding the Writing Style of a Writer

Let us look at one more example of text processing. Suppose you are interested in

comparing the writing styles of different writers. One of the measures that you are

going to use for comparison is the average sentence length in words because some writ-

ers use simple, short sentences and some have a tendency to use complex, long sentences.

You also want to measure the average word length in terms of how many characters

per word. You have collected several representative samples, in electronic form, of the

writers you are comparing. Having just learned about the text functions in Excel you

are eager to apply them for the writer comparison task.

Let us assume that you have copied and pasted a sample writing you are going to use for

comparison in cell A1. Let us now see how we can count the number of words in the

sample text in cell A1. Recall that the LEN function gives us the number of characters

in a string and the TRIM function removes all excess spaces. We can thus measure the

length of the sample text in cell A1 by the following formula:

=LEN(TRIM(A1))

We know that after trimming the words in the text are separated by single spaces. Thus,

if we were to get the length of the text after removing spaces, we can find the number

of words in the text. Therefore, we can use the following formula to measure the text

length after removing the spaces:

=LEN(SUBSTITUTE(TRIM(A1),” “,””))

In the above formula, we are using the SUBSTITUTE function to remove all spaces. The

difference between the above two lengths incremented by 1 then gives us the count of

the number of words in the text. Next, let us try to find the number of sentences in the

text. To find the number of sentences, we can use the same idea that we used for finding

words. We will remove all periods (.) from the text using the SUBSTITUTE function

and then measure the length. The difference in two lengths will tell us the number

of periods and hence the number of sentences. Thus, we use the following formula to

measure length after removing all periods:

=LEN(SUBSTITUTE(SUBSTITUTE(TRIM(A1),“ “,””),”.”,””))

Once we have the number of sentences, the total number of words, and the total number

of characters, we can easily get the average words per sentence and characters per word

for comparison.

Dates and Time

A date in Excel is simply a serial number that represents the number of days since

January 0, 1900 (a virtual “date” created for Windows). For example, 1/1/1900 is 1 and

1/15/1999 is serial number 36175. By treating dates as numbers, we can use them in

formulas and manipulate them. Excel recognizes several formats for entering dates in a

worksheet. These formats and their corresponding Excel interpretations are shown in

Chapter 6 Working with Text, Dates, and Times136

Table 6.2. Basically, Excel assumes you are referring to the current year if you do not

specify the year in your date entry.

When you enter data in a cell in one of the valid date formats, Excel expects the data to

represent a valid date. If the data does not convert to a valid date, Excel treats the data

as text. For example, if you enter 14/17/1979 in a cell, it will be treated as text because

months only go up to 12.

The idea of a date serial number is extended in Excel to represent time. For example, 12

noon, January 15, 1999 is represented internally as 36175.5, and 3:00 PM, January 15,

1999 as 36175.625. The smallest fraction of time unit in Excel is one one-thousandth

of a second which is equivalent to 0.00000001157. Table 6.3 shows different times of

day and their equivalent serial numbers.

Just like dates, Excel recognizes certain entries as time. For example, 10:30 is automati-

cally recognized as 10:30 AM and 15:30 as 3:30 PM. When you need to enter both

date and time, Excel expects you to enter the date first and then the time with at least a

Table 6.2 Excel Formats and Interpretations for Dates in a Worksheet

Cell Entry Excel’s Interpretation

10-17-79 October 17, 1979

10/17/79 October 17, 1979

10/17 October 17 of the current year

10-17 October 17 of the current year

1979/10/17 October 17, 1979

Oct 17 October 17 of the current year

Table 6.3 Times of Day and Excel’s Equivalent Serial Numbers

Time of Day Time Serial Number

12:00:00 AM

0.00000000

3:00:00 AM

0.12500000

6:00:00 AM

0.25000000

10:30:00 AM

0.43750000

1:30:00 PM

0.56250000

10:30:00 PM

0.93750000

Get *Computing with Excel and VBA* now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.