Chapter 4. Creating a Simple Page: (HTML Overview)
Part I provided a general overview of the web design environment. Now that we’ve covered the big concepts, it’s time to roll up our sleeves and start creating a real web page. It will be an extremely simple page, but even the most complicated pages are based on the principles described here.
In this chapter, we’ll create a web page step by step so you can get a feel for what it’s like to mark up a document with HTML tags. The exercises allow you to work along.
This is what I want you to get out of this chapter:
Get a feel for how markup works, including an understanding of elements and attributes.
See how browsers interpret HTML documents.
Learn the basic structure of an HTML document.
Get a first glimpse of a style sheet in action.
Don’t worry about learning the specific text elements or style sheet rules at this point; we’ll get to those in the following chapters. For now, just pay attention to the process, the overall structure of the document, and the new terminology.
A Web Page, Step by Step
You got a look at an HTML document in Chapter 2, but now you’ll get to create one yourself and play around with it in the browser. The demonstration in this chapter has five steps that cover the basics of page production.
Step 1: Start with content. As a starting point, we’ll write up raw text content and see what browsers do with it.
Step 2: Give the document structure. You’ll learn about HTML element syntax and the elements that give a document its structure.
Step 4: Add an image. By adding an image to the page, you’ll learn about attributes and empty elements.
Step 5: Change the page appearance with a style sheet. This exercise gives you a taste of formatting content with Cascading Style Sheets.
By the time we’re finished, you will have written the source document for the page shown in Figure 4-1. It’s not very fancy, but you have to start somewhere.
We’ll be checking our work in a browser frequently throughout this demonstration—probably more than you would in real life. But because this is an introduction to HTML, it is helpful to see the cause and effect of each small change to the source file along the way.
Before We Begin, Launch a Text Editor
In this chapter and throughout the book, we’ll be writing out HTML documents by hand, so the first thing we need to do is launch a text editor. The text editor that is provided with your operating system, such as Notepad (Windows) or TextEdit (Macintosh), will do for these purposes. Other text editors are fine as long as you can save plain-text files with the .html extension. If you have a WYSIWYG web-authoring tool such as Dreamweaver, set it aside for now. I want you to get a feel for marking up a document manually (see the sidebar HTML the Hard Way).
This section shows how to open new documents in Notepad and TextEdit. Even if you’ve used these programs before, skim through for some special settings that will make the exercises go more smoothly. We’ll start with Notepad; Mac users can jump ahead.
Creating a new document in Notepad (Windows)
These are the steps to creating a new document in Notepad on Windows 7 (Figure 4-2):
Open the Start menu and navigate to Notepad (in Accessories). 1
Click on Notepad to open a new document window, and you’re ready to start typing. 2
Next, we’ll make the extensions visible. This step is not required to make HTML documents, but it will help make the file types clearer at a glance. Select “Folder Options...” from the Tools menu 3 and select the View tab 4. Find “Hide extensions for known file types” and uncheck that option. 5 Click OK to save the preference, and the file extensions will now be visible.
In Windows 7, hit the ALT key to reveal the menu to access Tools and Folder Options. In Windows Vista, it is labeled “Folder and Search Options.”
Creating a new document in TextEdit (Mac OS X)
By default, TextEdit creates “rich text” documents—that is, documents that have hidden style formatting instructions for making text bold, setting font size, and so on. You can tell that TextEdit is in rich-text mode when it has a formatting toolbar at the top of the window (plain-text mode does not). HTML documents need to be plain-text documents, so we’ll need to change the format, as shown in this example (Figure 4-3).
Use the Finder to look in the Applications folder for TextEdit. When you’ve found it, double-click the name or icon to launch the application.
TextEdit opens a new document. The text-formatting menu at the top shows that you are in Rich Text mode. Here’s how you change it.
Open the Preferences dialog box from the TextEdit menu.
There are three settings you need to adjust:
On the “New Document” tab, select “Plain text”.
On the “Open and Save” tab, select “Ignore rich text commands in HTML files” and turn off “Append ‘.txt’ extensions to plain text files”.
When you are done, click the red button in the top-left corner.
When you create a new document, the formatting menu will no longer be there and you can save your text as an HTML document. You can always convert a document back to rich text by selecting Format → Make Rich Text when you are not using TextEdit for HTML.
Step 1: Start with Content
Now that we have our new document, it’s time to get typing. A web page always starts with content, so that’s where we begin our demonstration. Exercise 4-1 | Entering content walks you through entering the raw text content and saving the document in a new folder.
Learning from step 1
Our content isn’t looking so good (Figure 4-5). The text is all run together—that’s not how it looked in the original document. There are a couple of things to be learned here. The first thing that is apparent is that the browser ignores line breaks in the source document. The sidebar What Browsers Ignore lists other information in the source that is not displayed in the browser window.
Second, we see that simply typing in some content and naming the document .html is not enough. While the browser can display the text from the file, we haven’t indicated the structure of the content. That’s where HTML comes in. We’ll use markup to add structure: first to the HTML document itself (coming up in Step 2), then to the page’s content (Step 3). Once the browser knows the structure of the content, it can display the page in a more meaningful way.
Step 2: Give the Document Structure
Back in Chapter 2, you saw examples of HTML elements with an opening tag (
<p> for a paragraph, for example) and closing tag (
</p>). Before we start adding tags to our document, let’s look at the anatomy of an HTML element (its syntax) and firm up some important terminology. A generic container element is labeled in Figure 4-6.
An element consists of both the content and its markup.
Elements are identified by tags in the text source. A tag consists of the element name (usually an abbreviation of a longer descriptive name) within angle brackets (
< >). The browser knows that any text within brackets is hidden and not displayed in the browser window.
The element name appears in the opening tag (also called a start tag) and again in the closing (or end) tag preceded by a slash (
/). The closing tag works something like an “off” switch for the element. Be careful not to use the similar backslash character in end tags (see the tip Introducing...HTML elements).
The tags added around content are referred to as the markup. It is important to note that an element consists of both the content and its markup (the start and end tags). Not all elements have content, however. Some are empty by definition, such as the
img element used to add an image to the page. We’ll talk about empty elements a little later in this chapter.
One last thing...capitalization. In HTML, the capitalization of element names is not important. So
<IMG> are all the same as far as the browser is concerned. However, in XHTML (the stricter version of HTML) all element names must be all lowercase in order to be valid. Many web developers have come to like the orderliness of the stricter XHTML markup rules and stick with all lowercase, as I will do in this book.
HTML tags and URLs use the slash character (/). The slash character is found under the question mark (?) on the standard QWERTY keyboard.
It is easy to confuse the slash with the backslash character (\), which is found under the bar character (|). The backslash key will not work in tags or URLs, so be careful not to use it.
Basic document structure
Figure 4-7 shows the recommended minimal skeleton of an HTML5 document. I say “recommended” because the only element that is required in HTML is the
title. But I feel it is better, particularly for beginners, to explicitly organize documents with the proper structural markup. And if you are writing in the stricter XHTML, all of the following elements except
meta must be included in order to be valid. Let’s take a look at what’s going on in Figure 4-7.
I don’t want to confuse things, but the first line in the example isn’t an element at all; it is a document type declaration (also called DOCTYPE declaration) that identifies this document as an HTML5 document. I have a lot more to say about DOCTYPE declarations in Chapter 10, but for this discussion, suffice it to say that including it lets modern browsers know they should interpret the document as written according to the HTML5 specification.
The entire document is contained within an
htmlelement is called the root element because it contains all the elements in the document, and it may not be contained within any other element. It is used for both HTML and XHTML documents.
htmlelement, the document is divided into a head and a body. The
headelement contains descriptive information about the document itself, such as its title, the style sheet(s) it uses, scripts, and other types of “meta” information.
metaelements within the
headelement provide information about the document itself. A
metaelement can be used to provide all sorts of information, but in this case, it specifies the character encoding (the standardized collection of letters, numbers, and symbols) used in the document. I don’t want to go into too much detail on this right now, but know that there are many good reasons for specifying the
charsetin every document, so I have included it as part of the minimal document structure.
Also in the
headis the mandatory
titleelement. According to the HTML specification, every document must contain a descriptive title.
bodyelement contains everything that we want to show up in the browser window.
Are you ready to add some structure to the Black Goose Bistro home page? Open the index.html document and move on to Exercise 4-2 | Adding basic structure.
Not much has changed after structuring the document, except that the browser now displays the title of the document in the top bar or tab. If someone were to bookmark this page, that title would be added to his Bookmarks or Favorites list as well (see the sidebar Don’t Forget a Good Title). But the content still runs together because we haven’t given the browser any indication of how it should be structured. We’ll take care of that next.
Step 3: Identify Text Elements
With a little markup experience under your belt, it should be a no-brainer to add the markup that identifies headings and subheads (
h2), paragraphs (
p), and emphasized text (
em) to our content, as we’ll do in Exercise 4-3 | Defining text elements. However, before we begin, I want to take a moment to talk about what we’re doing and not doing when marking up content with HTML.
The purpose of HTML is to add meaning and structure to the content. It is not intended to provide instructions for how the content should look (its presentation).
Your job when marking up content is to choose the HTML element that provides the most meaningful description of the content at hand. In the biz, we call this semantic markup. For example, the most important heading at the beginning of the document should be marked up as an
h1 because it is the most important heading on the page. Don’t worry about what it looks like in the browser...you can easily change that with a style sheet. The important thing is that you choose elements based on what makes the most sense for the content.
Although HTML was intended to be used strictly for meaning and structure since its creation, that mission was somewhat thwarted in the early years of the Web. With no style sheet system in place, HTML was extended to give authors ways to change the appearance of fonts, colors, and alignment using markup alone. Those presentational extras are still out there, so you may run across them if you view the source of older sites or a site made with old tools. In this book, however, we’ll focus on using HTML the right way, in keeping with the contemporary standards-based, semantic approach to web design.
OK, enough lecturing. It’s time to get to work on that content in Exercise 4-3 | Defining text elements.
Now we’re getting somewhere. With the elements properly identified, the browser can now display the text in a more meaningful manner. There are a few significant things to note about what’s happening in Figure 4-9.
Block and inline elements
Although it may seem like stating the obvious, it is worth pointing out that the heading and paragraph elements start on new lines and do not run together as they did before. That is because by default, headings and paragraphs display as block elements. Browsers treat block elements as though they are in little rectangular boxes, stacked up in the page. Each block element begins on a new line, and some space is also usually added above and below the entire element by default. In Figure 4-10, the edges of the block elements are outlined in red.
By contrast, look at the text we marked up as emphasized (
em). It does not start a new line, but rather stays in the flow of the paragraph. That is because the
em element is an inline element. Inline elements do not start new lines; they just go with the flow. In Figure 4-10, the inline
em element is outlined in light blue.
The other thing that you will notice about the marked-up page in Figure 4-9 and Figure 4-10 is that the browser makes an attempt to give the page some visual hierarchy by making the first-level heading the biggest and boldest thing on the page, with the second-level headings slightly smaller, and so on.
How does the browser determine what an
h1 should look like? It uses a style sheet! All browsers have their own built-in style sheets (called user agent style sheets in the spec) that describe the default rendering of elements. The default rendering is similar from browser to browser (for example,
h1s are always big and bold), but there are some variations (long quotes may or may not be indented).
If you think the
h1 is too big and clunky as the browser renders it, just change it with a style sheet rule. Resist the urge to mark up the heading with another element just to get it to look better, for example, using an
h3 instead of an
h1 so it isn’t as large. In the days before ubiquitous style sheet support, elements were abused in just that way. Now that there are style sheets for controlling the design, you should always choose elements based on how accurately they describe the content, and don’t worry about the browser’s default rendering.
We’ll fix the presentation of the page with style sheets in a moment, but first, let’s add an image to the page.
Step 4: Add an Image
What fun is a web page with no image? In Exercise 4-4 | Adding an image, we’ll add an image to the page using the
img element. Images will be discussed in more detail in Chapter 7, but for now, it gives us an opportunity to introduce two more basic markup concepts: empty elements and attributes.
So far, nearly all of the elements we’ve used in the Black Goose Bistro home page have followed the syntax shown in Figure 4-6: a bit of text content surrounded by start and end tags.
A handful of elements, however, do not have text content because they are used to provide a simple directive. These elements are said to be empty. The image element (
img) is an example of such an element; it tells the browser to get an image file from the server and insert it at that spot in the flow of the text. Other empty elements include the line break (
br), thematic breaks (
hr), and elements that provide information about a document but don’t affect its displayed content, such as the
meta element that we used earlier.
Let’s get back to adding an image with the empty
img element. Obviously, an
<img> tag is not very useful by itself—there’s no way to know which image to use. That’s where attributes come in. Attributes are instructions that clarify or modify an element. For the
img element, the
src (short for “source”) attribute is required, and specifies the location (URL) of the image file.
Attributes go after the element name, separated by a space. In non-empty elements, attributes go in the opening tag only:
<element attributename="value"> <element attributename="value">Content</element>
You can also put more than one attribute in an element in any order. Just keep them separated with spaces.
<element attribute1="value" attribute2="value">
For another way to look at it, Figure 4-12 shows an
img element with its required attributes labeled.
Here’s what you need to know about attributes:
Attributes go after the element name in the opening tag only, never in the end tag.
There may be several attributes applied to an element, separated by spaces in the opening tag. Their order is not important.
Most attributes take values, which follow an equals sign (=). In HTML, some attribute values can be reduced to single descriptive words—for example, the
checkedattribute, which makes a checkbox checked when a form loads. In XHTML, however, all attributes must have explicit values (
checked="checked"). You may hear this type of attribute called a Boolean attribute because it describes a feature that is either on or off.
A value might be a number, a word, a string of text, a URL, or a measurement, depending on the purpose of the attribute. You’ll see examples of all of these throughout this book.
Some values don’t have to be in quotation marks in HTML, but XHTML requires them. Many developers like the consistency and tidiness of quotation marks even when authoring HTML. Either single or double quotation marks are acceptable as long as they are used consistently; however, double quotation marks are the convention. Note that quotation marks in HTML files need to be straight (”) not curly (”).
Some attributes are required, such as the
altattributes in the
The attribute names available for each element are defined in the HTML specifications; in other words, you can’t make up an attribute for an element.
Now you should be more than ready to try your hand at adding the
img element with its attributes to the Black Goose Bistro page in the next exercise. We’ll throw a few line breaks in there as well.
Step 5: Change the Look with a Style Sheet
Depending on the content and purpose of your website, you may decide that the browser’s default rendering of your document is perfectly adequate. However, I think I’d like to pretty up the Black Goose Bistro home page a bit to make a good first impression on potential patrons. “Prettying up” is just my way of saying that I’d like to change its presentation, which is the job of Cascading Style Sheets (CSS).
In Exercise 4-5 | Adding a style sheet, we’ll change the appearance of the text elements and the page background using some simple style sheet rules. Don’t worry about understanding them all right now; we’ll get into CSS in more detail in Part III. But I want to at least give you a taste of what it means to add a “layer” of presentation onto the structure we’ve created with our markup.
We’re finished with the Black Goose Bistro page. Not only have you written your first web page, complete with a style sheet, but you’ve also learned about elements, attributes, empty elements, block and inline elements, the basic structure of an HTML document, and the correct use of markup along the way. Not bad for one chapter!
When Good Pages Go Bad
The previous demonstration went smoothly, but it’s easy for small things to go wrong when typing out HTML markup by hand. Unfortunately, one missed character can break a whole page. I’m going to break my page on purpose so we can see what happens.
What if I had forgotten to type the slash (
/) in the closing emphasis tag (
</em>)? With just one character out of place (Figure 4-16), the remainder of the document displays in emphasized (italic) text. That’s because without that slash, there’s nothing telling the browser to turn “off” the emphasized formatting, so it just keeps going.
Omitting the slash in the closing tag (or even omitting the closing tag itself) for block elements, such as headings or paragraphs, may not be so dramatic. Browsers interpret the start of a new block element to mean that the previous block element is finished.
I’ve fixed the slash, but this time, let’s see what would have happened if I had accidentally omitted a bracket from the end of the first
<h2> tag (Figure 4-17).
See how the headline is missing? That’s because without the closing tag bracket, the browser assumes that all the following text—all the way up to the next closing bracket (
>) it finds—is part of the
<h2> opening tag. Browsers don’t display any text within a tag, so my heading disappeared. The browser just ignored the foreign-looking element name and moved on to the next element.
Making mistakes in your first HTML documents and fixing them is a great way to learn. If you write your first pages perfectly, I’d recommend fiddling with the code as I have here to see how the browser reacts to various changes. This can be extremely useful in troubleshooting pages later. I’ve listed some common problems in the sidebar Having Problems? Note that these problems are not specific to beginners. Little stuff like this goes wrong all the time, even for the pros.
Validating Your Documents
One way that professional web developers catch errors in their markup is to validate their documents. What does that mean? To validate a document is to check your markup to make sure that you have abided by all the rules of whatever version of HTML you are using (there are more than one, as we’ll discuss in Chapter 10). Documents that are error-free are said to be valid. It is strongly recommended that you validate your documents, especially for professional sites. Valid documents are more consistent on a variety of browsers, they display more quickly, and they are more accessible.
Right now, browsers don’t require documents to be valid (in other words, they’ll do their best to display them, errors and all), but any time you stray from the standard you introduce unpredictability in the way the page is displayed or handled by alternative devices.
So how do you make sure your document is valid? You could check it yourself or ask a friend, but humans make mistakes, and you aren’t really expected to memorize every minute rule in the specifications. Instead, you use a validator, software that checks your source against the HTML version you specify. These are some of the things validators check for:
The inclusion of a DOCTYPE declaration. Without it the validator doesn’t know which version of HTML or XHTML to validate against.
An indication of the character encoding for the document.
The inclusion of required rules and attributes.
Typos and other minor errors.
Developers use a number of helpful tools for checking and correcting errors in HTML documents. The W3C offers a free online validator at validator.w3.org. For HTML5 documents, use the online validator located at html5.validator.nu. Browser developer tools like the Firebug plug-in for Firefox or the built-in developer tools in Safari and Chrome also have validators so you can check your work on the fly. If you use Dreamweaver to create your sites, there is a validator built into that as well.
Now is a good time to make sure you understand the basics of markup. Use what you’ve learned in this chapter to answer the following questions. Answers are in Appendix A.
What is the difference between a tag and an element?
Write out the recommended minimal structure of an HTML5 document.
c. cooking home page.html
All of the following markup examples are incorrect. Describe what is wrong with each one, and then write it correctly.
<a href="file.html">linked text</a href="file.html">
<p>This is a new paragraph<\p>
How would you mark up this comment in an HTML document so that it doesn’t display in the browser window?
product list begins here
Element Review: Document Structure
This chapter introduced the elements that establish the structure of the document. The remaining elements introduced in the exercises will be treated in more depth in the following chapters.
Identifies the body of the document that holds the content
Identifies the head of the document that contains information about the document
The root element that contains all the other elements
Provides information about the document
Gives the page a title