by Eric Ladd
Before you charge into marking up the contents of the document your end users see, you need to take a few moments to set up the document's internal structure. There are only a few tags you need to know to accomplish this; but, because many browsers process an HTML document without them, authors often forget to use these tags.
A document's structure is established through the use of four tags:
The stand-alone <!DOCTYPE> tag is an optional element that is used to declare which level of HTML you're using to author your document. To indicate that you're using HTML 3.2 tags, the first line of your document should be
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
This indicates to the browsers that they should use the HTML 3.2 DTD, as specified by the W3C, to parse the document.
You can declare earlier versions of HTML as well. If you're sticking with HTML 2.0, your <!DOCTYPE> tag would read
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 2.0//EN">
Again, the <!DOCTYPE> tag is optional, so no browser chokes on a file that doesn't have one.
The first of the document structure tags that you should consider mandatory is the <HTML> tag. <HTML> is a container tag that works together with a closing </HTML> tag to contain information. In this instance, <HTML> and </HTML> enclose the entire HTML document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> ...the rest of the document... </HTML>
These tags simply say, "Everything between us is HTML."
| NOTE |
You may be wondering why the <!DOCTYPE> tag doesn't go between the <HTML> and </HTML> tags if these two tags contain everything in the document. This is because <!DOCTYPE> is technically an SGML tag that indicates what HTML DTD to use on the rest of the document. The document then becomes an HTML document once the <HTML> tag is encountered. |
The document head is one of the two major sections found between the <HTML> and </HTML> tags. Information in the document head is essential to the inner workings of the document, but typically has almost nothing to do with the content of the document. With the exception of the title, all information put forward in the document head is completely transparent to the end user.
This does not make the document head any less important, though. For an HTML document to work properly, you need several key pieces of information in place. The rest of this chapter examines the kinds of information that need to be placed in the document head.
The document head always begins with the <HEAD> tag, ends with the </HEAD> tag, and should immediately follow the <HTML> tag. Thus, your basic document structure so far appears as follows:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> ...elements in the document head... </HEAD> ...the rest of the document... </HTML>
| NOTE |
According to the HTML 3.2 DTD, all of the tags discussed in the remainder of this chapter are permitted only between the <HEAD> and </HEAD> tags. |
It is essential that you always give your document a descriptive title. When summarizing the features of HTML 3.2, the W3C said: "At the minimum, every HTML document must at least include the descriptive title element." So there you have it from the authority: You must have a title.
Putting a title on your document is simple. You merely place it between the <TITLE> and </TITLE> tags. For example:
<HEAD> <TITLE>World Wide Web Frequently Asked Questions (FAQ)</TITLE> other elements in the document head </HEAD>
Titles should be detailed enough to give a sense of what the document is about, without being too wordy. A good rule of thumb is that titles should be 40 characters or less in length.
There are several reasons why titles are important:
Figure 4.2 : Netscape Navigator also displays a document's title at the top of the window.
Figure 4.4 : It is easier to remember where you've been when history lists contain titles.
| World Wide Web Robots |
As previously noted, Web robots (sometimes called spiders) wander the Web and index the documents they find. Robots typically index the content of a document and return their results to online indexes for addition to their databases. Other robot functions include HTML validation, link validation, and detecting new content. Indexing robots go about their business in many different ways. Some just look for a document's title and possibly some author-supplied keywords. Others parse every single word in the document and count how many times each word occurs. This approach permits a quantifiable measure of what some keywords for the page might be. If the word "browser" constitutes 10% of the words in a given document, you can be pretty sure that the document has something to do with browsers. If you want your pages to be read by as many people as possible, you'll want to make them easy for robots to process and index. Once registered on an index like AltaVista, your documents can be returned as a result to queries posted by Web users. This leads to increased traffic on your pages. As an HTML author, there are many approaches to communicate with Web robots. Most of these approaches are found in HTML placed in the document head. The <TITLE> container tag you're reading about is one easy way. Later in the chapter, you'll learn about the <META> tag, which can be packed with all sorts of good information for robots. As you progress through the other HTML chapters in this book, look for more tips and suggestions for keeping your pages robot-friendly. |
Titles are an important end user service and can help increase traffic to your page. They're also very easy to place in your documents. For all of these reasons, make sure you title every document you put on the Web.
There are a number of instances when you'll need to specify an URL while you're coding an HTML page. Very often, these URLs point to documents on the exact same server and in the exact same directory. Other times, they point to documents on the same server but in a directory that is a level above or below the directory where the browser is currently looking. In either of these cases, it is more convenient to designate URLs that point to different directory levels and file names for the simple reason that you won't have to type out the http:// and the server name each time.
You can establish a base URL in a document by using the <BASE> tag. <BASE> is a stand-alone tag that takes the HREF attribute. HREF is set equal to the base URL you want to use-in most cases. This is just the URL of the document you're authoring.
If the base URL concept seems a little confusing, consider the following example. You're marking up a corporate overview for your company in a file called Overview.html, which resides in the geninfo directory of your server. The URL of the document you're working on is therefore:
http://www.yourfirm.com/geninfo/overview.html
If you specify this URL to be the base URL by putting the following <BASE> tag in the head of this document:
<BASE HREF="http://www.yourfirm.com/geninfo/overview.html">
then any other URL in the file can be specified relative to this base URL. Suppose you need to place your corporate logo on the page, and the logo is found in the images directory in the file logo.gif. You can then use the tag:
<IMG SRC="../images/logo.gif">
to place the image on the page. The double dots in the tag take you up one directory level from the geninfo directory. You then change to the images directory where you finally find the file logo.gif. Note that you didn't have to type out the full URL of the logo file, as you did in the tag:
<IMG SRC="http://www.yourfirm.com/images/logo.gif">
This may seem like only a small savings in effort, but once you see how many times you're typing URLs in your HTML code, you'll develop a greater appreciation for it.
To reference the file jobs.html in the same geninfo directory, use the relative URL
jobs.html
Because the file is in the same directory, you only have to give the file name. If you're referencing the file Ceo.html that is in the officers directory, a subdirectory of geninfo, you can use the relative URL
officers/ceo.html
| NOTE |
Many browsers automatically treat a document's URL as the base URL, so all references within the document can be made relative to the document's URL. For these browsers, specifying a base URL isn't really necessary, but it is still a matter of good style to include it. |
| CAUTION |
If you're referencing a file that is on a different server, a base URL won't help you. You'll have to use the full URL for that file. |
The <LINK> tag is a stand-alone tag that you can use to denote relationships between documents. This feature can be useful if you have to manage several files on a large Web site. You can also use it to link, which causes it to point back to the original author of a document. This gives the document some degree of copyright protection.
The <LINK> tag can take the attributes listed in
Table 4.1. The ones used most frequently are HREF, REL,
and REV.
| Attribute | Function |
| HREF | Specifies the URL of the related document |
| NAME | Defines a link from an anchor or URL to the current document |
| METHODS | Provides a list of functions supported by the current document |
| REL | Defines the relationship between the current document and the document specified in the HREF |
| REV | Defines the reverse relationship between the current document and the document specified in the HREF (the opposite of REL, in some sense) |
| TITLE | Provides the title of the linked document |
| URN | Assigns a Uniform Resource Number for the current document |
Revisiting the example you read in the <BASE> tag section, suppose you're editing the file Overview.html and your head section looks like the following:
<HEAD> <TITLE>Corporate Overview</TITLE> <BASE HREF="http://www.yourfirm.com/geninfo/overview.html"> <LINK HREF="officers/ceo.html" REL="precedes"> <LINK HREF="officers/ceo.html" TITLE="CEO Biography"> <LINK HREF="mailto:your_email@yourfirm.com" REV="made"> </HEAD>
The first <LINK> tag
<LINK HREF="officers/ceo.html" REL="precedes">
says that the file Ceo.html in the officers subdirectory is preceded by the current file, Overview.html.
The second <LINK> tag
<LINK HREF="officers/ceo.html" TITLE="CEO Biography">
provides the title of the document found in the file Ceo.html. In this case, the title is CEO Biography.
The third <LINK> tag
<LINK HREF="mailto:your_email@yourfirm.com" REV="made">
tells where you can find more information about the author (REV="made")
of Overview.html, the current document. In this case, the HREF
is an e-mail reference back to your e-mail address.
| NOTE |
The <LINK> tag also has a role in providing style sheet information in HTML documents. If style information is in a separate file, you can link to that file with <LINK HREF="sitestyles.css" REL="stylesheet"> This <LINK> tag says that the file sitestyles.css provides style sheet information for the current file. The css extension stands for "cascading style sheet." |
Some longer HTML documents act as repositories of information. To tap into the repository, you need some kind of search interface to help you. The Find option on browsers, such as Netscape Navigator, is one way to search a document. This, however, is not an intelligent search because it only looks for the first instance of the search text you provide. There's no guarantee that it will be a match to the information you want.
HTML provides a few ways for you to index your documents for searching. The first approach, the <ISINDEX> tag, utilizes a very simple tag to set up a very simple search interface. The second approach provides an alphabetical list for users to click. This second approach is better because it provides the user with an interface that is intuitive and that quickly narrows the focus.
The <ISINDEX> Tag The <ISINDEX> tag is a stand-alone tag that, in its simplest form, doesn't require any attributes. Placing the <ISINDEX> tag in the document head instructs a browser to place a query entry field on the page. Figures 4.5 and 4.6 show an <ISINDEX> field in Netscape Navigator and Microsoft Internet Explorer, respectively.
Figure 4.5 : <ISINDEX> fields allow users to enter search criteria right on the screen.
Figure 4.6 : Internet Explorer renders an <ISINDEX> field with slightly different prompting text.
Note the difference in the instructions that precede the query field in these two figures. Netscape Navigator uses the message This is a searchable index. Enter search keywords; whereas, Internet Explorer uses You can search this index. Type the keyword(s) you want to search for. If you don't like either of these, or if you want a standard prompt, you use the PROMPT attribute of the <ISINDEX> tag to set the prompt to whatever you like. For example, the tag
<ISINDEX PROMPT="Please enter a topic to search on:">
produces the <ISINDEX> field you see in Figure 4.7. The prompting text would be exactly the same if you looked at Netscape Navigator.
The <ISINDEX> tag seems like an easy way to put
a search interface on one of your documents, but it only provides
an interface. Along with the interface, you need to provide a
search functionality behind the page in the form of a CGI script
or a search engine. For example, you might use the Ice search
engine to index your site and set up the interface to the search
with an <ISINDEX> field. These take the hand-off
from the <ISINDEX> field, perform the requested
search, and return the results on a custom-generated HTML page.
| NOTE |
When the <ISINDEX> tag was first introduced, it was limited to appearing only in the document head. HTML 3.2, however, permits the <ISINDEX> tag to appear anywhere in a document. Direct your browser to http://www.w3.org/pub/WWW/MarkUp/Wilbur/ for details. There, you'll find that the <ISINDEX> is permissible as both a head and a body element. N |
Other Ways to Index Your Documents If there's no script or search engine on your server to back up your <ISINDEX> fields, you'll have to find some other way to index your documents. One simple approach is shown in Figure 4.8. A miniature table of contents at the top of a document shows the reader the major sections of the document. By clicking one of the section names, the reader jumps to that section, which spares them from scrolling down the page.
Another approach is to provide a list of letters of the alphabet that users can click to go to a list of keywords starting with the clicked letter (see Figure 4.9). This is particularly handy when searching a list of alphabetized names. As long as the user knows the desired name, it's easy to jump right to the part of the document where the name is found.
Figure 4.9 : Click "H" for HTML and other topics beginning with the letter "H".
As the previous few sections show, there's more than one way to index a page. <ISINDEX> is a fine way to do it, so long as you can provide the programming behind the page to perform a search. Indexing with letters of the alphabet is perhaps a bit more crude, in that users can't search on a specific topic, but it is much easier to implement if you don't have access to programming skills.
The <TITLE>, <LINK>, and <ISINDEX>
tags that you've learned so far are all ways of building specific
kinds of information into the document head. You can pack in even
more information with the stand-alone <META> tag.
The tag derives its name from the fact that it lets you specify
document meta-information-that is, information about the document
beyond what has already been specified, such as a title or a base
URL. The <META> tag takes the attributes shown
in Table 4.2.
| Attribute | Function |
| CONTENT | Assigns a value to a named property |
| HTTP-EQUIV | Binds the meta-information to an HTTP response header |
| NAME | Names a piece of meta-information; assumed to be the same as HTTP-EQUIV if not otherwise specified |
Every <META> tag has at least two attributes-either HTTP-EQUIV and CONTENT or NAME and CONTENT. In its most expanded form, <META> takes all three attributes simultaneously.
The <META> tag gives you the freedom to put lots of information into the head of a document. The next five sections show some examples.
Document Expiration Putting an expiration date on your document is a good idea if you want to write robot-friendly pages. A robot that parses a <META> tag with an expiration date knows when it should revisit the page to index fresh content. This way, you're assured that all of the Web indexes are able to keep up-to-date with the changes you make to your pages.
To put an expiration date on your document, set HTTP-EQUIV to "Expires" and CONTENT equal to the date and time of expiration, in standard Internet format. For example, to set a document to expire on the first day of 1997, you would use the tag
<META HTTP-EQUIV="Expires" CONTENT="Wed, 01 Jan 1997 00:00:00 EST">
| NOTE |
If a server does not support an HTTP-EQUIV attribute like "Expires," it ignores it. |
Reply-to Address You can furnish your name and e-mail address in the document by using the "Reply-To" HTTP header. The tag
<META HTTP-EQUIV="Reply-To" CONTENT="your_email@yourfirm.com">
accomplishes this. You can even put in your name parenthetically after the e-mail address.
Keywords Putting keywords in your document is another way to communicate with the robots that index pages for online Web indexes. Many robots are programmed to look for <META> tags such as
<META HTTP-EQUIV="Keywords" CONTENT="ACME, Inc. corporate overview, balance sheet, stockholder information">
This lets the robots index your page with the keywords ACME, Inc. corporate overview, balance sheet, and stockholder information. When someone does a search on ACME, Inc. stockholder information, they are directed to your page.
Bulletins First Floor Software produces a program called SmartMarks, which works with Netscape Navigator, Microsoft Internet Explorer, and NCSA Mosaic. The program provides bookmark management support and proactive monitoring of selected sites. Users who instruct SmartMarks to monitor your site are notified whenever SmartMarks detects a change on one of your pages.
You can also incorporate bulletins into your pages. SmartMarks looks for bulletins and displays them to users in a special window. You can set up a SmartMarks bulletin using <META> tags as follows:
<META HTTP-EQUIV="Bulletin-Text" CONTENT="New product line unveiled!"> <META HTTP-EQUIV="Bulletin-Date" CONTENT="Fri, 2 Aug 1996, 08:00:00 EST">
These two <META> tags tell SmartMarks that at 8
a.m. on Friday, August 2, 1996, it should post a bulletin to interested
users that a new product line is being rolled out. Users get this
information in a timely manner and will visit your site right
away to check it out.
| NOTE |
Check out First Floor's Web site at http://www.firstfloor.com/ to learn more about SmartMarks and how your site can become part of First Floor's "Get Smart!" Partnership program. |
Client Pull Netscape introduced the idea of Client Pull as one of its approaches to dynamic Web documents. After a specified delay, the browser either reloads the current page or loads a completely different page. In this way, content can change without any action from the user.
A Client Pull can be set up in the document head using a <META> tag. To simply reload the current document after a delay of n seconds, you use
<META HTTP-EQUIV="Refresh" CONTENT="n">
If you want to load a new document after an n-second delay, you use
<META HTTP-EQUIV="Refresh" CONTENT="n; url_of_next_document">
Client Pull lends itself nicely to a kiosk setting, where a display
loops through a prescribed set of pages when there are no users
checking out the site. When a user approaches the kiosk and clicks
a link, they jump out of the client-pull loop and move into the
rest of the site.
| CAUTION |
Make sure that on each page in a Client Pull loop there is some kind of link that permits a user to jump out of the loop. Otherwise, the only way to stop the looping is to exit the browser. |
Using a Custom Cache When a browser downloads
a file-be it an HTML file, graphic, video clip, Word document,
or whatever other kind of file you can download-it saves a copy
of that file in its cache. Browsers typically have two
caches: a memory cache and a disk cache. Files in
the memory cache are held in RAM and disappear after you exit
the browser. Your browser's disk cache is a directory on your
hard drive that holds copies of all of these downloaded files
so that, if the file needs to be referenced later, the browser
can just use the copy in the cache, rather than downloading it
again. This is true even if you shut down the browser and then
restart-the disk cache is available over multiple browsing sessions.
This has obvious timesaving benefits, especially in the case of
large files like video clips and Director movies.
| NOTE |
As your disk cache fills up to a limit that you prescribe, the browser will start to delete files from it so that it doesn't "overflow." |
A disk cache is a great way to reduce the amount of time a user spends waiting for files to download, but even the files in the disk cache have to be downloaded once. This may not seem like much to ask until you consider that many larger downloadable files can be as big as 1M or more! On a 14.4Kbps connection, it takes quite a while for such a file to download.
With the release of Netscape Navigator 3.0, Netscape introduced its LiveCache functionality. You can use LiveCache to create a custom cache for a set of Web pages that you can store on a hard disk or a CD-ROM. By distributing the cache in advance of any browsing sessions, your users will have immediate access to all the files necessary for viewing your pages, without having to wait for anything to download.
Once you have a custom cache set up, you need to supply instructions in your HTML documents, telling Netscape Navigator to open and use the cache. You do this by using the <META> tag as follows:
<META HTTP-EQUIV="Ext-Cache" CONTENT="name=MyCache; instructions=user_instructions">
Setting HTTP-EQUIV to "Ext-Cache"
tells the browser to get ready to use an external cache. The CONTENT
attribute supplies the name of the custom cache and any special
instructions you want displayed to the user when the cache is
first opened. These instructions are followed by an Open File
dialog box in which users can browse to, and open, the cache file.
| NOTE |
Users have to open the custom cache only once. After that, it's available until a different custom cache is opened or the user ends the browsing session. |
| TIP |
In your instructions to users, you will want to tell them to: |
If you plan to create a custom cache by using LiveCache, there are a few other issues you should know about:
When W3C put forward the HTML 3.2 standard, it reserved two tags for future use in the document head: the <STYLE> container tag for style sheets and the <SCRIPT> container tag for embedding client-side scripts.
Style Sheets HTML style sheets give Web page authors a means of associating font and block element information with certain HTML tags. This gives the author complete control over how something looks on the browser screen. Font size, font color, line spacing, alignment, margins, and other characteristics can be built into style sheet information.
W3C reserved the <STYLE> container tag for use in supplying style sheet information right in the document head. Microsoft Internet Explorer 3 is already able to parse the <STYLE> tag and can apply the style information it contains. Figure 4.10 shows the main page of Microsoft's Web site and Figure 4.11 shows the style information in the document head that produces the different fonts, sizes, and other page effects you see in Figure 4.10.
| NOTE |
In addition to the <STYLE> tag, W3C is considering other ways to deliver style information to a browser. One way is to use the <LINK> tag to link to a separate file that contains the style information. Another way is to embed style information inside individual HTML tags. |
Scripting Languages The <SCRIPT> container tag has been reserved by W3C to contain client-side script code that a browser compiles and executes when the page loads. Currently, JavaScript is a popular scripting language that achieves effects like scrolling messages along the status bar at the bottom of the Netscape Navigator window. The JavaScript produces a banner that is shown in Figure 4.12.
| NOTE |
A browser must compile the scripting language you're using in order to be able to run scripts in that language. If you're going to use client-side scripts, make sure your audience has a browser that correctly processes them. |
The other major section of an HTML document is the body. The body is contained between the <BODY> and </BODY> tags and it is made up of the content the user actually sees in the browser window. All kinds of formatting is possible in the document body, as you'll see over the next several chapters.
The <BODY> tag should come immediately after the </HEAD> tag and the </BODY> tag should immediately precede the </HTML> tag.
In spite of their being relatively easy to use, many HTML authors tend to leave off the document structure tags. Sometimes this is due to genuine forgetting, but more often than not it is due to the author's unwillingness to make the effort.
No matter what you're using as an authoring environment, you should take a few minutes to set up the following basic template that you can use to start every new HTML file you work on:
<HTML> <HEAD> <TITLE>Document title</TITLE> </HEAD> <BODY> </BODY> </HTML>
This gives you a bare minimum starting point to add any other
tags you want. All you need to do is fill in your title and you're
ready to go!
| TIP |
Many HTML authoring programs have a template (like the peceding one) available when you start a new file. Check your authoring program to see if it has this handy feature. |