Chapter 7

Linking HTML Documents

by Eric Ladd


CONTENTS

Putting links to Internet resources on your Web pages gives your audience access to a wealth of information, presented in a variety of formats-text, graphics, audio, video, newsgroups, interactive presentations, live chat ... the possibilities are almost endless! Linking your pages to related resources enhances a visitor's experience and helps support the message you're trying to communicate.

Yet, for all their power, links are remarkably simple to set up. You need only two things: the Internet address of the related resource and a means for the user to access the related resource. This chapter explores how Internet resources are addressed and how you can use HTML to place links on your pages.

Uniform Resource Locators (URLs)

A Web page's Uniform Resource Locator (URL) is its address on the Internet. URLs aren't unique to Web documents, however. You can access any type of Internet resource by having the right client program and knowing the resource's URL.

CAUTION
URLs are case-sensitive, so pay attention to uppercase and lowercase letters when typing URLs.

So What's an URI?
You may have heard some talk about Uniform Resource Identifiers (URIs) and wondered how they're different from URLs. URLs encode access protocols and server locations for Internet resources. The URI Working Group of the Internet Engineering Task Force (IETF) is considering the implementation of Uniform Resource Names (URNs), to give resources a unique name, and Uniform Resource Characteristics (URCs), to describe resources by providing author, title, subject, and location (URL) information. A resource's URI, then, is the joining of its URL, URN, and URC.
The reason for making resource addressing so much more intricate is to separate a resource's name and location. This way, you can use your browser to request a resource by name and the browser can then check the URC characteristics for the resource's location (or locations, if there's more than one URL for the resource) and access it. The hope is that this process will help alleviate problems that have arisen from using URLs alone-such as overloaded servers, expired links, and Internet traffic across great distances.

Parts of an URL

Every URL is made up of the following four parts:

These elements come together in the form:

protocol://server_name:port/directory_path_and_file_name

Protocol  The protocol portion of an URL tells the client program which set of rules to use in retrieving the resource. The most common protocols are shown in Table 7.1.

Table 7.1  Common Internet Protocols

ProtocolName
httpHypertext Transport Protocol (Web pages)
shttpSecure Hypertext Transport Protocol
ftpFile Transfer Protocol
gopherGopher
waisWide Area Information Service
telnetTelnet session
newsUseNet newsgroup protocol
mailtoElectronic mail

CAUTION
Not all browsers are "conversant" in all Internet protocols. Check your browser's documentation to learn which protocols your browser knows

The news and mailto protocols are slightly different from the rest. A typical news URL looks like

news:rec.pets.dogs.breeds.boxer

The news protocol indicator is simply followed by a colon (:) and the name of the newsgroup.

A typical mailto URL looks like

mailto:president@whitehouse.gov

In this case, mailto is followed by a colon (:) and an e-mail address.

NOTE
For both the news and mailto protocols to work, your browser needs to know where to find your news and mail servers. You can usually set this up under the browser's Options or Configurations dialog box.

Server Name  Once the browser knows which protocol to use, it needs to know on what machine the resource resides. The server name can be the dotted English language name of the machine (such as ftp5.macromedia.com) or the machine's Internet Protocol (IP) address (such as 205.139.80.105). Technically, machines use IP addresses to find one another on the Internet, but it usually doesn't matter whether you use the English language name or the IP address. This is because most Internet Service Providers have a Domain Name Service (DNS) to translate English language names into IP addresses.

TIP
If there is a server you access frequently, it's a good idea to know both its English language name and its IP address. That way, if your DNS is down, you can still access the server. Many UNIX machines have a utility called nslookup that lets you look up a server's IP address.

Port  The port refers to the port number on the server where the client should connect. Port is an optional element and, if omitted, the default port is used.

Directory Path and File Name  The server name directs the browser to a certain machine and, once there, the browser needs to know in which directory it can find the desired resource and the name of the file that contains the resource. Directory path and file name information is specified in much the same way as it is for UNIX or DOS operating systems (though DOS users must use a forward slash (/) instead of a backslash (\)). A sample path and file name might be:

press_releases/1996/october/new_ceo.html

This directs the browser to the file new_ceo.html in the october subdirectory of the 1996 directory. The 1996 directory is a subdirectory of the press_releases directory.

Figure 7.1 : In the absence of a file served as the default, you get a listing of all files available in the directory you select.

NOTE
Some URLs specify a path but not a file name. In these instances, the server automatically delivers a default file (typically named something like index.html, home.html, home.htm, or default.htm). If no such file exists, a directory listing appears in which file names are hyperlinked (see Figure 7.1). Click one of the file names to open it.

NOTE
Occasionally, some URLs have search or query data appended to them after the directory path and file name. This information becomes input to scripts or programs running on the server

Absolute URLs

An URL is said to be absolute or fully-qualified if it is made up of a protocol, server name (and port, if required), and a directory path and file information. An absolute URL is one that is spelled out in full. For example,

http://www.your_server.com/pub/WebDocs/investors/index.html

is an absolute URL.

Relative URLs

Because links within a site all point to files on the same server, it becomes tedious to type out the protocol and server name each time an URL is specified. To mitigate this tedium, HTML allows URLs to be given as an explicitly stated base URL. Such an URL is said to be relative or partially qualified.

NOTE
A base URL is formally declared by using the <BASE> tag in the HTML document head. The <BASE> tag is discussed in Chapter 4 "The Document Tags."

Suppose you declare a base URL of

http://www.your_server.com/pub/WebDocs/hr/jobs/analyst.html

and you need to specify the URL of the file Index.html, located in the hr directory (one directory level above the jobs directory). You could type all of the absolute URL

http://www.your_server.com/pub/WebDocs/hr/index.html

or you could give the URL relative to your base URL

../index.html

The two dots followed by a forward slash (../) instruct the browser to go up one directory level. If you needed to specify the URL of the file salaries.html in the compensation directory (a subdirectory of the jobs directory), you could use the relative URL

compensation/salaries.html

NOTE
Relative URLs are really useful only when referencing a document on the same server. If you're referencing a document on a different server, you must use an absolute URL to specify the server's name.

NOTE
If you don't specify a base URL in your HTML code, the browser will use the document's URL as the base URL.

Linking to Other HTML Documents

The key to placing links in your HTML documents is the <A> container tag. <A> and its companion closing tag </A> enclose the text a user clicks to follow the link. Such text is called hypertext anchor or just hypertext.

The <A> tag takes the attributes shown in Table 7.2. The attribute you'll use most often is the HREF attribute. NAME is useful for setting up links within a document (see next section). REL, REV, and TITLE are supported as part of HTML 3.2, but are not frequently used by many HTML authors.

Table 7.2  Attributes of the <A> Tag

AttributePurpose
HREFSet equal to the URL of the resource being linked to
NAMEEstablishes a named anchor within a document that can be targeted by an HREF
RELSpecifies a forward link relationship
REVSpecifies a reverse link relationship
TITLESupplies an advisory title for the linked document

To establish a hypertext link to a document with the URL

http://www.your_provider.net/your_name/homepage.html

you could use the following HTML:

Visit my <A HREF="http://www.your_provider.net/your_name/homepage.html">home page</A>.

Figure 7.2 shows what the link looks like on screen. Most browsers underline (or highlight in some way) hypertext. Typically it is also in a different color from the body text, though you can't appreciate that in a black and white figure.

Figure 7.2 : The mouse pointer changes to an upward-pointing hand when you pass it over a piece of hypertext.


TIP
If a hypertext anchor doesn't seem to be working quite right, check to make sure that the URL in the <A> tag is completely enclosed in quotes. Omitting the final quotation is a common mistake.

You can make any text on your pages a hypertext anchor. Body text, list items, headings, preformatted text, text marked up with either physical or logical styles, blockquotes, and addresses can all be contained between <A> and </A> tags to create hypertext.

NOTE
In Chapter 8 "Adding Graphics to HTML Documents," you'll learn that images can be hyperlink anchors, too. n
Hypertext anchors should be limited to just a few key words that relate the current document to the linked document. Making anchors that are large blocks of text is visually distracting and may confuse the user as to the nature of the linked document.

One stylistic note that deserves some attention is making list items into hypertext. Ordinarily, it doesn't matter how you nest the <A> and </A> tags with other formatting tag pairs. For example, the HTML

<H2><A HREF="report.html">Annual Report</A></H2>

is equivalent to

<A HREF="report.html"><H2>Annual Report</H2></A>

The situation is different with list items, however. Consider the following HTML, which produces the list of links you see in Figure 7.3.

Figure 7.3 : Bullet characters are not generally part of the hypertext.

<UL>
<LI><A HREF="navy.html">Navy</A></LI>
<LI><A HREF="army.html">Army</A></LI>
<LI><A HREF="airforce.html">Air Force</A></LI>
</UL>

Notice in Figure 7.3 that the list items are linked, but the bullet characters are not. However, if you reverse the order of the <A> and <LI> container tags:

<UL>
<A HREF="navy.html"><LI>Navy</LI></A>
<A HREF="army.html"><LI>Army</LI></A>
<A HREF="airforce.html"><LI>Air Force</LI></A>
</UL>

you get the results you see in Figure 7.4. In this case, the bullet characters become hypertext, too, because the <LI> tag occurs inside of the <A> and </A> tags.

Figure 7.4 : You can link the bullets by having your <LI> tag inside your tag.

The same is true with numbered lists. If the <LI> container tag occurs inside the <A> container tag, the numbers in the ordered list will be hypertext as well.

How you link your list items is entirely up to you, but it is generally better style if you do not link bullet characters in unordered lists and numbers in ordered lists. By not linking bullets and numbers, you get a greater contrast between them and the list items. This makes the list items stand out better. Also, bullets and numbers don't typically allude to the nature of the linked document, so linking them doesn't add any value for the user.

Linking Within a Given HTML Document

When a user clicks a hypertext link, the linked document loads into his or her browser, starting at the top of the document. You can target your links to specific points within a document by setting up named anchors with the NAME attribute of the <A> tag.

For example, if the linked document is rather long and you want to save users from having to do a lot of scrolling, set up named anchors at the start of each major section of the document. Then, when providing links to the long document, provide a link to each major section instead of a single link that always sends users to the top of the document.

If the long document consists of four major sections, you would set up a named anchor as follows:

<A NAME="section_three"><H2>Section 3</H2></A>

This makes the level 2 heading, Section 3, into a named anchor that can be targeted by a hypertext link. To set up a link that targets this anchor, place a pound sign (#) and the anchor's name after the long document's URL

<A HREF="longdoc.html#section_three">Section 3</A> discusses
previous approaches to solving the problem.

Clicking the hypertext, Section 3, instructs the browser to load the file longdoc.html and begin presenting material in the file, starting at the anchor with the name, section_three.

NOTE
Setting up named anchors within a document is what permits creation of a table of contents at the top of a long document (see Figure 7.5). By setting up named anchors within the file as before, your links in the table of contents just have to point to the anchor
View <A HREF="#section_three">Section 3</A>.
No file name is necessary because everything is contained within the same file.
When setting up a table of contents at the top, you should also set up links at the end of each major section that allow the user to jump back to the table of contents. You accomplish this by placing a named anchor on the table of contents heading:
<A NAME="toc"><H2>Table of Contents</H2></A>
and then placing a link back to the table of contents:
Return to the <A HREF="#toc">table of contents</A>.
at the end of each section (see Figure 7.6).

Figure 7.5 : A table of contents at the top of a long page points to named anchors through-out the file.

Figure 7.6 : Giving users a way back to the table of contents is an important navi-gational courtesy.


Linking to Other Internet Services

You aren't limited to just linking to other Web documents when setting up hypertext. Because HREF takes on the value of an URL, you can link to virtually any Internet service that is addressed with an URL.

FTP

File Transfer Protocol (FTP) was devised as a means of passing binary files back and forth over the Internet. An FTP client shows files and directories on a local, and a remote, machine and facilitates the exchange of files between the two (see Figure 7.7).

Figure 7.7 : WS FTP shows both directories and files available on local and remote machines. Exchanging files is as easy as clicking one of the arrow buttons.

Many popular Web browsers are programmed to perform as FTP clients, but only in one direction. You can transfer a file from a remote server to your machine through a Web browser, but you can't use it to send files. It's rare for a Web surfer to send a file, so the inability to send is not a serious end-user issue.

To set up an FTP download link on one of your pages, you set it up much like a link to another Web page, except you use an FTP URL rather than a Web URL. For example,

You download the <A HREF="ftp://ftp.your_firm.com/pub/program.exe">
self-extracting archive (3.2 Mb)</A> from this page.

When a user clicks the hypertext "self-extracting archive (3.2M)," his or her browser downloads the file Program.exe to a local disk drive.

TIP
Be sure to include the size of the file when you put up an FTP link so that users have a sense of how long it will take to download the file.

mailto

A mailto URL gives users a point-and-click way to send you electronic mail. The HTML code to set up such a link might be

We appreciate your <A HREF="mailto:feedback@your_firm.com">feedback</A>.

When users click the hypertext feedback, their browsers open a mail window where they can compose their messages (see Figure 7.8).

Figure 7.8 : Netscape Navigator provides a built-in electronic mail program for sending and receiving messages.


NOTE
To send a mail message using Netscape Navigator, select File, New Mail Message.

E-mail links set up with a mailto attribute are a great way to collect feedback on your Web pages or Web site. Even if you're not looking for opinions, it's still a good idea to give visitors some means of contacting you in case there is a problem loading or viewing your pages.

CAUTION
Before you use Netscape Navigator to send e-mail, you need to specify the mail server you use. To do this, select Options, Mail and News Preferences, click the Servers tab, and enter the name or IP address of your outgoing mail (SMTP) server.

UseNet

UseNet newsgroups are forums for discussion on a particular topic. Internet users with a UseNet client and access to a news server can read and post responses to any newsgroup they find interesting. With over 17,000 newsgroups to choose from, there's bound to be at least one that pertains to content on your Web pages.

You can establish hypertext links on a Web page that will take users to a UseNet newsgroup via their browser's newsreader feature. Figure 7.9 shows the Netscape Navigator newsreader. Microsoft Internet Explorer also has an associated newsreader. You set up a link to a newsgroup just like any other in this chapter, except that you use a news URL

Figure 7.9 : UseNet can put you in touch with people with Redskins tickets, among other things.

For more information, read

<A HREF="news:sci.math.applied.fluidflow">sci.math.applied.fluidflow</A>.

TIP
When creating a UseNet newsgroup link, remind users that they need to have their browsers configured to access a news server to read news.

CAUTION
Not all browsers support inline news viewing. What's more, not all services provide access to every UseNet newsgroup. Keep in mind that not every user is able to appreciate the news links you place on your pages.
For those that do support news, you'll have to specify the name of your news server. You can accomplish this by selecting Options, Mail and News Preferences, clicking the Servers tab, and entering the name or IP address of your news server in the news (NNTP) server field.

Gopher

The University of Minnesota developed Gopher as a means of presenting large amounts of information in a structured way. A Gopher client presents documents in a folder structure, making navigation a bit more intuitive than it would be on an FTP site (see Figure 7.10).

Figure 7.10 : Many of the activities you might associate with the World Wide Web Consortium correspond to hypertext links on its home page.

You'll note in the figure that the Gopher interface is much less visually exciting than a Web interface. It should be no surprise that the Web quickly overshadowed Gopher as the premiere way of delivering information over the Internet.

In spite of the fact that it has taken something of a backseat, Gopher holes are still around and are valid sources of information. You can set up links to Gopher sites on your Web pages by using the <A> container tag and the appropriate Gopher URL

Visit the <A HREF="gopher://gopher.umn.edu/">Mother Gopher</A>.

Clicking the hypertext "Mother Gopher" takes you to the original Gopher site at the University of Minnesota.

Most browsers have internal support for the Gopher protocol, so no helper applications are necessary. Unfortunately, the Gopher interface on a Web browser is rather plain (see Figure 7.11), so users may be disappointed if they link to a Gopher site after seeing a string of visually appealing Web pages.

Figure 7.11 : A Gopher site looks much less stimulating than your average Web page.

WAIS

WAIS or Wide Area Information Service sites are provided as gateways to large, searchable databases on the Internet. You can place links to WAIS sites on your Web pages as easily as any other type of link-you just need to set the HREF attribute in the <A> tag to the appropriate WAIS URL (see Figure 7.12).

Figure 7.12 : WAIS links have given way to the use of HTML forms as a front-end for database searching.

<A HREF="wais://wais.your_firm.com/">Search</A> our extensive databases.

Clicking the hypertext "Search" connects users to your WAIS server.

TIP
Browsers typically aren't programmed with built-in support for the WAIS protocol. Be sure to include information on pages with WAIS links, stating where users can download a WAIS client program.

Telnet

Telnet is an Internet application that permits users to log in to a remote computer. While most Web surfers usually do not need to do this, it may become necessary if you want to set up a link to an existing Bulletin Board Service (BBS) with information that is relevant to your site.

To establish a Telnet link on your page, use the telnet protocol in the linked URL. For example:

Users are invited to check out our 
<A HREF="telnet://bbs.your_firm.com/">BBS</A> for more information.

The hypertext BBS then serves as a clickable link to the Telnet site. Most browsers don't have Telnet capabilities built in, so users will require some type of helper application (see Figure 7.13). UNIX and Windows NT users will find Telnet clients available to them as part of their operating systems. Other users must download a client program, so providing some suggestions on your pages is a helpful courtesy.

Figure 7.13 : QVTNet's Term program makes an excellent Telnet helper application.

TIP
Windows 95 users have a Telnet client built right into their operating system.