by Simeon M. Greene
Client Pull and Server Push are two recent additions to the HTML and CGI standards. With these two methods, you can extend the capabilities of both Web browsers and Web servers.
Client Pull sends information to the Web browser via the <META> tags in an HTML file and allows it to perform additional functions. Server Push similarly sends special information to the browser in an HTML file but relies on the server, rather than the browser, to send additional data as specified in the HTML document.
The functionality of these two features depends on the MIME standard and the HTTP response header.
Client Pull is a method used to give additional instructions to a Web browser that would not have been sent by the server it is currently browsing. Client Pull is not a language, although it is usually implemented using HTML. You probably have seen Client Pull in action but did not recognize it. A common implementation of Client Pull is to have the browser automatically move to a different page without the user clicking a hyperlink. For example, sites that have changed their URL might use Client Pull to tell a browser to automatically load the new URL. You could also specify the browser to load the new URL after a time frame has expired. How exactly is this done? To understand what goes on in the background of Client Pull, you need to know about HTTP response headers and the <META> tags.
An HTTP response header is additional information that is added by the Web server right before sending an HTML file. The header's main purpose is to give the browser information so that the browser can prepare the document to display properly. The last date it was modified, the name of the document's author, the other types of files (GIF images, .Wav sound files, and so on) referenced from within the document are some examples of information contained in the header. It may also include information about the server itself, such as the version of the Web server and other miscellaneous information. Because the information is optional to the browser, some of it is often ignored.
So what does all this have to do with Client Pull? Unfortunately, there's no easy way to determine exactly what the server puts in the HTTP response header, and there's no way to tell it to specify special headers for certain documents. This means that the header is predefined and useless to the writer of the HTML document. What if you wanted your HTML document to be handled in a special way by the browser and wanted the browser to know how to handle the document before it receives it? You cannot specify this information after it is loaded. This would be a perfect job for an HTTP response header, but, as stated, it doesn't allow us to disturb it while writing the header information. The <META> tag allows the writer to extend the header from within the document-perhaps in the first few lines of the HTML file.
Within an HTML file, there is a special place reserved for information that is to be appended to the header and used before actually displaying the entire document. This information is placed between the <HEAD> and </HEAD> tags. You will typically put the title of the document between the <TITLE> and </TITLE> tags and then place these tags within the <HEAD> pair to display the title of the Web page on the window's title bar. By doing this, the server will know what you want to name the document. By including it between the <HEAD> pair, you're telling the browser loading the document that there is additional information in the tags. You could also use the <META> tag between the <HEAD> pair of tags to give the browser extra handling instructions. A typical example of this is shown in Listing 38.1.
Listing 38.1 Refresh.html-Use the <META> Tag for Additional Information
<HTML> <HEAD> <META HTTP-EQUIV="Refresh" CONTENT="20"> <TITLE>My Home Page (Updates every 20 seconds)</TITLE> </HEAD> <BODY> The information on this site changes regularly. For that reason, this page will reload itself every twenty seconds. If you are not using Netscape or Internet Explorer, you will need to reload this page manually.<BR> </BODY> </HTML>
The example in Listing 38.1 shows an HTML document that reloads itself every 20 seconds. This is typically useful for sites that are frequently updated. For example, news sites and weather sites. Examine how this is achieved by the <META> tag. In the example, the line
<META HTTP-EQUIV="Refresh" CONTENT="20">
is actually seen by the browser as
Refresh:20
and that tells it to refresh (reload) the same document in 20 seconds. The browser knows this before the document loads and begins its count once the document is fully loaded. Another implementation of this instruction is to tell the browser to load a different document after a given period of time. Listing 38.2 shows this implementation.
Listing 38.2 Meta.html-Load a New Document with the <META> Tag
<HTML> <HEAD> <META HTTP-EQUIV="Refresh" CONTENT="10; URL=http://www.mynewsite.com/"> <TITLE>New Site notification</TITLE> </HEAD> <BODY> My homepage has moved to a new location. The new URL is <A HREF="http://www.mynewsite.com/">www.mynewsite.com</A> so go there and then update your bookmarks (or favorites for you Microsoft users. If you're using Netscape or Internet Explorer, relax, we'll be there in ten seconds. </BODY> </HTML>
Again, closely examine the instructions contained within the <META> tag. The HTTP-EQUIV attribute remains the same, meaning that the browser will refresh the document. Looking at the CONTENT attribute, however, in addition to having a value of 10 seconds before the document is refreshed, there is also an URL value that specifies another document that will replace the current one being viewed. In this case, you are refreshing the document with another document. This is typically used for directing users to new URLs without having them click hyperlinks. In both examples, the browser is given additional information that it should act upon. However, this information, just like the HTTP response header, is optional and can be entirely ignored by the browser. In fact, the entire <META> tag can be ignored by browsers. If this is the case, the browser does not support Client Pull. The two browsers that support Client Pull are Netscape Navigator version 1.1 or later, and Internet Explorer version 2.0 or later. To avoid viewers of your Web site being stuck on a page that was supposed to be replaced by another page in a given time period, you should always include an optional hyperlink to the document for the sake of users with browsers that do not support Client Pull.
Customizing the HTTP response header is another way to give the browser additional instructions on how to handle a specific document. But isn't this header predefined by the server? An HTTP response header is created by the Web server for HTML documents that are created as files and stored on the server. This header is sent, prior to the actual document, to the browser requesting the file. For dynamically generated HTML documents, however, this is not the case. Because the documents are not stored as a file, the Web server has no knowledge of their existence, so you must create the HTTP response header manually. This can easily be done using a CGI script. A CGI script, written in Perl, that automatically loads a new page after a specific time frame has expired, is shown in Listing 38.3.
Listing 38.3 Clientpull.pl-Load a New Page with a CGI Script
#!/usr/bin/perl #this script will load a new page after 10 seconds. print "Content-type: text/html\n"; print "Refresh: 10; URL=http://www.mynewsite.com/\n\n"; print "<HTML>\n"; print "<HEAD>\n"; #you could have included the META tag here instead, but that would take away from the power of writing #your own header. print "<TITLE>New site notification</TITLE>\n"; print "</HEAD>\n"; print "<BODY>\n"; print "Please wait... Loading my new home page. <A HREF=\"http://www.mynewsite .com/\"> click here</A> to go my new site if you do not have the Netscape or Microsoft Internet Explorer browser.<BR>"; print "</BODY>\n"; print "</HTML>\n";
The declaration of the HTTP response header begins in line 3.
print "Content-type: text/html\n";
This line declares the MIME type of the document being requested by the browser. text/html is used to tell the browser that the document is a text document that is in HTML format. This line appears in every server-generated HTTP response header, and so you need to put it in yours as well. The newline escape character (\n) at the end of the line tells the browser to go to the next line. This escape character is used throughout the script and has the same effect. Line 4 of the script is where you tell the server to pull another document after a lapse of 10 seconds:
print "Refresh: 10; URL=http://www.mynewsite.com/\n\n";
This line resembles the <META> tag except that there isn't any HTTP-EQUIV and CONTENT attributes. Because, in this case, you are actually writing the HTTP response header as the browser would see it, the HTML document does not need to interpret any attributes. At the end of the line, notice that there are two newline escape characters. The browser goes to the next line and then immediately goes to another line. This inserts a blank line. This blank line is necessary because the header is sent to the browser just before the actual document, and this blank line creates the gap that separates them. You need these two newlines whenever you are ending an HTTP header. Without these two newlines, the entire document is read as the header and never appears. Using CGI to implement Client Pull is useful in cases when there is no document to display-such as after filling out a form.
After all you've learned about Client Pull, you're about to learn methods that apparently make it obsolete. The mechanics behind Client Pull are indeed fundamental, and that will never change. It's similar to the relationship between calculus and engineering: The raw theory may seem obsolete due to the advanced development of practical application, but in fact, it always remains as the basis of reference and the platform for learning how to develop newer, more developed applications. Client Pull has had a relatively short life since its introduction. Actually, if not used well, Client Pull becomes quite annoying. Today, with the popularity of Java, people are resorting to more creative ways of controlling the browser's actions. Java could easily implement Client Pull and even add some functions that can't be done easily with Client Pull.
For instance, if you had a Web page that referenced information that was updated at 3:00 am, 5:00 pm, and 8:00 pm, you could easily write a Java applet, embed it in your Web page using the <APPLET> tags, and monitor the page viewer's system clock. When the clock's time is equivalent to any one of these values, the applet retrieves the very same page again. With the ordinary <MIME> tag, you need to specify the refresh time in seconds. There is no way to pass a variable here, so it is impossible to use that for this example. The applications are endless for Java to replace Client Pull. The only advantages that Client Pull has is that it is more accurate in some cases and simpler (where the task is also simple and trivial). For example, when refreshing a Web page based on a given value in seconds, the response header starts the count once the actual document is loaded, whereas the Java applet only starts counting once the applet loads. This may vary based on the computer's CPU.
The <META> tag was developed with the purpose of customization. Because the tag is optional anyway, it would not affect the browser if you modified the tag a little. This is useful for those in the business of developing proprietary browsers. Let's say you developed a browser that followed the normal HTML 3.0 standards, but in addition, you added some of your own fancy features to it. One of the features you added was that the browser would play a special .Wav file whenever it was about to load an HTML page. You created this .Wav file with a special HTML editor that you developed. This would be a perfect job for the <META> tag. You want the following line to be appended to the information that comes in the HTTP response header:
Editor type : mySpecialEditor
To do this, you want your special HTML editor to insert the following line into the Web page that the user is creating after it is saved:
<META HTTP-EQUIV="Editor type" Content="mySpecialEditor">
The browser gets this along with all the other information in the header. Once this information is known, the browser now issues a command to play a .Wav file, such as a "Thank you," from the author of the HTML editor. This will occur for all files that have this <META> information included, and for pages that don't, there will be no sound.
Server Push is similar to Client Pull in that it also includes extra information within the HTML document. Server Push does not rely on the browser to act on the information included in the document but, instead, relies on the server to push the additional information as scheduled. To understand how Server Push works, we must take a more in-depth look into the MIME standard and specifically the MIME content type multipart/mixed.
You have by now already read about MIME and realize how important it is to the Internet. In fact, without MIME, there would be no Internet. HTTP, as well as SMTP and POP, depends heavily on MIME to describe documents sent by the server to a recipient and from the recipient requesting information from the server. This description is in the form of a message. The HTTP response headers we discussed earlier are messages but can also be called MIME headers. All headers used by HTTP are based on MIME. You can find a more verbose and accurate description of MIME at
http://www.cis.ohio-state.edu/htbin/rfc/rfc1521.html
and
http://www.cis.ohio-state.edu/htbin/rfc/rfc822.html.
As mentioned earlier, most of the information included in the header can be ignored by the browser. One of the things that cannot be ignored is the description of the document that is to follow the header (if any). The way to describe the document according to MIME is to supply a content type descriptor within the header. The syntax for this is
Content-type: Content-type/content-subtype
An actual implementation of this can be seen when sending an HTML document to the browser. To do this, you must specify in our header the content type and subtype suited for HTML documents
Content-type: text/html
and that's it! We have just told the browser that we are about to send a text file that is in HTML format. Obviously, this is not proprietary to HTML documents. If the document is an image, you simply specify the content type as
Content-type: image/gif
to inform the browser that you are sending an image of type .Gif. If you intend to describe any document to a Web browser, you need to supply a MIME type with this descriptor. You can even invent your own MIME content types. Because browers support only a fixed number of MIME content types, the only trick is being able to have the browser understand what this new content type is. Netscape provides developers with an API that allows the creation of plug-ins that can extend Netscape's list of supported MIME content types. If you are interested in creating your own content type, you probably should develop a plug-in that allows Netscape Navigator to load it. The most interesting thing about MIME, and specifically the content-type descriptor, is the ability to describe a document that contains other types of documents. How is this done?
Simply by declaring a content type for such a document. The content type multipart/mixed does the job for us. It tells the browser that the document being sent contains other documents of different types. You are also able to tell the browser that you would like these documents to replace one another. Now, when the browser receives a message with the MIME content type of multipart/mixed, it attempts to extract all the documents. Because each document is allowed to be of a different type, each document must have a MIME header that describes itself to the browser. When a browser receives a document it checks for others and attempts to load them.
When a new document is loaded, the browser uses it to replace the current document. The new syntax for the multipart/mixed content type is multipart/x-mixed-replace. The x in this syntax, as well as in other MIME types, denotes that it is not an official MIME type and may be limited to certain browsers. You use the multipart/x-mixed-replace in these Server Push examples and restrict yourself to using the Netscape Navigator 1.x and Internet Explorer 3.x browsers. In review, Server Push is a method by which a server describes a document to the browser that has multiple documents, and each document may or may not be of a different type. This is done by declaring the document as a multipart/mixed in the MIME header.
Because Server Push relies on information being in the MIME header, you must write a script that adds information to the header. You look at one Server Push script and improve it as you go along. You also look at some common implementations of server script, such as animation. A simple Server Push script is shown in Listing 38.4.
Listing 38.4 Servrpush.pl-A Simple Server Push Script
#!usr/bin/perl
#this is a Server Push script
#Ask to include flush.pl with this script
require "flush.pl"
#next you should tell the server that this is a Server Push script by
declaring the MIME
type as multipart/x-mixed-replace, and setting a boundary for each document.
print "Content-type: multipart/x-mixed-replace;boundary=DocumentBoundary\n\n";
#This is the first document to be sent
print "--DocumentBoundary\n";
#Declare the document's MIME header. Because this is a MIME header, you
need to end with to newline characters.
print "Content-type: text/html\n\n";
#You can write an HTML file here
print "<CENTER><B>Hello There</B></CENTER><BR>\n";
print "This page was created using Server-push, it will be replaced by
another in approx. 10 seconds\n\n";
#The double newlines above were used to tell the browser that it had reached
the end of the document.
#Now it's time to send the next document
print"--DocumentBoundary\n";
#Before we actually send our next document, let's flush the previous document
which is still residing in the IO Buffer. (I'll explain this later
&flush(STDOUT);
#wait for 5 secs.
sleep(5);
#This time we will send a Gif file that Netscape will display.
print "Content-type: image/gif\n\n"
open(IMAGE,"</mydirectory/images/image.gif");
while(read(IMAGE, $buf, 1024)) {
print $buf;
}
close (IMAGE);
print "\n\n";
#You're done now so end with a boundary
print "--DocumentBoundary";
If you are familiar with Perl, this looks simple, but if this
book is your only reference for the language, you might want to
use the Perl library file Flush.pl.
| NOTE |
Flush.pl should be available on all standard versions of Perl; WinPerl can be found on this book's accompanying CD-ROMs. For Windows 95 and Windows NT users: As long as the bin and lib directories of Perl are in the path, you should have no problem implementing the script in Listing 38.4 |
The reason for using Flush.pl is to resolve the problem that you have due to I/O buffering.
When you write files to your hard disk or a network drive, stored information is buffered until a reasonable chunk (the size may be variable) is gathered. The stored information is then sent in one batch to the storage device. The same concept is followed with file transfer over the Internet. The chunks are referred to as packets or datagrams, depending on the size. The buffer holds files that are smaller than its capacity until another file or additional data arrives and pushes out the first file. If the server receives all the documents, one right after the other and despite the holding period, the browser seems to display only the last document. In the case of your documents, you do not want the files to be buffered.
To work around this problem, you need a method to flush the output buffer so that your document is pushed out of the buffer without having to wait for more data to fill the buffer. The flush function contained in the library file Flush.pl does the job. You could write your own program to flush the output buffer, but why do this when there is already one available that flushes not only the output but input as well? The syntax
require "Flush.pl";
is used to tell the Perl compiler to include the file Flush.pl with your script, and the syntax
&flush(STDOUT);
calls the function flush from Flush.pl and passes the STDOUT as a parameter for the buffer to be flushed.
The only problem with your script is that you do not have full control. This is because you are not actually writing your own MIME header; you are merely inserting your own information that the server will later parse and include into its own header. The document size is another piece of header information that browsers do not ignore. The document size is specified by the Content-length in the MIME header. The syntax for this is
Content-length: length in bytes
This declarator is included by the server after calculating the length of the document you created. There is no standard size with multipart/mixed documents because the document consists of other documents with potentially varying sizes. This means that, to write an effective Server Push script, you cannot have the server including its own information in the MIME header. If you did, you would have the same buffering problem, with or without the flush function in your script. The browser would be expecting a document of a specific size to display and will wait until that size has been satisfied. To avoid the server interfering with your header, you must modify your Server Push script by making it an NPH script. NPH scripts are simply Server Push scripts that write their own MIME header and bypass the server processing. For your script to be recognized as an NPH script, it should begin with NPH-. This may differ for some Web servers, but it is pretty much a standard. Your previous script, now rewritten as an NPH script, is shown in Listing 38.5.
Listing 38.5 NPH-ServrPsh.pl-Listing 38.4's Script Rewritten as an NPH Script
#!usr/bin/perl
#this is a Server Push script
#Ask to include flush.pl with this script
require "flush.pl"
#next you should tell the server that this is a NPH script by declaring
the MIME type as
multipart/x-mixed-replace, and setting a boundary for each document. You also
need to add the version of the HTTP protocol supported because you
are writting your own header.
print "HTTP/1.0 200\n";
print "Content-type: multipart/x-mixed-replace;boundary=DocumentBoundary\n\n";
#This is the first document to be sent
print "--DocumentBoundary\n";
#Declare the document's MIME header. Because this is a MIME header, you need
to end with to newline characters.
print "Content-type: text/html\n\n";
#You can write an HTML file here
print "<CENTER><B>Hello There</B></CENTER><BR>\n";
print "This page was created using Server Push, it will be replaced by
another in approx. 10 seconds\n\n";
print"--DocumentBoundary\n";
&flush(STDOUT);
#wait for 5 secs.
sleep(5);
#get an image to display.
print "Content-type: image/gif\n\n"
open(IMAGE,"</mydirectory/images/image.gif");
while(read(IMAGE, $buf, 1024)) {
print $buf;
}
close (IMAGE);
print "\n\n";
#You're done now so end with a boundary
print "--DocumentBoundary";
The preceding script is very similar to a regular Server Push except for the following line:
print "HTTP/1.0 200/n";
This line tells the browser that the server is using the HTTP protocol version 1.0. The 200 is used to tell the browser that it can begin reading the rest of the header and the following document. Implementing this NPH script eliminates the annoying buffering problems and allows room for many uses of Server Push, including animation.