by Greg Knauss
CGI scripts provide you with the powerful ability to extend the functionality of your Web server. However, written carelessly, they can also provide security holes through which hackers and thieves can crawl.
The vindictive hacker is a familiar figure in computer lore-especially on the Internet-and although most Web servers are programmed to protect against his bag of tricks, a single security mistake in a CGI script can give him complete access to your machine: your password file, your private data, anything.
But by following a few simple rules and by being constantly alert-even paranoid-you can make your CGI scripts proof against attack, giving you all their advantages and still allowing yourself a good night's sleep.
Shell scripts, Perl programs, and C executables are the most common forms that a CGI script takes, and each has advantages and disadvantages when security is taken into account. None is the best, though-depending on other considerations (such as speed and reuse)-each has a place.
Though shell CGI programs are often the easiest to write-to even just throw together-it can be difficult to fully control them since they usually do most of their work by executing other, external programs. This can lead to several possible pitfalls because your CGI script instantly inherits any of the security problems that those programs have. For instance, the common UNIX utility awk has some fairly restrictive limits on the amount of data it can handle, and your CGI program will be burdened with all those limits as well.
Perl is a step up from shell scripts. It has many advantages for CGI programming and is fairly secure, just in itself. But Perl can offer CGI authors just enough flexibility and peace of mind that they might be lulled into a false sense of security. For example, Perl is interpreted and this makes it easier for bad user data to be included as part of the code.
Finally, there's C. Though C is very popular for many uses, it is because of this popularity that many of its security problems are well-known and can be exploited fairly easily. For instance, C is very bad at string handling-it does no automatic allocation or clean-up, leaving coders to handle everything on their own. A lot of C programmers, when dealing with strings, will simply set up a predefined space and hope that it will be big enough to handle whatever the user enters. Robert T. Morris, the author of the infamous Internet Worm, exploited such a weakness in attacking the C-based sendmail program, overflowing a buffer to alter the stack and gain unauthorized access. The same could happen to your CGI program.
Almost all CGI security holes come from interaction with the user. By accepting input from an outside source, a simple, predictable CGI program suddenly takes on any number of new dimensions, each of which might have the smallest crack through which a hacker can slip. It is interaction with the user-through forms or file paths-that give CGI scripts their power but also make them the most potentially dangerous part of running a Web server.
Writing secure CGI scripts is largely an exercise in creativity and paranoia. You must be creative to think of all the ways that a user, either innocently or otherwise, can send you data that has the potential to cause trouble. And you must be paranoid because, somehow, they will try every one of them.
When users log on to your Web site and begin to interact with it, they can cause you headaches in two ways. One is by not following the rules, by bending or breaking every limit or restriction you've tried to build into your pages; the other is by doing just what you've asked them to do.
Most CGI scripts act as the back end to HTML forms, processing the information entered by users to provide some sort of customized output. This being the case, most CGI scripts are written to expect data in a very specific format. They rely on input from the user matching the information that the form was designed to collect. This, however, isn't always the case. A user can get around these predefined formats in many ways, sending your script seemingly random data. Your CGI programs must be prepared for it.
Secondly, users can send a CGI script exactly the type of data it expects, with each field in the form filled in, in the format you expect. This type of submission could be from an innocent user interacting with your site as you intended, or it could be from a malevolent hacker using his knowledge of your operating system and Web server software to take advantage of common CGI programming errors. These attacks, in which everything seems fine, are the most dangerous and the hardest to detect. The security of your Web site depends on preventing them.
One of the most common security mistakes made in CGI programming is to trust the data that has been passed to your script from a form. Users are an unruly lot, and they're likely to find the handful of ways to send data that you never expect-that you think is impossible. All your scripts must take this into account. For instance, each of the following situations, and many more like them, is possible:
These situations can arise in several ways-some innocent, some not. For instance, your script could receive data that it doesn't expect because somebody else wrote a form (that requests input completely different from yours) and accidentally points the FORM ACTION to your CGI script. Perhaps they used your form as a template and forgot to edit the ACTION URL before testing it. This would result in your script getting data that it has no idea what to do with, possibly causing unexpected-and dangerous-behavior.
Or the user might have accidentally (or intentionally) edited the URL to your CGI script. When a browser submits form data to a CGI program, it simply appends the data entered into the form onto the CGI's URL (for GET methods) and, as easily as the user can type a Web page address into his browser, he can freely modify the data being sent to your script.
Finally, an ambitious hacker might write a program that connects to your server over the Web and pretends to be a Web browser. This program, though, could do things that no true Web browser would do, such as send a hundred megabytes of data to your CGI script. What would a CGI script do if it didn't limit the amount of data it read from a POST method because it assumed that the data came from a small form? It would probably crash and maybe crash in a way that allows access to the person who crashed it.
You can fight the unexpected input that can be submitted to your CGI scripts in several ways. You should use any or all of them when writing CGI.
First, your CGI script should set reasonable limits on how much
data it will accept, both for the entire submission and for each
NAME/VALUE pair in the submission. If your CGI
script reads the POST method, for instance, check the
size of the CONTENT_LENGTH environment variable to make
sure that it's something that you can reasonably expect. While
most Web servers set an arbitrary limit on the amount of data
that will be passed to your script via POST, you may
want to limit this size further. For instance, if the only input
your CGI script is designed to accept is a person's first name,
it might be a good idea to return an error if CONTENT_LENGTH
is more than 100 bytes. No reasonable first name will be that
long, and by imposing the limit, you've protected your script
from blindly reading anything that gets sent to it.
| NOTE |
In most cases, you don't have to worry about limiting the data submitted through the GET method. GET is usually self-limiting and won't deliver more than approximately 1K of data to your script. The server automatically limits the size of the data placed into the QUERY_STRING environment variable, which is how GET sends information to a CGI program Of course, hackers can easily circumvent this built-in limit simply by changing the METHOD of your FORM from GET to PUT. At the very least, your program should check that data is submitted using the method you expect; at most, it should handle both methods correctly and safely. |
Next, make sure that your script knows what to do if it receives data that it doesn't recognize. If, for example, a form asks that a user select one of two radio buttons, the script shouldn't assume that just because one isn't clicked, the other is. The following Perl code makes this mistake.
if ($form_Data{"radio_choice"} eq "button_one")
{
# Button One has been clicked
}
else
{
# Button Two has been clicked
}
Your CGI script should anticipate unexpected or "impossible" situations and handle them accordingly. The previous example is pretty innocuous, but the same assumption elsewhere could easily be dangerous. An error should be printed instead, for example:
if ($form_Data{"radio_choice"} eq "button_one")
{
# Button One selected
}
elsif ($form_Data{"radio_choice"} eq "button_two")
{
# Button Two selected
}
else
{
# Error
}
Of course, an error may not be what you want your script to generate
in these circumstances. Overly picky scripts that validate every
field and produce error messages on even the slightest unexpected
data can turn users off. Having your CGI script recognize unexpected
data, throw it away, and automatically select a default is a possibility,
too.
| NOTE |
The balance between safety and convenience for the user is a careful one. Don't be afraid to consult with your users to find out what works best for them |
For example, the following is C code that checks text input against several possible choices and sets a default if it doesn't find a match. This can be used to generate output that might better explain to the user what you are expecting.
if ((strcmp(help_Topic,"how_to_order.txt")) &&
(strcmp(help_Topic,"delivery_options.txt")) &&
(strcmp(help_Topic,"complaints.txt")))
{
strcpy(help_Topic,"help_on_help.txt");
}
On the other hand, your script might try to do users a favor and correct any mistakes rather than simply send an error or select a default. If a form asks users to enter the secret word, your script could automatically strip off any white space characters from the input before doing the comparison, like the following Perl fragment.
# Remove white space by replacing it with an empty string
$user_Input =~ s/\s//;
if ($user_Input eq $secret_Word)
{
# Match!
}
| TIP |
Although it's nice to try to catch the user's mistakes, don't try to do too much. If your corrections aren't really what users wanted, they'll just be annoyed |
| CAUTION |
You should also be aware that trying to catch every possible user-entry error will make your code huge, and near impossible to maintain. Don't over-engineer |
Finally, you might choose to go the extra mile and have your CGI script handle as many different forms of input as it can. Although you can't possibly anticipate everything that can be sent to a CGI program, there are often several common ways to do a particular thing, and you can check for each.
For example, just because the form you wrote uses the POST method to submit data to your CGI script, that doesn't mean that the data will come in that way. Rather than assume that the data will be on standard input (STDIN) where you're expecting it, you could check the REQUEST_ METHOD environment variable to determine whether the GET or POST method was used and read the data accordingly. A truly well-written CGI script will accept data no matter what method was used to submit it and will be made more secure in the process. Listing 35.1 shows an example in Perl.
Listing 35.1 Cgi_read.pl-A Robust Reading Form Input
# Takes the maximum length allowed as a parameter
# Returns 1 and the raw form data, or "0" and the error text
sub cgi_Read
{
local($input_Max) = 1024 unless $input_Max = $_[0];
local($input_Method) = $ENV{'REQUEST_METHOD'};
# Check for each possible REQUEST_METHODs
if ($input_Method eq "GET")
{
# "GET"
local($input_Size) = length($ENV{'QUERY_STRING'});
# Check the size of the input
return (0, "Input too big") if ($input_Size > $input_Max);
# Read the input from QUERY_STRING
return (1,$ENV{'QUERY_STRING'});
}
elsif ($input_Method eq "POST")
{
# "POST"
local($input_Size) = $ENV{'CONTENT_LENGTH'};
local($input_Data);
# Check the size of the input
return (0,"Input too big") if ($input_Size > $input_Max);
# Read the input from stdin
return (0,"Could not read STDIN") unless (read(STDIN,$input_Data,$input_Size));
return (1,$input_Data);
}
# Unrecognized METHOD
return (0,"METHOD not GET or POST");
}
| TIP |
Many existing CGI programming libraries already offer good built-in security features. Rather than write your own routines, you may want to rely on some of the well-known, publicly available functions |
Another type of data the user can alter is the PATH_INFO server environment variable. This variable is filled with any path information that follows the script's file name in a CGI URL. For instance, if Sample.sh is a CGI shell script, the URL http://www.yourserver.com/cgi-bin/sample.sh/extra/path/info will cause /extra/path/info to be placed in the PATH_INFO environment variable when Sample.sh is run.
If you use this PATH_INFO environment variable, you must be careful to completely validate its contents. Just as form data can be altered in any number of ways, so can PATH_INFO-accidentally or on purpose. A CGI script that blindly acts on the path file specified in PATH_INFO can allow malicious users to wreak havoc on the server.
For instance, if a CGI script is designed to simply print out the file that's referenced in PATH_INFO, a user who edits the CGI URL will be able to read almost any file on your computer, as in the following script:
#!/bin/sh # Send the header echo "Context-type: text/html" echo "" # Wrap the file in some HTML echo "<HTML><HEADER><TITLE>File</TITLE></HEADER><BODY>" echo "Here is the file you requested:<PRE>\n" cat $PATH_INFO echo "</PRE></BODY></HTML>"
Although this script works fine if the user is content clicking only predefined links-say, http://www.yourserver.com/cgi-bin/showfile.sh/public/faq.txt-a more creative (or spiteful) user could use it to receive any file on your server. If he were to jump to http://www.yourserver.com/cgi-bin/showfile.sh/etc/passwd, the preceding script would happily return your machine's password file, which is something you do not want to happen.
A much safer course is to use the PATH_TRANSLATED environment variable. It automatically appends the contents of PATH_INFO to the root of your server's document tree, meaning that any file specified by PATH_TRANSLATED is probably already accessible to browsers and, therefore, safe.
In one case, however, files that may not be accessible through a browser can be accessed if PATH_TRANSLATED is used within a CGI script. The .htaccess file, which can exist in each subdirectory of a document tree, controls who has access to the particular files in that directory. It can be used to limit the visibility of a group of Web pages to company employees, for example.
Whereas the server knows how to interpret .htaccess, and thus knows how to limit who can and who can't see these pages, CGI scripts don't. A program that uses PATH_TRANSLATED to access arbitrary files in the document tree may accidentally override the protection provided by the server.
Now that you've seen several ways users can provide your CGI script with data that it didn't expect and what you can do about it, the larger issue remains of how to validate legitimate data that the user has submitted.
In most cases, correctly but cleverly written form submissions can cause you more problems than out-of-bounds data. It's easy to ignore nonsense input, but determining whether legitimate, correctly formatted input will cause you problems is a much bigger challenge.
File names, for example, are simple pieces of data that may be submitted to your CGI script and cause endless amounts of trouble if you're not careful (see Figure 35.1).
Anytime you try to open a file based on a name supplied by the user, you must rigorously screen that name for any number of tricks that can be played. If you ask the user for a file name and then try to open whatever was entered, you could be in big trouble.
For instance, what if the user enters a name that has path elements in it, such as directory slashes and double dots? Although you expect a simple file name-say, File.txt-you could end up with /file.txt or ../../../file.txt. Depending on how your Web server is installed and what you do with the submitted file name, you could be exposing any file on your system to a clever hacker.
Further, what if the user enters the name of an existing file or one that's important to the running of the system? What if the name entered is /etc/passwd or C:\WINNT\SYSTEM32\ KERNEL32.DLL? Depending on what your CGI script does with these files, they may be sent out to the user or overwritten with garbage.
Under Windows 95 and Windows NT, if you don't screen for the backslash character, you might allow Web browsers to gain access to files that aren't even on your Web server through Universal Naming Convention file names. If the script that's about to run in Figure 35.2 doesn't carefully screen the file name before opening it, it might give the Web browser access to any machine in the domain or workgroup.
What might happen if the user puts an illegal character in a file name? Under UNIX, any file name beginning with a period (.) is invisible. Under Windows, both slashes are directory separators. It's possible to write a Perl program carelessly and allow external programs to execute when you thought you were only opening a file if the file name begins with the pipe. Even control characters (the Escape key or the Return key, for instance) can be sent to you as part of file names if the user knows how.
Worse yet, in a shell script, the semicolon ends one command and starts another. If your script is designed to cat the file the user enters, a user might enter file.txt;rm -rf / as a file name, causing File.txt to be returned and, consequently, the entire hard disk to be erased, without confirmation.
To avoid all the dangers associated with bad input and close all the potential security holes they open, you should screen every file name the user enters. You must make sure that the input is what you expect.
The best way to do this is to compare each character of the entered file name against a list of acceptable characters and return an error if they don't match. This turns out to be much safer than trying to maintain a list of all the illegal characters and compare against that-it's too easy to accidentally let something slip through.
Listing 35.2 is an example of how to do this comparison in Perl. It allows any letter of the alphabet (upper- or lowercase), any number, the underscore, and the period. It also checks to make sure that the file name doesn't start with a period. Thus, this fragment doesn't allow slashes to change directories, semicolons to put multiple commands on one line, or pipes to play havoc with Perl's open() call.
Listing 35.2 Making Sure that All Characters Are Legal
if (($file_Name = ~ /[^a-zA-Z_\.]/) || ($file_Name = ~ /^\./))
{
# File name contains an illegal character or starts with a period
}
| NOTE |
When you have a commonly used test, such as the code in Listing 35.2, it's a good idea to make it into a subroutine, so you can call it repeatedly. This way, you can change it in only one place in your pro- gram if you think of an improvement Continuing that thought, if the subroutine is used commonly among several programs, it's a good idea to put it into a library so that any improvements can be instantly inherited by all your scripts. |
| CAUTION |
Although the code in Listing 35.2 filters out most bad file names, your operating system may have restrictions it doesn't cover. Can a file name start with a digit, for instance? Or with an underscore? What if the file name has more than one period or if the period is followed by more than three characters? Is the entire file name short enough to fit within the restrictions of the file system You must constantly be asking yourself these sorts of questions. The most dangerous thing you can do when writing CGI scripts is rely on the users to follow instructions. They won't. It's your job to make sure they don't get away with it. |
Another type of seemingly innocuous input that can cause you endless trouble is receiving HTML when you request text from the user. Listing 35.3 is a Perl fragment that simply customizes a greeting to whoever has entered a name in the $user_Name variable, for example, John Smith (see Figure 35.3).
Figure 35.3 : When the user enters what you requested, everything works well.
Listing 35.3 A Script that Sends a Customized Greeting
print("<HTML><TITLE>Greetings!</TITLE><BODY>\n");
print("Hello, $user_Name! It's good to see you!\n");
print("</BODY></HTML>\n");
But imagine if, rather than enter just a name, the user types <HR><H1><P ALIGN= "CENTER">John Smith</P></H1><HR>. The result would be Figure 35.4-probably not what you wanted.
Figure 35.4 : Entering HTML when a script expects plain text can change a page in unexpected ways.
Or imagine if a hacker entered <IMG SRC="/secret/project/cutekid.gif"> when you requested the user's name. Again, if the code in Listing 35.2 were part of a CGI script with this HTML in the $user_Name variable, your Web server would happily show the hacker your secret adorable toddler picture! Figure 35.5 is an example.
Or what if The last signee!<FORM><SELECT> was entered as the user's name in a guest book? The <SELECT> tag would cause the Web browser to ignore everything between it and a nonexistent </SELECT>, including any names that were added to the list later. Even though 10 people signed the guest book shown in Figure 35.6, only the first three appear because the third name contains a <FORM> and a <SELECT> tag.
Figure 35.6 : Because the third signee used HTML tags in his name, nobody after him will show up.
But even more dangerous than entering simple HTML, a malicious
hacker might enter a server-side include directive instead. If
your Web server is configured to obey server-side includes, a
user might type <!-- #include file="/secret/project/plan.txt"
--> instead of his name to see the complete text of your
secret plans. Or he could enter <!-- #include file="/etc/passwd"
--> to get your machine's password file. And, probably
worst of all, a hacker might input <!-- #exec cmd="rm
-rf /" -->, and the innocent code in Listing 35.3
would proceed to delete almost everything on your hard disk.
| CAUTION |
Server-side includes are very often disabled because of how they can be misused. Although much more information is available in Chapter 33, "Server-Side Includes," you might want to consider this option to truly secure your site against this type of attack |
There are two solutions to the problem of the user entering HTML rather than flat text:
$user_Input = ~ s/<>//g;
$user_Input = ~ s/</</g; $user_Input = ~ s/>/>/g;
Another area where you must be careful is how your CGI script interfaces user input with any external processes. Because executing a program outside of your CGI script means that you have no control over what it does, you must do everything you can to validate the input you send to it before the execution begins.
For instance, shell scripts often make the mistake of concatenating a command-line program with form input and then executing them together. This works fine if the user has entered what you expected, but additional commands may be slipped in and unintentionally executed.
The following fragment of shell script commits this error:
FINGER_OUTPUT=`finger $USER_INPUT` echo $FINGER_OUTPUT
If the user politely enters the e-mail address of a person to
finger, everything works as it should. But if he enters an e-mail
address followed by a semicolon and another command, that command
will be executed as well. If the user enters webmaster@www.yourserver.com;rm
-rf /, you're in considerable trouble.
| CAUTION |
You also must be careful to screen all the input you receive, not just form data, before using it in the shell. Web server environment variables can be set to anything by a hacker who has written his own Web client and can cause just as much damage as bad form data If you execute the following line of shell script, thinking that it will simply add the referer to your log, you might be in trouble if HTTP_REFERER has been set to ;rm -rf /;echo "Ha ha". Echo $HTTP_REFERER >> ./referer.log |
Even if a hidden command isn't snuck into user data, innocent input may give you something you don't expect. The following line, for instance, will give an unexpected result-a listing of all the files in the directory-if the user input is an asterisk.
echo "Your input: " $USER_INPUT
When sending user data through the shell, as both of these code snippets do, it's a good idea to screen it for shell meta-characters. Such characters include the semicolon (which allows multiple commands on one line), the asterisk and the question mark (which perform file glob- bing), the exclamation point (which, under csh, references running jobs), the back quote (which executes an enclosed command), and so on. Like filtering file names, maintaining a list of allowable characters is often easier than trying to catch each character that should be disallowed. The following Perl fragment crudely validates an e-mail address:
if ($email_Address ~= /[^a-zA-Z0-9_\-\+\@\.])
{
# Illegal character!
}
else
{
system("finger $email_Address");
}
If you decide that you must allow shell meta-characters in your input, there are ways to make their inclusion safer and ways that don't actually accomplish anything. Although you may be tempted to simply put quotation marks around unvalidated user input to prevent the shell from acting on special characters, this almost never works. Look at the following:
echo "Finger information:<HR><PRE>" finger "$USER_INPUT" echo "</PRE>"
Although the quotation marks around $USER_INPUT will prevent the shell from interpreting an included semicolon that would allow a hacker to simply piggyback a command, this script still has several severe security holes. For instance, the input might be `rm -rf /`, with the back quotes causing the hacker's command to be executed before finger is even considered.
A better way to handle special characters is to escape them so that the shell simply takes their values without interpreting them. By escaping the user input, all shell meta-characters are ignored and treated instead as just more data to be passed to the program.
The following line of Perl code does this for all non-alphanumeric characters.
$user_Input = ~ s/([^w])/\\\1/g;
Now, if this user input were appended to a command, each character-even the special characters-would be passed through the shell to finger.
But all told, validating user input-not trusting anything sent
to you-will make your code easier to read and safer to execute.
Rather than trying to defeat a hacker after you're already running
commands, give data the once-over at the door.
| Handling Internal Functions |
With interpreted languages, such as the shell and Perl, the user can enter data that will actually change your program-data that causes errors that aren't present if the data is correct. If user data is being interpreted as part of the program's execution, anything he enters must adhere to the rules of the language or cause an error. For instance, the following Perl fragment may work fine or may generate an error depending on what the user enters.
In Perl, the eval() operator exists to prevent this. eval() allows for runtime syntax checking and determines whether an expression is valid Perl or not. The following code is an improved version of the preceding code:
Unfortunately, most shells (including the most popular, /bin/sh) have no easy way to detect errors such as this one, which is another reason to avoid them. |
When executing external programs, you must also be aware of how the user input you pass to those programs will affect them. You may guard your own CGI script against hacker tricks, but it's all for naught if you blithely pass anything a hacker may have entered to external programs without understanding how those programs use that data.
For instance, many CGI scripts will send e-mail to a particular person, containing data collected from the user by executing the mail program.
This can be very dangerous because mail has many internal commands, any of which could be invoked by user input. For instance, if you send text entered by the user to mail and that text has a line that starts with a tilde ( ~), mail will interpret the next character on the line as one of the many commands it can perform. ~r /etc/passwd, for example, will cause your machine's password file to be read by mail and sent off to whomever the letter is addressed, perhaps even the hacker himself.
In an example such as this one, rather than use mail to send e-mail from UNIX machines, you should use sendmail, the lower-level mail program that lacks many of mail's features. But, of course, you should also be aware of sendmail's commands so that those can't be exploited.
As a general rule, when executing external programs, you should
use the one that fits your needs as closely as possible, without
any frills. The less an external program can do, the less it can
be tricked into doing.
| CAUTION |
Here's another problem with mail and sendmail: You must be careful that the address you pass to the mail system is a legal e-mail address. Many mail systems will treat an e-mail address starting with a pipe as a command to be executed, opening a huge security hole for any hacker that enters such an address Again, always validate your data! |
Another example that demonstrates you must know your external programs well to use them effectively is grep. Most people will tell you that you can't get into much trouble with grep. However, grep can be fooled fairly easily, and how it fails is illustrative. The following code, which is supposed to perform a case-sensitive search for a user-supplied term among many files, is an example.
print("The following lines contain your term:<HR><PRE>");
$search_Term = ~ s/([^w])/\\\1/g;
system("grep $search_Term /public/files/*.txt");
print("</PRE>");
This all seems fine, unless you consider what happens if the user enters -i. It's not searched for but functions as a switch to grep, as would any input starting with a dash. This will cause grep to either hang while waiting for the search term to be typed into standard input or to error out when anything after the -i is interpreted as extra switch characters. This, undoubtedly, isn't what you wanted or planned for. In this case, it's not dangerous, but in others, it might be.
There's n o such thing as a harmless command, and each must be
carefully considered from every angle. You should be as familiar
as possible with every external program your CGI script executes.
The more you know about the programs, the more you can do to protect
them from bad data-both by screening that data and by disabling
options or disallowing features.
| Security Beyond Your Own |
sendmail has an almost legendary history of security problems. Almost from the beginning, hackers have found clever ways to exploit sendmail and gain unauthorized access to the computers that run it. But sendmail is hardly unique. Dozens-if not hundreds-of popular, common tools have security problems, with more being discovered each year. The point is that it's not only the security of your own CGI script that you must worry about, but the security of all the programs your CGI script uses. Knowing sendmail's full range of documented capabilities is important, but, perhaps more important is knowing which capabilities are not documented because they probably aren't intended to exist. Keeping up with security issues in general is a necessary step to maintain the ongoing integrity of your Web site. One of the easiest ways to do this is on UseNet, in the newsgroup's comp.security. announce (where important information about computer security is broadcast) and comp.security. unix (which has a continuing discussion of UNIX security issues). A comprehensive history of security problems, including attack-prevention software, is available through the Computer Emergency Response Team (CERT) at ftp.cert.org. |
A common mistake in CGI security is to forget local users. Although
people browsing your site over the Web don't have access to local
security considerations, such as file permissions and owners,
local users of your Web server do, and you must guard against
these threats even more than those from the Web.
| CAUTION |
Local system security is a big subject and almost any reference on it will give you good tips on protecting the integrity of your machine from local users. As a general rule, if your system as a whole is safe, your Web site is safe, too |
Most Web servers are installed to run CGI scripts as a special user. This is the user that owns the CGI program while it runs, and the permissions granted limit what the script will be able to do.
Under UNIX, the server itself usually runs as root to allow it to use socket port 80 to communicate with browsers. When the server executes a CGI program, however, it should do so as an innocuous user, such as the commonly used nobody, and the ability to configure this behavior is available on many servers. It is very dangerous to run CGI scripts as root! The less powerful the user, the less damage a runaway CGI script can do.
You should also be aware if the setuid bit is set on your UNIX CGI scripts. If enabled, no matter what user the server runs programs as, it will execute with the permissions of the file's owner. This, of course, has major security implications-you could lose control over which user your script runs as.
Fortunately, the setuid bit is easy to disable. Executing chmod a-s on all your CGI scripts will guarantee that it's turned off, and your programs will run with the permissions you intended.
Of course, in some situations you may want the setuid bit set-if your script needs to run as a specific user to access a database, for example. If this is the case, you should make doubly sure that the other file permissions on the program limit access to it to those users you intend.
A similar situation can occur under Windows NT. Microsoft's Internet Information Server (IIS) normally runs CGI scripts with the access control list (ACL) of IUSR_computer. However, by editing a Registry entry, IIS can be set to run scripts as SYSTEM. SYSTEM has much wider permissions than IUSR_computer and can cause correspondingly more damage if things go wrong. You should make sure that your server is configured the way you intend.
Another potential problem with the single, common user that Web server scripts execute as is that a single human being is not necessarily always in control of the server. If many people share control of a server, each may install CGI scripts that run as, for example, the nobody user. This allows any of these people to use a CGI program to gain access to parts of the machine that they may be restricted from, but that nobody is allowed to enter.
Probably the most common solution to this potential security problem is to restrict CGI control to a single individual. Although this may seem reasonable in limited circumstances, it's often impossible for larger sites. Universities, for example, have hundreds of students, each of whom wants to experiment with writing and installing CGI scripts.
When multiple people have CGI access, a better solution to the problem of deciding which user a script runs as is to use CGIWrap. CGIWrap, which is included on the CD-ROMs that accompany this book, is a simple wrapper that executes a CGI script as the user who owns the file instead of the user whom the server specifies. This simple precaution leaves the script owner responsible for the damage it can do.
For instance, if the user joanne owns a CGI script that's wrapped in CGIWrap, the server will execute the script with joanne's permissions. In this way, CGIWrap acts like a setuid bit but has the added advantage of being controlled by the Web server rather than the operating system. This means that anybody who sneaks through any security holes in the script will be limited to whatever joanne herself can do-the files she can read and delete, the directories she can view, and so on.
Because CGIWrap puts CGI script authors in charge of the permissions for their own scripts, it can be a powerful tool not only to protect important files owned by others but also to motivate people to write secure scripts. The realization that only their files would be in danger can be a powerful persuader to script authors.
You should also be aware of which users own CGI scripts and what file permissions they have. The permissions on the directories that contain the scripts are also very important.
If, for example, the cgi-bin directory on your Web server is world-writable, any local user will be able to delete your CGI script and replace it with another. If the script itself is world-writable, anybody will be able to modify the script to do anything they please.
Look at the following innocuous UNIX CGI script:
#!/bin/sh # Send the header echo "Content-type: text/html" echo "" # Send some HTML echo "<HTML><HEADER><TITLE>Fortune</TITLE></HEADER> echo "<BODY>Your fortune:<HR><PRE>" fortune echo "</BODY></HTML>"
Now, imagine if the permissions on the script allowed a local user to change the program to the following:
#!/bin/sh # Send the header echo "Content-type: text/html" echo "" # Do some damage! rm -rf / echo "<HTML><TITLE>Got you!</TITLE><BODY>" echo "<H1>Ha ha!</H1></BODY></HTML>"
The next user to access the script over the Web would cause huge amounts of damage, even though that person had done nothing wrong. Checking the integrity of user input over the Web is important but even more so is making sure that the scripts themselves remain unaltered and unalterable!
Equally important is the integrity of the files that your scripts create on the local hard disk. After you feel comfortable that you've got a good file name from the Web user, how you actually go about using that name is also important. Depending on which operating system your Web server is running, permissions and ownership information can be stored on the file along with the data inside it. Users of your Web server may be able to cause havoc depending on how various permission flags are set.
For instance, you should be aware of the permissions you give
a file when you create it. Most Web server software sets the umask,
or permission restrictions, to 0000, meaning that it's possible
to create a file that anybody can read or write. Although the
permissions on a file probably don't make any difference to people
browsing on the Web, people with local access can take advantage
of loose restrictions. You should always specify the most conservative
permissions possible, while still allowing your program the access
it needs when creating files.
| TIP |
This isn't a good idea for CGI programs only but for all the code you write |
The simplest way to make sure that each file-open call has a set of minimum restrictions is to set your script's umask. umask() is a UNIX call that restricts permissions on every subsequent file creation. The parameter passed to umask() is a number that's "masked" against the permissions mode of any later file creation. A umask of 0022 will cause any file created to be writable only by the user, no matter what explicit permissions are given to the group and other users during the actual open.
But even with the umask set, you should create files with explicit permissions, just to make sure that they're as restrictive as possible. If the only program that will ever be accessing a file is your CGI script, only the user that your CGI program runs as should be given access to the file-permissions 0600. If another program needs to access the file, try to make the owner of that program a member of the same group as your CGI script so that only group permissions need to be set-permissions 0660. If you must give the world access to the file, make it so the file can only be read, not written to-permissions 0644.
Finally, a local user can attack your Web server in one last way by fooling it into running an external program that he wrote instead of what you specified in your CGI script. The following is a simple program that shows a Web surfer a bit of wisdom from the UNIX fortune command.
#!/bin/sh # Send the header echo "Content-type: text/html" echo "" # Send the fortune echo "<HTML><HEADER><TITLE>Fortune</TITLE></HEADER><BODY>" echo "You crack open the cookie and the fortune reads:<HR><PRE>" fortune echo "</PRE></BODY></HTML>"
This script seems harmless enough. It accepts no input from the user, so he can't play any tricks on it that way. Because it's run only by the Web server, the permissions on the script itself can be set to be very restrictive, preventing a trouble-minded local user from changing it. And if the permissions on the directory in which it resides are set correctly, there's not much that can go wrong, is there?
Of course, there is. This code calls external programs, in this case, echo and fortune. Because these scripts don't have explicit paths specifying where they are on the hard disk, the shell uses the PATH environment variable to search for them, and this can be dangerous. If, for example, the fortune program was installed in /usr/games, but PATH listed /tmp before it, then any program that happened to be named "fortune" and resided in the temporary directory executes instead of the true fortune (see Figure 35.7).
This program can do anything its creator wants, from deleting files to logging information about the request and then passing the data on to the real fortune-leaving the user and you none the wiser.
You should always specify explicit paths when running external programs from your CGI scripts. The PATH environment variable is a great tool, but it can be misused just like any other.
On the Web, there are many, many helpful archives of CGI scripts-each stuffed with dozens of useful, valuable programs all free for the taking. But before you start haphazardly downloading all these gems and blindly installing them on your server, you should pause and consider a few things:
If the answer to either question is no, you could be opening yourself up to a huge con game, doing the hacker's work for him by installing a potentially dangerous CGI program on your own server. It's like bringing a bomb into your house because you thought it was a blender.
These Trojan horse scripts-so named because they contain hidden dangers-might be wonderful time savers, doing exactly what you need and functioning perfectly, until a certain time is reached or a certain signal is received. Then, they will spin out of your control and execute planned behavior that could range from silly to disastrous.
Before installing a CGI program that you didn't write yourself, you should take care to examine it closely for any potential dangers. If you don't know the language of the script or if its style is confusing, then you might be better off looking for a different solution. For example, look at this Perl fragment:
system("cat /etc/passwd") if ($ENV{"PATH_INFO"} eq "/send/passwd");
This single line of code could be hidden among thousands of others, waiting for its author or any surfer to enter the secret words that cause it to send him your password file.
If your knowledge of Perl is shaky, if you didn't take the time to completely review the script before installing it, or if a friend assured you that he's running the script with no problems, you could accidentally open your site to a huge security breach-one you may not know about. The most dangerous Trojan horses won't even let you know they've gone about their work. They will continue to work correctly, silently sabotaging all your site's security.
Occasionally, you may find precompiled C CGI scripts on the Web. These are even more dangerous than prewritten programs that include the source. Because precompiled programs don't give you any way of discovering what's actually going on, their "payload" can be much more complex and much more dangerous.
For instance, a precompiled program might make the effort not only to lie in wait for some hidden trigger but also to inform the hacker who wrote it where it is installed! A cleverly written CGI program might mail its author information about your machine and its users every time the script is run (see Figure 35.8) and you would never know because all that complexity is safely out of site behind the precompiled executable.
Full-blown CGI scripts aren't the only code that can be dangerous when downloaded off the Web. Also, dozens of handy CGI libraries are available, and they pose exactly the same risks as full programs. If you never bother to look at what each library function does, you might end up writing the program that breaks your site's security yourself.
All a hacker needs is for you to execute one line of code that he wrote, and you've allowed him entry. You should review-and be sure that you understand-every line of code that will execute on your server as a CGI script.
Remember, always look a gift horse in the mouth!
| The Extremes of Paranoia and the Limits of Your Time |
Although sight-checking all the code you pull off the Web is often a good idea, it can take huge amounts of time, especially if the code is complex or difficult to follow. At some point, you may be tempted to throw caution to the wind and hope for the best, installing the program and firing up your browser. The reason you downloaded a CGI program in the first place was to save time. Right? If you do decide to give your paranoia a rest and just run a program that you didn't write, then reduce your risk by getting the CGI script from a well-known and highly regarded site. The NCSA httpd, for instance, is far too big for the average user to go over line by line, but downloading it from its home site at http://www.ncsa.uiuc.edu is as close to a guarantee of its integrity as you're likely to get. In fact, anything downloaded from NCSA will be pre-screened for you. In truth, dozens of well-known sites on the Web have already done most of the paranoia-induced code checking for you. Downloading code from any of them is just another layer of protection that you can use for your own benefit. Such sites include the following:
|