Chapter 46

Java and Security

by Ryan Sutter


CONTENTS

Java has changed the face of the Web from a static publishing medium to an interactive application-development platform by providing executable "live" content embedded in HTML documents. This is a very frightening thought to most system administrators. After all, it's bad enough that people can download software that might contain viruses that could damage their machines. How can the network stay secure with programs coming in and running on the host machines all on their own? What is to keep somebody from reading sensitive data, wiping out hard drives, setting up backdoors to the network, or something worse? Fortunately, the folks at Sun gave this some thought and designed Java with security in mind from the ground up, starting with the language and continuing on through the compiler, compiled code, and runtime environment.

To understand Java's preventative measures, we'll start by reviewing the special security concerns that apply to interactive content. We'll then cover the types of attacks that unscrupulous programmers might attempt, and the kinds of security issues that could relate to a well-intentioned but poorly written program. Once we've covered the issues, we'll discuss the features of the Java language, the Java compiler, and the Java Virtual Machine that are designed to help ensure security. Then we'll talk about the remaining open issues related to Java security and what you can (and can't) do about them, as well as the new Security API being implemented in Java 1.1

Executable Content and Security

In this section, we'll discuss briefly how interactivity on the Web has evolved and how security issues have changed with each new technique. We then focus on how live content, executing on host machines, poses the most challenging security issues of all.

We will only discuss the general security issues that relate to executable content on the Web as opposed to other means of interactivity. From there we outline the issues and illustrate possible attack scenarios.

Interactivity Versus Security

There is a direct correlation between interactivity and security:

The greater the level of interactivity, the greater the security risk.

The Internet allows information to be spread, but this is also what makes it potentially dangerous. This is especially the case when the information is executable code, like Java. If you download an image that cannot execute instructions on your machine, a Java applet can. As you will see, this relationship between interactivity and security is true on the server side as well as the client side.

Let's step back to the basic building block of the Web-HTTP. HTTP is a simple, stateless protocol. It is so simple, in fact, that it only allows data to travel one way-from server to client. When an HTTP server receives a request for a file, it simply hands that file over. There is no interaction between the server and client beyond the call and response. This is pretty close to the model of traditional print mediums. A receiver receives something from a transmitter. The only real difference is that instead of broadcasting, the server is narrowcasting. The transmitter is sending out whatever was specifically requested by a client, not just pumping out information to everyone. This, in itself, is a fairly secure model on both the client and server sides. The server controls what files and information the client has access to by choosing what it serves. The client is open to very little risk except maybe being overloaded by too much data from the server, but the client's operating system usually prevents that. Although this is quite reliable and more interactive than television, it is still a relatively passive medium.

Of course, the basic HTTP protocol leaves much to be desired in the way of interactivity, and people had to really fight it in order to create compelling interactive content. Still, interactivity techniques were developed with the foremost of these as forms and CGI programs.

The use of forms and CGI is still relatively secure on the client side but significantly less on the server side. The process works like this. The browser on the client side receives an HTML form document. The form can contain combo boxes, radio buttons, check boxes, and text fields as well as buttons to post the form data. An end user fills out the form and submits its contents to the server. Form contents are submitted by passing as an argument to a program that executes on the server. This program is called a CGI (Common Gateway Interface) program. It can be written in any language that executes on the server and commonly consists of a UNIX shell-script, a C program, a Visual Basic program, or a PERL script. The CGI program parses up the parameter string supplied by the client and utilizes the data. For example, the program can store the data in a local database, e-mail it, and so forth. All access to the server is accomplished by the CGI program itself. There is never any direct access to the server by the client. The only real security risk in this arrangement is the possibility that a badly behaved CGI program could damage the server by depleting system resources, corrupting files, or anything else an executable program could do.

For more information on CGI programs and security, see Chapters 23, "Key Web Access and Security Concerns for Webmasters," and 35, "CGI Security."

The next logical step in the evolution of interactivity on the Web was client-side executable content. This actually existed before Java in the guise of helper applications and plug-ins. Through the use of helper applications (and later helper applications that execute right in the browser called plug-ins), it is possible to view and interact with Web content using code that executes on your own machine. You simply need to download and install the helper software first (assuming it is available for your platform) and get the content later. The content itself is not executable but contains information about itself that tells the browser what program to use to interact with it. This is accomplished by use of a MIME (Multipurpose Internet Mail Extensions) type. This model creates a security breach on the client potentially worse than the Java model because there are no limits imposed on the application running on the client. The person using the helper application must trust that it won't do any harm. The content itself (images, sounds, and movies) is not executable, and an end user must explicitly install the viewer software. Hence, there is really no more risk than installing any other kind of application.

What about the Java model? It is one big step forward for interactivity, and one big step backward for security. Suddenly, both the client and the server are at risk because the client executes live code without knowing in advance what that code is or what it does. When browsing the Web, you can click on a link and receive a page that starts running on your machine. You may not get the chance to decide to trust the person sending the content. When the content itself is live, instead of static, it opens up whole new realms of interactivity but also raises some serious security questions. How can the end user be sure that they aren't going to download a page that may wipe out their hard drive, infect files with viruses, steal private information, or simply crash the machine? Let's now quantify the security issues.

The Security Problem

How is network-executable content any different from software installed and running on a local machine? Well, a piece of software (in order to serve any really useful purpose) needs to be able to access all of the system resources within the limits of what the operating system allows. It needs to save files, read information, and access the system's memory. Although there are bugs in software (sometimes accidental, sometimes malicious), the person installing the software generally makes a decision to trust the person who wrote the software. This is the traditional software model.

An application arriving over a network must also be able to make use of system resources to function. The only difference is that executable content arriving in a Web page does not need to be installed first. The user may not even know where it is coming from, and you will not have the chance to decide if you trust the person on the other end. If the code was written by a hacker who wanted to damage your machine or violate your security, and the live content had all of the same freedoms a regular local application would have, you would have no warning and no protection.

How do we allow for a useful application and maintain a level of trust? It wouldn't make sense to completely restrict outside programs from doing anything on the local machine because this would severely limit the functionality of the network application. A better strategy is to develop limitations that hinder the malicious behavior but allow for the freedom to do the things that need to be done. There are six steps to defining this:

  1. Determine in advance all potential malicious behavior and attack scenarios.
  2. Reduce all potential attack scenarios to a basic set of behaviors that form the basis of all of them.
  3. Design a programming language and architecture that does not allow the basic set of behaviors that form that basis. Hopefully this will disallow the malicious behavior.
  4. Prove that the language and architecture are secure against the intended attack scenarios.
  5. Allow executable content using only this secure architecture.
  6. Design the language and architecture to be extensible so that new attack scenarios can be dealt with as they arrive, and that new counter-measures can be retrofitted into the existing security measures.

Java was designed with each of these steps in mind and addresses most, if not all, of these points. Before exploring Java's security architecture itself, let's discuss the types of potential attack scenarios.

Potential Attack Scenarios

There are two basic categories of attacks that people try to perpetrate. There are security breaches and nuisance attacks. The following are some examples of nuisance attacks:

These types of attacks may not necessarily open you up to a security breach because they do not leak private information about your company or yourself to any unauthorized third party. They can, however, do everything from making your computing experience very unpleasant to causing damage to your computer. The goal of these attacks is just to wreak havoc of one type or another.

The other more serious types of attacks are security breaches, where somebody may attempt to gain private or sensitive information about you or your business. There are more strategies used to accomplish this than can be covered in a single chapter of a book. In fact, there are several books available on the subject and I am certain more will be written. However, here are a few of the major strategies that people might try

The Java Approach to Security

Java is an object-oriented programming language, but it also is a cross-platform operating environment (the Java Virtual Machine) that is separate and independent of Java, the language. Java is also a compiler for the Java language that produces bytecode for the Java Virtual Machine. The Java VM could run bytecodes compiled from any language, not just Java, and the class files that make up Java objects could be created by any compiler that targeted the Virtual Machine. Therefore, security in Java needs to be implemented on each of these fronts separately: in the language, the compiler, and the Virtual Machine.

What is so special about the Java programming language that makes it more secure than other languages? After all, because the VM is a virtual processor and any language can theoretically be compiled for it, why develop the Java language in the first place? The Java language was designed to be several things:

Many of these requirements affected the way security was implemented in the language, the compiler, and the Virtual Machine. For example, the portability requirement meant that Java could not rely on any security measures built into an operating system because it needed to run on any system. In order for the language to be both easy to learn and secure, the security needed to be designed into the language itself and not left up to the good will of the programmer. Using C/C++ would not have worked.

The new language Sun developed, Java, has its roots in C++ and other object-oriented languages but reduces the complexity, platform-dependent variations, and potentially system- damaging capabilities of these languages. Some of the ways Java does this are

NOTE
Before we go too far into the Java security architecture, it is important to point out that we will be chiefly discussing Java applets, not applications. A Java application installed on a local machine has the same privileges and capabilities as any other program. The features built into the Java language that help enforce security in applets can also make for better-behaved applications. Even things like the automatic memory management, however, can be subverted by the linking in of native code that was written in a language that allows direct machine access. Therefore, it is important to note, applets may be considered secure, but an application written in Java should not be considered any more secure than an application written in any other language.

So how do these characteristics of the Java language affect the security issues we discussed in the previous chapter? Let's go through each piece of the security problem to find out.

Visualize All Attack Scenarios  The Java security model is designed to protect the following resources from attack:

The types of attack that the Java model protects against are

Some types of nuisance attacks, such as the display of rude or offensive material or the starting up of processes that hog system resources, are difficult or impossible to stop and are not addressed in the Java security model. Still, applets are encouraged to be on their best behavior.

Construct a Basic Set of Malicious Behavior  As previously stated, Java security is implemented in several places. The language, the compiler, and the runtime environment are a few. Each is considered a potential security risk and security measures vary for each link in the chain. A more complete list is

Design Security Architecture Against Above Behavior Set  Each element of the Java system is designed to defend against potential, specific, malicious behavior. We will discuss the specifics later, but suffice it to say for now, that this step is satisfied by the Java language and runtime environment themselves.

Prove Security of Architecture  Even though there are limitations and precautions on each part of the Java system, these measures must be proven to be effective. After all, there are some pretty ingenious hackers out there. Sun has attempted to satisfy this criteria in a couple different ways.

Restrict Executable Content to Proven Security Architecture  The class loader and the bytecode verifier both help to accomplish this objective. If a compiler creates a class file that violates security rules, the class loader and bytecode verifier will not allow it to execute.

NOTE
One point here. The Java architecture is not limited to the programming language of Java. Therefore, the type of security checks performed by the class loader and the bytecode verifier are general to the security restrictions of the Java virtual machine and not the language itself.

Make Security Architecture Extensible  The Java language is well designed for this purpose because it is an object-oriented language and allows for the addition of new security classes. The Java SecurityManager class helps implement enhancements to the security model.

Architecture of Java Security Mechanisms

We will discuss how security is implemented in the Java language, the compiler, and the Virtual Machine, respectively.

Security Built into the Java Language Itself

The Java language may have some of its roots in C++ but much of the complexity of C++ is gone. This is good for programmers attempting to learn the language but is also good for security. The reasons will become apparent as we cover the various points about the Java language that set it apart from C++ and also help make it secure.

No Pointer Arithmetic  The Java language does not have pointer arithmetic. There is no direct access to memory addresses at all. All references to classes and instance variables in a class file happen through the use of symbolic names. Memory management is taken care of by the Java Virtual machine. This not only eliminates an entire class of pesky hard-to-find bugs, but also means that a programmer cannot forge a pointer to memory or create magic offsets that just happen to point to the right place. Programmers cannot change system variables or access private information on the user's machine.

Automatic Garbage Collection  Along with memory management, the Java VM also provides for automatic Garbage Collection. This makes Java both more secure and robust. In C/C++, it is fairly common to do either of the following:

Memory management bugs are hard to track down and can cause many problems. Java keeps track of all objects in use and reclaims the memory as it is required. One nice thing about the Garbage Collection is that it runs as a background process in its own thread. The programmer never needs to think about memory management.

Well-Defined Language  The Java as language is very strictly defined and is identical on every platform it runs on. This means

The platform and the compiler used in C and C++ affects how things are done in your code. Operations are not always performed in the same order, and primitive types can vary in size. This makes life more difficult for the programmer and increases the risk of dangerous bugs.

Strict Object-Oriented Language  With the exception of the primitive types, everything in Java is a basic object. This strict adherence to object-oriented methodology means that all of the theoretical benefits of OOP are realized in Java. This includes

Final Classes, Methods, and Variables  Classes, methods, and variables can be declared FINAL. This means that they can not be modified after the declaration and also prevents the overriding of trusted methods by malicious code.

Strong Typecasting  Java has to be a strongly typed language because it automatically manages memory. There are no loopholes in the Java type system:

Unique Object Handles  Every Java object has a unique hash code associated with it. This allows the current state of a Java program to be fully inventoried at any time.

The Security of Compiled Code

The Java compiler thoroughly checks the Java code for security violations. It is a very thorough, very stringent compiler that enforces the restrictions listed previously. However, it is possible that Java code could be compiled with a "fixed" compiler that would allow illegal operations. This is where the Java class loader and bytecode verifier come into play. There are various types of security enforced by the runtime system on compiled code.

Java Class Files Structure  Java applets and applications are made up of .Class files which are compiled bytecode. Just briefly, let's cover the format of Java class files.

Each Java class file is transferred across the network separately-all classes used in a Java applet or application reside in their own separate class file. The class file is a series of 8-bit bytes. 16 and 32-bit values are formed by reading 2 or 4 of these bytes and joining them together. Each class file contains

The constant pool is how various constant information about the class is stored. It can be any of the following:

As previously mentioned, all references to variables and classes in the Java language are done through symbolic names, not pointers. This is true in the class file as well. Elsewhere in the class file, references to variables, methods, and objects are accomplished by referring to indices in this constant pool. Security is thus maintained inside the class file.

NOTE
An interesting thing to note here is that each method can have multiple code attributes. The CODE attribute signifies Java bytecode, but there are other code attributes, such as SPARC-CODE and 386-CODE, that allow for a machine code implementation of the method. This allows for faster execution of code but cannot be verified to be sound. For the most part, browsers use the Java code to retrieve executable content from a remote site because of this trust issue. However, in a full-fledged Java application, having multiple code attributes allows the programmer to write code that is both cross-platform and still capable of taking advantage of platform-specific techniques where possible.

The class file format has more features that are not really in use yet. One of these is the ability to allow authors to digitally sign their class files to guarantee to the end user that the file has not been modified by a third party. The user still needs to decide if they wish to trust the author, but at least they know what they are getting. This is likely to come into play as Microsoft pushes ActiveX and their AuthentiCode technology. It also allows authors to digitally sign their ActiveX controls.

The class loader of most current Java implementations, including Sun's own HotJava browser, considers any code that comes from a remote source to be potentially hostile and will not use any machine code contained in a Java class file. They will run machine code loaded from local class files, however. Expect this to change when there are ways to designate trusted and untrusted services.

More About Bytecodes  In addition to the actual bytecodes that execute a method, the CODE attribute also supplies other information about the method. This information is for the memory manager, the bytecode verifier, and the Java VM's exception handling. They are as follows:

There are six primitive types in the Java VM:

There are also several array types that the Java VM recognizes:

In the case of an array of handles, there is an additional type field that indicates the class of object that the array can store.

Each method has its own expression-evaluation stack and set of local registers. The registers must be 32-bit and hold any of the primitive types other than the double floats and the long integers. These are stored in two consecutive registers and the VM instructions, opcodes, address them using the index of the lower-numbered register.

The VM instruction set provides opcodes that operate on various data types and can be divided into several categories:

The bytecodes consist of a one byte opcode followed by zero or more bytes of additional operand information. With two exceptions, all instructions are fixed-length and based on the opcode.

Next we move on to the bytecode verifier.

The Bytecode Verifier  The Bytecode Verifier is really the last line of defense against a bad Java applet. This is where the classes are checked for integrity, where the compiled code is checked for it's adherence to the Java rules, and where a misbehaving applet will most likely be caught. If the compiled code was created with a "fixed" compiler to get around Java's restrictions, it will most likely fail the Verifier's checks and be stopped. This is one of the most interesting parts of the Java security mechanism, I think, because of the way it is designed to be thorough and general at the same time. The Bytecode Verifier does not have to work only on code created by a Java compiler, but on any bytecodes created for a Java VM, so it needs to be general. However, it also needs to catch any and all exceptions to the rules laid out for a Java applet or application and must therefore be thorough.

All bytecode goes through the Bytecode Verifier, which makes four passes over the code.

Pass 1  This is the most basic pass. The Verifier makes sure that the following criteria are met:

This pass finds any screwed up class files from a faulty compiler and may also catch class files that were damaged in transit. Assuming everything goes well, we get to the second pass.

Pass 2  This pass is a little more scrutinizing. It verifies almost everything without actually looking at the bytecodes themselves. Some of the things that Pass 2 uncovers are

On Pass 2, everything needs to look legal, that is to say that at face value all the classes appear to refer to classes that really exist, rules of inheritance aren't broken, and more. It does not check the sourcecode itself, this is left up to further passes. Passes 3 and 4 check to see if the fields and methods actually exist in a real class and if the types refer to real classes.

Pass 3  On this pass, the actual bytecodes of each method are verified. Each method undergoes dataflow analysis to ensure that the following things are true:

The verifier does several things including verifying that the exception-handler offsets point to legitimate starting and ending offsets in the code and making sure the code does not end in the middle of an instruction.

Pass 4  Pass 4 happens as the code actually runs. During the third pass, the Bytecode Verifier does not load any classes unless it must to check its validity. This is for efficiency's sake. On the fourth pass, the final checks are made the first time an instruction referencing a class executes. The Verifier then does the following:

Likewise, the first time an instruction calls a method, or accesses or modifies a field, the verifier does the following:

Namespace Encapsulation Using Packages  Java classes are defined within packages which give them unique names. The Java standard for naming packages is the domain the package originates from but in reverse order with the first part capitalized. If my domain is www.ryansutter.com, classes coming from my domain should be in the COM.ryansutter.www package.

What is the advantage to using packages? With packages, a class arriving over the network is distinguishable and therefore cannot impersonate a trusted local class. This is true even if they have the same names.

Very Late Linking and Binding  The exact layout of runtime resources is one of the last things done by the Java VM. This is to prevent an unscrupulous programmer from making assumptions about the allocation of resources and utilizing these for security attacks.

Security in the Java Runtime System

As we have already discussed, classes can be treated differently when loaded locally as opposed to over a network. One of these differences is how the class is loaded into the runtime system. The default way for this to happen is to just load the class from a local class file. Any other way of retrieving a class requires the class to be loaded with an associated ClassLoader. The ClassLoader class is a subtype of a standard Java object that has the methods to implement many of the security mechanisms we have discussed so far. A lot of the attack scenarios that have been used against Java have involved getting around the ClassLoader.

The ClassLoader comes into play after pass 3 of the Bytecode Verifier as the classes are actually loaded on pass 4. The ClassLoader is fairly generic because it does not know for certain that it is loading classes written in Java. It could be loading classes written in C++ and compiled into bytecode.

The ClassLoader, therefore, has to check general rules for consistency within ClassFiles. If a class fails these checks, it isn't loaded and an attack on an end-users system fails. It is an important part of the Java security system.

Automatic Memory Management and Garbage Collection  Although discussed previously as part of the language, we will revisit this again because it implements in the runtime environment. In C or C++, the programmer is responsible for allocating and deallocating memory and needs to keep track of the pointers to all of the objects in memory. This can result in memory leaks, dangling pointers, null pointers, and more bugs that are very difficult to find and fix.

By having automatic memory management, Java gets around these problems and makes life easier for the programmer. It does more than that, unfortunately. Leaving memory management up to the programmer can possibly introduce new and interesting bugs or allow for a bit of mischief. Manual allocation and deallocation of bugs opens the door for unauthorized replication of objects, impersonation of trusted objects, and attacks on data consistency.

Here is an example of how a programmer might go about impersonating a trusted class (for instance, the ClassLoader) if Java did not have automatic deallocation of memory. First, the program would create a legitimate object of class MyFakeClassLoader and a pointer to refer to that object. Now, with a little sleight-of-hand and knowledge of how allocation and deallocation work, the programmer removes the object from memory but leaves the memory pointer. He then instantiates a new instance of ClassLoader, which happens to be the exact same size, in the same memory space and voila! The pointer is now referring to the other class and the programmer has access to methods and variables that are supposed to be private. This scenario is not possible in Java, however, because of the automatic memory management.

The SecurityManager Class  The Java security model is open to extension when new holes are found. The key to this is the SecurityManager class. This class is a generic class for implementing security policies and providing security wrappers around other parts of Java. This class does not get used by itself-it is simply a base for implementing security in other classes. Actual implementation of security in other objects is accomplished through subclassing the SecurityManager class. Although not a comprehensive list, this class contains methods to

Everything discussed so far has been about Java as a whole. The language has no pointers whether you are working with applets or applications. The bytecode verifier and class-loading mechanisms still apply. Applications in Java function like any other applications in a language, including direct memory access through use of native code. There are some limitations on applets, however, that do not apply to applications.

Applet Security

Java would not have made nearly the splash that it did just by being cross-platform and object-oriented. It was the Internet and applets that put it on the cover of Time magazine. It is also from the Internet that the biggest risks come for Java applets.

Applets are limited Java programs, extended from class Applet, that usually execute embedded in an HTML document. Applets usually load from remote machines and are subject to severe limitations on the client machine.

Restrictions on Applets Arriving Over a Network  Applets arriving on the client machine are subject to the following file system and network restrictions:

NOTE
This set of restrictions on the applet may vary from one implementation of Java to another. For instance, all of these apply when using Netscape Navigator 2.0 or later, but the JDK appletviewer allows you to designate an explicit list of files that can be accessed by applets. The HotJava browser, Netscape Navigator, and the JDK appletviewer (to name a few) all have minor differences too numerous to mention here ranging from handling of certain exceptions to access control lists for applets. If you want to know detailed limitations for a particular browser, I suggest you go to their web site and get the most up-to-date information.

There is some system information available to applets, however. Access to this information depends on the specific Java implementation. There is a method of the System object called getProperty. By calling System.getProperty(String key), an applet can learn about its environment. The information available to the applet is as follows:

Other pieces of information that may or not be available depending on the implementation are

All the limitations discussed here apply to applets received over the network.

Applets Loaded From the Local File System  When applets are loaded locally, they are no longer subject to the same restrictions of a remotely loaded applet. They have the same freedoms as an application, including the ability to

It is up to the implementation of Java to enforce the correct applet restrictions if it has been loaded from a remote source and then cached on a local disk.

Open Issues on Security

In the case of Java, the security is only as good as its runtime implementation. There have been many holes found in Java's security and many of them have been fixed in specific implementations, but these same issues may arise again in future implementations as Java is ported everywhere. After all, each version of the Java VM needs to be written in a platform specific programming language like C and can have its own flaws and weaknesses.

Aside from that, there are many types of malicious behavior that are difficult (if not impossible) to avoid. For instance, no matter what is done to the Java security model, it will not stop someone from putting rude or obscene material in an applet or starting long, resource intensive processes. These things are not bugs and will continue to be nuisances at times.

Some holes have been found in various implementations of Java. A couple of the attacks current as of this writing are

There have been many other techniques discovered to load native code, connect to hosts other than the one an applet was loaded from, read and write the local file system, and so on.

The next release of Java, version 1.1 introduces several new API's to the core set of Java API's, including a new Security API. Although not yet published by Sun at the time of this writing, the new API promises to allow for encryption, decryption, and digital signature capability in Java class files.

Every implementation of Java has its own open issues and Sun's is no exception. The best thing to do is to keep on top of the issues for the implementation you are using.

Further References on Java and Security

The following references will help you keep up with the changing world of Java security. It is by no means a comprehensive list but should get you started on researching the topic further and give you some valuable starting places from which to continue your research.

UseNet:

alt.2600

comp.risks

comp.lang.java

comp.infosystems.www.*

WWW:

Security bugs in Java by David Hopwood:

http://ferret.lmh.ox.ac.uk/~david/java/

Netscape Navigator Java Security FAQ:

http://www.netscape.com/

Low-Level Security in Java, Frank Yellin, Sun Microsystems:

http://java.sun.com

Java Security, Joseph A. Bank, MIT:

http://www-swiss.ai.mit.edu/~jbank/javapaper/javapaper.html