Chapter 18

The Microsoft Java Virtual Machine

by Bryan Morgan


CONTENTS

Early in 1996, Microsoft and Sun Microsystems entered into an unusual agreement. The tremendous growth of Java pushed both companies to try to work together after competing against each other for years for sales of operating systems, development tools, and applications. Sun had already released the Java Developer's Kit (version 1.0) and had produced Java Virtual Machines (JVMs) for the Sun Solaris and Windows 95/NT platforms. Work was also under way on a JVM for the Apple Macintosh.

Microsoft had repeatedly said that it supported Java, but in fact it had nothing to show for this oral support. Through their agreement, Sun Microsystems agreed to license the JVM specification to Microsoft. Microsoft agreed to produce the reference implementation of the JVM for the Windows platforms. This essentially means that Microsoft has agreed to provide a complete implementation of the JVM, to be included with future versions of the Windows operating system. Any changes or additions made to the Microsoft Virtual Machine (VM) will be licensed back to Sun for possible inclusion in the official JVM specification.

The Microsoft VM

After several months of work, Microsoft has now released its JVM. It is currently available in Microsoft's Internet Explorer 3.0 (or higher) Web browser and is used by Visual J++ to compile and run Java code. This Virtual Machine has been dubbed the "Microsoft Windows Virtual Machine for Java." Besides being a mouthful of words, this title also hints that there is something different under the hood that separates this JVM from the Sun specification. (Those familiar with Microsoft have probably already suspected this.)

Before immediately dismissing the Microsoft VM and Visual J++ in general because of these differences, remember that the Microsoft VM is a complete JVM built according to the Sun specification. Every feature in a standard virtual machine is included in the Microsoft VM. All packages included in the Sun JVM (such as java.awt, java.io, java.net, and java.util) are present in the Microsoft VM. Because of Sun's licensing requirements, developers should feel secure that any Java code written using Visual J++ will run just fine within any Java-aware browser on any platform.

CAUTION
Although code written using Visual J++ and the Microsoft VM can run unmodified in any Web browser that supports Java, this is not to say that all Web browser implementations are perfect. The Sun VM specification has a few gray areas that have yet to be cleared up. This has resulted in Java code that runs fine under one browser (such as MSIE) but will not run in another (such as Netscape Navigator). This has nothing to do with the Visual J++ compiler; rather, the problem lies in the Web browser implementation itself.

In addition to providing a complete implementation of the JVM, Microsoft also chose to extend its virtual machine so that it could be used by all Windows applications, not just the Web browser. This extension gives the Microsoft VM the capability to load component object model (COM) classes and expose COM interfaces of Java classes. This means that the Microsoft VM allows Java classes to exist as both standard Java classes and COM objects. ActiveX controls are examples of COM objects that could be created using Java. (It should also be pointed out that the Microsoft VM itself is implemented as an ActiveX control.) In short, the Microsoft VM allows Java classes to be reused by nearly all popular Windows programming environments.

The remainder of this chapter introduces the Sun JVM specification and explains the Microsoft VM extensions. This discussion is not intended to discuss the actual "plumbing" details, but rather to give you an idea of what goes on when a JVM interprets and runs Java code.

The JVM Specification

As the creator of Java and the Virtual Machine, Sun Microsystems owns the specification for the JVM. Sun currently works with other virtual machine builders (such as Apple, IBM, and Microsoft) to ensure that all versions of the JVM remain true to the overall Sun specification. This specification lays out in detail the Java runtime architecture, bytecodes and their format, and Java class-file format. All capabilities described in the specification must be included by every licensee of the JVM.

NOTE
Although every feature must be implemented by a licensee, how that feature is implemented is generally left up to the individual licensees. For instance, the Microsoft VM performs garbage collection of objects in memory; however, it uses a "stop and copy" scheme instead of the standard "mark and sweep" paradigm.

Some abstractions are completely left up to the implementor, such as the following:

The initial specification also says that all compliant JVMs will initially provide the capability to interpret and run Java classes. Although this does not rule out the production of machine code by Java compilers, it does ensure the capability to run interpreted Java applets and applications. (Tools currently used to interpret and run Java applications include Sun's java interpreter and the Microsoft jview interpreter.)

Components of the JVM

The JVM, like a true computer hardware machine, uses several components to manage and run programs. Readers familiar with microprocessor organization will notice some similarities between these machines and the JVM. The following components make up the JVM:

Each of these components is used to track and manage Java code at runtime.

Java Instruction Set

Think of the Java instruction set as a set of bytecodes that are Java "machine code." Java compilers take Java source code and produce class files whose contents are made up of Java instruction set commands. A Java instruction set command consists of an opcode specifying the operation to be performed and zero or more operands supplying parameters that will be used by the operation. Instruction set opcodes are always one byte long. The operand's size may vary. If an operand's size is greater than one byte, it is stored in "big-endian" order with the high-order byte first. (The section "The Java Instruction Set" lists the common Java opcodes and their meanings.)

Primitive Data Types

Each JVM must natively support a set of primitive data types, including int, boolean, char, short, byte, float, double, and long. These data types are managed by the compiler; the programmer does not have to create them specifically. In other words, operations involving the primitive data types are supported at the opcode level.

JVM Registers

The JVM maintains a set of internal registers used to store the machine state while a program is executing. This set of registers is analogous to the hardware registers in a microprocessor. Some examples of the JVM registers follow:

Each register is defined to be 32 bits wide.

JVM Stack

The JVM stack is used to track the state of a single method invocation and is made up of local variables, an execution environment, and the operand stack.

Local variables are all 32 bits wide. In the case of long integers or double precision floating-point values (that is, 64-bit numbers), two local variables are addressed by the index of the first local variable. Specific opcodes are defined to load the values of local variables onto the stack and also to store variables from the stack into local variables. Local variables are addressed as indices from the vars register and therefore can be treated as an array.

The execution environment is used to maintain the operations of the Java stack itself. The environment maintains pointers to the previous stack frame, its own local variables, and operand stack base and top.

When compiled, each Java method defines a list of "catch clauses" to describe the instruction range for which the method is active and the type of exception that it is to handle. The catch clause list for the current method is searched for a match each time an exception is thrown by the executing program. Before the system can branch to the exception handler, the exception must be verified against the catch clause. If the exception is within the instruction range and the exception is a subtype of the type of exception that the clause handles, the program branches to the exception handler. When no handler is found, the stack frame is popped and the exception is raised again.

JVM Garbage-Collected Heap

The JVM garbage-collected heap is the area from which runtime objects are allocated. Unlike traditional system heaps of memory, the JVM monitors which objects are in use. When an object is determined to be unused throughout the remainder of the program, the JVM automatically frees the memory for that object. The JVM specification does not specify how the memory is to be freed up. Many garbage collection strategies use the mark-and-sweep algorithm, which goes through the following steps:

  1. Every object in memory is examined.
  2. Objects that are in use are marked.
  3. The virtual machine sweeps back through memory. All objects left unmarked are freed.

The Sun Microsystems implementation of the JVM uses this algorithm. Although it is fast, it often results in the fragmentation of memory. The Microsoft VM uses a different algorithm known as "stop and copy," which goes through the following steps:

  1. Every object in memory is examined.
  2. Objects that are in use are copied immediately to another area in memory.
  3. At the end, all objects that were not copied are freed.

Stop and copy requires a greater amount of memory to accomplish the same result achieved by the mark-and-sweep algorithm. However, it does not result in memory fragmentation.

The Constant Pool

When a Java applet or application is run, all the objects are loaded into the constant pool (or class constant pool). This pool of data contains the names of all fields and methods and other such information that is used by methods in the class. When the class is first read in from memory, the class structure has two fields related to the constant pool, nconstants, which indicates the number of constants contained in this class's constant pool, and constant_into.constants_offset., which contains an integer offset (in bytes) from the start of the class to the data that describes the constants in the class.

The constant pool is treated as an array named constant_pool. constant_pool[0] can be used by the implementor for anything. The remainder of the constant_pool array is used to store the sequence of bytes in the class object beginning at the constant_info.constants_offset byte location.

The Java Instruction Set

The Java instruction set, as described earlier, is a set of bytecodes that is equivalent to compiled languages' machine code. These bytecodes are interpreted by the JVM at runtime and are subsequently converted into the machine code for whichever platform the JVM is running on. Table 18.1 contains descriptions of some of the more common Java instructions.

Table 18.1. A partial listing of the Java instruction set.

OpcodeDescription
bipush Push 1-byte signed integer onto the stack
sipush Push 2-byte signed integer onto the stack
ldc1 Push 1-byte value from constant pool
ldc2 Push 2-byte value from constant pool
ldc2w Push long or double from constant pool
aconst_null Push null object
iconst_m1 Push integer constant Ð1 onto the stack
iload Load integer from local variable
lload Load long from local variable
fload Load float from local variable
dload Load double from local variable
aload Load local object variable
istore Store integer into local variable
lstore Store long into local variable
fstore Store float into local variable
dstore Store double into local variable
astore Store object reference into local variable
iincIncrement local variable by constant
newarray Allocate new array
anewarray Allocate new array of objects
multianewarray Allocate new multidimensional array
arraylength Get length of array
iaload Load integer from array
laload Load long from array
faloadLoad float from array
iastore Store into integer array
lastore Store into long array
fastore Store into float array
dastoreStore into double array
nop Do nothing (no-op)
pop Pop top stack word
pop2 Pop top two stack words
dup Duplicate top stack word
dup2Duplicate top two stack words
swapSwap top two stack words
iadd Integer add
ladd Long add
fadd Float add
dadd Double add
isub Integer subtract
lsub Long subtract
fsub Float subtract
dsub Double subtract
imul Integer multiply
lmul Long multiply
fmul Float multiply
dmul Double multiply
idiv Integer division
ldiv Long division
fdiv Float division
ddiv Double division
imod Integer modulus
lmod Long modulus
fmod Float modulus
dmod Double modulus
ineg Integer negate
lneg Long negate
fneg Float negate
dnegDouble negate
ishl Integer shift left
ishr Integer shift right
iushr Integer logical shift right
lshl Long shift left
lshr Long shift right
lushr Long logical shift right
iand Integer boolean and
land Long boolean and
ior Integer boolean or
lor Long boolean or
ixor Integer boolean xor
lxorLong boolean xor
i2l Integer to long conversion
i2f Integer to float conversion
i2d Integer to double conversion
l2i Long to integer conversion
l2f Long to float conversion
l2d Long to double conversion
f2i Float to integer conversion
f2l Float to long conversion
f2d Float to double conversion
d2i Double to integer conversion
d2l Double to long conversion
d2f Double to float conversion
int2byte Integer to byte conversion
int2char Integer to char conversion
int2shortInteger to short conversion
ireturn Return integer from method
lreturn Return long from method
freturn Return float from method
dreturn Return double from method
areturn Return object reference from method
returnReturn void from method
tableswitch Access jump table by index and jump
lookupswitchAccess jump table by key match and jump
putfield Set field in object
getfield Get field in object
putstatic Set static field in class
getstatic Get static field in class
invokevirtual Invoke class method
invokenonvirtual Invoke nonvirtual method
invokestatic Invoke static method
invokeinterface Invoke interface method
athrowThrows an exception
new Creates a new object
newfromname Creates a new object from a given name
checkcast Make sure object is of a given type
instanceof Determine whether object is of given type
verifystackVerify that stack is empty
breakpointCalls the breakpoint handler

The instructions listed in Table 18.1 are used to manipulate primitive data types and objects (classes and interfaces) using the JVM registers, stack, and heap. When Java source code is compiled by Visual J++, each line of Java code is then converted into an associated set of instructions from the instruction set listed in Table 18.1. These instructions are combined and placed into a class file for the JVM to interpret into machine code.

Class Files

Class files receive their names from several sources. The most obvious source is that each Java class object, when compiled, is stored in one of these files. These files also always end with the .class extension. Each public or private class or interface is compiled into a class file with the name objectname.class.

The JVM specification describes the actual format of a class file. Whereas other portions of the specification allow for some variance, the class file format is very exact in its specifications.

The format of this file groups class fields into a structure that closely resembles a C language struct programming structure. The types u1, u2, and u4 are used to designate unsigned 1-, 2-, or 4-byte quantities.

NOTE
Each class file begins with the magic field. For a JVM to interpret the class file, the value of this field must be 0xCAFEBABE.

The remainder of the class file contains a large number of fields used to describe the variables and methods used in the class. Also contained in the file are the opcodes that make up each method.

Limitations of the JVM

Because Java's designers intentionally force virtual machine implementors to work with a set of constraints, the JVM has a few restrictions. These restrictions are necessary to ensure a high level of portability of executable content across multiple platforms (the goal of Java). The following are some inherent restrictions:

Most of these restrictions are of no consequence to the average developer. (Have you ever passed more than 256 parameters to a method? If so, and you were working on a development team, you probably were not employed for long!) The only serious restrictions that could cause problems are perhaps the 256-method-per-class limit and the overall memory addressing limit of 4GB. These limits are mentioned here so that the Visual J++ developer is aware of the limits of the underlying Microsoft VM.

Integrating Java and COM Using the Microsoft VM

As mentioned at the beginning of the chapter, the Microsoft VM is the reference implementation of the JVM for the Windows platform. Any Java bytecodes produced by any compiler on any platform should be able to run without modification using the Microsoft JVM in Internet Explorer. (In reality, bugs in the Internet Explorer browser may prevent a few select operations from working properly. However, this is not because of any Microsoft extensions to the JVM.)

There are some major features in the Microsoft JVM that separate it from every other virtual machine built by all the other major manufacturers (IBM, Sun, Apple, Netscape, and so on). Because of the many similarities between COM and Java, the Microsoft VM also supports the integration of COM objects (such as ActiveX controls) and Java classes. This is possible because the Microsoft VM is capable of loading COM objects and exposing their interfaces. Because of the many similarities between COM and Java, COM objects can be created and used in Java classes. In fact, to the Java programmer, there is really no difference between the two. All the details involved with binding to the native COM code are hidden from the developer.

If COM objects can be used in Java code, it makes sense that the Microsoft VM would be able to expose Java classes as COM objects as well. (It does!) This capability allows Java classes to be exposed as COM objects and therefore to be reused by applications written in other programming languages such as Visual Basic, C++, or Delphi.

Combining COM and Java

The fact that COM objects can be used in Java programs (and vice versa) is a cause of concern to many professional software developers and Java enthusiasts. When COM and Java are mixed together using tools such as Visual J++ and the Microsoft VM, Java loses one of its greatest assets: platform independence. In return, the Java developer gains access to a huge base of existing code written in a variety of languages. Throughout this book, it has been (and will continue to be) emphasized that programmers who want to use ActiveX/COM and Java should think carefully about the potential users of their applications. In the intranet case, in which all users are using Windows95/NT and the Internet Explorer browser, a strong argument can be made for a combination of ActiveX and Java. In the Internet case, in which a variety of platforms and browsers exist, Java allows the developer to reach the broadest audience.

Another way of looking at this same issue is to view Java as just a great programming language. This is the basis of Microsoft's Java strategy. If you have been actively involved with building Windows applications and plan to do so for some time, the Visual J++ compiler combined with the Microsoft VM gives you a powerful new tool to use in building these applications.

Using COM Objects in Java

Chapter 7 "Advanced Java Programming," introduces the concept of native methods. Native methods in Java allow Java code to indirectly call methods written in other languages such as C or C++. This native code interface is supplied by every JVM, including the Microsoft VM. The Microsoft VM also provides another method for reusing existing methods written in other programming languages: by using COM objects in Java.

From the programming language level, COM objects look identical to Java objects when used in Java code. This is because COM, like Java, supports the following:

Because of these similarities, it was clear early on that COM and Java could be integrated using extensions to the JVM. To start, the virtual machine must understand how to extract class and interface information from COM objects. Fortunately, COM object information is stored in a file known as a type library (similar in purpose to a Java class file). After the type library has been examined, it is converted to Java class files. Special attributes are added to the class files so that the Microsoft VM will know that the objects represented by these class files are COM objects, not simply Java objects. (This is an example of a Microsoft extension that would render a class file useless on other platforms.)

Identifying COM Objects

Recall that regular Java class files can be imported by a class using statements such as the following:


import java.awt.*;

import sun.tools.debug;

import activex.*;

When the Java compiler encounters the import keyword, it examines the CLASSPATH environment variable on the local machine to determine which directories contain Java classes. Using these three statements, these class files would be located in the following directories:


$CLASSPATH/java/awt/*

$CLASSPATH/sun/tools/debug

$CLASSPATH/activex/*

If the classes are not found in the specified directory, most Java compilers would generate an error. The Microsoft Visual J++ compiler goes a step further, however, to try to locate potential COM objects using the following methodology:

  1. It examines the directories listed in the CLASSPATH environment variable for Java class files.
  2. If no class files are found, these same directories are searched for type library files (*.tlb) or interface definition language files (*.idl).
  3. If none of these files is found, the Windows Registry is examined to retrieve type library information about the referenced objects.

When the COM information is obtained from the type library, interface definition file, or Registry entries, corresponding Java class files are built that allow the COM objects to be used in Java code.

Memory Management of COM Objects

Because Java frees you from having to deallocate memory when objects are no longer used, this same capability must be present for COM objects as well. Fortunately, COM objects support a similar memory management scheme known as reference counting. Reference counting is a COM object's process of maintaining an internal count of how many other objects are currently referencing it. An object's reference count is decremented whenever its release() method is called. When the reference count goes to 0, the COM object will free itself. Java differs slightly from this model because the JVM is responsible for tracking the reference count of objects, not the individual objects themselves.

The Microsoft VM performs double duty in this area. It still tracks the usage of all Java objects and frees them when they are no longer used. In addition, it also calls each COM object's release() method when it detects another object is no longer using it. When the COM object's reference count goes to 0, it removes itself from memory.

Limitations of Java/COM Integration

One of the primary reasons that Java and COM could be integrated so smoothly and quickly is that Java provides the capability for an object to implement multiple interfaces. As explained in Chapter 16, "Java and the Component Object Model," ActiveX relies heavily on interfaces that objects must implement so that other objects can reuse them. Without the notion of multiple interfaces, it would be very difficult to integrate Java and COM in a transparent manner.

Despite this advantage, Java does differ from COM in some respects. These differences put some restrictions on the level of integration between Java and COM. First, recall that the key to using Java and COM objects together is the COM objects' type library. The Microsoft VM imports the COM type library and subsequently constructs a Java class file. In theory, this is a wonderful concept. However, some COM objects cannot be described within a type library because of limitations of the type library model. Therefore, COM objects that fall into this category cannot be implemented in Java.

Also recall that Java only supports single inheritance of objects. COM supports a type of inheritance known as aggregation: An object can aggregate a set of interfaces obtained from another object and present them as its own. (Note that it did not inherit from these interfaces.) This set of interfaces is known as an aggregate. There can be multiple aggregates within a single COM object. To the Java programmer, this would appear as a single class inheriting methods from multiple classes, which is not allowed. Because this goes against the fundamental concepts of Java, multiple aggregation in imported COM objects is not allowed.

Summary

The true magic of Java is the underlying JVM. It is this virtual machine that converts the platform-independent bytecodes into platform-dependent machine code at runtime. The JVM specification is controlled by Sun Microsystems. This specification lists the capabilities that an implementor must support to produce a compliant virtual machine.

All virtual machines resemble hardware machines in some respects. They all include support for an instruction set, primitive data types, registers, program stack, and object heap.

The instruction set for a JVM contains a large group of instructions that are responsible for loading and unloading objects and their members. In addition, the JVM specification defines in detail the format of the Java class file. Any extensions to the specification in this area can be ignored by virtual machines that do not support these extensions. Therefore, Microsoft's class file extensions that provide information about COM objects will be ignored by other virtual machines that do not support these COM extensions. Unfortunately, it is impossible to predict how each environment will react when these extensions are encountered. Be prepared to witness your ActiveX-enabled Java code crash a browser (such as Netscape Navigator) that does not support ActiveX. (All it takes is one null memory address or a bad pointer reference and the whole browser can come crashing down.)

The Microsoft VM provides the standard functionality described in the JVM specification. In addition, Microsoft also allows Java objects to use COM objects, and vice versa, because the object models of Java and COM are extremely similar. Both support inheritance, multiple interfaces, and dynamically created and linked objects. Java class definitions are compiled and stored as bytecodes within class files; COM objects' definitions are compiled and stored as type libraries. One of the primary tasks of the Microsoft VM is to translate between class files and type libraries so that neither COM nor Java can tell the difference between the other's objects and its own.

For the developer, the advantage of this COM/Java integration is that he or she can easily reuse existing ActiveX controls in Java applications. Java classes can also be included as programming objects in standard Windows programming languages such as Visual Basic, C++, and Delphi. The result of this is that Visual J++ should not be viewed as just a Web development tool. Instead, the Microsoft VM allows Visual J++ to claim the same status and level of support as Microsoft's other flagship products: Visual Basic and Visual C++.