Dynamic Runtime Layer Developer's Guide

Revision History

Disclaimer

1. About this Document

1.1 Purpose

1.2 Intended Audience

1.3 Using This Document

1.4 Conventions and Symbols

2. VM Architecture

2.1 Overview

2.2 About Components

2.3 Major DRL Components

2.4 Data Structures

2.5 Initialization

2.6 Root Set Enumeration

2.7 Finalization

3. VM Core

3.1 Architecture

3.2 Class Support

3.3 Native Code Support

3.4 Stack Support

3.5 Thread Management

3.6 Kernel Classes

3.7 Kernel Class Natives

3.8 VM Services

3.9 Exception Handling

3.10 JVMTI Support

3.11 Verifier

3.12 Utilities

3.13 Public Interfaces

4. JIT Compiler

4.1 Architecture

4.2 Front-end

4.3 Optimizer

4.4 Code Selector

4.5 IA-32 Back-end

4.6 Utilities

4.7 Public Interfaces

4.8 Jitrino.JET

5. Execution Manager

5.1 Architecture

5.2 Recompilation Model

5.3 Profile Collector

5.4 Profiler Thread

5.5 Public Interfaces

6. Garbage Collector

6.1 Architecture

6.2 GC Procedure

6.3 Object Allocation

6.4 Public Interfaces

7. Interpreter

7.1 Characteristics

7.2 Internal Structure

7.3 Support Functions

8. Porting Layer

8.1 Characteristics

8.2 Component Manager

8.3 Public Interfaces

9. Class Libraries

9.1 Characteristics

9.2 Packaging Structure

10. Inter-component Optimizations

10.1 Fast Subtype Checking

10.2 Direct-call Conversion

10.3 Fast Constant-string Instantiation

10.4 Lazy Exceptions

11. References

Revision History

Version Version Information Date
Initial version Intel, Nadya Morozova: document created. November 16, 2005
Version 1.0 Intel, Nadya Morozova: document updated and expanded. March 2, 2006

Disclaimer and Legal Information

Copyright 2005-2006 The Apache Software Foundation or its licensors, as applicable.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Portions, Copyright © 1991-2005 Unicode, Inc. The following applies to Unicode.

COPYRIGHT AND PERMISSION NOTICE

Copyright © 1991-2005 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html. Permission is hereby granted, free of charge, to any person obtaining a copy of the Unicode data files and any associated documentation (the "Data Files") or Unicode software and any associated documentation (the "Software") to deal in the Data Files or Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Data Files or Software, and to permit persons to whom the Data Files or Software are furnished to do so, provided that (a) the above copyright notice(s) and this permission notice appear with all copies of the Data Files or Software, (b) both the above copyright notice(s) and this permission notice appear in associated documentation, and (c) there is clear notice in each modified Data File or in the Software as well as in the documentation associated with the Data File(s) or Software that the data or software has been modified.

THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE.

Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in these Data Files or Software without prior written authorization of the copyright holder.

2. Additional terms from the Database:

Copyright © 1995-1999 Unicode, Inc. All Rights reserved.

Disclaimer

The Unicode Character Database is provided as is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. If this file has been purchased on magnetic or optical media from Unicode, Inc., the sole remedy for any claim will be exchange of defective media within 90 days of receipt. This disclaimer is applicable for all other data files accompanying the Unicode Character Database, some of which have been compiled by the Unicode Consortium, and some of which have been supplied by other sources.

Limitations on Rights to Redistribute This Data

Recipient is granted the right to make copies in any form for internal distribution and to freely use the information supplied in the creation of products supporting the UnicodeTM Standard. The files in the Unicode Character Database can be redistributed to third parties or other organizations (whether for profit or not) as long as this notice and the disclaimer notice are retained. Information can be extracted from these files and used in documentation or programs, as long as there is an accompanying notice indicating the source.

1. About This Document

1.1 Purpose

This document introduces DRL, the dynamic run-time layer, explains basic concepts and terms, and gives an overview of the product's structure and interfaces for inter-component communication. Special focus is given to the virtual machine, DRLVM. Use this document to focus on the DRLVM implementation specifics and to understand the internal peculiarities of the product.

The document describes version 1 of the DRL virtual machine donated in March 2006.

1.2 Intended Audience

The target audience for the document includes a wide community of engineers interested in using DRLVM and in working further with the product to contribute to its development.

1.3 Using This Document

This document consists of several major parts describing the key processes and components of the DRL virtual machine, as follows:

Part 2. VM Architecture provides an overview of the DRL component-based model and describes the major inter-component processes running inside the virtual machine, such as root set enumeration and object finalization. The part also comprises a section about data structures used in DRLVM. You can start with this part to learn about major principles of VM internal operation.

Part 3. VM Core gives an in-depth description of the core virtual machine and its subcomponents responsible for various functions of the virtual machine, including stack walking and thread management.

Part 4. JIT Compiler describes compilation paths and specific features of the DRL just-in-time compiler. Consult this part of the guide for details on optimizations implemented in the DRL JIT compiler and its code generation path.

Part 5. Execution Manager shows the details of the dynamic profile-guided optimization subsystem. In this part, you can find information on method profiles and recompilation logic.

Part 6. Garbage Collector focuses on object allocation and garbage collection processes. This part contains a description of the garbage collector component and of its interaction with the VM core.

Part 7. Interpreter has a description of the interpreter component and its debugging services.

Part 8. Porting Layer gives an overview of platform-dependent functionality used in DRL. The part also includes an overview of the component manager.

Part 9. Class Libraries gives information on the layout and characteristics of the Java* class libraries interacting with the DRL virtual machine.

Part 10. Inter-component Optimizations is devoted to performance-improving operations that involve multiple components.

Part 11. References lists the links to external materials supporting this document. These materials include specifications, manual on programming, and articles on specific issues. You can consult this part of the document for directions on investigating a specific problem or alternative ways of implementing specific features.

Back to Top

1.4 Conventions and Symbols

This document uses the unified conventions for the DRL documentation kit.

The table below provides the definitions of all acronyms used in the document.

Acronym Definition
API Application Program Interface
APR Apache Portable Runtime Layer
CFG Control Flow Graph
CG Code Generator
CLI Common Language Interface
DFG Data Flow Graph
DPGO Dynamic Profile-guided Optimizations
DRL Dynamic Run-time Layer
DRLVM Dynamic Run-time Layer Virtual Machine
EE Execution Engine
EM Execution Manager
FP Floating Point
GC Garbage Collector
HIR High-level Intermediate Representation
IR Intermediate Representation
J2SE* Java* 2 Standard Edition
JCL Java* Class Libraries
JIT Just-in-time Compiler
JNI Java* Native Interface
JVM Java* Virtual Machine
JVMTI JVM Tool Interface
LIR Low-level Intermediate Representation
LMF Last Managed Frame
LOB Large Object Block
LOS Large Object Space
OS Operating System
PC Profile Collector
SIMD Single Instruction Multiple Data
SOB Single Object Block
SSA Single Static Assignment
SSE, SSE2 Streaming SIMD Extensions (2)
STL Standard Template Library
TBS Time-based Sampling
TLS Thread Local Storage
TM Thread Manager
VM Virtual Machine, same as JVM in current document

Back to Top

2. VM Architecture

2.1 Overview

The Dynamic Runtime Layer (DRL) is a clean-room implementation of the Java* 2 Platform, Standard Edition (J2SE*) 1.5.0. This Java* run-time environment consists of the virtual machine (DRLVM), and a set of Java* class libraries (JCL). The product is released in open source. The virtual machine is written in C++ code and a small amount of assembly code. This document focuses on the virtual machine, and gives a short overview of the class libraries supporting it.

Key features of DRL include the following:

2.2 About Components

The DRL virtual machine reconciles high performance with the extensive use of well-defined interfaces between its components.

2.2.1 Components, Interfaces, and Instances

A component corresponds to one static or dynamic library, and several libraries linked statically or dynamically at run time make up the managed run-time environment. For details on components linking, see section 2.2.2 Linking Models.

DRLVM components communicate via functional interfaces. An interface is a pointer to a table of function pointers to pure C methods. Interfaces have string names, which unambiguously identify their function table layout. Each component exposes the default interface to communicate with the component manager, and one or more interfaces for communication with other components.

Note

In the current version, only the execution manager uses the component manager. Other components will migrate to this new model in further releases.

DRL can also operate with co-existing component instances, as requires the Invocation API [7]. An instance of a component contains a pointer to its default interface and component-specific data. The porting layer always has exactly one instance. This allows a compiler to in-line calls to the porting layer functions. Other components have the same number of instances as the VM core does.

Background

In Java* programming, components, interfaces, and instances can be described in terms of classes, interfaces and objects. A VM component encapsulates common features, attributes, and properties of virtual machines, and maps to a Java* class. VM interfaces are tables of methods implemented and exposed by the class. If several virtual machines exist in the same address space, they all expose the same interfaces. These VM instances are instances of the VM class, or objects.
The component manager enables explicit creation of component instances by exposing the CreateNewInstance() function, which corresponds to the Java* operator new(). Components with only one instance correspond to static class methods in Java*. All components are initialized at load time.

Subsequent sections define each component and provide information on public interfaces, dependencies and other component specifics.

2.2.2 Linking Models

Libraries corresponding to different DRL components are linked by one of the following models:

Back to Top

2.3 Major DRL Components

Figure 1 below displays the major DRL components and their interfaces.

Major DRL Components

Figure 1. Major DRL Components

Figure 1 demonstrates the DRL Java* virtual machine major components, and the class libraries that support the machine. These components are responsible for the following functions:

Depending on the configuration, you can use multiple execution engine components, for example, an interpreter and optimizing JIT. Simultaneous use of multiple JIT compilers can provide different trade-offs between compilation time and code quality.

Back to Top

2.4 Data Structures

This section provides an overview of data structures in DRLVM, typical examples of data structures, and the exposed data layout of public data structures.

In DRLVM, all data structures are divided into the following groups:

For example, when compiling an access operation to an instance field, the JIT calls the public VM_JIT interface function to obtain the offset, and uses the result to generate the appropriate load instruction. Another example is the VM core internal representation of a class object.

2.4.1 Object Layout

DRLVM exports data structures in accordance with the JNI [5] and JVMTI [4] standards. In addition to these structures, DRLVM shares information about an object layout across its components. In particular, the Java Native Interface does not specify the structure of jobject, and DRLVM defines it as illustrated below.

typedef struct ManagedObject {
  VTable *vt;
  uint32 obj_info;
  /* Class-specific data */
} ManagedObject;
struct _jobject { ManagedObject* object; }
typedef struct _jobject*  jobject;

The jobject structure contains the following elements:

Class-specific instance fields immediately follow the vt and obj_info fields. Representation of array instances is shared between the garbage collector and the JIT compiler. The VM core determines the specific offsets to store the array length and the first element of the array. This way, the VM core makes these fields available for the garbage collector and the JIT via the VM interface.

Example
The excerpt of code below illustrates the usage of an object structure in DRLVM for the GetBooleanField() JNI function.
typedef jobject ObjectHandle;

jboolean JNICALL GetBooleanField(JNIEnv *env,
                                 jobject obj,
                                 jfieldID fieldID)
{
    Field *f = (Field *) fieldID;
    /* Initialize the class if the field is accessed */
    if (!ensure_initialised(env, f->get_class())) {
        return 0; /* Error */
    }

    ObjectHandle h = (ObjectHandle) obj;

    tmn_suspend_disable();       //-- Do not allow GC suspension --v
    Byte *java_ref = (Byte *)h->object;
    jboolean val = *(jboolean *)(java_ref + offset);
    tmn_suspend_enable();        //--------------------------------^

    return val;
} // GetBooleanField

2.4.2 Compressed References

To decrease memory footprint on 64-bit platforms [11], direct object and VTable pointers are compressed in the Java* heap to 32-bit values.

To calculate a direct heap pointer, the system adds the pointer to the heap base to the compressed value from the reference field. Similarly, a direct pointer to an object VTable equals to the compressed value stored in the first 32 bits of the object plus the base VTable pointer. This limits the maximum heap size to 4 GB, but significantly reduces the average object size and the work set size, and improves cache performance.

Apart from the basic assumptions about object layout and the VTable cache, all interaction between major DRLVM components is achieved through function calls.

Back to Top

2.5 Initialization

VM initialization is a sequence of operations performed at the virtual machine start-up before execution of user applications. Currently, DRLVM does not support the invocation API [7], and initialization follows the sequence described below. The subsection 2.5.3 Destroying the VM below also describes the virtual machine shutdown sequence.

The main(…) function is responsible for the major stages of initialization sequence and does the following:

  1. Initializes the logger
  2. Performs the first pass arguments parsing
  3. Creates a VM instance by calling the create_vm() function
  4. Runs the user application by calling the VMStarter.start() method
  5. Destroys the VM instance by calling the destroy_vm() function

The subsequent sections describe these initialization stages in greater detail.

2.5.1 First pass Arguments Parsing

At this stage, the VM splits all command-line arguments into the following groups:

The virtual machine then creates the JavaVMInitArgs structure from <vm-arguments>.

2.5.2 Creating the VM

The create_vm() function is a prototype for JNI_CreateJavaVM() responsible for creating and initializing the virtual machine. This function does the following:

  1. For Linux* platforms, initializes the threading system.
    No actions are performed on Windows* platforms. Other steps apply to both operating systems.
  2. Attaches the current thread. This is the first step of the three-step procedure of attaching the thread to the VM. See steps 15 and 19 for further steps of the attaching procedure.
    1. Creates synchronization objects.
    2. Initializes the VM_thread structure and stores the structure in the thread local storage.
  3. Initializes the VM global synchronization locks.
  4. Creates the component manager.
  5. Loads the garbage collector and interpreter libraries.
  6. Initializes basic VM properties, such as java.home, java.library.path, and vm.boot.class.path, according to the location of the VM library.
    The list of boot class path .jar files is hard-coded into the VM library. Use –Xbootclasspath command-line options to change the settings.
  7. Initializes system signal handlers.
  8. Parses VM arguments.
  9. Initializes JIT compiler instances.
  10. Initializes the VM memory allocator.
  11. Initializes the garbage collector by calling gc_init().
  12. Preloads basic API native code dynamic libraries.

    Note

    The vm.other_natives_dlls property defines the list of libraries to be loaded.

  13. Initializes the JNI support VM core component.
  14. Initializes the JVMTI support functionality, loads agent dynamic libraries. At this step, the primordial phase starts.
  15. Attaches the current thread and creates the M2nFrame at the top of the stack (step 2).
  16. Initializes the bootstrap class loader.
  17. Preloads the classes required for further VM operation.
  18. Caches the class handles for the core classes into the VM environment.
  19. Attaches the current thread (step 3).
    1. Creates the java.lang.Thread object for the current thread.
    2. Creates the thread group object for the main thread group and includes the main thread in this group.
    3. Sets the system class loader by calling java.lang.ClassLoader.getSystemClassLoader().
  20. Sends the VMStart JVMTI event. This step begins the start phase.
  21. Sends the ThreadStart JVMTI event for the main thread. Send the VMInit JVMTI event. At this stage, the live phase starts.
  22. Calls the VMStarter.initialize() method.

2.5.3 Destroying the VM

The destroy_vm() function is a prototype for JNI_DestroyJavaVM() responsible for terminating operation of a VM instance. This function calls the VMStarter.shutdown() method.

2.5.4 VMStarter class

This Java* class supports specific VM core tasks by providing the following methods:

initialize()
Called by the create_vm() method, does the following:
shutdown()
Called by the destroy_vm() method, does the following:
start()
Runs the user application:

Back to Top

2.6 Root Set Enumeration

DRLVM automatically manages the Java* heap by using tracing collection techniques.

2.6.1 About Roots

Root set enumeration is the process of collecting the initial set of references to live objects, the roots. Defining the root set enables the garbage collector to determine a set of all objects directly reachable from the all running threads and to reclaim the rest of the heap memory. The set of all live objects includes objects referred by roots and objects referred by other live objects. This way, the set of all live objects can be constructed by means of transitive closure of the objects referred by the root set.

Roots consist of:

2.6.2 Black-box Method

In DRLVM, the black-box method is designed to accommodate precise enumeration of the set of root references. The GC considers everything outside the Java* heap as a black box, and has little information about the organization of the virtual machine. The GC relies on the support of the VM core to enumerate the root set. In turn, the VM considers the thread stack as the black box, and uses the services provided by the JIT and interpreter to iterate over the stack frames and enumerate root references in each stack frame.

Enumeration of a method stack frame is best described in terms of safe points and GC maps. The GC map is the data structure for finding all live object pointers in the stack frame. Typically, the GC map contains the list of method arguments and local variables of the reference type, as well as spilt over registers, in the form of offsets from the stack pointer. The GC map is associated with a specific point in the method, the safe point. The JIT determines the set of safe points at the method compilation time, and the interpreter does this at run time. This way, call sites and backward branches enter the list. During method compilation, the JIT constructs the GC maps for each safe point. The interpreter does not use stack maps, but keeps track of object references dynamically, at run time. With the black-box method, the VM has little data on the thread it needs to enumerate, only the register context.

Back to Top

2.6.3 Enumeration Procedure

When the GC decides to do garbage collection, it enumerates all roots as described below.

  1. The garbage collector calls the VM core function vm_enumerate_root_set_all_threads().

    Note

    Currently, the DRLVM implementation does not support concurrent garbage collectors.

  2. The VM core suspends all threads, see section 3.5.4 Safe Suspension.
  3. The VM core enumerates all the global and thread-local references in the run-time data structures: the VM enumerates each frame of each thread stack.
    For each frame produced by the JIT-compiled code, it is necessary to enumerate the roots on that frame and to unwind to the previous frame. For that, the VM calls methods JIT_get_root_set_from_stack_frame() and JIT_unwind_stack_frame().
    1. The VM identifies the method that owns the stack frame by looking up the instruction pointer value in the method code block tables.
    2. The VM passes the instruction pointer and the stack pointer registers to the JIT compiler.
    3. The JIT identifies the safe point and finds the GC map associated with the code address.
    4. The JIT consults the GC map for the safe point, and enumerates the root set for the frame. For that, the JIT calls the function gc_add_root_set_entry() for each stack location, which contains pointers to the Java* heap [12].
      The interpreter uses its own stack frame format and enumerates all thread stack trace when the interpreter function interpreter_enumerate_thread() is called.
  4. The VM core and the execution engine communicate the roots to the garbage collector by calling the function gc_add_root_set_entry(ManagedObject).

    Note

    The parameter points to the root, not to the object the root points to. This enables the garbage collector to update the root in case it has changed object locations during the collection.

  5. The VM core returns from vm_enumerate_root_set_all_threads(), so that the garbage collector has all the roots and proceeds to collect objects no longer in use, possibly moving some of the live objects.
  6. The GC determines the set of reachable objects by tracing the reference graph. In the graph, Java* objects are vertices and directed edges are connectors between the objects having reference pointers to other objects.
  7. The GC calls the VM function vm_resume_threads_after(). The VM core resumes all threads, so that the garbage collector can proceed with the allocation request that triggered garbage collection.

Back to Top

2.7 Finalization

Finalization is the process of reclaiming unused system resources after garbage collection. The DRL finalization fully complies with the specification [1]. The VM core and the garbage collector cooperate inside the virtual machine to enable finalizing unreachable objects.

Note

In DRL, the virtual machine tries to follow the reverse finalization order, so that the object created last is the first to be finalized; however, the VM does not guarantee that finalization follows this or any specific order.

2.7.1 Finalization Procedure

As Figure 2 shows, several queues can store references to finalizable objects:

Object Queues in VM and GC

Figure 2. Finalization Framework

The garbage collector uses these queues at different stages of the GC procedure to enumerate the root set and kick off finalization for unreachable objects, as follows.

Back to Top

  1. Object Allocation

    During object allocation, the garbage collector places references to finalizable objects into the live object queue, as shown in Figure 3. Functions gc_alloc() and gc_alloc_fast() register finalizable objects with the queue.

    Allocation functions and the live objects queue

    Figure 3. Allocation of Finalizable Objects

  2. After Mark Scan

    After marking all reachable objects, the GC moves the remaining object references to the unmarked objects queue. Figure 4 illustrates this procedure: grey squares stand for marked object references, and white square are the unmarked object references.

    Marked Objects moved to Queue 1 and unmarked objects to Queue 2

    Figure 4. Unmarked Objects Queue Usage

  3. Filling in the Finalizable Objects Queue

    From the buffering queue, the GC transfers unmarked object references to the VM queue, as shown in Figure 5. To place a reference into the queue, the garbage collector calls the vm_finalize_object() function for each reference until the unmarked objects queue is empty.

    Unmarked Objects are moved to Queue 3 of finalizable objects

    Figure 5. Finalization Scheduling

  4. Activating the Finalizer Thread

    Finally, the GC calls the vm_hint_finalize() function that wakes up finalizer threads. All finalizer threads are pure Java* threads, see section 2.7.2 Work Balancing Subsystem.
    Each active thread takes one object to finalize and does the following:

    1. Gets references to the object from the VM queue
    2. Removes the reference from the queue
    3. Calls the finalize() function for the object

    If the number of active threads is greater than the number of objects, the threads that have nothing to finalize are transferred to the sleep mode, as shown in Figure 6.

    Active finalizer threads address objects in the finalizable objects queue or sleep

    Figure 6. Finalizer Threads

Back to Top

2.7.2 Work Balancing Subsystem

The work balancing subsystem dynamically adjusts the number of running finalizer threads to prevent an overflow of the Java heap by finalizable objects. This subsystem operates with two kinds of finalizer threads: permanent and temporary. During normal operation with a limited number of finalizable objects, permanent threads can cover all objects scheduled for finalization. When permanent threads are no longer sufficient, the work balancing subsystem activates temporary finalizer threads as needed.

The work balancing subsystem operates in the following stages:

Stage 1: Permanent finalizer threads only
Object allocation starts. Only permanent finalizer threads run. The garbage collector uses the hint counter variable to track finalizable objects, and increases the value of the hint counter by 1 when allocating a finalizable object.
Stage 2: Temporary finalizer activated
The number of objects scheduled for finalization increases, and at some point in time, the hint counter value exceeds a certain threshold (currently set to 128).
At this stage, the garbage collector calls the vm_hint_finalize() function before performing the requested allocation. This function is also called after each garbage collection. The vm_hint_finalize() function checks whether any objects remain in the queue of objects to finalize. If the queue is not empty, this means that the current quantity of finalizer threads is not enough. In this case, the work balancing subsystem creates additional temporary finalizer threads. The number of created temporary threads corresponds to the number of CPUs.
The operation of checking the finalizable objects queue state is performed periodically. The number of running temporary threads can be greater than suffices, because the optimum number of finalizer threads is unknown.

Note

The work balancing subsystem checks whether the finalization queue is empty, but does not take into account the number of objects in the queue.

 
Stage 3: Temporary finalizer threads destroyed
At a certain point, excess finalizer threads can appear, so that the number of objects to finalize starts decreasing. When the number of threads becomes two times greater than the optimal number, the finalizable objects queue should be empty, see explanation below.
When the finalization queue is empty, temporary threads are destroyed and the work balancing cycle restarts.

WBS Internals

Assuming that N is an indefinite optimum number of finalizer threads, you can make the following conclusions:

If N is less or equal to the number of permanent finalizer threads, no temporary threads are created. Otherwise, the number of finalizer threads undergoes the following changes during WBS activity, in the chronological order:

  1. When the hint counter exceeds the pre-set threshold, and the finalization queue is not empty, the work balance subsystem activates temporary threads as needed.
  2. When the number of temporary threads exceeds N, the number the objects starts decreasing; however, the number of finalizer threads continues to grow. By the time the number of finalizer threads reaches 2N, no objects remain in the queue, because at this time an optimum finalization system could finalize the same quantity of objects as current.
  3. When the queue is empty, temporary threads are destroyed, as described in stage 3.

Figure 7 below demonstrates variations in the number of finalizer threads over time.

Number of objects to finalize and of running threads changing dynamically

Figure 7. Variations in Number of Running Finalizer Threads

As a result, the number of running finalizer threads in the current work balancing subsystem can vary between 0 and 2N.

Note

The maximum value for 2N is 256 running finalization threads.

Back to Top

3. VM Core

3.1 Architecture

The core virtual machine is the central part of the overall VM design. The VM core consists of common VM blocks defined by the JVM specification [1] and of elements specific for the DRLVM implementation, as follows:

The structure of the virtual machine enables building stable interfaces for inter-block communication as well as public VM interfaces. These interfaces inherit platform independence from the VM specification [1]. Figure 8 shows the VM core overall structure and the internal logic of components interaction. For more details on available interfaces, see 3.13 Interfaces.

VM Core Components and Interfaces

Figure 8. VM Core Components

Red font indicates external interfaces.

Back to Top

3.2 Class Support

The class support component processes classes in accordance with the JVM specification [1], which includes class loading, class preparation, resolution, and initialization operations. This component also contains several groups of functions that other VM components use to get information on loaded classes and other class-related data structures. For example, JVMTI functions RedefineClasses() and GetLoadedClasses() use utility interfaces provided by class support.

The class support component has the following major goals:

3.2.1 Classification of Class Support Interfaces

Class support functions can be divided into the following groups:

Class Loading
Comprises functions for loading classes, searching for loaded classes inside VM structures, and JVMTI class redefinition. The functions obtain bytes from the Java* class libraries via the descendants of the java.lang.ClassLoader class or from the files and directories listed in the vm.boot.class.path property. These functions also bind loaded classes with the defining class loader and provide information on all loaded classes.
Class Manipulation
Provides access to class properties, such as the internal (VM) and the external (Java*) names, access and properties flags, the super-class and the defining class, as well as super interfaces, inner classes, methods, fields, and attributes.
Supports class resolution by resolving symbolic references in the run-time constant pool of the class.
Method Manipulation
Provides access to the properties of methods of a class, such as the name of method, descriptor, signature, access and properties flags, bytecode, local variables information, stack, exceptions, handlers, attributes, and the declaring class.
Functions of this group also enable adding new versions of JIT-compiled methods code and storing service information about compiled code.
Field Access
Contains functions that provide access to the properties of class fields, that is, to the name, descriptor, containing class, and the class of the field.
Type Access
Provides access to generalized information on classes for the JIT compiler and other DRLVM components. These can easily be adapted to work with non-Java* virtual machines, for example, with the ECMA Common Language Infrastructure. Type access functions provide descriptions of both Java* types, such as managed pointers, arrays, and primitive types, and non-Java* types, such as non-managed pointers, method pointers, vectors, unboxed data, and certain unsigned primitive types.

3.2.2 Internal Class Support Data Structures

The VM core stores information about every class, field, and method loaded as described below.

Back to Top

3.3 Native Code Support

The native code support component consists of two parts, execution of native methods used by Java* classes, and an implementation of the Java* Native Interface (JNI) API for native code. Execution of native methods is required by the Java* Language Specification [2] and JNI is required by JNI Specification [5].

3.3.1 Execution of Native Methods

The virtual machine calls native methods differently with the JIT and with the interpreter as described below.

JNI optimizations

The VM core generates specialized JNI wrappers to support the transition from managed to native code. The straight-forward implementation of these wrappers calls a function to allocate storage and initialize JNI handles for each reference argument. However, most JNI methods have only a small number of reference parameters. To take advantage of this, an in-line sequence of instructions is used to allocate and initialize the JNI handles directly. This improves the performance of applications that contain multiple JNI calls.

3.3.2 JNI Support

The Java* Native Interface is a set of functions, which enable native code to access Java* classes, objects, methods, and all the functionality available for a regular method of a Java* class.

The JNI implementation mostly consists of wrappers to different components of the virtual machine. For example, class operations are wrappers for the class support component, method calls are wrappers that invoke the JIT or the interpreter, and object fields and arrays are accessed directly by using the known object layout.

Example

The following code is implementation of the IsAssignableFrom JNI function, which uses the class support interface:

#include “vm_utils.h”
#include “jni_utils.h”

jboolean JNICALL IsAssignableFrom(JNIEnv * UNREF env,
   jclass clazz1,
   jclass clazz2)
{
  TRACE2("jni", "IsAssignableFrom called");
  assert(tmn_is_suspend_enabled());
  Class* clss1 = jclass_to_struct_Class(clazz1);
  Class* clss2 = jclass_to_struct_Class(clazz2);

  Boolean isAssignable = class_is_subtype(clss1, clss2);
  if (isAssignable) {
 return JNI_TRUE;
  } else {
 return JNI_FALSE;
  }
} //IsAssignableFrom

Back to Top

3.4 Stack Support

3.4.1 About the Stack

The stack is a set of frames created to store local method information. The stack is also used to transfer parameters to the called method and to get back a value from this method. Each frame in the stack stores information about one method. Each stack corresponds to one thread.

Note

The JIT compiler can combine in-lined methods into one for performance optimization. In this case, all combined methods information is stored in one stack frame.

The VM uses native frames related to native C/C++ code and managed frames for Java* methods compiled by the JIT. Interaction between native methods is platform-specific. To transfer data and control between managed and native frames, the VM uses special managed-to-native frames, or M2nFrames.

Note

In the interpreter mode, the VM creates several native frames instead of one managed frame for a Java* method. These native frames store data for interpreter functions, which interpret the Java* method code step by step.

M2nFrames

M2nFrames contain the following:

3.4.2 Stack Walking

Stack walking is the process of going from one frame on the stack to another. Typically, this process is activated during exception throwing and root set enumeration. In DRLVM, stack walking follows different procedures depending on the type of the frame triggering iteration, as described below.

The system identifies whether the thread is in a managed or in a native frame and follows one of the scenarios described below.

Figure 11 below gives an example of a stack structure with M2nFrames and managed frames movement indicated.

Stack Walking from a Managed Frame

Figure 9. Stack Walking from a Managed Frame

Stack Walking from a Native Frame

Figure 10. Stack Walking from a Native Frame

Stack Illustration

Figure 11. LMF List after the Call to a Native Method

The main component responsible for stack walking is the stack iterator.

3.4.3 Stack Iterator

The stack iterator enables moving through a list of native and Java* code frames. The stack iterator performs the following functions:

3.4.4 Stack Trace

The stack trace converts stack information obtained from the iterator and transfers this data to the org.apache.harmony.vm.VMStack class.

Note

One frame indicated by the iterator may correspond to more than one line in the stack trace because of method in-lining (see the first note in About the Stack).

Back to Top

3.5 Thread Management

The thread management component provides threading functionality inside the virtual machine and the class libraries. The purpose of thread management is to hide platform specifics from the rest of the VM, and to adapt the OS threading functions to the Java* run-time environment. For example, thread management enables root set enumeration by making threads accessible for the garbage collector.

3.5.1 Interaction with Other Components

Thread management is used by the following components:

Note

The thread management code is currently written by using the restricted set of the Win32 threading API. On Linux*, each Win32 thread management function is replaced with the appropriate adaptor implemented using the POSIX threading API. This provides portability of the thread management code across Windows* and Linux* platforms.

3.5.2 Structure

The central part of thread management is the VM_thread control structure, which holds all the data necessary to describe a Java* thread within the virtual machine. Instances of the VM_thread control structure are referred to as thread blocks in the code. All thread blocks, that is, all active Java* threads running inside the VM, are represented in a linked list. The list is traversed by means of iterators that the thread_manager.cpp module provides. Currently, the number of threads simultaneously running in the system is limited to 800 threads. Each Java* thread in the system gets a unique index called thread_index within the range of 0 to 2048 at creation time. The maximum number of 2048 has been selected to ensure that it does not set any restrictions on underlying system capabilities.

Note

Your system might exceed the approximate maximum number of 800 threads.

Back to Top

3.5.3 Thread Creation and Completion

At this time, no single API in thread management is responsible for the thread creation. However, the code that creates a thread does the following:

  1. Allocates a VM_thread structure with help of the get_a_thread_block() function.
  2. Creates a new thread by using the appropriate system call.
  3. In a newly started thread, puts a reference to the VM_thread structure into the thread local storage.

The newly created thread is responsible for correct completion. The completion code is added to the thread procedure. The thread completion code does the following:

  1. If an exception has remained uncaught in the thread, calls exception handler.
  2. Invokes the notify_all() function on the java.lang.Thread object associated with the thread. This call activates any threads that have invoked the join() function for this thread.
  3. De-allocates the VM_thread structure by using the free_this_thread_block() function.

3.5.4 Safe Suspension

One of the key features of thread management in DRLVM is the safe suspension functionality. Safe suspension means that the thread is physically stopped only at certain safe points or at the end of safe regions. This ensures that the suspended thread holds no locks associated with operating system resources, such as the native memory heap. Safe suspension helps to avoid possible deadlocks in the DRL virtual machine, as described below.

At any moment of time, any thread is in one of the following states:

The suspension algorithm typically involves two threads, the suspender thread and the suspendee thread.

Functions Used

A safe region of code in a suspendee thread is marked by functions tmn_suspend_enable() and tmn_suspend_disable(). Additionally, the code is marked by the thread_safe_point() function to denote the point, where safe suspension is possible. A suspender thread can invoke thread_suspend_generic() or thread_resume_generic() functions supplying the suspendee thread block as an argument. The thread_suspend_generic() function handles safe suspension when called on a thread as shown below, and the thread_resume_generic() function instructs the suspendee thread to wake up and continue execution.

Suspension Algorithm

This section describes the algorithm of safe suspension, as follows:

  1. The suspender thread calls the thread_suspend_generic() function, which sets a flag for the suspendee thread indicating a request for suspension.
  2. Depending on the state of the suspendee thread, one of the following mechanisms is activated:
    1. If the suspendee thread is currently running in a safe code region, the thread_suspend_generic() function immediately returns. The suspendee thread runs until it reaches the end of the safe region. After that, the thread is blocked until another thread calls the thread_resume_generic() function for it.
    2. If the suspendee thread is currently in an unsafe region, the thread_suspend_generic() function is blocked until the suspendee thread reaches the beginning of a safe region or a safe point. The thread state then changes, and the mechanism described in point a) above starts.
  3. The suspendee thread undergoes the following:
    1. A thread, while executing Java* code, periodically calls the thread_safe_point() function. In case of a suspension request set for this thread, this function notifies the requesting thread and waits until another thread calls the thread_resume_generic() function for this thread. In other words, the thread suspends itself at a safe point upon request.
    2. Once it enters a safe region, the thread calls the tmn_suspend_enable() function. This function sets the suspend_enabled state flag to true. In case a suspension request is set for this thread, the function notifies the requesting thread that a safe region is reached.
    3. When the thread leaves a safe region, it calls the tmn_suspend_disable() function. This function sets the suspend_enabled state flag to false and invokes the thread_safe_point() function.

3.5.5 Monitors

Monitors are a central part of Java* threads synchronization. Any Java* object can serve as a monitor. In DRLVM, monitor-related information is kept in the head of the Java* object structure. An object header is 32-bit long and has the following structure:

Monitor structure bits distribution

Figure 12. Monitor Structure

Where:

Note

In the current implementation, the contention bit is always set to 1.

Acquiring Monitors

The monitor_enter() and monitor_exit() functions manage monitors in accordance with the JNI specification [5].

The monitor_enter() operation for a specific object is shown below.

First, stack_key in the header of the object is examined. The header can be in one of the three states, which determine further actions, as follows.

State of stack_key Actions
Free: stack_key contains FREE_MONITOR The current thread index is stored in stack_key. Read and update operations for stack_key are performed atomically to prevent possible race conditions.
Occupied by the current thread The recursion count increases.
Occupied by another thread The current thread does the following in a loop, until the monitor is successfully acquired:
  1. Puts the object into a mon_enter_array array, which holds a queue of contending threads waiting for the monitor to be released
  2. Waits (schedules off the CPU) until the thread holding the monitor sends the event_handle_monitor notification informing that the monitor has been released
  3. Attempts to acquire the monitor again

Note

The waiting thread is excluded from the system scheduling and does not get the CPU resources for its execution.

During the monitor_exit() operation, the current thread does the following based on the recursion_count value:

Back to Top

3.6 Kernel Classes

The VM kernel classes link the virtual machine with the Java* class libraries (JCL), and consist of the Java* part and the native part. This section describes the Java* part of the kernel classes, whereas the native part is described in section 3.7 Kernel Class Natives.

Kernel classes are Java* API classes, members of which use or are used by the virtual machine. Because these classes have data on the VM internals, the kernel classes are delivered with the VM. Examples of kernel classes include java.lang.Object and java.lang.reflect.Field.

The current implementation is based on the Harmony Class Library Porting Documentation [20]. The DRL kernel classes have amendments to the porting documentation, as indicated in section 3.6.2 Implementation Specifics below.

3.6.1 Kernel Classes and VM

In DRLVM, the kernel classes communicate with the virtual machine through a Java* interface defining a strict set of static native methods implemented in the VM. The interface mainly consists of four package private classes: java.lang.VMClassRegistry, java.lang.VMExecutionEngine, java.lang.VMMemoryManager, and java.lang.VMThreadManager, and two public classes java.lang.Compiler and org.apache.harmony.vm.VMStack.

3.6.2 Implementation Specifics

This section describes the specifics of the kernel classes implementation in DRL.

  1. The set of kernel classes is extended with the java.lang.System class due to its close connection with the VM and the other kernel classes.
  2. The DRL implementation provides java.lang.String and java.lang.StringBuffer classes due to package private dependencies between them.
  3. The class java.util.concurent.locks.LockSupport has been added to the kernel classes set to support J2SE 1.5.0.
  4. The DRL implementation of the java.lang.Class.getStackClasses() method does not completely correspond to the Harmony Porting Documentation. This method adds two frames to the bottom of the resulting array when stopAtPrivileged is specified, so that the caller of the privileged frame is the last included frame.
  5. DRLVM does not support the shutdown procedure as described in the specification. Namely, the com.ibm.oti.vm.VM.shutdown() method is not called upon VM shutdown.

Back to Top

3.7 Kernel Class Natives

The kernel class natives component is the part of the 3.6 Kernel Classes serving as a bridge between the Java* part of the kernel classes and other VM components, namely, the garbage collector, class support, stack support, exception handling, and object layout support. The kernel class natives component also makes use of the thread management functionality. The interaction between the kernel classes and VM components is based on specific internal interfaces of the virtual machine.

Note

The current implementation of kernel class natives is based on JNI and uses JNI functions. As a result, kernel class natives functions are exported as ordinary native methods from the VM executable as specified by the JNI specification [5]. For example, when the VMThreadManager Java* class from the kernel classes component defines the method native static Thread currentThread(), the kernel class natives component implements the function Java_java_lang_VMThreadManager_currentThread().

3.7.1 Structure

Currently, the kernel class natives component consists of the following:

Back to Top

3.8 VM Services

The VM services component provides the JIT compiler with functionality requiring close cooperation with the virtual machine. Below is the list of the VM services currently provided for the JIT.

3.8.1 Compile-time Services

During compilation time, the JIT compiler uses the following services:

Certain services make a part of class support interface, for example, type management and class resolution. For details, see section 3.2 Class Support.

3.8.2 Run-time Services

JIT-compiled code accesses the following groups of services at run time:

Both service types are described below.

Services with M2nFrame

The following services are called in the JNI-like way:

These services enable suspension in their code and push M2nFrames to the top of the stack that stores an initial stack state for exceptional stack unwinding and local references for root enumeration purposes.

Services without M2nFrame

The following frequently used services are invoked without pushing an M2nFrame on the stack:

These services prevent thread suspension in their code. Most direct call functions that implement operations or cast types only return a required result. Storing a reference to an array uses another convention because it returns NULL for success, or a class handler of an exception to throw.

Back to Top

3.9 Exception Handling

The exceptions interface handles exceptions inside the VM. Exception handling can follow different paths depending on the execution engine mode, as indicated in subsequent sections.

The exceptions interface includes the following function groups:

In DRLVM, two ways of handling exceptions are available: exception throwing and raising exceptions, as described below.

3.9.1 Throwing Exceptions

Procedure

When an exception is thrown, the virtual machine tries to find the exception handler provided by the JIT and registered for the specified kind of exception and for the specified code address range. If the handler is available, the VM transfers control to it, otherwise, the VM unwinds the stack and transfers control to the previous native frame.

When to apply

In Java* code, only exception throwing is used, whereas in internal VM native code, raising an exception is also an alternative. Exception throwing is usually faster than raising exceptions because with exception throwing, the VM uses the stack unwinding mechanism.

3.9.2 Raising Exceptions

Procedure

When the VM raises an exception, a flag is set that an exception occurred, and the function exits normally. This approach is similar to the one used in JNI [5].

When to apply

Raising exceptions is used in internal VM functions during JIT compilation of Java* methods, in the interpreter, and in the Java* Native Interface in accordance with the specification [5]. This usage is especially important at start-up when no stack has been formed.

3.9.3 Choosing the Exception Handling Mode

DRLVM provides the following facilities to set the exception handling mode:

To get the current exception, use exn_get(), and to check that whether it is raised or not, use exn_raised(). The last function returns not the exception object, but the boolean flag only. This technique saves VM resources because the system does not need to create a new copy of the exception object.

Note

Remember that in the interpreter mode, the VM can only raise exceptions, and not throw them.

Back to Top

3.10 JVMTI Support

In DRLVM, the JVMTI support component implements the standard JVMTI interface responsible for debugging and profiling.

The DRLVM implementation of JVMTI mostly consists of wrapper functions, which request service and information from other VM parts, such as the class loader, the JIT, the interpreter, and the thread management functionality.

Another part of JVMTI implementation is written for service purposes, and comprises agent loading and registration, events management, and API extensions support.

3.10.1 JVMTI functions

The JVMTI support component is responsible for the following groups of operations:

Related Links

Back to Top

3.11 Verifier

According to the JVM specification [1], the verifier is activated during class loading, before the preparation stage, and consequently, before the start of class initialization. Verification of a class consists of the following passes:

  1. Verifying the class file structure
  2. Checking class data, that is, the logical structure of the data
  3. Verifying bytecode instructions for the methods of the class
  4. Linking the class in run time (handled by the resolver)

Subsequent sections present specifics of verification performed in DRLVM.

3.11.1 Optimized Verification Procedure

The current version of the verifier is optimized to minimize performance impact of the time-consuming bytecode verification. The improved verification procedure is described below:

Stage 1: When checking methods of a class, the verifier scans dependencies on other classes, methods, and fields. The verifier only checks this information if the referenced element is loaded. For unloaded elements handling, see Stage 2.

Stage 2: The verifier generates a list of constraints to be checked during the next stage. Constraints contain information on verification checks that cannot be performed because referenced elements have not been loaded. The verifier stores the list of constraints in the checked class data.

Stage 3: Before class initialization, the verifier goes over the list of previously generated constraints. Provided all exit criteria are met, the verification of the class completes successfully and initialization of the class begins.

The verifier releases the constraints data when the class is unloaded.

3.11.2 Verifications Classification

For optimization purposes, all verification procedures have been divided into the following groups:

Background

The control flow graph (CFG) is a data structure, which is an abstract representation of a procedure or program. Each node in the graph represents a basic block without jumps or jump targets. Directed edges represent jumps in the control flow.

The data flow graph (DFG) is a graph reflecting data dependencies between code instructions of a procedure or program. The data flow graph provides global information about how a procedure or a larger segment of a program manages its data.

Note

In addition, a group of classes is declared as trusted. The verifier skips these classes to minimize performance impact. The group of trusted classes mostly includes system classes.

Back to Top

3.12 Utilities

This layer provides common general-purpose utilities. The main requirements for these utilities include platform independence for DRLVM interfaces, thread and stack unwind safety. The following two main subcomponents constitute the utilities layer:

Note

This section describes VM utilities. For information on the compiler utilities, consult section 4.6 Utilities.

The utilities layer has the following key features:

3.12.1 Memory Management

This interface is responsible for allocating and freeing the memory used by other components. The current implementation provides two types of memory allocation mechanisms:

Memory management functionality is concentrated in port/include/port_malloc.h and port/include/tl/memory_pool.h.

3.12.2 Logger

The current logging system is based on the Apache log4cxx logger adapted to DRLVM needs by adding a C interface and improving the C++ interface. The port/include/logger.h header file describes the pure C programmatic logger interface. The cxxlog.h header file in the same directory contains a number of convenience macros improving effectiveness of the logger for C++ code.

Each logging message has a header that may include its category, timestamp, location, and other information. Logging messages can be filtered by category and by the logging level. You can use specific command-line options to configure the logger and make maximum use of its capabilities. See help message ij –X for details on the logger command-line options.

Back to Top

3.13 Public Interfaces

The VM core exports the following public interfaces:

Back to Top

3.13.1 VM Common Interface

The VM common interface is exported by the VM core for interaction with the JIT compiler and the garbage collector. This interface includes a large set of getter functions used to query properties of classes, methods, fields, and object data structures required by DRLVM components. Other functions of this interface do the following:

Back to Top

3.13.2 VM_JIT Interface

This VM core interface supports just-in-time compilation. Functions of this interface do the following:

Note

The VM core also exports the GC interface function table to support GC-related operations of the JIT, such as root set enumeration. For details, see section 6.4 Public Interfaces in the GC description.

For a description of functions that the JIT compiler exports to communicate with the VM core, see section 4.7.1 JIT_VM.

Back to Top

3.13.3 VM_EM INTERFACE

The VM_EM interface of the VM core supports high-level management of execution engines. Functions of this interface do the following:

For a description of functions that the execution manager exports to interact with the VM core, see 5.5.1 EM_VM Interface.

Back to Top

3.13.4 vm_GC Interface

The VM core uses this interface to communicate with the garbage collector. The garbage collector also interacts with the just-in-time compiler via indirect calls to the VM_GC interface.

On the VM side, most functions of this interface are used for root set enumeration and for stopping and resuming all threads. To implement stop-the-world collections, DRLVM currently provides the functions vm_enumerate_root_set_all_threads() and vm_resume_threads_after().

Note

The current implementation does not support concurrent garbage collectors. However, DRLVM has been designed to make implementation of concurrent garbage collectors as convenient as possible.

For details on the VM_GC interface, see section 6.4 GC Public Interfaces.

Back to Top

3.13.5 VM INTERPRETER INTERFACE

This interface enables the interpreter to use the VM core functionality. Functions of the interface do the following:

Back to Top

3.13.6 C VM Interface

The C VM interface together with the kernel classes is responsible for interaction between the VM and the class libraries. The C VM interface is used by the native part of the Java* class libraries (JCL) implementation as a gateway to different libraries, such as the port library, the VM local storage, and the zip cache pool. This component is required for the operation of JCL, see the Harmony Class Library Porting documentation [20].

C VM Specifics

The C VM interface has the following limitations:

The C VM interface is a separate dynamic library. This library is statically linked with the pool, the zip support and the thread support libraries.

Back to Top

4. JIT Compiler

Jitrino is the code name for just-in-time (JIT) compilers shipped with DRLVM [3]. Jitrino comprises two distinct JIT compilers: the Jitrino.JET baseline compiler and the optimizing compiler, Jitrino.opt. These two compilers share common source code and are packaged in a single library. This section is mostly devoted to Jitrino.opt compiler. For details on the baseline compiler, see section 4.8 Jitrino.JET.

The optimizing JIT compiler features two intermediate representation (IR) types: platform independent high-level IR (HIR) and platform-dependent low-level IR (LIR). Jitrino incorporates an extensive set of code optimizations for each IR type. This JIT compiler has a distinct internal interface between the front-end operating on HIR and the back-end operating on LIR. This enables easy re-targeting of Jitrino to different processors and preserving all the optimizations done at the HIR level.

Key features of the JIT compiler include:

Jitrino also features:

Jitrino is eminent for its clear and consistent overall architecture and a strong global optimizer, which runs high-level optimizations and deals with single or multiple methods instead of basic blocks.

In the current implementation, certain Jitrino features are implemented only partially or not enabled, namely:

Back to Top

4.1 Architecture

The Jitrino compiler provides a common strongly typed substrate, which helps developers to optimize code distributed for Java* run-time environments and adapt it to different hardware architectures with a lower chance of flaws. The architecture of the compiler is organized to support this flexibility as illustrated in Figure 13. Paths connect the Java* and ECMA Common Language Interface (CLI) front-ends with every architecture-specific back-end, and propagate type information from the original bytecode to the architecture-specific back-ends.

For extensibility purposes, the Jitrino compiler contains language- and architecture-specific parts and language- and architecture-independent parts described in subsequent sections of this document. As a result, supporting a new hardware architecture requires implementation of a new back-end.

4.1.1 Compilation Paths

To optimize time spent on compilation, the Jitrino compiler can follow a fast or a slow compilation path. In most applications, only a few methods consume the majority of time. Overall performance benefits when Jitrino aggressively optimizes these methods.

The initial compilation stage is to translate methods into machine code by using the baseline compiler Jitrino.JET, the Jitrino express compilation path. This compiler performs a very fast and simple compilation and applies no optimizations. The main Jitrino compilation engine recompiles only hot methods. Jitrino.JET generates instrumentation counters to collect the run-time profile. Later, the execution manager uses this profile to determine recompilation necessity.

4.1.2 Compilation Process

The process of compilation in Jitrino.opt follows a single path, as shown in Figure 13 below.

  1. The run-time environment bytecode is translated into the high-level IR by the individual front-ends for each supported source language. Currently, Jitrino incorporates the Java* bytecode translator, and provides the high-level IR and optimizer support for the Java* and Common Language Interface (CLI) bytecode. The language- and architecture-independent part comprises the HIR and the optimizer.
  2. After optimization, architecture-specific code generators translate HIR into architecture-specific intermediate representations, perform architecture-specific optimizations, register allocation, and finally emit the generated native code.

JIT Architecture

Figure 13. Jitrino Compiler Architecture

The subsequent sections provide an overview of the compiler subcomponents. Section 4.3 Optimizer provides details on the Jitrino high-level optimizations.

Back to Top

4.2 Front-end

The initial compilation step is the translation of bytecode into HIR, which goes in the following phases:

  1. Bytecode translator establishes the basic block boundaries and exception handling regions, and recovers type information for variables and operators. At this phase, the translator generates type information for variables and virtual Java* stack locations, similarly to the bytecode verification algorithm defined by the JVM specification [1].
  2. Bytecode translator generates the HIR and performs simple optimizations, including constant and copy propagation, folding, devirtualization and in-lining of method calls, elimination of redundant class initialization checks, and value numbering-based redundancy elimination [15].

Jitrino HIR, which is the internal representation of a lower level than the bytecode, breaks down complex bytecode operations into several simple instructions to expose more opportunities to later high-level optimization phases. For example, loading an object field is broken up into operations that perform a null check of the object reference, load the base address of the object, compute the address of the field, and load the value at that computed address.

Back to Top

4.3 Optimizer

The optimizer includes a set of compiler components independent of the original Java* bytecode and the hardware architecture. The optimizer comprises the high-level intermediate representation, the optimizer, and the architecture-independent part of the code selector. The code selector has a distinct interface level to set off the architecture-dependent part.

4.3.1 High-Level Intermediate Representation

Jitrino uses the traditional high-level intermediate representation, where the control flow is represented as a graph consisting of nodes and edges. The compiler also maintains dominator and loop structure information on HIR for use in optimization and code generation. HIR represents:

Explicit modeling of the exception control flow in the control flow graph (CFG) enables the compiler to optimize across throw-catch boundaries. For locally handled exceptions, the compiler can replace expensive throw-catch combinations with cheaper direct branches.

To explain the same in greater detail, each basic block node consists of a list of instructions, and each instruction includes an operator and a set of single static assignment (SSA) operands. The SSA form provides explicit use-def links between operands and their defining instructions, which simplifies and speeds up high-level optimizations. Each HIR instruction and each operand have detailed type information propagated to the back-end at further compilation stages.

Back to Top

4.3.2 Optimizer

The Jitrino compiler uses a single optimization framework for Java* and CLI programs. The optimizer applies a set of classical object-oriented optimizations balancing the effectiveness of optimizations with their compilation time. Every high-level optimization is represented as a separate transformation pass over the HIR. These passes are grouped into four categories:

Note

In the current version, high-level optimizations are disabled by default. You can enable these via the command-line interface.

The optimization passes performed during compilation of a method constitute an optimization path. Each optimization pass has a unique string tag used internally to construct the optimization path represented as a character string. The default optimization path can be overridden on the command-line.

Note

Many optimizations can use dynamic profile information for greater efficiency, such as method and basic block hotness and branch probability. However, dynamic profile information creation is not currently enabled in Jitrino.

HIR Simplification Passes

The HIR simplification passes are a set of fast optimization passes that the Jitrino optimizer performs several times on the intermediate representation to reduce its size and complexity. Simplification passes improve the code quality and the efficiency of more expensive optimizations. The IR simplification consists of three passes:

  1. Propagation and folding performs constant, type, and copy propagation, and simplifies and folds expressions. For example, when a reference is defined by a new allocation, the optimizer omits a run-time check for null references that are proven non-null [15].
  2. Unreachable and dead code cleanup eliminates unreachable code by testing reachability via traversal from the control flow graph entry, and dead code by using a sparse liveness traversal over SSA-form use-def links [15].
  3. Fast high-level value numbering eliminates common sub-expressions [16]. This pass uses an in-order depth-first traversal of the dominator tree instead of the more expensive iterative data flow analysis done by traditional common sub-expression elimination. High-level value numbering effectively eliminates redundant address computation and check instructions. For example, chkzero(), chknull(), and chkcast() HIR instructions are redundant if guarded by explicit conditional branches.

Together, the IR simplification passes constitute a single cleanup pass performed at various points in the optimization process.

Scope Enhancement Passes

The high-level optimization begins with a set of transformations to enhance the scope of further optimizations, as follows:

  1. Normalization of control flow by removing critical edges, and factoring entry and back edges of loops, which facilitates loop peeling, redundancy elimination passes and other operations.

    Note

    A critical edge is an edge from a node with multiple successors to a node with multiple predecessors.

  2. Loop transformations, including loop inversion, peeling, and unrolling.

    Note

    Loop peeling in combination with high-level value numbering provides a cheap mechanism to hoist loop-invariant computation and run-time checks.

  3. Guarded devirtualization of virtual method calls to reduce their run-time costs and to enable the compiler to in-line their targets.
    Provided the exact type information, this IR simplification pass can convert a virtual call into a more efficient direct call. When the type information is not available, the most probable target of the virtual method can be predicted, and the scope enhancement pass devirtualizes the call by guarding it with a cheap run-time test to verify that the predicted method is in fact the target.

Inliner

The central part of the scope enhancement passes is the inliner, which removes the overhead of a direct call and specializes the called method within the context of its call site. In-lining is an iterative process built around other scope enhancement and IR simplification passes. In-lining goes as follows:

  1. Inliner performs scope enhancement and IR simplification transformations on the original intermediate representation:
    1. Examines each direct call site in the IR, including those exposed by guarded devirtualization.
    2. Assigns a weight to the call by using heuristic methods.
    3. Registers the call in a priority queue if it exceeds a certain threshold.
  2. Inliner selects the top candidate, if any, for in-lining.
  3. Translator generates an intermediate representation for the method selected for in-lining.
  4. Inliner repeats the cycle on the new IR and processes the representation for further in-lining candidates, updates the priority queue, merges it with the existing IR, selects a new candidate, and repeats the cycle.

The in-liner halts when the queue is empty or after the IR reaches a certain size limit. When in-lining is completed, the optimizer performs a final IR simplification pass over the entire intermediate representation.

Redundancy Elimination Passes

The final set of optimization passes comprises optimizations to eliminate redundant and partially redundant computations and includes loop-invariant code motion and bounds-check elimination [15].

Bounds-check Elimination
The Jitrino optimizer uses demand-driven array bounds-check elimination analysis based on the ABCD [17] algorithm. Unlike the original ABCD algorithm, the Jitrino optimizer bounds-check elimination does not construct a separate constraint graph, but uses the SSA graph directly to derive constraints during an attempted proof instead.
The optimizer also handles symbolic constants to enable check elimination in more complicated cases. Check elimination transformations track conditions used to prove that a check can be eliminated. At the code selector stage, the selector propagates the information about the conditions to the code generator, which can use this information for speculative loads generation.

Back to Top

4.4 Code Selector

The code selector translates the high-level intermediate representation to a low-level intermediate representation (currently, to the IA-32 representation). The component is designed so that code generators for different architectures can be plugged into the compiler. To be pluggable, a code generator (CG) must implement distinct code selector callback interfaces for each structural entity of a method. During code selection, the selector uses the callback interfaces to translate these entities from HIR to LIR.

Code selection is based on the HIR hierarchical structure of the compiled method illustrated by Figure 14.

Code Selector work flow

Figure 14. Code Selector Structure

Grey indicates callback interfaces.

For every non-leaf structural element in Figure 14, the code selector defines:

Each class of the code selector defines a genCode() function, which takes the callback of this class as an argument. Every function in a callback receives a code selector class instance corresponding to the lower-level element of the method hierarchy. This way, control is transferred between the optimizer part of the code selector and the CG part.

Back to Top

4.5 IA-32 Back-end

The Jitrino IA-32 code generator has the following key features:

4.5.1 Limitations

Code Generation Pass

The table below describes the passes of the code generation process used in the IA-32 code generator. The order of the passes in the table mostly corresponds to the default code generation path.

Name Classes Description
selector
Code selector MethodCodeSelector, CfgCodeSelector, InstCodeSelector Builds LIR based on the information from the optimizer. Is based on the common code lowering framework.
i8l
8-byte instructions lowerer I8Lowerer Performs additional lowering of operations on 8-byte integer values into processor instructions.
bbp
Back-branch polling (insertion of safe points) BBPollingTransformer Inserts safe points into loops to ensure that threads can be suspended quickly at GC-safe points.
gcpoints
GC safe points Info GCPointsBaseLiveRangeFixer Performs initial data flow analysis and changes the LIR for proper GC support: creates mapping of interior pointers (pointers to object fields and array elements) to base pointers (pointers to objects) and extends liveness of base pointers when necessary.
cafl
Complex address form loader ComplexAddrFormLoader Translates address computation arithmetic into IA-32 complex address forms.
early_prop
Early propagation EarlyPropagation Performs fast copy and constant propagation. This pass is fast and simple though very conservative.
native
Translation to native form InstructionFormTranslator Performs trivial transformation of LIR instructions from their extended to native form.
constraints
Constraints resolver ConstraintsResolver Checks instruction constraints imposed on operands by instructions and split operands when they cannot be used simultaneously in all the instructions they are inserted in.
dce
Dead code elimination DCE Performs the simple liveness-based one-pass dead code elimination.
bp_regalloc-GP, bp_regalloc-XMM
Bin-pack register allocator RegAlloc2 Perform bin-pack global register allocation for general-purpose and XMM registers.
spillgen
Spill code generator SpillGen Acts as the local register and stack allocator and the spill code generator. Is similar to the constraint resolver but addresses the constraints caused by dependencies between operands. For example, this pass is used when an instruction allows only one memory operand.
The pass requires no reserved registers, but tries to find register usage holes or to evict an already assigned operand from a desired register.
layout
Code layout Layouter Performs linearization of the CFG. Uses the topological, top-down, and bottom-up algorithms.
In the current code drop the default code layout algorithm is topological [21].
copy
Copy pseudo inst expansion CopyExpansion Converts copy pseudo-instructions into real instructions.
Currently, the CG describes instructions that copy operands as copy pseudo-instructions and expands them when all operands are assigned to physical locations. This facilitates building other transformations.
stack
Stack layout StackLayouter Creates prolog and epilog code. Uses register usage information, the calling convention used for this method, and the stack depth required for stack local operands. Also initializes persistent stack information to be used in run-time stack unwinding.
emitter
Code emitter CodeEmitter Produces several streams:
  • stream of executable code for the method
  • data block with FP constant values and target offset tables for switch instructions
Registers the exception try and catch regions in the VM for handling during runtime, and direct calls for patching.
si_insts
StackInfo inst registrar StackInfoInstRegistrar Traverses call instructions to get information on the stack layout on call sites and completes the persistent stack information.
gcmap
GC map creation GCMapCreator Creates the GC map comprising sets of locations with base and interior pointers representing GC root set for each call site.
info
Creation of method info block InfoBlockWriter Serializes persistent stack information and the GC map in the VM as an information block for later use during run-time stack iteration, exception throwing, and GC root set enumeration.

Back to Top

4.6 Utilities

The utilities used by the JIT compiler major components include:

Note

The JIT compiler utilities are similar to, but not identical with the VM utilities. For example, the JIT compiler and the VM core use different loggers.

4.6.1 Logging System

The Jitrino logging system facilitates debugging and performance bottleneck detection. The system is organized as a set of log categories structured into two hierarchical category trees for compile-time and run-time logging. This functionality is used in root set enumeration and stack iteration at run time.

A log category corresponds to a particular compiler component, such as the front-end or the code generator, and has a number of levels providing different logging details. Figure 15 illustrates logging categories and levels, namely:

In the logging system, built-in timers measure the time spent by a particular compiler component or a code transformation pass during compilation.

Lock Categories Hierarchy

Figure 15. Log Categories Trees

In DRL, all logging is off by default. You can enable logging for a category on the command line, use the -Xjit LOG option and assign the particular level to it. For example, you can enable debug-level logging for the optimizer component and IR dumping for the code generator by using the following option:
-Xjit LOG=\”opt=debug,cg=ir,singlefile\”

On the command line, you can assign logging levels independently to the categories in different sub-trees. When redirected to the file system, separate logs for different threads are created unless the singlefile mode is specified.

CFG Visualization

Debugging the JIT requires information on the compilation inter-stage modification of the control flow graph for the compiled method, including instructions and operands. For that, the Jitrino compiler enables generation of dot files representing the control flow graph at both IR levels. The text dot files can be converted into descriptive pictures, which represent the CFG graphically. A variety of graph visualization tools are available for dot files conversion.

Back to Top

4.7 Public Interfaces

This section describes the interfaces that the JIT compiler exports to communicate with other components. The Jitrino compiler exposes all necessary interfaces to work as a part of the run-time environment. Jitrino explicitly supports precise moving garbage collectors requiring the JIT to enumerate live references.

4.7.1 JIT_VM Interface

Functions inside the JIT_VM interface can be grouped into the following categories:

Note

Root set enumeration and stack unwinding are run-time routines called only during execution of compiled code.

Bytecode Compilation

Functions in this set are responsible for the primary JIT compiler task of running just-in-time compilation to produce native executable code from a method bytecode. A request to compile a method can come from the VM core or the execution manager.

Root Set Enumeration

This set of functions supports the garbage collector by enumerating and reporting live object references. The JIT compiler uses these functions to report locations of object references and interior pointers that are live at a given location in the JIT-compiled code. The object references and interior pointers constitute the root set that the GC uses to traverse all live objects. The interface requires reporting locations of the values rather than the values, to enable a moving garbage collector to update the locations while moving objects.

Note

Unlike reference pointers that always point to the object’s header, interior pointers actually point to a field that is inside the target object. If the JIT reports an Interior Pointer without the Reference Pointer, then the burden is upon the GC to actually reconstruct the Reference Pointer.

For more information, see sections 2.6 Root Set Enumeration and 6. Garbage Collector.

Stack Unwinding

The virtual machine requires support from the compiler to perform stack unwinding, that is, an iteration over the stack from a managed frame to the frame of the caller.

To facilitate stack walking, the JIT stack unwinding interface does the following:

For more information about the stack, see section 3.4 Stack Support.

JVMTI Support

The set of JIT functions responsible for JVMTI support is exported for interaction with the VM JVMTI component. These functions do the following:

The VM can request the JIT to compile a method and to support generation of specific JVMTI events in compiled code. To facilitate these actions, additional parameters are passed to the bytecode compilation interface.

For a description of functions that the VM core exports to interact with the JIT compiler, see section 3.13 Public Interfaces .

4.7.2 JIT_EM INTERFACE

The JIT compiler exports this interface to support the execution manager. Functions of this set are responsible for the following operations:

For a description of the functions that the execution manager exports to interact with the JIT compiler, see section 5.5.2 EM_JIT Interface.

Back to Top

4.8 JITRINO.JET

The Jitrino.JET baseline compiler is the Jitrino subcomponent used for translating Java* bytecode into native code with practically no optimizations. The compiler emulates operations of stack-based machine using a combination of the native stack and registers.

Jitrino.JET performs two passes over bytecode, as shown in Figure 16. During the first pass, Jitrino.JET establishes basic block boundaries and generates native code during the second.

Two-pass convertion from bytecode to native code.

Figure 16: Baseline Compilation Path

Subsequent sections provide a description of these passes.

4.8.1. Baseline Compilation: Pass 1

During the first pass over the method’s bytecode, the compiler finds basic block boundaries and counts references for these blocks.

Note

The reference count is the number of ways for reaching a basic block (BB).

To find basic blocks boundaries, Jitrino.JET does a linear scan over the bytecode and analyses instructions, as follows:

During the first pass, the compiler also finds the reference count for each block. Jitrino.JET then uses the reference count during code generation to reduce the number of memory transfers.

Example
Figure 17 illustrates an example with reference counts. The reference count ref_count for the second basic block (BB2) is equal to 1 because this block can only be reached from the first basic block (BB1). The other reference count is equal to 2 , because the third basic block can be reached as a branch target from BB1 or a fall-through from BB2.

Example of reference counters reached from different basic blocks.

Figure 17: Basic Blocks Reference Count

Back to Top

4.8.2 Baseline Compilation: Pass 2

During the second pass, Jitrino.JET performs the code generation by doing the following:

  1. Walks over the basic blocks found at Pass 1 in the depth-first search order
  2. Mimics Java* operand stack
  3. Generates code for each bytecode instruction
  4. Matches the native code layout and the bytecode layout
  5. Updates relative addressing instructions, such as CALL and JMP instructions.

During code generation, Jitrino.JET performs register allocation by using an original technique of vCRC (virtual cyclic register cache), as described below.

4.8.3 Virtual Cyclic Register Cache

Back to Top

As it was mentioned in the introduction, Jitrino.JET simulates stack-based machine operations with the aid of the virtual cyclic register cache (vCRC).

Because all the operations involve only the top of the operand stack, keeping it on the registers significantly reduces the number of memory access operations and improves performance. Even a small amount of registers is significant. For example, the average stack depth for methods executed during the SpecJVM98 benchmark is only 3.

vCRC: Basic Algorithm

This section is an overview of the major idea behind vCRC.

As shown in Figure 18, the position of the item is counted from the bottom of the stack and does not change over time. In contrast to the position, the depth of an item is counted from the top of the stack and is a relative value.

Operands on the Stack with their position and depth indicated

Figure 18: Depth and Position on the Operand Stack

The position can provide the following mapping between a registers array and the operands' positions in the operand stack:

POSITION % NUMBER_OF_REGISTERS => Register#

A simple tracking algorithm detects overflows and loads or unloads registers when necessary.

Example
Figure 19 illustrates an example, where the first 3 operations occupy 3 registers allocated to keep the top of the operand stack. The next operation iconst_4 has no register available, which causes the first item int32 (1) to get spilled to the memory. The register EAX is now free and can store the new item.

REGISTERS[3] = {EAX, EBX, ECX}
When more registered are required that are available, a register is freed to store another operand

Figure 19: Saving Operand to Memory, Register Re-used

With this algorithm, the topmost items are always on registers regardless of the number of operands stored on the stack, and the system does not need to access memory to operate with the top of the stack.

Back to Top

vCRC: Virtual Stacks

The basic algorithm cannot be applied directly. For example, on IA-32, storing a floating-point double value of 64 bits on general-purpose registers uselessly occupies 2 registers. Operations with floating-point values on general-purpose registers are non-trivial and mostly useless.

Technically speaking, vCRC allows tracking positions for operands of the different types as if they were in different virtual stacks. Different virtual operand stacks are mapped on different sets of registers, as follows:

Figure 20 provides an example of different operand types on the operand stack for an IA-32 platform.

Different Operand Types tracked separately on the operand stack

Figure 20: Virtual Operand Stacks

Back to Top

4.8.4 Java* Method Frame Mimic

During the code generation phase, the state of the method stack frame is mimic:

When code generation for a basic block starts, the reference count determines the actions taken, as indicated in the table below.

Reference Count Actions
ref_count > 1 Local variables are taken from the memory.
Top of the stack is expected to be on the appropriate registers.
ref_count = 1 Information on local variables state and stack state is inherited from the previous basic block.
ref_count = 0 Dead code, must not be here.

Back to Top

4.8.5 Run-time Support for Generated Code

To support run-time operations, such as stack unwinding, root set enumeration, and mapping between bytecode and native code, a specific structure, the method info block, is prepared and stored for each method during Pass 2.

At run time, special fields are also pre-allocated on the native stack of the method to store GC information, namely the stack depth, stack GC map, and locals GC map.

The GC map shows whether the local variables or the stack slots contain an object. The GC map for local variables is updated on each defining operation with a local slot, as follows:

The GC map for the stack is updated only at GC points, that is before an instruction that may lead to a GC event for example, a VM helpers call. The stack depth and the stack state calculated during method’s compilation get saved before invocation: code is generated to save the state.

Back to Top

5. Execution Manager

The execution manager (EM) is the central part of the DRLVM dynamic optimization subsystem. Dynamic optimization is the process of modifying compilation and execution time parameters in a system at run time. Optimization of compiled code may result to recompilation of managed method code. In this system, the execution manager makes decisions for optimization based on the profiles that are collected by profile collectors. Every profile contains specific optimization data and is associated with the method code compiled by a particular JIT.

The key functions of the execution manager are the following:

The features of the DRL execution manager include the following:

5.1 ARCHITECTURE

The VM core creates the execution manager before loading an execution engine. Depending on the configuration, the execution manager initializes execution engines and profile collectors.

During JIT compiler instantiation, the execution manager provides the JIT with a name and a run-time JIT handle. The JIT can use this name to distinguish its persistent settings from settings of other execution engines. The compiler can also use the handle to distinguish itself from other JIT compilers at run time.

The EM also configures the JIT to generate a new profile or to use an existing profile via the profile access interface. This interface enables accessing profile collectors and their custom interfaces. Every profile collector uses its properties to check, whether the JIT that generating a profile and the JIT that will use the generated profile are profile-compatible compilers. For example, for the edge profile, the profile collector can check compatibility using the feedback point and IR-level compatibility properties. During its initialization, the JIT compiler accepts or rejects profile collection and usage.

Interaction between Execution Manager, JIT, and VM

Figure 21. Execution Manager Interfaces

In the figure, several blocks of the same type identify instances of the same component, as in the case with profile collectors and JIT compilers. For details on interfaces displayed in the figure, see section 5.5 EM Public Interfaces.

Back to Top

5.2 RECOMPILATION MODEL

Recompilation chain is the central entity of the EM recompilation model. This chain can connect multiple profile-compatible JIT compilers into a single recompilation queue. To compile a method for the first time, the execution manager calls the first JIT compiler in the chain. After profiling information about the method is collected, the next JIT in the chain is ready to recompile the method applying more aggressive optimizations. The data from method profile can be used during method recompilation to adjust custom optimizations parameters.

If multiple recompilation chains co-exist at run time, the EM selects the appropriate recompilation chain to initially compile a method. Method filters associated with chains can configure the execution manager to use a specific chain for method compilation. Method filters can identify a method by its name, class name, signature or ordinal compilation number.

Within this model, the execution of a method goes as follows:

  1. The virtual machine calls the EM to execute a method.
  2. The execution manager uses method filters to select the appropriate recompilation chain.
  3. The execution manager instructs the first JIT in the chain to compile a method.
  4. After the method is compiled, the virtual machine proceeds with its execution.
  5. The EM checks, whether the method is hot. For hot methods, the EM initiates recompilation by the next JIT in the compilation chain.

Note

A method is hot when a profile associated with it satisfies specific parameters in the PC configuration settings. For example, for an entry and back-edge profile collector, these parameters are the entry and back-edge counters' limits. When a counter value reaches the limit, the method becomes hot.

Back to Top

5.3 Profile Collector

The profile collector (PC) is the execution manager subcomponent that collects profiles for Java* methods compiled by the JIT or executed by the interpreter. The DRL EM instantiates and configures profile collectors according to the settings of its configuration file.

The profile collector can collect method profiles only for the methods compiled by the same JIT. To collect the same type of profile information for methods compiled by different JIT compilers, the EM uses different PC instances.

After the PC collects a method profile, subsequent JIT compilers in the recompilation chain can reuse this profile. An execution engine is allowed to use a method profile in case the configuration file indicates that this JIT can use the profile. The EM defines the JIT role, that is, configures the JIT compiler to generate or to use a specific profile in the file include/open/em.h using the following format:

enum EM_JIT_PC_Role {
  EM_JIT_PROFILE_ROLE_GEN=1,
  EM_JIT_PROFILE_ROLE_USE=2
  };

With this model, instances of the compiler work independently of each other at run time. The JIT compiler can always use the PC handle to access the profile data that is assigned to be collected or to be used by this JIT compiler.
The profile collector does not trigger method recompilation. Instead, the PC notifies the execution manager that a method profile is ready according to a configuration passed from the EM during profile collector initialization. After that, the EM initiates recompilation of the method, if necessary.

5.4 Profiler Thread

To check readiness of a method profile and to recompile hot methods, the execution manager requires a special thread created by the VM core. This thread must be an ordinary Java* thread, because method compilation may result in execution of JIT-compiled code during class resolution or side-effect analysis.

The execution manager uses the recompilation thread created by the VM after loading all core classes and before executing the main method. The EM configures this thread to call back in a specified period of time. During this callback, the EM can check profiles and run method recompilation as required.

Back to Top

5.5 Public Interfaces

The execution manager interacts with the virtual machine and JIT compilers by using specific interfaces. In addition to these external interfaces, the execution manager uses the internal interface to communicate with profile collectors.

5.5.1 EM_VM Interface

The execution manager exports this interface to provide the VM with method compilation and execution functions. The virtual machine sends requests to the EM to execute a method. For that, the VM passes the method handle and parameters to the execution manager. The EM selects the JIT for compiling the method and runs method compilation and execution.

5.5.2 EM_JIT Interface

The execution manager exports this interface to enable JIT compilers to access method profiles. Via this interface, the JIT can gain access to a method profile or to the instance of the profile collector assigned to this JIT during initialization. The major part of EM_JIT is the profile access interface. By using this interface, the JIT compiler can access a custom profiler interface specific for a profile collectors family and then interacts directly with a specific profile collector.

5.5.3 Internal EM interfaces

The internal EM interfaces handle interaction between the execution manager and the profile collector. Via the time-based sampling support interface (TBS), the EM registers time-based sampling callbacks and configures thresholds of method profiles. The profile collector checks method profiles using this interface or the internal sampling method if the thresholds are reached. When the method profile is ready, the PC reports to the execution manager. The profile collector communicates with the EM via the profile-related events interface.

Back to Top

6. Garbage Collector

The garbage collector (GC) component is responsible for allocation and reclamation of Java* objects in the heap. The garbage collector automatically reclaims the objects that cannot be reached and thus cannot influence the program behavior using tracing techniques to identify unreachable objects. The VM can allocate new objects in the space recycled by the GC.

This component interacts with the VM core, the JIT compiler and the interpreter, the thread management functionality, and JIT-compiled code. The GC contacts the VM core to access data on the internal structure of objects, and uses several assumptions about data layout (see section 6.4 Data Layout Assumptions).

6.1 Architecture

When the heap memory is exhausted, the garbage collector instructs the VM core to safely suspend all managed threads, determines the set of root references [13], performs the actual collection, and then resumes the threads.

Note

The root set is the set of all pointers to Java* objects and arrays that are on the Java* thread stacks and in static variables. These pointers are called root references. See [13] for a detailed description of fundamentals of tracing garbage collection.

The garbage collector relies on the VM core to enumerate the root set. The VM core enumerates the global and thread-local references in the run-time data structures. The VM delegates the enumeration of the stack further to the execution engine, the JIT compiler or the interpreter. The GC then determines the set of reachable objects by tracing the reference graph.

To improve efficiency of heap tracing and object allocation, space in the VTable is reserved to cache frequently accessed GC information. In the current design, the GC caches the object field layout as the list of offsets for the fields of reference types in the VTable.

For details on garbage collection, see section 6.3 GC Procedure.

6.1.1 Block Structure of the Heap

The garbage collector divides the managed heap into 128-KB blocks and stores the mark tables in the first 4 kilobytes of each block. The mark table contains one bit for each possible object location in the block. Because objects are aligned at the boundary of 4 bytes, any 4-bytes aligned address could be the start of an object. That is why, a mark bit is allocated for each 4-byte aligned location in the block.

At the start of garbage collection, the mark tables are filled with zeroes. During heap trace, the garbage collector identifies all live objects and inserts value 1 in the mark tables for these objects only. The mark bit location is computed from the object address by subtracting the block start address and shifting to the right by 2 bits. Later, the garbage collector uses the mark table to reclaim space taken up by unreachable objects. Other GC data, such as free lists, is also stored in the first page of the block. See the definition of the block_info structure in gc/src/gc_header.h for more details.

Objects greater than 62 KB use special kinds of blocks allocated contiguously to fit the object, single object blocks. The part of the last block, which results from the 128-KB alignment, remains unused.

The collection of blocks used for large block allocation is the large object space (LOS).

Back to Top

6.1.2 Dynamic Heap Resize

In DRLVM, the size of the garbage collected heap can vary depending on system needs. The heap has three size characteristics: the current and the maximum heap size, and the committed size.

Heap Size

This parameter represents the amount of memory the GC uses for Java* object allocation. During normal operation, the GC starts garbage collection when the amount of allocated space reaches the heap size.

You can specify the initial value of heap size using the -Xms command-line option.

The default initial size is 64 MB. The GC can decide to change the current size after the garbage collection in the following conditions:

If at least one condition is true, the garbage collector increases the heap to the minimum size that eliminates the condition provided that the new heap size does not exceed the maximum value. Additional space is committed immediately on heap resize to ensure that the required amount of physical memory is available.

Maximum Heap Size

The garbage collector reserves the virtual memory corresponding to the maximum size at the VM startup. You can specify the maximum size using the command-line option –Xmx. The default value is 256 MB.

Committed Size

This parameter indicates the physical memory used by the heap. The committed size equals to zero at startup and grows dynamically when the garbage collector allocates new blocks in the heap memory. After reaching the current size, the allocation mechanism triggers garbage collection. The committed size grows by blocks when the GC starts a new memory block from the block store.

On Windows*, the function VirtualAlloc(…, MEM_COMMIT, …) performs the commit operation. On Linux*, this operation is no-op because the operating system commits the pages automatically on the first access. The commit-as-you-go behavior ensures the smallest possible footprint when executing small applications.

Back to Top

6.1.3 Data Layout Assumptions

The GC interface includes an implicit agreement between the VM core and the garbage collector regarding the layout of certain data in memory. The garbage collector makes assumptions on the layout of objects as described below in terms of the ManagedObject data type to load the VTable for an object without calling VM core functions.

6.1.4 GC Worker Threads

On multiprocessor machines, the GC can use the advantage of multiple processors by parallelizing computationally intensive tasks. For that, the garbage collector uses special C worker threads. The GC creates worker threads during initialization sequence, one worker thread for each processor available.

Most of the time, worker threads sleep waiting for the task. The GC controlling thread assigns the tasks for worker threads by sending an event. Once the task is received, the worker threads start their activity. While workers are busy, the GC controlling thread waits for the completion of the task. When all worker threads complete the assigned task, the controlling thread resumes execution and worker threads get suspended waiting for a new task.

Back to Top

6.2 GC Procedure

In the current implementation, the garbage collector uses the mark-sweep-compact stop-the-world collection mechanism [18], [19]. The sections below provide details on garbage collection in DRLVM at each stage of the process.

  1. Triggering garbage collection

    Garbage collection can be triggered by exhaustion of memory in the current heap or forced by the System.gc() method.

  2. Obtaining the GC lock

    Each user thread, which determined that the collection needs to be performed, tries to obtain a GC lock by calling the vm_gc_lock_enum() function. Only one thread succeeds and becomes the GC controlling thread. Other threads remain blocked at the vm_gc_lock_enum() function call, that is, remain suspended in gc_alloc() until the collection completes. After the thread manages to get the GC lock, it checks whether another thread has already performed the collection.

  3. Enumerating the root set

    After the controlling thread gets the lock and checks that the collection is still necessary, the thread resets GC global data by clearing the root set array and reference lists. Next, the thread calls the vm_enumerate_root_set_all_threads() function to make the core virtual machine enumerate the root heap pointers. In response, the VM suspends all threads and enumerates the root set by using the functions gc_add_root_set_entry(), gc_add_compressed_root_set_entry(), and others. The garbage collector records the root pointers in the root set array as the enumeration proceeds.

  4. Verification trace

    If the collector is built with debugging code enabled, the verification trace is performed after the full root set enumeration. This operation verifies that the roots set and heap are in consistent state, that is, all pointers into Java* heap point to the valid Java* objects and no dangling pointers exist.
    The garbage collector marks traced objects using the object header bit instead of regular mark tables in order to prevent interference with regular GC operation. The verification trace procedure uses the eighth bit (0x80) of the object lockword (obj_info word) for marking. At the end of the verification trace, the GC prints the number of objects found to be strongly reachable during the trace operation.

  5. Mark scan

    During the mark scan stage, the GC identifies all objects reachable from the root set. The GC maintains the mark stack, a stack of objects reached during tracing but not scanned yet. The GC repeatedly removes an object from the mark stack and scans all reference fields of the objects. For each reference field, the GC tries to mark an object that the field points to and, if mark operation succeeds, adds the object pointer to the mark stack. The GC continues scanning objects from the mark stack until it is empty.

    Mark scan activity runs in parallel on worker threads, one thread per processor. Each worker thread atomically grabs one root from the root set array and traces all objects reachable from that root by using its own private mark stack. Worker threads mark objects by using the atomic compare and swap operation on the corresponding byte in the mark table. The mark operation failure indicates that another worker thread has marked the object earlier, and is responsible for scanning it.

    Finalizable Objects and Weak References

    By the end of the mark scan operation, all strongly reachable objects are marked and the garbage collector deals with collected reference objects and weak roots. Complying to the Java* API specification [6], the GC maintains the strength of weak references by going through soft, weak, and then phantom reference objects in this specific order.

    Note

    The current implementation treats soft references as weak references, and clears soft referents when they become softly reachable.

    To ensure that finalizable objects are not reclaimed before the finalize() method is run, the GC uses the finalizable queue as an additional root set and restarts the mark scan process to transitively mark objects reachable from finalizable objects. This increases the number of live objects during collection. For details on handling finalizable objects in the DRL virtual machine, see section 2.7 Finalization.

    Note

    According to the JVM specification [1], objects are only included into the finalizable queue during object allocation and not after they are scheduled for finalization.

    The finalize() method is not run during garbage collection. Instead, the GC puts finalizable objects to a separate queue for later finalization.

  6. Reclamation of unmarked objects

    Once all the references and finalizable objects have been visited, the GC has a list of all reachable objects. In other words, any object that remains unmarked at this stage can safely be reclaimed.

    Reclamation by Compaction and Sweep

    Reclamation of unmarked objects can be performed in two ways: compaction and sweep. Compaction is preferable because it improves the object locality and keeps fragmentation at a low level. However, compaction does not cover the objects declared pinned during enumeration, and large objects because of high costs of copying.

    Compaction is performed block-wise in several stages. All blocks that contain pinned objects and blocks with large objects are excluded, see the files include/open/gc.h and include/open/vm_gc.h for more information. During compaction, the GC does the following:

    1. Calculates the new locations of compacted objects and installs forwarding pointers in the object headers in the obj_info word.
      Non-zero values of obj_info are saved. The GC sets the new locations by sliding live objects to the lower addresses in order to maintain allocation ordering.

      Note

      The pointer to the new object location, which is written in the old object copy, is the forwarding pointer.

    2. Updates all slots pointing to compaction areas by using the forwarding pointers.
      The GC uses the lists of slots collected beforehand during the mark scan phase.
    3. Copies the objects to their target destinations.
    4. Restores the headers overwritten with forwarding pointers.

    Sweep is performed on blocks that were not compacted and objects in the large object space. The GC scans the array of mark bits to find sufficiently long zero sequences. The garbage collector ignores zero sequences that represent less than 2 KB of the Java* heap. Longer free areas are linked to the free list structure and used for object allocation. The GC does not perform the sweep operation during the stop-the-world pause. Instead, the GC sweeps the block just before using it for allocating new objects.

  7. Final stage of garbage collection

    At this stage, the object reclamation is complete. In the debugging mode, the GC performs the second verification trace and can optionally print the number of live objects. The GC provides the virtual machine with the list of objects scheduled for the finalize() method run and references scheduled for an enqueue() method run. Finally, the GC commands the VM to resume the user threads and finishes the collection.

  8. Iteration

    In case garbage collection was triggered by allocation, the garbage collector retries to allocate the object. If the allocation fails a second time, the garbage collector repeats the GC procedure. In case the allocation fails for two iterations, an OutOfMemoryError condition is reported. For details on allocation techniques, see section 6.3 Object Allocation.

Back to Top

6.3 Object Allocation

The DRL garbage collector provides several object allocation mechanisms to improve performance. Below is the description of ordinary and optimized object allocation procedures.

The gc_alloc() and gc_alloc_fast() functions are the key functions in object allocation. The gc_alloc() function may trigger garbage collection to satisfy allocation, and the gc_alloc_fast() function is not allowed to do so. The garbage collector also provides other functions that handle specific types of object allocation or optimize the allocation procedure, as follows:

Back to Top

Allocation Procedure

The gc_alloc() and gc_alloc_fast() functions do the following in the specified order:

  1. Try to allocate the object from the current allocation area.

    Note

    An allocation area is a contiguous section of free space suitable for bump (frontier) allocation. The GC recreates allocation areas in the blocks after each garbage collection. Allocation areas range from 2 KB to 124 KB in size. Areas of size less than 2 KB are ignored to prevent degradation of allocation performance. The maximum allocation area size is determined by the block size.

  2. If allocation in the current area fails, move to the next allocation area and retry step 1.
  3. If allocation in the current block fails, move to the next block in the chunk and retry steps 1 and 2.

    Note

    A chunk is a linked list of blocks. After the garbage collection all memory chunks are freed. To avoid race conditions between multiple Java* threads, chunks are removed from the chunk queue atomically. These chunks can then be used by the owning thread without further synchronization.

  4. After exhausting the current thread chunk, the gc_alloc() and gc_alloc_fast() functions follow different procedures:
    1. The gc_alloc() function attempts to grab new chunks from the master chunk list. If this fails, the thread tries to take the GC lock, collect garbage, and restart the allocation procedure. After two garbage collections fail, the gc_alloc() function returns NULL, which produces an out of memory error.
    2. The gc_alloc_fast() function returns NULL immediately after allocation from current chunk fails. The function never tries to use another memory chunk.

To start garbage collection from gc_alloc(), the VM needs to enumerate the root set for the thread that called gc_alloc(), and this requires pushing an M2nFrame on the stack. Because few allocations fail and trigger garbage collection, the effort of pushing an M2nFrame is often wasted. To avoid this, the thread invokes the gc_alloc_fast() function without an M2nFrame and cannot start garbage collection.

Back to Top

6.4 Public Interfaces

This section lists the interfaces the garbage collector exports for communication with other components.

6.4.1 GC Interface

The garbage collector provides a number of interface functions to support DRLVM activity at various stages, as follows:

Handshaking at DRLVM startup

The GC exposes several groups of functions that support different startup operations, as described below.

Initializing the Garbage Collector

The VM core calls the gc_init() function to initialize the garbage collector. In this function, the GC does the following:

  1. Uses the VM property mechanism to read GC-specific configuration options from the application command line. The GC calls the vm_get_ property() function to query configuration options that may have been specified on the command line. For example, the heap size parameters are passed to the GC as values of the properties gc.ms and gc.mx for initial and maximum heap size parameters respectively.
  2. Allocates the managed heap and prepares to serve memory allocation requests.
  3. Creates a GC worker thread for each available processor. GC worker threads wait until the controlling thread wakes the threads for parallel-mode collection activities, for example, for mark scanning and compaction.

After these actions, the garbage collector returns from function gc_init(), and the VM initialization sequence continues. After the call to gc_init(), the VM starts allocating Java* objects, but garbage collection is disabled until the end of the VM initialization procedure.
Finally, the virtual machine calls the gc_vm_initialized() function to inform the garbage collector that the VM core is sufficiently initialized to enumerate roots, and that garbage collection is permitted. The VM can allocate a moderate amount of objects between the calls to gc_init() and gc_vm_initialized().

Class Loading and Creating a Thread

The VM core calls the functions gc_class_prepared() and gc_thread_init(). The function gc_thread_init() assigns a private allocation to a thread. The function gc_class_prepared() caches the field layout information in the VTable structure.

Managed Object Allocation

To allocate a new object, JIT-compiled code or native methods call the gc_alloc() or gc_alloc_fast() functions in the GC interface. If the heap space is exhausted, the garbage collector stops all managed threads and performs the garbage collection. To allocate an object, the GC can use one of several available functions, as described in section 6.3 Object Allocation.

Root Set Enumeration

The gc_add_root_set_entry() function and several similar functions support this operation on the GC side.

Virtual Machine Shutdown

The VM core calls the gc_wrapup() function to tell the GC that the managed heap is no longer required. In response, the GC frees the heap and other auxiliary data structures.

Other interface groups include functions for forcing garbage collection and querying information about available memory and details of the garbage collection operation.

Back to Top

6.4.2 VM_GC interface

The VM implements the several groups of functions to support GC operation as described below. The functions of this interface are grouped by the period of operation, during which they are activated.

Collection Time

The VM core exposes the following functions to support garbage collection:

Finalization and Weak Reference Handling

The GC handles weak reference objects differently from regular objects. When preparing GC data about a class in gc_class_prepared(), the GC uses the function class_is_reference() to find out, whether the objects of this class require special handling. The GC calls class_get_referent_offset() to get the offset of the referent field with weak reference properties.

During the garbage collection, the GC finds the objects that need to be finalized and resets weak references. The GC does not execute Java* code, but transfers the set of finalizable objects and reference objects to the VM by using the functions vm_finalize_object() and vm_enqueue_reference().

Handshaking at VM Startup

At this stage, the GC uses the following functions exposed by the VM core:

Back to Top

7. Interpreter

The interpreter component executes Java* bytecode and is used in the VM interchangeably with the JIT compiler. The interpreter does the following:

The interpreter supports the following platforms: Windows* / IA-32, Linux* / IA-32, Linux* / Itanium® processor family and Linux* / Intel® EM64T.

Note

The DRL interpreter works on Intel® EM64T, but no JIT compiler is currently available for this platform.

Currently, the interpreter is closely tied to the VM core, see section 7.2.2 for details.

7.0.1 Interpreter and JIT

The interpreter differs from the JIT compiler. The JIT translates byte code into native code and executes the produced native code. The interpreter reads the original bytecode and executes a short sequence of corresponding C/C++ code. Interpretation is simpler, but substantially slower than executing JIT-compiled code.

Back to Top

7.1 Characteristics

7.1.1 Calling Conventions

The interpreter is a C/C++ module and its calling conventions correspond to the JNI specification conventions.

Note

In native code, it is impossible to call a function with the number of arguments unknown at compile time. That is why, arguments of Java* methods are passed as pointers to arrays of arguments.

7.1.2 Stack Structure

The interpreter has its own Java* stack frame format. Each Java* frame contains the following fields:

The Java* frame is allocated on the C stack by using the alloca() function.

Back to Top

7.2 Internal Structure

The interpreter consists of the following major components:

7.2.1 Packaging structure

This section lists the file groups responsible for major functions of the interpreter located in the interpreter folder.

src – Interpreter source files location.
|
interpreter.cpp
  – Major interpreter functionality, bytecode handlers.
interpreter_ti.cpp
    – Support for JVMTI in VM.
interp_stack_trace.cpp
    – Stack trace retrieval and thread root set enumeration for GC.
interp_vm_helpers.cpp
    – Wrappers around VM core functions that enable the interpreter to use them.
invokeJNI_*.asm
    – Platform-specific execution of JNI methods.
    Native function call constructed from the JNI function pointer and an array of arguments.
interp_native_*.cpp
    – Platform-specific execution of JNI methods.
    Definition of the InvokeJNI() function for the Windows*  / IA-32 platform.
interp_exports.cpp
    – Exporting interpreter interface functions to the virtual machine via the functions table.

7.2.2 Interfaces

The interpreter has a tightly bound and complicated interface for communication with the VM core. The interpreter is dynamically linked with the VM core and uses its internal interfaces. At the same time, the interpreter exports its enumeration, stack trace generation and JVMTI support functions via a single method table. These make up the Interpreter interface.

Back to Top

7.3 Support Functions

7.3.1 JNI Support

VM support for execution of JNI methods relies on stub generation. This does not suit the interpreter because dynamic code in the stubs is hardly fit for debugging. In the current implementation, the interpreter is mainly aimed at VM code debugging and uses its own code for handling JNI methods.

The DRL interpreter provides functions for each type of JNI functions: static, virtual, executed from an interpreted frame or from interpreter invocation code. The interpreter executes JNI methods by performing the following actions:

  1. Locates the method native code via a find_native_method() call.
  2. Pushes an M2nFrame on the stack to transfer control from managed to native code for compatibility with the JIT mode.
  3. Places all arguments in one or several arrays according to calling conventions specific for a platform:
    1. On the IA-32 platform, all arguments are on the stack and in one array.
    2. On the Itanium® processor family platform, a special handling mechanism is used for arguments placed in integer and floating point registers. A similar mechanism is used on the Intel® EM64T platform.
  4. Optionally, calls the JVMTI method entry event callback.
  5. Synchronizes methods lock with the corresponding monitor.
  6. Notifies the thread management component that the current thread can be suspended because at this stage, the code is GC-safe.
  7. Executes the JNI method via the invokeJNI stub, which converts the native code address and arguments array or arrays into a function call.
  8. Notifies the thread manager that the current thread cannot be suspended at an arbitrary point because the code is no longer GC-safe.
  9. Synchronized methods release corresponding monitor.
  10. With JVMTI enabled, calls the method exit event callback.
  11. Destroys the M2nFrame.

7.3.2 JVMTI Support

The DRL interpreter assists the virtual machine in supporting JVMTI. The interpreter enables stack walking, stack frame examination, method entry and exit events, breakpoints, single step and PopFrame functions.

Back to Top

8. Porting Layer

8.1 Characteristics

This component provides unified interfaces to low-level system routines across different platforms. The porting layer mainly covers the following functional areas:

To maximize benefits of the porting layer, other components interact with the underlying operating system and hardware via this component. Currently, most DRLVM code uses the Apache Portable Runtime library as a base porting library, though certain parts are still not completely ported to APR, and access the operating system directly. The DRL porting library also includes about 20 additional functions and macros, designed as potential extensions to APR. These additions mostly relate to querying system information and virtual memory management.

8.2 Component Manager

The component manager is a subcomponent of the porting layer responsible for loading and subsequent initialization of VM components.

During the loading stage, the component manager queries the default interface from each loading component, and then makes this information available at the initialization stage via interface queries. The component manager also enables instance creation for interfaces. Currently, only the execution manager uses the component manager loading scheme.

8.3 Public Interfaces

The porting library is statically linked to the VM core component and exports its interfaces through this component.

Note

This implies that APR objects are compiled as exported but packaged as a static library (not linked as a self-contained dynamic shared library).

Other components may directly include porting library headers (APR or additional ones), and dynamically link with the VM core.

Back to Top

9. Class Libraries

The class libraries complement the DRLVM to provide a full J2SE*-compliant run-time environment. The class libraries contain all classes defined by J2SE* specification [6] except for the set of kernel classes.

9.1 Characteristics

The DRL class libraries satisfy the following requirements:

Note

DRLVM does not require the full J2SE* API set in order to be functional.

At startup, DRLVM preloads approximately 20 classes, including the kernel classes. The minimal subset for the VM startup is defined by the dependencies of the preloaded classes, which can vary in different implementations. You can get the exact list from DRLVM sources, mainly from vmcore\src\init\vm_init.cpp file.

9.1.1 Interaction

The class libraries interact with the VM through the following interfaces:

Note

The current implementation of VM accessors is built on top of JNI. Future implementations may utilize the VM-specific Fast (Raw) Native Interface or any intrinsic mechanism in order to achieve the better performance.

Back to Top

9.2 Packaging Structure

In DRL, the class libraries are packaged according to the following structure on Windows* and Linux*:

ij -java.home
|
+-bin - java.library.path
|
+-lib - vm.boot.class.path

9.2.1 Java* Classes

The class libraries are packaged as .jar or .zip files and stored in the \lib directory. Each .jar file contains the classes that belong to a specific functional area [9]. By default, the VM boot class path points to the location of .jar and .zip archives, as listed above. You can set an alternate location of the boot class path on the command line by using the –Xbootclasspath command-line option.

9.2.2 Native Libraries

Native libraries, .dll files on Windows* and .so files on Linux* used by the class libraries are placed in the \bin directory. You can set an alternate location for the native libraries on the command line by using the java.library.path property.

9.2.3 Resources

The class libraries would typically use the java.home property in order to determine the location of the necessary resources, for example, the location of the java.security.policy file. By default, the java.home property is initialized to the parent directory of the ij executable, which is the \ij directory, as shown above. You can set an alternate value for the java.home property on the command line.

Back to Top

10. Inter-component Optimizations

In DRLVM, safety requirements and dynamic class loading affect the applicability and effectiveness of traditional compiler optimizations, such as null-check elimination or array-bounds checking. To improve performance, DRLVM applies inter-component optimizations to reduce or eliminate these safety overheads, and to ensure effective operation in the presence of dynamic loading.

Inter-component optimizations include various optimization techniques supported by more than one component in DRLVM, as described in the subsequent sections.

10.1 Fast Subtype Checking

Java* programs widely use inheritance. The VM needs to check whether an object is an instance of a specific super type thousand of times per second. These type tests are the result of explicit checks in application code (for example, the Java* checkcast bytecode), as well as implicit checks during array stores (for example, Java* aastore bytecode). The array store checks verify that the types of objects being stored into arrays are compatible with the element types of the arrays. Although functions checkcast(), instanceof(), and aastore() take up at most a couple of percent of the execution time for Java* benchmarks, that is enough to justify some degree of in-lining. The VM core provides the VM_JIT interface to allow JIT compilers to perform a faster, in-lined type check under certain commonly used conditions.

10.2 Direct-call Conversion

In DRLVM, Java* virtual functions are called indirectly by using a pointer from a VTable even when the target method is precisely known. This is done because a method may not have been compiled yet, or it may be recompiled in the future. By using an indirect call, the JIT-compiled code for a method can easily be changed after the method is first compiled, or after it is recompiled.
Because indirect calls may require additional instructions (at least on the Itanium® processor family), and may put additional pressure on the branch predictor, converting them into direct calls is important. For direct-call conversion, the VM core includes a callback mechanism to enable the JIT compiler to patch direct calls when the targets change due to compilation or recompilation. When the JIT produces a direct call to a method, it calls a function to inform the VM core. If the target method is compiled, the VM core calls back into the JIT to patch and redirect the call.

10.3 Fast constant-string instantiation

Constant-string instantiation is common in Java* applications, and DRLVM, loads constant strings at run time in a single load, as is with static fields. To use this optimization, Jitrino calls the class loader interface function class_get_const_string_intern_addr() at compile time. This function interns the string and returns the address of a location pointing to the interned string. Note that the VM core reports this location as part of the root set during garbage collection.
Because string objects are created at compile time regardless of the control paths actually executed, the optimization applied blindly to all JIT-compiled code, might result in allocation of a significant number of unnecessary string objects. To avoid this, apply the heuristic method of not using fast strings in exception handlers.

10.4 Lazy Exceptions

Certain applications make extensive use of exceptions for control flow. Often, however, the exception object is not used in the exception handler. In such cases, the time spent on creating the exception object and creating and recording the stack trace in the exception object is wasted. The lazy exceptions optimization enables the JIT compiler and the VM core to cooperate on eliminating the creation of exception objects with an ordinary constructor in case these objects are not used later on.

To implement lazy exceptions, the JIT compiler finds the exception objects that are used only in the throw statements in the compiled method. The JIT compiler analyzes the constructors of these objects for possible side effects. If the constructor has no side effects, the JIT removes the exception object construction instructions and substitutes a throw statement with a call to a run-time function that performs the lazy exception throwing operation. During execution of the new function, the VM core unwinds the stack to find the matching handler, and does one of the following depending on the exception object state:

The lazy exceptions technique significantly improves performance. For more information on exceptions in DRLVM, see section 3.9 Exception Handling.

Back to Top

11. References

This section lists the external references to various sources used in DRLVM documentation, and to standards applied to DRLVM implementation.

[1] Java* Virtual Machine Specification, http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html

[2]Java* Language Specification, Third Edition, http://java.sun.com/docs/books/jls/

[3]JIT Compiler Interface Specification, Sun Microsystems, http://java.sun.com/docs/jit_interface.html

[4]JVM Tool Interface Specification, http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html

[5] Java* Native Interface Specification, http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/jniTOC.html

[6]Java* API Specification, http://java.sun.com/j2se/1.5.0/docs/api

[7] Java* Invocation API Specification, http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/invocation.html

[8]Creating a Debugging and Profiling Agent with JVMTI tutorial, http://java.sun.com/developer/technicalArticles/Programming/jvmti/index.html

[9] Apache Harmony project, http://incubator.apache.org/harmony/.

[10] IA-32 Intel Architecture Software Developer's Manual, Intel Corp., http://www.intel.com/design

[11] Ali-Reza Adl-Tabatabai, Jay Bharadwaj, Michal Cierniak, Marsha Eng, Jesse Fang, Brian T. Lewis, Brian R. Murphy, and James M. Stichnoth, Improving 64-Bit Java* IPF Performance by Compressing Heap References, Proceedings of the International Symposium on Code Generation and Optimization (CGO’04), 2004, http://www.cgo.org/cgo2004/

[12] Stichnoth, J.M., Lueh, G.-Y. and Cierniak, M., Support for Garbage Collection at Every Instruction in a Java* Compiler, ACM Conference on Programming Language Design and Implementation, Atlanta, Georgia, 1999, http://www.cs.rutgers.edu/pldi99/

[13] Wilson, P.R., Uniprocessor Garbage Collection Techniques, in revision (accepted for ACM Computing Surveys). ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps

[14] Apache Portable Runtime library, http://apr.apache.org/

[15] S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, CA, 1997.

[16] P. Briggs, K.D., Cooper and L.T. Simpson, Value Numbering. Software-Practice and Experience, vol. 27(6), June 1997, http://www.informatik.uni-trier.de/~ley/db/journals/spe/spe27.html

[17] R. Bodik, R. Gupta, and V. Sarkar, ABCD: Eliminating Array-Bounds Checks on Demand, in proceedings of the SIGPLAN ’00 Conference on Program Language Design and Implementation, Vancouver, Canada, June 2000, http://research.microsoft.com/~larus/pldi2000/pldi2000.htm

[18] Paul R. Wilson, Uniprocessor garbage collection techniques, Yves Bekkers and Jacques Cohen (eds.), Memory Management - International Workshop IWMM 92, St. Malo, France, September 1992, proceedings published as Springer-Verlag Lecture Notes in Computer Science no. 637.

[19] Bill Venners, Inside Java 2 Virtual Machine, http://www.artima.com/insidejvm/ed2/

[20] Harmony Class Library Porting Documentation, http://svn.apache.org/viewcvs.cgi/*checkout*/incubator/harmony/enhanced/classlib/trunk/doc/vm_doc/html/index.html?content-type=text%2Fplain

[21]Karl Pettis, Robert C. Hansen, Profile Guided Code Positioning, http://www.informatik.uni-trier.de/~ley/db/conf/pldi/pldi90.html

Back to Top

(C) Copyright 2005 Intel Corporation

* Other brands and names are the property of their respective owners.