Dynamic Runtime Layer Developer's Guide
10. Inter-component Optimizations
10.3 Fast Constant-string Instantiation
Version | Version Information | Date |
Initial version | Intel, Nadya Morozova: document created. | November 16, 2005 |
Version 1.0 | Intel, Nadya Morozova: document updated and expanded. | March 2, 2006 |
Copyright 2005-2006 The Apache Software Foundation or its licensors, as applicable.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Portions, Copyright © 1991-2005 Unicode, Inc. The following applies to Unicode.
COPYRIGHT AND PERMISSION NOTICE
Copyright © 1991-2005 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html. Permission is hereby granted, free of charge, to any person obtaining a copy of the Unicode data files and any associated documentation (the "Data Files") or Unicode software and any associated documentation (the "Software") to deal in the Data Files or Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Data Files or Software, and to permit persons to whom the Data Files or Software are furnished to do so, provided that (a) the above copyright notice(s) and this permission notice appear with all copies of the Data Files or Software, (b) both the above copyright notice(s) and this permission notice appear in associated documentation, and (c) there is clear notice in each modified Data File or in the Software as well as in the documentation associated with the Data File(s) or Software that the data or software has been modified.
THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE.
Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in these Data Files or Software without prior written authorization of the copyright holder.
2. Additional terms from the Database:
Copyright © 1995-1999 Unicode, Inc. All Rights reserved.
Disclaimer
The Unicode Character Database is provided as is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. If this file has been purchased on magnetic or optical media from Unicode, Inc., the sole remedy for any claim will be exchange of defective media within 90 days of receipt. This disclaimer is applicable for all other data files accompanying the Unicode Character Database, some of which have been compiled by the Unicode Consortium, and some of which have been supplied by other sources.
Limitations on Rights to Redistribute This Data
Recipient is granted the right to make copies in any form for internal distribution and to freely use the information supplied in the creation of products supporting the UnicodeTM Standard. The files in the Unicode Character Database can be redistributed to third parties or other organizations (whether for profit or not) as long as this notice and the disclaimer notice are retained. Information can be extracted from these files and used in documentation or programs, as long as there is an accompanying notice indicating the source.
This document introduces DRL, the dynamic run-time layer, explains basic concepts and terms, and gives an overview of the product's structure and interfaces for inter-component communication. Special focus is given to the virtual machine, DRLVM. Use this document to focus on the DRLVM implementation specifics and to understand the internal peculiarities of the product.
The document describes version 1 of the DRL virtual machine donated in March 2006.
The target audience for the document includes a wide community of engineers interested in using DRLVM and in working further with the product to contribute to its development.
This document consists of several major parts describing the key processes and components of the DRL virtual machine, as follows:
Part 2. VM Architecture provides an overview of the DRL component-based model and describes the major inter-component processes running inside the virtual machine, such as root set enumeration and object finalization. The part also comprises a section about data structures used in DRLVM. You can start with this part to learn about major principles of VM internal operation.
Part 3. VM Core gives an in-depth description of the core virtual machine and its subcomponents responsible for various functions of the virtual machine, including stack walking and thread management.
Part 4. JIT Compiler describes compilation paths and specific features of the DRL just-in-time compiler. Consult this part of the guide for details on optimizations implemented in the DRL JIT compiler and its code generation path.
Part 5. Execution Manager shows the details of the dynamic profile-guided optimization subsystem. In this part, you can find information on method profiles and recompilation logic.
Part 6. Garbage Collector focuses on object allocation and garbage collection processes. This part contains a description of the garbage collector component and of its interaction with the VM core.
Part 7. Interpreter has a description of the interpreter component and its debugging services.
Part 8. Porting Layer gives an overview of platform-dependent functionality used in DRL. The part also includes an overview of the component manager.
Part 9. Class Libraries gives information on the layout and characteristics of the Java* class libraries interacting with the DRL virtual machine.
Part 10. Inter-component Optimizations is devoted to performance-improving operations that involve multiple components.
Part 11. References lists the links to external materials supporting this document. These materials include specifications, manual on programming, and articles on specific issues. You can consult this part of the document for directions on investigating a specific problem or alternative ways of implementing specific features.
This document uses the unified conventions for the DRL documentation kit.
The table below provides the definitions of all acronyms used in the document.
Acronym | Definition |
API | Application Program Interface |
APR | Apache Portable Runtime Layer |
CFG | Control Flow Graph |
CG | Code Generator |
CLI | Common Language Interface |
DFG | Data Flow Graph |
DPGO | Dynamic Profile-guided Optimizations |
DRL | Dynamic Run-time Layer |
DRLVM | Dynamic Run-time Layer Virtual Machine |
EE | Execution Engine |
EM | Execution Manager |
FP | Floating Point |
GC | Garbage Collector |
HIR | High-level Intermediate Representation |
IR | Intermediate Representation |
J2SE* | Java* 2 Standard Edition |
JCL | Java* Class Libraries |
JIT | Just-in-time Compiler |
JNI | Java* Native Interface |
JVM | Java* Virtual Machine |
JVMTI | JVM Tool Interface |
LIR | Low-level Intermediate Representation |
LMF | Last Managed Frame |
LOB | Large Object Block |
LOS | Large Object Space |
OS | Operating System |
PC | Profile Collector |
SIMD | Single Instruction Multiple Data |
SOB | Single Object Block |
SSA | Single Static Assignment |
SSE, SSE2 | Streaming SIMD Extensions (2) |
STL | Standard Template Library |
TBS | Time-based Sampling |
TLS | Thread Local Storage |
TM | Thread Manager |
VM | Virtual Machine, same as JVM in current document |
The Dynamic Runtime Layer (DRL) is a clean-room implementation of the Java* 2 Platform, Standard Edition (J2SE*) 1.5.0. This Java* run-time environment consists of the virtual machine (DRLVM), and a set of Java* class libraries (JCL). The product is released in open source. The virtual machine is written in C++ code and a small amount of assembly code. This document focuses on the virtual machine, and gives a short overview of the class libraries supporting it.
Key features of DRL include the following:
The DRL virtual machine reconciles high performance with the extensive use of well-defined interfaces between its components.
A component corresponds to one static or dynamic library, and several libraries linked statically or dynamically at run time make up the managed run-time environment. For details on components linking, see section 2.2.2 Linking Models.
DRLVM components communicate via functional interfaces. An interface is a pointer to a table of function pointers to pure C methods. Interfaces have string names, which unambiguously identify their function table layout. Each component exposes the default interface to communicate with the component manager, and one or more interfaces for communication with other components.
Note
In the current version, only the execution manager uses the component manager. Other components will migrate to this new model in further releases.
DRL can also operate with co-existing component instances, as requires the Invocation API [7]. An instance of a component contains a pointer to its default interface and component-specific data. The porting layer always has exactly one instance. This allows a compiler to in-line calls to the porting layer functions. Other components have the same number of instances as the VM core does.
Background
In Java* programming, components, interfaces, and
instances can be described in terms of classes, interfaces and
objects. A VM component encapsulates common features, attributes, and
properties of virtual machines, and maps to a Java*
class. VM interfaces are tables of methods implemented and exposed by
the class. If several virtual machines exist in the same address
space, they all expose the same interfaces. These VM instances are
instances of the VM class, or objects.
The component manager enables
explicit creation of component instances by exposing the
CreateNewInstance()
function, which corresponds to the
Java* operator new()
. Components with
only one instance correspond to static class methods in Java*. All components are initialized at load time.
Subsequent sections define each component and provide information on public interfaces, dependencies and other component specifics.
Libraries corresponding to different DRL components are linked by one of the following models:
Figure 1 below displays the major DRL components and their interfaces.
Figure 1. Major DRL Components
Figure 1 demonstrates the DRL Java* virtual machine major components, and the class libraries that support the machine. These components are responsible for the following functions:
The other components listed below make part of the DRL virtual machine.
Depending on the configuration, you can use multiple execution engine components, for example, an interpreter and optimizing JIT. Simultaneous use of multiple JIT compilers can provide different trade-offs between compilation time and code quality.
This section provides an overview of data structures in DRLVM, typical examples of data structures, and the exposed data layout of public data structures.
In DRLVM, all data structures are divided into the following groups:
For example, when compiling an access operation to an instance field,
the JIT calls the public VM_JIT
interface function to
obtain the offset, and uses the result to generate the appropriate
load instruction. Another example is the VM core internal
representation of a class object.
DRLVM exports data structures in accordance with the JNI [5] and JVMTI [4] standards. In
addition to these structures, DRLVM shares information about an object
layout across its components. In particular, the Java Native Interface
does not specify the structure of jobject
, and DRLVM
defines it as illustrated below.
typedef struct ManagedObject { VTable *vt; uint32 obj_info; /* Class-specific data */ } ManagedObject; struct _jobject { ManagedObject* object; } typedef struct _jobject* jobject;
The jobject
structure contains the following elements:
vt
field points to the object virtual-method
table.
Each class has one virtual-method table (VTable) with class-specific information to perform common operations, such as getting pointers to virtual methods. The VTable is shared across all instances of a class. During garbage collection, the VTable supplies such information, as the size of the object and the offset of each reference stored in the instance.
obj_info
field is used during synchronization and
garbage collection. This is a 32-bit value on all supported
architectures. This field also stores the hash code of an instance.
Class-specific instance fields immediately follow the vt
and obj_info
fields. Representation of array instances is
shared between the garbage collector and the JIT compiler. The VM core
determines the specific offsets to store the array length and the
first element of the array. This way, the VM core makes these fields
available for the garbage collector and the JIT via the VM interface.
GetBooleanField()
JNI
function.
typedef jobject ObjectHandle; jboolean JNICALL GetBooleanField(JNIEnv *env, jobject obj, jfieldID fieldID) { Field *f = (Field *) fieldID; /* Initialize the class if the field is accessed */ if (!ensure_initialised(env, f->get_class())) { return 0; /* Error */ } ObjectHandle h = (ObjectHandle) obj; tmn_suspend_disable(); //-- Do not allow GC suspension --v Byte *java_ref = (Byte *)h->object; jboolean val = *(jboolean *)(java_ref + offset); tmn_suspend_enable(); //--------------------------------^ return val; } // GetBooleanField
To decrease memory footprint on 64-bit platforms [11], direct object and VTable pointers are compressed in the Java* heap to 32-bit values.
To calculate a direct heap pointer, the system adds the pointer to the heap base to the compressed value from the reference field. Similarly, a direct pointer to an object VTable equals to the compressed value stored in the first 32 bits of the object plus the base VTable pointer. This limits the maximum heap size to 4 GB, but significantly reduces the average object size and the work set size, and improves cache performance.
Apart from the basic assumptions about object layout and the VTable cache, all interaction between major DRLVM components is achieved through function calls.
VM initialization is a sequence of operations performed at the virtual machine start-up before execution of user applications. Currently, DRLVM does not support the invocation API [7], and initialization follows the sequence described below. The subsection 2.5.3 Destroying the VM below also describes the virtual machine shutdown sequence.
The main(…)
function is responsible for the major
stages of initialization sequence and does the following:
create_vm()
function
VMStarter.start()
method
destroy_vm()
function
The subsequent sections describe these initialization stages in greater detail.
At this stage, the VM splits all command-line arguments into the following groups:
<vm-arguments>
for initializing the VM instance
<class-name | -jar jar-file>
for the name or
class or .jar
file
<java-arguments>
for the user application
The virtual machine then creates the JavaVMInitArgs
structure from <vm-arguments>
.
The create_vm()
function is a prototype for
JNI_CreateJavaVM()
responsible for creating and
initializing the virtual machine. This function does the following:
VM_thread
structure and stores
the structure in the thread local storage.
java.home
,
java.library.path
, and
vm.boot.class.path
, according to the location of the
VM library..jar
files is hard-coded
into the VM library. Use –Xbootclasspath
command-line options to change the settings.
gc_init()
.
Note
The vm.other_natives_dlls
property defines the list
of libraries to be loaded.
java.lang.Thread
object for the
current thread.
java.lang.ClassLoader.getSystemClassLoader()
.
VMStart
JVMTI event. This step begins the
start phase.
ThreadStart
JVMTI event for the main thread.
Send the VMInit
JVMTI event. At this stage, the
live phase starts.
VMStarter.initialize()
method.
The destroy_vm()
function is a prototype for
JNI_DestroyJavaVM()
responsible for terminating operation
of a VM instance. This function calls the VMStarter.shutdown()
method.
This Java* class supports specific VM core tasks by providing the following methods:
create_vm()
method, does the following:
destroy_vm()
method, does the following:
System.exit()
method
main()
method via reflection
main()
method
and caches exceptions from this method
DRLVM automatically manages the Java* heap by using tracing collection techniques.
Root set enumeration is the process of collecting the initial set of references to live objects, the roots. Defining the root set enables the garbage collector to determine a set of all objects directly reachable from the all running threads and to reclaim the rest of the heap memory. The set of all live objects includes objects referred by roots and objects referred by other live objects. This way, the set of all live objects can be constructed by means of transitive closure of the objects referred by the root set.
Roots consist of:
In DRLVM, the black-box method is designed to accommodate precise enumeration of the set of root references. The GC considers everything outside the Java* heap as a black box, and has little information about the organization of the virtual machine. The GC relies on the support of the VM core to enumerate the root set. In turn, the VM considers the thread stack as the black box, and uses the services provided by the JIT and interpreter to iterate over the stack frames and enumerate root references in each stack frame.
Enumeration of a method stack frame is best described in terms of safe points and GC maps. The GC map is the data structure for finding all live object pointers in the stack frame. Typically, the GC map contains the list of method arguments and local variables of the reference type, as well as spilt over registers, in the form of offsets from the stack pointer. The GC map is associated with a specific point in the method, the safe point. The JIT determines the set of safe points at the method compilation time, and the interpreter does this at run time. This way, call sites and backward branches enter the list. During method compilation, the JIT constructs the GC maps for each safe point. The interpreter does not use stack maps, but keeps track of object references dynamically, at run time. With the black-box method, the VM has little data on the thread it needs to enumerate, only the register context.
When the GC decides to do garbage collection, it enumerates all roots as described below.
vm_enumerate_root_set_all_threads()
.
Note
Currently, the DRLVM implementation does not support concurrent garbage collectors.
JIT_get_root_set_from_stack_frame()
and
JIT_unwind_stack_frame()
.
gc_add_root_set_entry()
for each stack
location, which contains pointers to the Java* heap [12].interpreter_enumerate_thread()
is
called.
gc_add_root_set_entry(ManagedObject)
.
Note
The parameter points to the root, not to the object the root points to. This enables the garbage collector to update the root in case it has changed object locations during the collection.
vm_enumerate_root_set_all_threads()
, so that the
garbage collector has all the roots and proceeds to collect objects
no longer in use, possibly moving some of the live objects.
vm_resume_threads_after()
.
The VM core resumes all threads, so that the garbage collector can
proceed with the allocation request that triggered garbage
collection.
Finalization is the process of reclaiming unused system resources after garbage collection. The DRL finalization fully complies with the specification [1]. The VM core and the garbage collector cooperate inside the virtual machine to enable finalizing unreachable objects.
Note
In DRL, the virtual machine tries to follow the reverse finalization order, so that the object created last is the first to be finalized; however, the VM does not guarantee that finalization follows this or any specific order.
As Figure 2 shows, several queues can store references to finalizable objects:
Figure 2. Finalization Framework
The garbage collector uses these queues at different stages of the GC procedure to enumerate the root set and kick off finalization for unreachable objects, as follows.
Object Allocation
During object allocation, the
garbage collector places references to finalizable objects into
the live object queue, as shown in Figure 3. Functions
gc_alloc()
and gc_alloc_fast()
register finalizable objects with the queue.
Figure 3. Allocation of Finalizable Objects
After Mark Scan
After marking all reachable objects, the GC moves the remaining object references to the unmarked objects queue. Figure 4 illustrates this procedure: grey squares stand for marked object references, and white square are the unmarked object references.
Figure 4. Unmarked Objects Queue Usage
Filling in the Finalizable Objects Queue
From the buffering queue, the GC transfers unmarked object
references to the VM queue, as shown in Figure 5. To place a
reference into the queue, the garbage collector calls the
vm_finalize_object()
function for each reference
until the unmarked objects queue is empty.
Figure 5. Finalization Scheduling
Activating the Finalizer Thread
Finally, the GC calls the vm_hint_finalize()
function that wakes up finalizer threads. All finalizer threads
are pure Java* threads, see section 2.7.2 Work Balancing Subsystem.
Each active thread takes one object to finalize and does the
following:
finalize()
function for the object
If the number of active threads is greater than the number of objects, the threads that have nothing to finalize are transferred to the sleep mode, as shown in Figure 6.
Figure 6. Finalizer Threads
The work balancing subsystem dynamically adjusts the number of running finalizer threads to prevent an overflow of the Java heap by finalizable objects. This subsystem operates with two kinds of finalizer threads: permanent and temporary. During normal operation with a limited number of finalizable objects, permanent threads can cover all objects scheduled for finalization. When permanent threads are no longer sufficient, the work balancing subsystem activates temporary finalizer threads as needed.
The work balancing subsystem operates in the following stages:
vm_hint_finalize()
function before performing the
requested allocation. This function is also called after each
garbage collection. The vm_hint_finalize()
function
checks whether any objects remain in the queue of objects to
finalize. If the queue is not empty, this means that the current
quantity of finalizer threads is not enough. In this case, the work
balancing subsystem creates additional temporary finalizer threads.
The number of created temporary threads corresponds to the number of
CPUs.Note
The work balancing subsystem checks whether the finalization queue is empty, but does not take into account the number of objects in the queue.
WBS Internals
Assuming that N is an indefinite optimum number of finalizer threads, you can make the following conclusions:
If N is less or equal to the number of permanent finalizer threads, no temporary threads are created. Otherwise, the number of finalizer threads undergoes the following changes during WBS activity, in the chronological order:
Figure 7 below demonstrates variations in the number of finalizer threads over time.
Figure 7. Variations in Number of Running Finalizer Threads
As a result, the number of running finalizer threads in the current work balancing subsystem can vary between 0 and 2N.
Note
The maximum value for 2N is 256 running finalization threads.
The core virtual machine is the central part of the overall VM design. The VM core consists of common VM blocks defined by the JVM specification [1] and of elements specific for the DRLVM implementation, as follows:
java.lang
package.
The structure of the virtual machine enables building stable interfaces for inter-block communication as well as public VM interfaces. These interfaces inherit platform independence from the VM specification [1]. Figure 8 shows the VM core overall structure and the internal logic of components interaction. For more details on available interfaces, see 3.13 Interfaces.
![]() |
Red font indicates external interfaces. |
The class support component processes classes in accordance with the
JVM specification [1], which includes
class loading, class preparation, resolution, and initialization
operations. This component also contains several groups of functions
that other VM components use to get information on loaded classes and
other class-related data structures. For example, JVMTI functions
RedefineClasses()
and GetLoadedClasses()
use
utility interfaces provided by class support.
The class support component has the following major goals:
.class
files and
.jar
archives from the bootstrap class loader
Class support functions can be divided into the following groups:
java.lang.ClassLoader
class or from the files and
directories listed in the vm.boot.class.path
property.
These functions also bind loaded classes with the defining class
loader and provide information on all loaded classes.
The VM core stores information about every class, field, and method loaded as described below.
The native code support component consists of two parts, execution of native methods used by Java* classes, and an implementation of the Java* Native Interface (JNI) API for native code. Execution of native methods is required by the Java* Language Specification [2] and JNI is required by JNI Specification [5].
The virtual machine calls native methods differently with the JIT and with the interpreter as described below.
JNI optimizations
The VM core generates specialized JNI wrappers to support the transition from managed to native code. The straight-forward implementation of these wrappers calls a function to allocate storage and initialize JNI handles for each reference argument. However, most JNI methods have only a small number of reference parameters. To take advantage of this, an in-line sequence of instructions is used to allocate and initialize the JNI handles directly. This improves the performance of applications that contain multiple JNI calls.
The Java* Native Interface is a set of functions, which enable native code to access Java* classes, objects, methods, and all the functionality available for a regular method of a Java* class.
The JNI implementation mostly consists of wrappers to different components of the virtual machine. For example, class operations are wrappers for the class support component, method calls are wrappers that invoke the JIT or the interpreter, and object fields and arrays are accessed directly by using the known object layout.
The following code is implementation of the
IsAssignableFrom
JNI function, which uses the class
support interface:
#include “vm_utils.h” #include “jni_utils.h” jboolean JNICALL IsAssignableFrom(JNIEnv * UNREF env, jclass clazz1, jclass clazz2) { TRACE2("jni", "IsAssignableFrom called"); assert(tmn_is_suspend_enabled()); Class* clss1 = jclass_to_struct_Class(clazz1); Class* clss2 = jclass_to_struct_Class(clazz2); Boolean isAssignable = class_is_subtype(clss1, clss2); if (isAssignable) { return JNI_TRUE; } else { return JNI_FALSE; } } //IsAssignableFrom
The stack is a set of frames created to store local method information. The stack is also used to transfer parameters to the called method and to get back a value from this method. Each frame in the stack stores information about one method. Each stack corresponds to one thread.
The JIT compiler can combine in-lined methods into one for performance optimization. In this case, all combined methods information is stored in one stack frame.
The VM uses native frames related to native C/C++ code and managed frames for Java* methods compiled by the JIT. Interaction between native methods is platform-specific. To transfer data and control between managed and native frames, the VM uses special managed-to-native frames, or M2nFrames.
Note
In the interpreter mode, the VM creates several native frames instead of one managed frame for a Java* method. These native frames store data for interpreter functions, which interpret the Java* method code step by step.
M2nFrames contain the following:
VM_thread
. The list is terminated with a dummy frame
with zero contents.
Stack walking is the process of going from one frame on the stack to another. Typically, this process is activated during exception throwing and root set enumeration. In DRLVM, stack walking follows different procedures depending on the type of the frame triggering iteration, as described below.
The system identifies whether the thread is in a managed or in a native frame and follows one of the scenarios described below.
Figure 11 below gives an example of a stack structure with M2nFrames and managed frames movement indicated.
![]() |
Figure 9. Stack Walking from a Managed Frame |
![]() |
Figure 10. Stack Walking from a Native Frame |
![]() |
Figure 11. LMF List after the Call to a Native Method |
The main component responsible for stack walking is the stack iterator.
The stack iterator enables moving through a list of native and Java* code frames. The stack iterator performs the following functions:
The stack trace converts stack information obtained from the iterator
and transfers this data to the
org.apache.harmony.vm.VMStack
class.
Note
One frame indicated by the iterator may correspond to more than one line in the stack trace because of method in-lining (see the first note in About the Stack).
The thread management component provides threading functionality inside the virtual machine and the class libraries. The purpose of thread management is to hide platform specifics from the rest of the VM, and to adapt the OS threading functions to the Java* run-time environment. For example, thread management enables root set enumeration by making threads accessible for the garbage collector.
Thread management is used by the following components:
java.lang.Object, java.lang.Thread,
java.util.concurrent.locks.LockSupport
)
Note
The thread management code is currently written by using the restricted set of the Win32 threading API. On Linux*, each Win32 thread management function is replaced with the appropriate adaptor implemented using the POSIX threading API. This provides portability of the thread management code across Windows* and Linux* platforms.
The central part of thread management is the VM_thread
control structure, which holds all the data necessary to describe a
Java* thread within the virtual machine. Instances of
the VM_thread
control structure are referred to as
thread blocks in the code. All thread blocks, that is, all
active Java* threads running inside the VM, are
represented in a linked list. The list is traversed by means of
iterators that the thread_manager.cpp
module provides.
Currently, the number of threads simultaneously running in the system
is limited to 800 threads. Each Java* thread in the
system gets a unique index called thread_index
within the range of 0 to 2048 at creation time. The maximum number of 2048 has been selected to ensure that it does not set any restrictions on underlying system capabilities.
Note
Your system might exceed the approximate maximum number of 800 threads.
At this time, no single API in thread management is responsible for the thread creation. However, the code that creates a thread does the following:
VM_thread
structure with help of the
get_a_thread_block()
function.
VM_thread
structure into the thread local storage.
The newly created thread is responsible for correct completion. The completion code is added to the thread procedure. The thread completion code does the following:
notify_all()
function on the
java.lang.Thread
object associated with the thread.
This call activates any threads that have invoked the
join()
function for this thread.
VM_thread
structure by using the
free_this_thread_block()
function.
One of the key features of thread management in DRLVM is the safe suspension functionality. Safe suspension means that the thread is physically stopped only at certain safe points or at the end of safe regions. This ensures that the suspended thread holds no locks associated with operating system resources, such as the native memory heap. Safe suspension helps to avoid possible deadlocks in the DRL virtual machine, as described below.
At any moment of time, any thread is in one of the following states:
The suspension algorithm typically involves two threads, the suspender thread and the suspendee thread.
Functions Used
A safe region of code in a suspendee thread is marked by functions
tmn_suspend_enable()
and
tmn_suspend_disable()
. Additionally, the code is marked by the
thread_safe_point()
function to denote the point, where
safe suspension is possible. A suspender thread can invoke
thread_suspend_generic()
or
thread_resume_generic()
functions supplying the suspendee
thread block as an argument. The thread_suspend_generic()
function handles safe suspension when called on a thread as shown
below, and the thread_resume_generic()
function instructs
the suspendee thread to wake up and continue execution.
Suspension Algorithm
This section describes the algorithm of safe suspension, as follows:
thread_suspend_generic()
function, which sets a flag
for the suspendee thread indicating a request for suspension.
thread_suspend_generic()
function
immediately returns. The suspendee thread runs until it
reaches the end of the safe region. After that, the thread is
blocked until another thread calls the
thread_resume_generic()
function for it.
thread_suspend_generic()
function is blocked
until the suspendee thread reaches the beginning of a safe
region or a safe point. The thread state then changes, and the mechanism described in
point a) above starts.
thread_safe_point()
function. In case of a suspension request set for this
thread, this function notifies the requesting thread and
waits until another thread calls the
thread_resume_generic()
function for this
thread. In other words, the thread suspends itself at a safe
point upon request.
tmn_suspend_enable()
function. This function
sets the suspend_enabled
state flag to true. In
case a suspension request is set for this thread, the
function notifies the requesting thread that a safe region is
reached.
tmn_suspend_disable()
function. This function
sets the suspend_enabled
state flag to false and
invokes the thread_safe_point()
function.
Monitors are a central part of Java* threads synchronization. Any Java* object can serve as a monitor. In DRLVM, monitor-related information is kept in the head of the Java* object structure. An object header is 32-bit long and has the following structure:
![]() |
Figure 12. Monitor Structure |
Where:
stack_key
stores the index of the Java* thread that acquired the monitor or equals to
FREE_MONITOR
if no index is provided.
recursion_count
counts the number of times that the
thread acquires a monitor.
7 bit hash
is the hash code for the object.
contention bit
equals to one if multiple threads are
contending for acquiring the monitor.
Note
In the current implementation, the contention bit is always set to 1.
Acquiring Monitors
The monitor_enter()
and monitor_exit()
functions manage monitors in accordance with the JNI specification [5].
The monitor_enter()
operation for a specific object is
shown below.
First, stack_key
in the header of the object is examined.
The header can be in one of the three states, which determine further
actions, as follows.
State of stack_key
|
Actions |
Free: stack_key contains FREE_MONITOR
|
The current thread index is stored in stack_key .
Read and update operations for stack_key are
performed atomically to prevent possible race conditions.
|
Occupied by the current thread | The recursion count increases. |
Occupied by another thread |
The current thread does the following in a loop, until the
monitor is successfully acquired:
|
Note
The waiting thread is excluded from the system scheduling and does not get the CPU resources for its execution.
During the monitor_exit()
operation, the current thread
does the following based on the recursion_count
value:
FREE_MONITOR
written into stack_key
indicating that monitor is no longer occupied
mon_enter_array
queue is notified by means
of event_handle_monitor
and enables it to
acquire the monitor, see the monitor_enter()
operation description.
The VM kernel classes link the virtual machine with the Java* class libraries (JCL), and consist of the Java* part and the native part. This section describes the Java* part of the kernel classes, whereas the native part is described in section 3.7 Kernel Class Natives.
Kernel classes are Java* API classes, members of
which use or are used by the virtual machine. Because these classes
have data on the VM internals, the kernel classes are delivered with
the VM. Examples of kernel classes include
java.lang.Object
and
java.lang.reflect.Field
.
The current implementation is based on the Harmony Class Library Porting Documentation [20]. The DRL kernel classes have amendments to the porting documentation, as indicated in section 3.6.2 Implementation Specifics below.
In DRLVM, the kernel classes communicate with the virtual machine
through a Java* interface defining a strict set of
static native methods implemented in the VM. The interface mainly
consists of four package private classes:
java.lang.VMClassRegistry
,
java.lang.VMExecutionEngine
,
java.lang.VMMemoryManager
, and
java.lang.VMThreadManager
, and two public classes
java.lang.Compiler
and
org.apache.harmony.vm.VMStack
.
This section describes the specifics of the kernel classes implementation in DRL.
java.lang.System
class due to its close connection with the VM and the other kernel classes.
java.lang.String
and
java.lang.StringBuffer
classes due to package private dependencies
between them.
java.util.concurent.locks.LockSupport
has been added to the kernel classes set to support J2SE 1.5.0.
java.lang.Class.getStackClasses()
method does not
completely correspond to the Harmony Porting Documentation. This method
adds two frames to the bottom of the resulting array when
stopAtPrivileged
is specified, so that the caller of
the privileged frame is the last included frame.
com.ibm.oti.vm.VM.shutdown()
method is not called upon
VM shutdown.
The kernel class natives component is the part of the 3.6 Kernel Classes serving as a bridge between the Java* part of the kernel classes and other VM components, namely, the garbage collector, class support, stack support, exception handling, and object layout support. The kernel class natives component also makes use of the thread management functionality. The interaction between the kernel classes and VM components is based on specific internal interfaces of the virtual machine.
Note
The current implementation of kernel class natives is based on JNI and
uses JNI functions. As a result, kernel class natives functions are
exported as ordinary native methods from the VM executable as
specified by the JNI specification [5]. For
example, when the VMThreadManager
Java*
class from the kernel classes component defines the method
native static Thread currentThread()
, the kernel class
natives component implements the function
Java_java_lang_VMThreadManager_currentThread()
.
Currently, the kernel class natives component consists of the following:
jobject Java_java_lang_VMThreadManager_currentThread(JNIEnv *jenv, jclass) { return thread_current_thread(); }
For method VMClassRegistry.findLoadedClass(String name,
ClassLoader loader)
, the wrapper checks the loader
parameter and determines further activities. If this parameter
has a non-null value, the corresponding class loader is used for
class lookup. If it is null, the Java*
execution stack is examined in order to obtain context class
loader if any, otherwise the system class loader is used.
java.lang.Throwable
. When a Java* exception object is created, this mechanism
prepares and stores the snapshot of the stack trace; this is
a fast operation.java.lang.StackTraceElement
array) are only
performed when the exception data is actually required, for
example when the printStackTrace()
method of the
Throwable
class is called.
java.lang.reflection
package [6].java.lang.Object
type and the VM internal type
representation required for operations with fields and
methods. This mechanism also communicates with the JIT or the
interpreter to perform the actual execution of methods.
java.lang.Thread
and
java.lang.reflect.Field
, with the corresponding
VM internal data structures. When the classes are loaded, the
VM class support component adds the fields to the classes,
and the kernel class natives component uses the fields to
store the links to the corresponding VM internal data. Note
that these fields are not accessible from Java* code.
print
of the class
VMDebug
allows printing messages to the standard
output before printing can be done through the
java.lang.System.out/err
channel.
The VM services component provides the JIT compiler with functionality requiring close cooperation with the virtual machine. Below is the list of the VM services currently provided for the JIT.
During compilation time, the JIT compiler uses the following services:
dump
services for dumping generated stubs and
methods to the file in a human readable form.
Certain services make a part of class support interface, for example, type management and class resolution. For details, see section 3.2 Class Support.
JIT-compiled code accesses the following groups of services at run time:
Both service types are described below.
Services with M2nFrame
The following services are called in the JNI-like way:
checkcast()
and instanceof()
checks
These services enable suspension in their code and push M2nFrames to the top of the stack that stores an initial stack state for exceptional stack unwinding and local references for root enumeration purposes.
Services without M2nFrame
The following frequently used services are invoked without pushing an M2nFrame on the stack:
These services prevent thread suspension in their code. Most direct
call functions that implement operations or cast types only return a
required result. Storing a reference to an array uses another
convention because it returns NULL
for success, or a
class handler of an exception to throw.
The exceptions interface handles exceptions inside the VM. Exception handling can follow different paths depending on the execution engine mode, as indicated in subsequent sections.
The exceptions interface includes the following function groups:
In DRLVM, two ways of handling exceptions are available: exception throwing and raising exceptions, as described below.
Procedure
When an exception is thrown, the virtual machine tries to find the exception handler provided by the JIT and registered for the specified kind of exception and for the specified code address range. If the handler is available, the VM transfers control to it, otherwise, the VM unwinds the stack and transfers control to the previous native frame.
When to apply
In Java* code, only exception throwing is used, whereas in internal VM native code, raising an exception is also an alternative. Exception throwing is usually faster than raising exceptions because with exception throwing, the VM uses the stack unwinding mechanism.
Procedure
When the VM raises an exception, a flag is set that an exception occurred, and the function exits normally. This approach is similar to the one used in JNI [5].
When to apply
Raising exceptions is used in internal VM functions during JIT compilation of Java* methods, in the interpreter, and in the Java* Native Interface in accordance with the specification [5]. This usage is especially important at start-up when no stack has been formed.
DRLVM provides the following facilities to set the exception handling mode:
exn_throw()
function.
exn_throw_only()
or
exn_raise_only()
functions. This way is faster.
To get the current exception, use exn_get()
, and to check
that whether it is raised or not, use exn_raised()
. The
last function returns not the exception object, but the boolean flag
only. This technique saves VM resources because the system does not
need to create a new copy of the exception object.
Note
Remember that in the interpreter mode, the VM can only raise exceptions, and not throw them.
In DRLVM, the JVMTI support component implements the standard JVMTI interface responsible for debugging and profiling.
The DRLVM implementation of JVMTI mostly consists of wrapper functions, which request service and information from other VM parts, such as the class loader, the JIT, the interpreter, and the thread management functionality.
Another part of JVMTI implementation is written for service purposes, and comprises agent loading and registration, events management, and API extensions support.
The JVMTI support component is responsible for the following groups of operations:
Related Links
According to the JVM specification [1], the verifier is activated during class loading, before the preparation stage, and consequently, before the start of class initialization. Verification of a class consists of the following passes:
Subsequent sections present specifics of verification performed in DRLVM.
The current version of the verifier is optimized to minimize performance impact of the time-consuming bytecode verification. The improved verification procedure is described below:
Stage 1: When checking methods of a class, the verifier scans dependencies on other classes, methods, and fields. The verifier only checks this information if the referenced element is loaded. For unloaded elements handling, see Stage 2.
Stage 2: The verifier generates a list of constraints to be checked during the next stage. Constraints contain information on verification checks that cannot be performed because referenced elements have not been loaded. The verifier stores the list of constraints in the checked class data.
Stage 3: Before class initialization, the verifier goes over the list of previously generated constraints. Provided all exit criteria are met, the verification of the class completes successfully and initialization of the class begins.
The verifier releases the constraints data when the class is unloaded.
For optimization purposes, all verification procedures have been divided into the following groups:
The verifier can perform these checks without constructing the control flow graph.
For these operations, the bytecode verifier analyzes the control and data flow graphs.
Background
The control flow graph (CFG) is a data structure, which is an abstract representation of a procedure or program. Each node in the graph represents a basic block without jumps or jump targets. Directed edges represent jumps in the control flow.
The data flow graph (DFG) is a graph reflecting data dependencies between code instructions of a procedure or program. The data flow graph provides global information about how a procedure or a larger segment of a program manages its data.
Note
In addition, a group of classes is declared as trusted. The verifier skips these classes to minimize performance impact. The group of trusted classes mostly includes system classes.
This layer provides common general-purpose utilities. The main requirements for these utilities include platform independence for DRLVM interfaces, thread and stack unwind safety. The following two main subcomponents constitute the utilities layer:
Note
This section describes VM utilities. For information on the compiler utilities, consult section 4.6 Utilities.
The utilities layer has the following key features:
This interface is responsible for allocating and freeing the memory used by other components. The current implementation provides two types of memory allocation mechanisms:
malloc()
,
free()
, realloc()
system calls
Memory management functionality is concentrated in
port/include/port_malloc.h
and
port/include/tl/memory_pool.h
.
The current logging system is based on the Apache log4cxx
logger adapted to DRLVM needs by adding a C interface and improving
the C++ interface. The port/include/logger.h
header file describes the
pure C programmatic logger interface. The cxxlog.h
header
file in the same directory contains a number of convenience macros improving effectiveness
of the logger for C++ code.
Each logging message has a header that may include its category,
timestamp, location, and other information. Logging messages can be
filtered by category and by the logging level. You can use specific
command-line options to configure the logger and make maximum use of
its capabilities. See help message ij –X
for
details on the logger command-line options.
The VM core exports the following public interfaces:
The VM common interface is exported by the VM core for interaction with the JIT compiler and the garbage collector. This interface includes a large set of getter functions used to query properties of classes, methods, fields, and object data structures required by DRLVM components. Other functions of this interface do the following:
This VM core interface supports just-in-time compilation. Functions of this interface do the following:
invokevirtual
method call. The location of the
VTable and the offset make up the start address of the
method.
For details, see section 3.8.1 Compile-time Services.
checkcast()
, instanceof()
,
monitorenter()
, monitorexit()
, or a GC write barrier operation. For details,
see section 3.8.2 Run-time
Services.
Note
The VM core also exports the GC interface function table to support GC-related operations of the JIT, such as root set enumeration. For details, see section 6.4 Public Interfaces in the GC description.
For a description of functions that the JIT compiler exports to communicate with the VM core, see section 4.7.1 JIT_VM.
The VM_EM
interface
of the VM core supports high-level management of execution engines. Functions of this
interface do the following:
For a description of functions that the execution manager exports to interact with the VM core, see 5.5.1 EM_VM Interface.
The VM core uses this interface to communicate with the garbage
collector. The garbage collector also interacts with the just-in-time
compiler via indirect calls to the VM_GC
interface.
On the VM side, most functions of this interface are used for root set
enumeration and for stopping and resuming all threads. To implement
stop-the-world collections, DRLVM currently provides the functions
vm_enumerate_root_set_all_threads()
and
vm_resume_threads_after()
.
Note
The current implementation does not support concurrent garbage collectors. However, DRLVM has been designed to make implementation of concurrent garbage collectors as convenient as possible.
For details on the VM_GC
interface, see section 6.4 GC Public Interfaces.
This interface enables the interpreter to use the VM core functionality. Functions of the interface do the following:
The C VM interface together with the kernel classes is responsible for interaction between the VM and the class libraries. The C VM interface is used by the native part of the Java* class libraries (JCL) implementation as a gateway to different libraries, such as the port library, the VM local storage, and the zip cache pool. This component is required for the operation of JCL, see the Harmony Class Library Porting documentation [20].
C VM Specifics
The C VM interface has the following limitations:
boot.class.path
property. For details, see section 2.5 Initialization.
The C VM interface is a separate dynamic library. This library is statically linked with the pool, the zip support and the thread support libraries.
Jitrino is the code name for just-in-time (JIT) compilers shipped with DRLVM [3]. Jitrino comprises two distinct JIT compilers: the Jitrino.JET baseline compiler and the optimizing compiler, Jitrino.opt. These two compilers share common source code and are packaged in a single library. This section is mostly devoted to Jitrino.opt compiler. For details on the baseline compiler, see section 4.8 Jitrino.JET.
The optimizing JIT compiler features two intermediate representation (IR) types: platform independent high-level IR (HIR) and platform-dependent low-level IR (LIR). Jitrino incorporates an extensive set of code optimizations for each IR type. This JIT compiler has a distinct internal interface between the front-end operating on HIR and the back-end operating on LIR. This enables easy re-targeting of Jitrino to different processors and preserving all the optimizations done at the HIR level.
Key features of the JIT compiler include:
Jitrino also features:
Jitrino is eminent for its clear and consistent overall architecture and a strong global optimizer, which runs high-level optimizations and deals with single or multiple methods instead of basic blocks.
In the current implementation, certain Jitrino features are implemented only partially or not enabled, namely:
The Jitrino compiler provides a common strongly typed substrate, which helps developers to optimize code distributed for Java* run-time environments and adapt it to different hardware architectures with a lower chance of flaws. The architecture of the compiler is organized to support this flexibility as illustrated in Figure 13. Paths connect the Java* and ECMA Common Language Interface (CLI) front-ends with every architecture-specific back-end, and propagate type information from the original bytecode to the architecture-specific back-ends.
For extensibility purposes, the Jitrino compiler contains language- and architecture-specific parts and language- and architecture-independent parts described in subsequent sections of this document. As a result, supporting a new hardware architecture requires implementation of a new back-end.
To optimize time spent on compilation, the Jitrino compiler can follow a fast or a slow compilation path. In most applications, only a few methods consume the majority of time. Overall performance benefits when Jitrino aggressively optimizes these methods.
The initial compilation stage is to translate methods into machine code by using the baseline compiler Jitrino.JET, the Jitrino express compilation path. This compiler performs a very fast and simple compilation and applies no optimizations. The main Jitrino compilation engine recompiles only hot methods. Jitrino.JET generates instrumentation counters to collect the run-time profile. Later, the execution manager uses this profile to determine recompilation necessity.
The process of compilation in Jitrino.opt follows a single path, as shown in Figure 13 below.
Figure 13. Jitrino Compiler Architecture
The subsequent sections provide an overview of the compiler subcomponents. Section 4.3 Optimizer provides details on the Jitrino high-level optimizations.
The initial compilation step is the translation of bytecode into HIR, which goes in the following phases:
Jitrino HIR, which is the internal representation of a lower level than the bytecode, breaks down complex bytecode operations into several simple instructions to expose more opportunities to later high-level optimization phases. For example, loading an object field is broken up into operations that perform a null check of the object reference, load the base address of the object, compute the address of the field, and load the value at that computed address.
The optimizer includes a set of compiler components independent of the original Java* bytecode and the hardware architecture. The optimizer comprises the high-level intermediate representation, the optimizer, and the architecture-independent part of the code selector. The code selector has a distinct interface level to set off the architecture-dependent part.
Jitrino uses the traditional high-level intermediate representation, where the control flow is represented as a graph consisting of nodes and edges. The compiler also maintains dominator and loop structure information on HIR for use in optimization and code generation. HIR represents:
Explicit modeling of the exception control flow in the control flow graph (CFG) enables the compiler to optimize across throw-catch boundaries. For locally handled exceptions, the compiler can replace expensive throw-catch combinations with cheaper direct branches.
To explain the same in greater detail, each basic block node consists of a list of instructions, and each instruction includes an operator and a set of single static assignment (SSA) operands. The SSA form provides explicit use-def links between operands and their defining instructions, which simplifies and speeds up high-level optimizations. Each HIR instruction and each operand have detailed type information propagated to the back-end at further compilation stages.
The Jitrino compiler uses a single optimization framework for Java* and CLI programs. The optimizer applies a set of classical object-oriented optimizations balancing the effectiveness of optimizations with their compilation time. Every high-level optimization is represented as a separate transformation pass over the HIR. These passes are grouped into four categories:
Note
In the current version, high-level optimizations are disabled by default. You can enable these via the command-line interface.
The optimization passes performed during compilation of a method constitute an optimization path. Each optimization pass has a unique string tag used internally to construct the optimization path represented as a character string. The default optimization path can be overridden on the command-line.
Note
Many optimizations can use dynamic profile information for greater efficiency, such as method and basic block hotness and branch probability. However, dynamic profile information creation is not currently enabled in Jitrino.
The HIR simplification passes are a set of fast optimization passes that the Jitrino optimizer performs several times on the intermediate representation to reduce its size and complexity. Simplification passes improve the code quality and the efficiency of more expensive optimizations. The IR simplification consists of three passes:
new
allocation, the
optimizer omits a run-time check for null references that are
proven non-null [15].
chkzero()
,
chknull()
, and chkcast()
HIR instructions
are redundant if guarded by explicit conditional branches.
Together, the IR simplification passes constitute a single cleanup pass performed at various points in the optimization process.
The high-level optimization begins with a set of transformations to enhance the scope of further optimizations, as follows:
Note
A critical edge is an edge from a node with multiple successors to a node with multiple predecessors.
Note
Loop peeling in combination with high-level value numbering provides a cheap mechanism to hoist loop-invariant computation and run-time checks.
Inliner
The central part of the scope enhancement passes is the inliner, which removes the overhead of a direct call and specializes the called method within the context of its call site. In-lining is an iterative process built around other scope enhancement and IR simplification passes. In-lining goes as follows:
The in-liner halts when the queue is empty or after the IR reaches a certain size limit. When in-lining is completed, the optimizer performs a final IR simplification pass over the entire intermediate representation.
The final set of optimization passes comprises optimizations to eliminate redundant and partially redundant computations and includes loop-invariant code motion and bounds-check elimination [15].
The code selector translates the high-level intermediate representation to a low-level intermediate representation (currently, to the IA-32 representation). The component is designed so that code generators for different architectures can be plugged into the compiler. To be pluggable, a code generator (CG) must implement distinct code selector callback interfaces for each structural entity of a method. During code selection, the selector uses the callback interfaces to translate these entities from HIR to LIR.
Code selection is based on the HIR hierarchical structure of the compiled method illustrated by Figure 14.
![]() |
Figure 14. Code Selector Structure Grey indicates callback interfaces. |
For every non-leaf structural element in Figure 14, the code selector defines:
Each class of the code selector defines a genCode()
function, which takes the callback of this class as an argument. Every
function in a callback receives a code selector class instance
corresponding to the lower-level element of the method hierarchy. This
way, control is transferred between the optimizer part of the code
selector and the CG part.
The Jitrino IA-32 code generator has the following key features:
Constraint
class:CallingConvention
interface to facilitate adapting the
code generator to various extended compilation scenarios, such as
ahead-of-time compilation and code caching.
IRTransformer
class.
IRManager, Inst,
and Opnd
classes.
Note
You can override the default hard-coded descriptions on the command line.
Code Generation Pass
The table below describes the passes of the code generation process used in the IA-32 code generator. The order of the passes in the table mostly corresponds to the default code generation path.
Name | Classes | Description |
selector | ||
Code selector |
MethodCodeSelector, CfgCodeSelector,
InstCodeSelector
|
Builds LIR based on the information from the optimizer. Is based on the common code lowering framework. |
i8l | ||
8-byte instructions lowerer |
I8Lowerer
|
Performs additional lowering of operations on 8-byte integer values into processor instructions. |
bbp | ||
Back-branch polling (insertion of safe points) |
BBPollingTransformer
|
Inserts safe points into loops to ensure that threads can be suspended quickly at GC-safe points. |
gcpoints | ||
GC safe points Info |
GCPointsBaseLiveRangeFixer
|
Performs initial data flow analysis and changes the LIR for proper GC support: creates mapping of interior pointers (pointers to object fields and array elements) to base pointers (pointers to objects) and extends liveness of base pointers when necessary. |
cafl | ||
Complex address form loader |
ComplexAddrFormLoader
|
Translates address computation arithmetic into IA-32 complex address forms. |
early_prop | ||
Early propagation |
EarlyPropagation
|
Performs fast copy and constant propagation. This pass is fast and simple though very conservative. |
native | ||
Translation to native form |
InstructionFormTranslator
|
Performs trivial transformation of LIR instructions from their extended to native form. |
constraints | ||
Constraints resolver |
ConstraintsResolver
|
Checks instruction constraints imposed on operands by instructions and split operands when they cannot be used simultaneously in all the instructions they are inserted in. |
dce | ||
Dead code elimination |
DCE
|
Performs the simple liveness-based one-pass dead code elimination. |
bp_regalloc-GP, bp_regalloc-XMM | ||
Bin-pack register allocator |
RegAlloc2
|
Perform bin-pack global register allocation for general-purpose and XMM registers. |
spillgen | ||
Spill code generator |
SpillGen
|
Acts as the local register and stack allocator and the spill
code generator. Is similar to the constraint resolver but
addresses the constraints caused by dependencies between
operands. For example, this pass is used when an instruction
allows only one memory operand. The pass requires no reserved registers, but tries to find register usage holes or to evict an already assigned operand from a desired register. |
layout | ||
Code layout |
Layouter
|
Performs linearization of the CFG. Uses the topological,
top-down, and bottom-up algorithms. In the current code drop the default code layout algorithm is topological [21]. |
copy | ||
Copy pseudo inst expansion |
CopyExpansion
|
Converts copy pseudo-instructions into real instructions. Currently, the CG describes instructions that copy operands as copy pseudo-instructions and expands them when all operands are assigned to physical locations. This facilitates building other transformations. |
stack | ||
Stack layout |
StackLayouter
|
Creates prolog and epilog code. Uses register usage information, the calling convention used for this method, and the stack depth required for stack local operands. Also initializes persistent stack information to be used in run-time stack unwinding. |
emitter | ||
Code emitter |
CodeEmitter
|
Produces several streams:
|
si_insts | ||
StackInfo inst registrar |
StackInfoInstRegistrar
|
Traverses call instructions to get information on the stack layout on call sites and completes the persistent stack information. |
gcmap | ||
GC map creation |
GCMapCreator
|
Creates the GC map comprising sets of locations with base and interior pointers representing GC root set for each call site. |
info | ||
Creation of method info block |
InfoBlockWriter
|
Serializes persistent stack information and the GC map in the VM as an information block for later use during run-time stack iteration, exception throwing, and GC root set enumeration. |
The utilities used by the JIT compiler major components include:
Note
The JIT compiler utilities are similar to, but not identical with the VM utilities. For example, the JIT compiler and the VM core use different loggers.
The Jitrino logging system facilitates debugging and performance bottleneck detection. The system is organized as a set of log categories structured into two hierarchical category trees for compile-time and run-time logging. This functionality is used in root set enumeration and stack iteration at run time.
A log category corresponds to a particular compiler component, such as the front-end or the code generator, and has a number of levels providing different logging details. Figure 15 illustrates logging categories and levels, namely:
rt
) with two sub-categories
for logging related to garbage collection and call address patching
root
) with
sub-categories for the front-end translator and HIR construction
(fe
), the optimizer (opt
), and the code
generator (cg
)
In the logging system, built-in timers measure the time spent by a particular compiler component or a code transformation pass during compilation.
![]() |
Figure 15. Log Categories Trees |
In DRL, all logging is off by default. You can enable logging for a
category on the command line, use the -Xjit LOG
option
and assign the particular level to it. For example, you can enable
debug-level logging for the optimizer component and IR dumping for the
code generator by using the following option:
-Xjit LOG=\”opt=debug,cg=ir,singlefile\”
On the command line, you can assign logging levels independently to
the categories in different sub-trees. When redirected to the file
system, separate logs for different threads are created unless the
singlefile
mode is specified.
CFG Visualization
Debugging the JIT requires information on the compilation inter-stage modification of the control flow graph for the compiled method, including instructions and operands. For that, the Jitrino compiler enables generation of dot files representing the control flow graph at both IR levels. The text dot files can be converted into descriptive pictures, which represent the CFG graphically. A variety of graph visualization tools are available for dot files conversion.
This section describes the interfaces that the JIT compiler exports to communicate with other components. The Jitrino compiler exposes all necessary interfaces to work as a part of the run-time environment. Jitrino explicitly supports precise moving garbage collectors requiring the JIT to enumerate live references.
Functions inside the JIT_VM
interface can be grouped into
the following categories:
Note
Root set enumeration and stack unwinding are run-time routines called only during execution of compiled code.
Functions in this set are responsible for the primary JIT compiler task of running just-in-time compilation to produce native executable code from a method bytecode. A request to compile a method can come from the VM core or the execution manager.
This set of functions supports the garbage collector by enumerating and reporting live object references. The JIT compiler uses these functions to report locations of object references and interior pointers that are live at a given location in the JIT-compiled code. The object references and interior pointers constitute the root set that the GC uses to traverse all live objects. The interface requires reporting locations of the values rather than the values, to enable a moving garbage collector to update the locations while moving objects.
Note
Unlike reference pointers that always point to the object’s header, interior pointers actually point to a field that is inside the target object. If the JIT reports an Interior Pointer without the Reference Pointer, then the burden is upon the GC to actually reconstruct the Reference Pointer.
For more information, see sections 2.6 Root Set Enumeration and 6. Garbage Collector.
The virtual machine requires support from the compiler to perform stack unwinding, that is, an iteration over the stack from a managed frame to the frame of the caller.
To facilitate stack walking, the JIT stack unwinding interface does the following:
For more information about the stack, see section 3.4 Stack Support.
The set of JIT functions responsible for JVMTI support is exported for interaction with the VM JVMTI component. These functions do the following:
The VM can request the JIT to compile a method and to support generation of specific JVMTI events in compiled code. To facilitate these actions, additional parameters are passed to the bytecode compilation interface.
For a description of functions that the VM core exports to interact with the JIT compiler, see section 3.13 Public Interfaces .
The JIT compiler exports this interface to support the execution manager. Functions of this set are responsible for the following operations:
For a description of the functions that the execution manager exports to interact with the JIT compiler, see section 5.5.2 EM_JIT Interface.
The Jitrino.JET baseline compiler is the Jitrino subcomponent used for translating Java* bytecode into native code with practically no optimizations. The compiler emulates operations of stack-based machine using a combination of the native stack and registers.
Jitrino.JET performs two passes over bytecode, as shown in Figure 16. During the first pass, Jitrino.JET establishes basic block boundaries and generates native code during the second.
Figure 16: Baseline Compilation Path
Subsequent sections provide a description of these passes.
During the first pass over the method’s bytecode, the compiler finds basic block boundaries and counts references for these blocks.
Note
The reference count is the number of ways for reaching a basic block (BB).
To find basic blocks boundaries, Jitrino.JET does a linear scan over the bytecode and analyses instructions, as follows:
athrow
, return
,
goto
, conditional branches
,
switches
, ret
, and jsr
end a
basic block.
During the first pass, the compiler also finds the reference count for each block. Jitrino.JET then uses the reference count during code generation to reduce the number of memory transfers.
ref_count
for the second basic block
(BB2) is equal to 1
because this block can only be
reached from the first basic block (BB1). The other reference count
is equal to 2
, because the third basic block can be
reached as a branch target from BB1 or a fall-through from BB2.
Figure 17: Basic Blocks Reference Count
During the second pass, Jitrino.JET performs the code generation by doing the following:
CALL
and JMP
instructions.
During code generation, Jitrino.JET performs register allocation by using an original technique of vCRC (virtual cyclic register cache), as described below.
As it was mentioned in the introduction, Jitrino.JET simulates stack-based machine operations with the aid of the virtual cyclic register cache (vCRC).
Because all the operations involve only the top of the operand stack, keeping it on the registers significantly reduces the number of memory access operations and improves performance. Even a small amount of registers is significant. For example, the average stack depth for methods executed during the SpecJVM98 benchmark is only 3.
vCRC: Basic Algorithm
This section is an overview of the major idea behind vCRC.
As shown in Figure 18, the position of the item is counted from the bottom of the stack and does not change over time. In contrast to the position, the depth of an item is counted from the top of the stack and is a relative value.
Figure 18: Depth and Position on the Operand Stack
The position can provide the following mapping between a registers array and the operands' positions in the operand stack:
POSITION % NUMBER_OF_REGISTERS => Register#
A simple tracking algorithm detects overflows and loads or unloads registers when necessary.
iconst_4
has no register available,
which causes the first item int32 (1)
to get spilled
to the memory. The register EAX
is now free and can
store the new item.
REGISTERS[3] = {EAX, EBX, ECX}
Figure 19: Saving Operand to Memory, Register Re-used
With this algorithm, the topmost items are always on registers regardless of the number of operands stored on the stack, and the system does not need to access memory to operate with the top of the stack.
vCRC: Virtual Stacks
The basic algorithm cannot be applied directly. For example, on IA-32, storing a floating-point double value of 64 bits on general-purpose registers uselessly occupies 2 registers. Operations with floating-point values on general-purpose registers are non-trivial and mostly useless.
Technically speaking, vCRC allows tracking positions for operands of the different types as if they were in different virtual stacks. Different virtual operand stacks are mapped on different sets of registers, as follows:
Figure 20 provides an example of different operand types on the operand stack for an IA-32 platform.
Figure 20: Virtual Operand Stacks
4.8.4 Java* Method Frame Mimic
During the code generation phase, the state of the method stack frame is mimic:
When code generation for a basic block starts, the reference count determines the actions taken, as indicated in the table below.
Reference Count | Actions |
ref_count > 1
|
Local variables are taken from the memory. Top of the stack is expected to be on the appropriate registers. |
ref_count = 1
|
Information on local variables state and stack state is inherited from the previous basic block. |
ref_count = 0
|
Dead code, must not be here. |
4.8.5 Run-time Support for Generated Code
To support run-time operations, such as stack unwinding, root set enumeration, and mapping between bytecode and native code, a specific structure, the method info block, is prepared and stored for each method during Pass 2.
At run time, special fields are also pre-allocated on the native stack of the method to store GC information, namely the stack depth, stack GC map, and locals GC map.
The GC map shows whether the local variables or the stack slots contain an object. The GC map for local variables is updated on each defining operation with a local slot, as follows:
The GC map for the stack is updated only at GC points, that is before an instruction that may lead to a GC event for example, a VM helpers call. The stack depth and the stack state calculated during method’s compilation get saved before invocation: code is generated to save the state.
The execution manager (EM) is the central part of the DRLVM dynamic optimization subsystem. Dynamic optimization is the process of modifying compilation and execution time parameters in a system at run time. Optimization of compiled code may result to recompilation of managed method code. In this system, the execution manager makes decisions for optimization based on the profiles that are collected by profile collectors. Every profile contains specific optimization data and is associated with the method code compiled by a particular JIT.
The key functions of the execution manager are the following:
The features of the DRL execution manager include the following:
The VM core creates the execution manager before loading an execution engine. Depending on the configuration, the execution manager initializes execution engines and profile collectors.
During JIT compiler instantiation, the execution manager provides the JIT with a name and a run-time JIT handle. The JIT can use this name to distinguish its persistent settings from settings of other execution engines. The compiler can also use the handle to distinguish itself from other JIT compilers at run time.
The EM also configures the JIT to generate a new profile or to use an existing profile via the profile access interface. This interface enables accessing profile collectors and their custom interfaces. Every profile collector uses its properties to check, whether the JIT that generating a profile and the JIT that will use the generated profile are profile-compatible compilers. For example, for the edge profile, the profile collector can check compatibility using the feedback point and IR-level compatibility properties. During its initialization, the JIT compiler accepts or rejects profile collection and usage.
Figure 21. Execution Manager Interfaces
In the figure, several blocks of the same type identify instances of the same component, as in the case with profile collectors and JIT compilers. For details on interfaces displayed in the figure, see section 5.5 EM Public Interfaces.
Recompilation chain is the central entity of the EM recompilation model. This chain can connect multiple profile-compatible JIT compilers into a single recompilation queue. To compile a method for the first time, the execution manager calls the first JIT compiler in the chain. After profiling information about the method is collected, the next JIT in the chain is ready to recompile the method applying more aggressive optimizations. The data from method profile can be used during method recompilation to adjust custom optimizations parameters.
If multiple recompilation chains co-exist at run time, the EM selects the appropriate recompilation chain to initially compile a method. Method filters associated with chains can configure the execution manager to use a specific chain for method compilation. Method filters can identify a method by its name, class name, signature or ordinal compilation number.
Within this model, the execution of a method goes as follows:
Note
A method is hot when a profile associated with it satisfies specific parameters in the PC configuration settings. For example, for an entry and back-edge profile collector, these parameters are the entry and back-edge counters' limits. When a counter value reaches the limit, the method becomes hot.
The profile collector (PC) is the execution manager subcomponent that collects profiles for Java* methods compiled by the JIT or executed by the interpreter. The DRL EM instantiates and configures profile collectors according to the settings of its configuration file.
The profile collector can collect method profiles only for the methods compiled by the same JIT. To collect the same type of profile information for methods compiled by different JIT compilers, the EM uses different PC instances.
After the PC collects a method profile, subsequent JIT compilers in
the recompilation chain can reuse this profile. An execution engine is
allowed to use a method profile in case the configuration file
indicates that this JIT can use the profile. The EM defines the
JIT role, that is, configures the JIT compiler to generate or
to use a specific profile in the file include/open/em.h
using
the following format:
enum EM_JIT_PC_Role { EM_JIT_PROFILE_ROLE_GEN=1, EM_JIT_PROFILE_ROLE_USE=2 };
With this model, instances of the compiler work independently of each
other at run time. The JIT compiler can always use the PC handle to
access the profile data that is assigned to be collected or to be used
by this JIT compiler.
The profile collector does not trigger method recompilation. Instead,
the PC notifies the execution manager that a method profile is ready
according to a configuration passed from the EM during profile
collector initialization. After that, the EM initiates recompilation
of the method, if necessary.
To check readiness of a method profile and to recompile hot methods, the execution manager requires a special thread created by the VM core. This thread must be an ordinary Java* thread, because method compilation may result in execution of JIT-compiled code during class resolution or side-effect analysis.
The execution manager uses the recompilation thread created by the VM after loading all core classes and before executing the main method. The EM configures this thread to call back in a specified period of time. During this callback, the EM can check profiles and run method recompilation as required.
The execution manager interacts with the virtual machine and JIT compilers by using specific interfaces. In addition to these external interfaces, the execution manager uses the internal interface to communicate with profile collectors.
The execution manager exports this interface to provide the VM with method compilation and execution functions. The virtual machine sends requests to the EM to execute a method. For that, the VM passes the method handle and parameters to the execution manager. The EM selects the JIT for compiling the method and runs method compilation and execution.
The execution manager exports this interface to enable JIT compilers
to access method profiles. Via this interface, the JIT can gain access
to a method profile or to the instance of the profile collector
assigned to this JIT during initialization. The major part of
EM_JIT
is the profile access interface. By using
this interface, the JIT compiler can access a custom profiler
interface specific for a profile collectors family and then interacts
directly with a specific profile collector.
The internal EM interfaces handle interaction between the execution manager and the profile collector. Via the time-based sampling support interface (TBS), the EM registers time-based sampling callbacks and configures thresholds of method profiles. The profile collector checks method profiles using this interface or the internal sampling method if the thresholds are reached. When the method profile is ready, the PC reports to the execution manager. The profile collector communicates with the EM via the profile-related events interface.
The garbage collector (GC) component is responsible for allocation and reclamation of Java* objects in the heap. The garbage collector automatically reclaims the objects that cannot be reached and thus cannot influence the program behavior using tracing techniques to identify unreachable objects. The VM can allocate new objects in the space recycled by the GC.
This component interacts with the VM core, the JIT compiler and the interpreter, the thread management functionality, and JIT-compiled code. The GC contacts the VM core to access data on the internal structure of objects, and uses several assumptions about data layout (see section 6.4 Data Layout Assumptions).
When the heap memory is exhausted, the garbage collector instructs the VM core to safely suspend all managed threads, determines the set of root references [13], performs the actual collection, and then resumes the threads.
Note
The root set is the set of all pointers to Java* objects and arrays that are on the Java* thread stacks and in static variables. These pointers are called root references. See [13] for a detailed description of fundamentals of tracing garbage collection.
The garbage collector relies on the VM core to enumerate the root set. The VM core enumerates the global and thread-local references in the run-time data structures. The VM delegates the enumeration of the stack further to the execution engine, the JIT compiler or the interpreter. The GC then determines the set of reachable objects by tracing the reference graph.
To improve efficiency of heap tracing and object allocation, space in the VTable is reserved to cache frequently accessed GC information. In the current design, the GC caches the object field layout as the list of offsets for the fields of reference types in the VTable.
For details on garbage collection, see section 6.3 GC Procedure.
The garbage collector divides the managed heap into 128-KB blocks and stores the mark tables in the first 4 kilobytes of each block. The mark table contains one bit for each possible object location in the block. Because objects are aligned at the boundary of 4 bytes, any 4-bytes aligned address could be the start of an object. That is why, a mark bit is allocated for each 4-byte aligned location in the block.
At the start of garbage collection, the mark tables
are filled with zeroes. During heap trace, the garbage collector identifies
all live objects and inserts value 1
in the mark tables
for these objects only. The mark bit location is computed from the object
address by subtracting the block start address and shifting to the right by 2 bits.
Later, the garbage collector uses the mark
table to reclaim space taken up by unreachable objects. Other GC data,
such as free lists, is also stored in the first page of the block. See
the definition of the block_info
structure in gc/src/gc_header.h
for more details.
Objects greater than 62 KB use special kinds of blocks allocated contiguously to fit the object, single object blocks. The part of the last block, which results from the 128-KB alignment, remains unused.
The collection of blocks used for large block allocation is the large object space (LOS).
In DRLVM, the size of the garbage collected heap can vary depending on system needs. The heap has three size characteristics: the current and the maximum heap size, and the committed size.
Heap Size
This parameter represents the amount of memory the GC uses for Java* object allocation. During normal operation, the GC starts garbage collection when the amount of allocated space reaches the heap size.
You can specify the initial value of heap size using the
-Xms
command-line option.
The default initial size is 64 MB. The GC can decide to change the current size after the garbage collection in the following conditions:
If at least one condition is true, the garbage collector increases the heap to the minimum size that eliminates the condition provided that the new heap size does not exceed the maximum value. Additional space is committed immediately on heap resize to ensure that the required amount of physical memory is available.
Maximum Heap Size
The garbage
collector reserves the virtual memory corresponding to the maximum
size at the VM startup.
You can specify the maximum size using the command-line option
–Xmx
. The default value is 256 MB.
Committed Size
This parameter indicates the physical memory used by the heap. The committed size equals to zero at startup and grows dynamically when the garbage collector allocates new blocks in the heap memory. After reaching the current size, the allocation mechanism triggers garbage collection. The committed size grows by blocks when the GC starts a new memory block from the block store.
On Windows*, the function
VirtualAlloc(…, MEM_COMMIT, …)
performs the
commit operation. On Linux*, this operation is no-op
because the operating system commits the pages automatically on the
first access. The commit-as-you-go behavior ensures the smallest
possible footprint when executing small applications.
The GC interface includes an implicit agreement between the VM core
and the garbage collector regarding the layout of certain data in
memory. The garbage collector makes assumptions on the layout of objects as
described below in terms of the ManagedObject
data type to load the VTable for
an object without calling VM core functions.
obj_info
field to store a forwarding
pointer while performing garbage collection and for other
operations. This field is also used by the synchronization
subsystem. During garbage collection the original content of the
field is saved. Once GC completes, the original content is
restored.
gc_class_prepared()
so that the garbage
collector can obtain information needed from the VM core through
the VM interface and can store the information in the VTable.
gc_thread_init()
function to allow the garbage
collector to initialize this space. The garbage collector typically
stores a pointer to per-thread allocation areas in this space.
On multiprocessor machines, the GC can use the advantage of multiple processors by parallelizing computationally intensive tasks. For that, the garbage collector uses special C worker threads. The GC creates worker threads during initialization sequence, one worker thread for each processor available.
Most of the time, worker threads sleep waiting for the task. The GC controlling thread assigns the tasks for worker threads by sending an event. Once the task is received, the worker threads start their activity. While workers are busy, the GC controlling thread waits for the completion of the task. When all worker threads complete the assigned task, the controlling thread resumes execution and worker threads get suspended waiting for a new task.
In the current implementation, the garbage collector uses the mark-sweep-compact stop-the-world collection mechanism [18], [19]. The sections below provide details on garbage collection in DRLVM at each stage of the process.
Triggering garbage collection
Garbage collection can be triggered by exhaustion of memory
in the current heap or forced by the
System.gc()
method.
Obtaining the GC lock
Each user thread, which determined that the collection needs to
be performed, tries to obtain a GC lock by calling the
vm_gc_lock_enum()
function. Only one thread
succeeds and becomes the GC controlling thread. Other
threads remain blocked at the vm_gc_lock_enum()
function call, that is, remain suspended in
gc_alloc()
until the collection completes. After
the thread manages to get the GC lock, it checks whether another
thread has already performed the collection.
Enumerating the root set
After the controlling thread gets the lock and checks that the collection is still necessary, the
thread resets GC global data by clearing the root set array and
reference lists. Next, the thread calls the
vm_enumerate_root_set_all_threads()
function to
make the core virtual machine enumerate the root heap pointers.
In response, the VM suspends all threads and enumerates the root
set by using the functions gc_add_root_set_entry()
,
gc_add_compressed_root_set_entry()
, and others. The
garbage collector records the root pointers in the root set
array as the enumeration proceeds.
Verification trace
If the collector is built with debugging code enabled, the
verification trace is performed after the full root set
enumeration. This operation verifies that the roots set and heap
are in consistent state, that is, all pointers into Java* heap point to the valid Java*
objects and no dangling pointers exist.
The garbage collector marks traced objects using the object
header bit instead of regular mark tables in order to prevent
interference with regular GC operation. The verification trace
procedure uses the eighth bit
(0x80)
of the object lockword
(obj_info
word) for
marking. At the end of the verification trace, the GC prints the
number of objects found to be strongly reachable during the
trace operation.
During the mark scan stage, the GC identifies all objects reachable from the root set. The GC maintains the mark stack, a stack of objects reached during tracing but not scanned yet. The GC repeatedly removes an object from the mark stack and scans all reference fields of the objects. For each reference field, the GC tries to mark an object that the field points to and, if mark operation succeeds, adds the object pointer to the mark stack. The GC continues scanning objects from the mark stack until it is empty.
Mark scan activity runs in parallel on worker threads, one thread per processor. Each worker thread atomically grabs one root from the root set array and traces all objects reachable from that root by using its own private mark stack. Worker threads mark objects by using the atomic compare and swap operation on the corresponding byte in the mark table. The mark operation failure indicates that another worker thread has marked the object earlier, and is responsible for scanning it.
By the end of the mark scan operation, all strongly reachable objects are marked and the garbage collector deals with collected reference objects and weak roots. Complying to the Java* API specification [6], the GC maintains the strength of weak references by going through soft, weak, and then phantom reference objects in this specific order.
Note
The current implementation treats soft references as weak references, and clears soft referents when they become softly reachable.
To ensure that finalizable objects are not reclaimed before the
finalize()
method is run, the GC uses the
finalizable queue as an additional root set and restarts the
mark scan process to transitively mark objects reachable from
finalizable objects. This increases the number of live objects
during collection. For details on handling finalizable objects
in the DRL virtual machine, see section 2.7 Finalization.
Note
According to the JVM specification [1], objects are only included into the finalizable queue during object allocation and not after they are scheduled for finalization.
The finalize()
method is not run during garbage
collection. Instead, the GC puts finalizable objects to a separate
queue for later finalization.
Reclamation of unmarked objects
Once all the references and finalizable objects have been visited, the GC has a list of all reachable objects. In other words, any object that remains unmarked at this stage can safely be reclaimed.
Reclamation of unmarked objects can be performed in two ways: compaction and sweep. Compaction is preferable because it improves the object locality and keeps fragmentation at a low level. However, compaction does not cover the objects declared pinned during enumeration, and large objects because of high costs of copying.
Compaction is performed block-wise in several stages. All
blocks that contain pinned objects and blocks with large objects
are excluded, see the files include/open/gc.h
and include/open/vm_gc.h
for more information. During compaction, the GC does the following:
obj_info
word.obj_info
are saved. The GC
sets the new locations by sliding live objects to the lower
addresses in order to maintain allocation ordering.
Note
The pointer to the new object location, which is written in the old object copy, is the forwarding pointer.
Sweep is performed on blocks that were not compacted and objects in the large object space. The GC scans the array of mark bits to find sufficiently long zero sequences. The garbage collector ignores zero sequences that represent less than 2 KB of the Java* heap. Longer free areas are linked to the free list structure and used for object allocation. The GC does not perform the sweep operation during the stop-the-world pause. Instead, the GC sweeps the block just before using it for allocating new objects.
Final stage of garbage collection
At this stage, the object reclamation is complete. In the
debugging mode, the GC performs the second verification trace
and can optionally print the number of live objects. The GC
provides the virtual machine with the list of objects scheduled
for the finalize()
method run and references
scheduled for an enqueue()
method run. Finally, the
GC commands the VM to resume the user threads and finishes the
collection.
Iteration
In case garbage collection was triggered by allocation, the
garbage collector retries to allocate the object. If the
allocation fails a second time, the garbage collector repeats
the GC procedure. In case the allocation fails for two
iterations, an OutOfMemoryError
condition is
reported. For details on allocation techniques, see
section 6.3 Object Allocation.
The DRL garbage collector provides several object allocation mechanisms to improve performance. Below is the description of ordinary and optimized object allocation procedures.
The gc_alloc()
and gc_alloc_fast()
functions
are the key functions in object allocation. The
gc_alloc()
function may trigger garbage collection to
satisfy allocation, and the gc_alloc_fast()
function is
not allowed to do so. The garbage collector also provides other
functions that handle specific types of object allocation or optimize
the allocation procedure, as follows:
gc_alloc()
function follows the ordinary allocation
path with no optimizations. This function returns the pointer to
the allocated object, or NULL
if the object cannot be
allocated even after garbage collection. The caller of the
gc_alloc()
function throws an OutOfMemoryError
condition in
case of NULL
return value. The potential necessity to
enumerate the root set or to throw an exception requires the VM to
push an M2nFrame before calling gc_alloc()
.
gc_alloc_fast()
function is called when the expensive overhead
of an M2nFrame is not required. This is
the common case because typically a request for new space is
granted. Only if the function returns NULL
, the VM
pushes the M2nFrame and calls the slower gc_alloc()
.
gc_supports_frontier_allocation()
function handles
frontier allocation, the fastest allocation method. In-lined
allocation fast path optimizes allocation, but, unlike
gc_alloc()
and gc_alloc_fast()
, this
function cannot handle finalizable objects correctly. A special
trick with overloading object size must be used.
Note
Currently, Jitrino does not use frontier allocation.
gc_alloc_pinned()
allocates permanently pinned
objects.
gc_pinned_malloc_noclass()
function provides pinned
allocation services at VM initialization stage.
Note
The VM does not currently use functions
gc_alloc_pinned()
and
gc_pinned_malloc_noclass()
.
Allocation Procedure
The gc_alloc()
and gc_alloc_fast()
functions
do the following in the specified order:
Note
An allocation area is a contiguous section of free space suitable for bump (frontier) allocation. The GC recreates allocation areas in the blocks after each garbage collection. Allocation areas range from 2 KB to 124 KB in size. Areas of size less than 2 KB are ignored to prevent degradation of allocation performance. The maximum allocation area size is determined by the block size.
Note
A chunk is a linked list of blocks. After the garbage collection all memory chunks are freed. To avoid race conditions between multiple Java* threads, chunks are removed from the chunk queue atomically. These chunks can then be used by the owning thread without further synchronization.
gc_alloc()
and gc_alloc_fast()
functions
follow different procedures:
gc_alloc()
function attempts to grab new
chunks from the master chunk list. If this fails, the thread
tries to take the GC lock, collect garbage, and restart the
allocation procedure. After two garbage collections fail, the
gc_alloc()
function returns NULL
,
which produces an out of memory error.
gc_alloc_fast()
function returns
NULL
immediately after allocation from current
chunk fails. The function never tries to use another memory
chunk.
To start garbage collection from gc_alloc()
, the VM needs
to enumerate the root set for the thread that called
gc_alloc()
, and this requires pushing an M2nFrame on the
stack. Because few allocations fail and trigger garbage collection,
the effort of pushing an M2nFrame is often wasted. To avoid this, the
thread invokes the gc_alloc_fast()
function without an
M2nFrame and cannot start garbage collection.
This section lists the interfaces the garbage collector exports for communication with other components.
The garbage collector provides a number of interface functions to support DRLVM activity at various stages, as follows:
Handshaking at DRLVM startup
The GC exposes several groups of functions that support different startup operations, as described below.
gc_requires_barriers()
to determine whether the
garbage collector requires write barriers. The current GC
implementation does not require write barriers.
Write barriers can be used
for generational incremental collection, and concurrent garbage
collection techniques to track the root sets of portions of the
heap that are targeted for collection. In case the garbage
collector requires write barriers, the JIT generates calls to
the GC function gc_write_barrier()
after the
references are stored into an object field [13].
gc_supports_compressed_references()
to find out,
whether the GC compresses reference fields and VTable pointers. If
the configuration of the VM core or the JIT differs from the GC
configuration, for example, GC compresses references, while VM and
the JIT do not, an error message is printed and the VM process is
terminated. For details on compression techniques, see section 2.4.2 Compressed References
gc_supports_frontier_allocation()
function to find
out, whether the GC supports fast thread-local allocation. Fast
thread-local allocation, the same as bump allocation or
frontier allocation, increases the current pointer and
compares it to the ceiling pointer. If the GC supports fast
allocation, the function returns the offsets of current and ceiling
pointers in GC thread-local data.
Initializing the Garbage Collector
The VM core calls the gc_init()
function to initialize
the garbage collector. In this function, the GC does the following:
vm_get_ property()
function to query configuration
options that may have been specified on the command line. For
example, the heap size parameters are passed to the GC as values of
the properties gc.ms
and gc.mx
for
initial and maximum heap size parameters respectively.
After these actions, the garbage collector returns from function
gc_init()
, and the VM initialization sequence continues.
After the call to gc_init()
, the VM starts allocating
Java* objects, but garbage collection is disabled
until the end of the VM initialization procedure.
Finally, the virtual machine calls the
gc_vm_initialized()
function to inform the garbage
collector that the VM core is sufficiently initialized to enumerate
roots, and that garbage collection is permitted. The VM can allocate a
moderate amount of objects between the calls to gc_init()
and gc_vm_initialized()
.
Class Loading and Creating a Thread
The VM core calls the functions gc_class_prepared()
and
gc_thread_init()
. The function
gc_thread_init()
assigns a private allocation to a
thread. The function gc_class_prepared()
caches the field
layout information in the VTable structure.
Managed Object Allocation
To allocate a new object, JIT-compiled code or native methods call
the gc_alloc()
or gc_alloc_fast()
functions
in the GC interface. If the heap space is exhausted, the garbage
collector stops all managed threads and performs the garbage
collection. To allocate an object, the GC can use one of several
available functions, as described in section 6.3 Object Allocation.
Root Set Enumeration
The gc_add_root_set_entry()
function and several similar
functions support this operation on the GC side.
Virtual Machine Shutdown
The VM core calls the gc_wrapup()
function to tell the GC
that the managed heap is no longer required. In response, the GC frees
the heap and other auxiliary data structures.
Other interface groups include functions for forcing garbage collection and querying information about available memory and details of the garbage collection operation.
The VM implements the several groups of functions to support GC operation as described below. The functions of this interface are grouped by the period of operation, during which they are activated.
Collection Time
The VM core exposes the following functions to support garbage collection:
vm_gc_lock_enum()
and vm_gc_unlock_enum()
provide global GC locking services. The global GC lock determines
the thread to run the garbage collection. The VM also protects
thread structure modifications by using this lock, because the
threads blocked on the global GC lock are considered to be suspended safely for the purpose of garbage
collection.
vm_enumerate_root_set_all_threads()
suspends all
Java* threads and enumerates their root set, which
is used at the beginning of the stop-the-world phase of garbage
collection.
vm_resume_threads_after()
performs the opposite
operation: resumes the suspended threads and ends the
stop-the-world phase.
vm_classloader_iterate_objects()
notifies the VM that
the GC is ready to iterate the live objects during the
stop-the-world phase.
vm_hint_finalize()
informs the VM core that objects
awaiting finalization may have been found during the garbage
collection.vm_hint_finalize()
function wakes up finalizer threads. The GC can also call
this function during normal operation in case the number of
finalizable objects is rapidly growing. The finalization subsystem
performs load balancing by increasing the number of finalizer
threads in case the size of finalization queue exceeds a pre-set
threshold. The VM provides the function
is_it_finalize_thread()
to let the GC differentiate
its service for different threads.
get_global_safepoint_status()
indicates whether
garbage collection is in progress.
Finalization and Weak Reference Handling
The GC handles weak reference objects differently from regular
objects. When preparing GC data about a class in
gc_class_prepared()
, the GC uses the function
class_is_reference()
to find out, whether the objects of
this class require special handling. The GC calls
class_get_referent_offset()
to get the offset of the
referent field with weak reference properties.
During the garbage collection, the GC finds the objects that need to
be finalized and resets weak references. The GC does not execute
Java* code, but transfers the set of finalizable
objects and reference objects to the VM by using the functions
vm_finalize_object()
and
vm_enqueue_reference()
.
Handshaking at VM Startup
At this stage, the GC uses the following functions exposed by the VM core:
vm_number_of_gc_bytes_in_vtable()
returning the number
of bytes that the VM reserves for GC use at the beginning of each
VTable structure.
vm_number_of_gc_bytes_in_thread_local()
returning the
number of bytes that the VM reserves in the thread structure.vm_get_gc_thread_local()
.
The interpreter component executes Java* bytecode and is used in the VM interchangeably with the JIT compiler. The interpreter does the following:
The interpreter supports the following platforms: Windows* / IA-32, Linux* / IA-32, Linux* / Itanium® processor family and Linux* / Intel® EM64T.
Note
The DRL interpreter works on Intel® EM64T, but no JIT compiler is currently available for this platform.
Currently, the interpreter is closely tied to the VM core, see section 7.2.2 for details.
The interpreter differs from the JIT compiler. The JIT translates byte code into native code and executes the produced native code. The interpreter reads the original bytecode and executes a short sequence of corresponding C/C++ code. Interpretation is simpler, but substantially slower than executing JIT-compiled code.
The interpreter is a C/C++ module and its calling conventions correspond to the JNI specification conventions.
Note
In native code, it is impossible to call a function with the number of arguments unknown at compile time. That is why, arguments of Java* methods are passed as pointers to arrays of arguments.
The interpreter has its own Java* stack frame format. Each Java* frame contains the following fields:
The Java* frame is allocated on the C stack by using
the alloca()
function.
The interpreter consists of the following major components:
This section lists the file groups responsible for major functions of
the interpreter located in the interpreter
folder.
src – Interpreter source files location. | interpreter.cpp – Major interpreter functionality, bytecode handlers. interpreter_ti.cpp – Support for JVMTI in VM. interp_stack_trace.cpp – Stack trace retrieval and thread root set enumeration for GC. interp_vm_helpers.cpp – Wrappers around VM core functions that enable the interpreter to use them. invokeJNI_*.asm – Platform-specific execution of JNI methods. Native function call constructed from the JNI function pointer and an array of arguments. interp_native_*.cpp – Platform-specific execution of JNI methods. Definition of the InvokeJNI() function for the Windows* / IA-32 platform. interp_exports.cpp – Exporting interpreter interface functions to the virtual machine via the functions table.
The interpreter has a tightly bound and complicated interface for communication with the VM core. The interpreter is dynamically linked with the VM core and uses its internal interfaces. At the same time, the interpreter exports its enumeration, stack trace generation and JVMTI support functions via a single method table. These make up the Interpreter interface.
VM support for execution of JNI methods relies on stub generation. This does not suit the interpreter because dynamic code in the stubs is hardly fit for debugging. In the current implementation, the interpreter is mainly aimed at VM code debugging and uses its own code for handling JNI methods.
The DRL interpreter provides functions for each type of JNI functions: static, virtual, executed from an interpreted frame or from interpreter invocation code. The interpreter executes JNI methods by performing the following actions:
find_native_method()
call.
invokeJNI
stub, which
converts the native code address and arguments array or arrays into
a function call.
The DRL interpreter assists the virtual machine in supporting JVMTI. The interpreter enables stack walking, stack frame examination, method entry and exit events, breakpoints, single step and PopFrame functions.
This component provides unified interfaces to low-level system routines across different platforms. The porting layer mainly covers the following functional areas:
Note
For most components, high-level threading provided by the thread management interface suffices.
To maximize benefits of the porting layer, other components interact with the underlying operating system and hardware via this component. Currently, most DRLVM code uses the Apache Portable Runtime library as a base porting library, though certain parts are still not completely ported to APR, and access the operating system directly. The DRL porting library also includes about 20 additional functions and macros, designed as potential extensions to APR. These additions mostly relate to querying system information and virtual memory management.
The component manager is a subcomponent of the porting layer responsible for loading and subsequent initialization of VM components.
During the loading stage, the component manager queries the default interface from each loading component, and then makes this information available at the initialization stage via interface queries. The component manager also enables instance creation for interfaces. Currently, only the execution manager uses the component manager loading scheme.
The porting library is statically linked to the VM core component and exports its interfaces through this component.
Note
This implies that APR objects are compiled as exported but packaged as a static library (not linked as a self-contained dynamic shared library).
Other components may directly include porting library headers (APR or additional ones), and dynamically link with the VM core.
The class libraries complement the DRLVM to provide a full J2SE*-compliant run-time environment. The class libraries contain all classes defined by J2SE* specification [6] except for the set of kernel classes.
The DRL class libraries satisfy the following requirements:
Note
DRLVM does not require the full J2SE* API set in order to be functional.
At startup, DRLVM preloads approximately 20 classes, including the
kernel classes. The minimal subset for the VM startup is defined by
the dependencies of the preloaded classes, which can vary in different
implementations. You can get the exact list from DRLVM sources, mainly
from vmcore\src\init\vm_init.cpp
file.
The class libraries interact with the VM through the following interfaces:
Note
The current implementation of VM accessors is built on top of JNI. Future implementations may utilize the VM-specific Fast (Raw) Native Interface or any intrinsic mechanism in order to achieve the better performance.
In DRL, the class libraries are packaged according to the following structure on Windows* and Linux*:
ij -java.home | +-bin - java.library.path | +-lib - vm.boot.class.path
The class libraries are packaged as .jar
or
.zip
files and stored in the \lib
directory.
Each .jar
file contains the classes that belong to a
specific functional area [9]. By default, the
VM boot class path points to the location of .jar
and
.zip
archives, as listed above. You can set an alternate
location of the boot class path on the command line by using the
–Xbootclasspath
command-line option.
Native libraries, .dll
files on Windows*
and .so
files on Linux* used by the
class libraries are placed in the \bin
directory. You can
set an alternate location for the native libraries on the command line
by using the java.library.path
property.
The class libraries would typically use the java.home
property in order to determine the location of the necessary
resources, for example, the location of the
java.security.policy
file. By default, the
java.home
property is initialized to the parent directory
of the ij
executable, which is the \ij
directory, as shown above. You can set an alternate value for the
java.home
property on the command line.
In DRLVM, safety requirements and dynamic class loading affect the applicability and effectiveness of traditional compiler optimizations, such as null-check elimination or array-bounds checking. To improve performance, DRLVM applies inter-component optimizations to reduce or eliminate these safety overheads, and to ensure effective operation in the presence of dynamic loading.
Inter-component optimizations include various optimization techniques supported by more than one component in DRLVM, as described in the subsequent sections.
Java* programs widely use inheritance. The VM needs
to check whether an object is an instance of a specific super type
thousand of times per second. These type tests are the result of
explicit checks in application code (for example, the Java* checkcast
bytecode), as well as implicit
checks during array stores (for example, Java*
aastore
bytecode). The array store checks verify that the
types of objects being stored into arrays are compatible with the
element types of the arrays. Although functions
checkcast()
, instanceof()
, and
aastore()
take up at most a couple of percent of the
execution time for Java* benchmarks, that is enough
to justify some degree of in-lining. The VM core provides the VM_JIT
interface to allow JIT compilers to
perform a faster, in-lined type check under certain commonly used
conditions.
In DRLVM, Java* virtual functions are called
indirectly by using a pointer from a VTable even when the target
method is precisely known. This is done because a method may not have
been compiled yet, or it may be recompiled in the future. By using an
indirect call, the JIT-compiled code for a method can easily be
changed after the method is first compiled, or after it is
recompiled.
Because indirect calls may require additional instructions (at least
on the Itanium® processor family), and may put additional pressure
on the branch predictor, converting them into direct calls is
important. For direct-call conversion, the VM core includes a callback
mechanism to enable the JIT compiler to patch direct calls when the
targets change due to compilation or recompilation. When the JIT
produces a direct call to a method, it calls a function to inform the
VM core. If the target method is compiled, the VM core calls back into
the JIT to patch and redirect the call.
Constant-string instantiation is common in Java*
applications, and DRLVM, loads constant strings at run time in a
single load, as is with static fields. To use this optimization,
Jitrino calls the class loader interface function
class_get_const_string_intern_addr()
at compile time.
This function interns the string and returns the address of a location
pointing to the interned string. Note that the VM core reports this
location as part of the root set during garbage collection.
Because string objects are created at compile time regardless of the
control paths actually executed, the optimization applied blindly to
all JIT-compiled code, might result in allocation of a significant
number of unnecessary string objects. To avoid this, apply the
heuristic method of not using fast strings in exception handlers.
Certain applications make extensive use of exceptions for control flow. Often, however, the exception object is not used in the exception handler. In such cases, the time spent on creating the exception object and creating and recording the stack trace in the exception object is wasted. The lazy exceptions optimization enables the JIT compiler and the VM core to cooperate on eliminating the creation of exception objects with an ordinary constructor in case these objects are not used later on.
To implement lazy exceptions, the JIT compiler finds the exception
objects that are used only in the throw
statements in the
compiled method. The JIT compiler analyzes the constructors of these
objects for possible side effects. If the constructor has no side
effects, the JIT removes the exception object construction
instructions and substitutes a throw
statement with a
call to a run-time function that performs the lazy exception throwing
operation. During execution of the new function, the VM core unwinds
the stack to find the matching handler, and does one of the following
depending on the exception object state:
The lazy exceptions technique significantly improves performance. For more information on exceptions in DRLVM, see section 3.9 Exception Handling.
This section lists the external references to various sources used in DRLVM documentation, and to standards applied to DRLVM implementation.
[1] Java* Virtual Machine Specification, http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html
[2]Java* Language Specification, Third Edition, http://java.sun.com/docs/books/jls/
[3]JIT Compiler Interface Specification, Sun Microsystems, http://java.sun.com/docs/jit_interface.html
[4]JVM Tool Interface Specification, http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html
[5] Java* Native Interface Specification, http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/jniTOC.html
[6]Java* API Specification, http://java.sun.com/j2se/1.5.0/docs/api
[7] Java* Invocation API Specification, http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/invocation.html
[8]Creating a Debugging and Profiling Agent with JVMTI tutorial, http://java.sun.com/developer/technicalArticles/Programming/jvmti/index.html
[9] Apache Harmony project, http://incubator.apache.org/harmony/.
[10] IA-32 Intel Architecture Software Developer's Manual, Intel Corp., http://www.intel.com/design
[11] Ali-Reza Adl-Tabatabai, Jay Bharadwaj, Michal Cierniak, Marsha Eng, Jesse Fang, Brian T. Lewis, Brian R. Murphy, and James M. Stichnoth, Improving 64-Bit Java* IPF Performance by Compressing Heap References, Proceedings of the International Symposium on Code Generation and Optimization (CGO’04), 2004, http://www.cgo.org/cgo2004/
[12] Stichnoth, J.M., Lueh, G.-Y. and Cierniak, M., Support for Garbage Collection at Every Instruction in a Java* Compiler, ACM Conference on Programming Language Design and Implementation, Atlanta, Georgia, 1999, http://www.cs.rutgers.edu/pldi99/
[13] Wilson, P.R., Uniprocessor Garbage Collection Techniques, in revision (accepted for ACM Computing Surveys). ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps
[14] Apache Portable Runtime library, http://apr.apache.org/
[15] S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, CA, 1997.
[16] P. Briggs, K.D., Cooper and L.T. Simpson, Value Numbering. Software-Practice and Experience, vol. 27(6), June 1997, http://www.informatik.uni-trier.de/~ley/db/journals/spe/spe27.html
[17] R. Bodik, R. Gupta, and V. Sarkar, ABCD: Eliminating Array-Bounds Checks on Demand, in proceedings of the SIGPLAN ’00 Conference on Program Language Design and Implementation, Vancouver, Canada, June 2000, http://research.microsoft.com/~larus/pldi2000/pldi2000.htm
[18] Paul R. Wilson, Uniprocessor garbage collection techniques, Yves Bekkers and Jacques Cohen (eds.), Memory Management - International Workshop IWMM 92, St. Malo, France, September 1992, proceedings published as Springer-Verlag Lecture Notes in Computer Science no. 637.
[19] Bill Venners, Inside Java 2 Virtual Machine, http://www.artima.com/insidejvm/ed2/
[20] Harmony Class Library Porting Documentation, http://svn.apache.org/viewcvs.cgi/*checkout*/incubator/harmony/enhanced/classlib/trunk/doc/vm_doc/html/index.html?content-type=text%2Fplain
[21]Karl Pettis, Robert C. Hansen, Profile Guided Code Positioning, http://www.informatik.uni-trier.de/~ley/db/conf/pldi/pldi90.html
(C) Copyright 2005 Intel Corporation
* Other brands and names are the property of their respective owners.