Main Page | Namespace List | Alphabetical List | Data Structures | Directories | File List | Data Fields | Globals | Related Pages

utf.h File Reference


Detailed Description

Manipulate UTF-8 CONSTANT_Utf8_info character strings.

There are three character string types in this program: null-terminated (rchar) strings ala 'C' language, UTF-8 (CONSTANT_Utf8_info) strings, and Unicode (jchar)[] strings.

Control

$URL: https://svn.apache.org/path/name/utf.h $ $Id: utf.h 0 09/28/2005 dlydick $

Copyright 2005 The Apache Software Foundation or its licensors, as applicable.

Licensed under the Apache License, Version 2.0 ("the License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and limitations under the License.

Version:
$LastChangedRevision: 0 $
Date:
$LastChangedDate: 09/28/2005 $
Author:
$LastChangedBy: dlydick $ Original code contributed by Daniel Lydick on 09/28/2005.

Reference

Definition in file utf.h.

Go to the source code of this file.

Functions

 ARCH_COPYRIGHT_APACHE (utf, h,"$URL: https://svn.apache.org/path/name/utf.h $ $Id: utf.h 0 09/28/2005 dlydick $")
jbyte utf_classname_strcmp (CONSTANT_Utf8_info *s1, ClassFile *pcfs2, jvm_constant_pool_index cpidx2)
 Compare a UTF string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool.
jvm_array_dim utf_get_utf_arraydims (CONSTANT_Utf8_info *inbfr)
 Report the number of array dimensions prefixing a Java type string.
jbyte utf_pcfs_strcmp (CONSTANT_Utf8_info *s1, ClassFile *pcfs, jvm_constant_pool_index cpidx2)
 Compare contents of UTF string to contents of a UTF string from a class file structure.
jbyte utf_prchar_classname_strcmp (rchar *s1, ClassFile *pcfs, jvm_constant_pool_index cpidx2)
 Compare a null-terminated string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool.
jbyte utf_prchar_pcfs_strcmp (rchar *s1, ClassFile *pcfs, jvm_constant_pool_index cpidx2)
 Compare contents of null-terminated string to contents of a UTF string from a class file structure.
rcharutf_utf2prchar (CONSTANT_Utf8_info *src)
 Convert a UTF string from a (CONSTANT_Utf8_info *) into a null-terminated string by allocating heap and copying the UTF data.
rcharutf_utf2prchar_classname (CONSTANT_Utf8_info *src)
 Convert and an un-formatted class name UTF string (of the type ClassName and not of type [[[LClassName) from a (CONSTANT_Utf8_info *) into a null-terminated string with Java class formatting items. Result is delivered in a heap-allocated buffer. When done with result, perform HEAP_FREE_DATA(result) to return that buffer to the heap.
jshort utf_utf2unicode (CONSTANT_Utf8_info *utf_inbfr, jchar *outbfr)
 Convert UTF8 buffer into Unicode buffer.
cp_info_duputf_utf2utf_unformatted_classname (cp_info_dup *inbfr)
 Strip a UTF string of any class formatting it contains and return result in a heap-allocated buffer.
rboolean utf_utf_isarray (CONSTANT_Utf8_info *inbfr)
rboolean utf_utf_isclassformatted (CONSTANT_Utf8_info *src)
 Verify if a UTF string contains class formatting or not.
jbyte utf_utf_strcmp (CONSTANT_Utf8_info *s1, CONSTANT_Utf8_info *s2)
 Compare two UTF strings from constant_pool, s1 minus s2.


Function Documentation

ARCH_COPYRIGHT_APACHE utf  ,
,
"$URL: https://svn.apache.org/path/name/utf.h $ $Id: utf.h 0 09/28/2005 dlydick $" 
 

jshort utf_utf2unicode CONSTANT_Utf8_info utf_inbfr,
jchar outbfr
 

Convert UTF8 buffer into Unicode buffer.

Parameters:
[in] utf_inbfr UTF string structure
[out] outbfr Buffer for resulting Unicode character string
Returns:
Two returns, one a buffer, the other a count:
*outbfr Unicode version of utf_inbfr string in outbfr

charcnvcount (Return value of function) Number of Unicode characters in outbfr. This will only be the same as length when ALL UTF characters are ASCII. It will otherwise be less than that.

SPEC AMBIGUITY: In case of invalid characters, a Unicode ? is inserted and processing continues. In this way, the result string will still be invalid, but at least it will be proper Unicode. This may prove more than is necessary, but the spec says nothing at all about this matter. Since the NUL character may not appear in UTF-8, if a buffer is terminated by a NUL in the first utf_inbfr->length bytes, termination will be assumed. If a UTF8_FORBIDDEN_xxx character is read, it is converted to a Unicode ? also.

< Looks suspiciously like ASCII NUL

< '\u007f', UTF-8 representation

! Top 4 bits are '1110'

< Top 3 bits are '110'

< Bottom 5 bits contain data bits 10-6

< Move first byte up to bits 10-6

< Looks suspiciously like ASCII NUL

< Top 2 bits are '10'

< Bottom 6 bits contain data bits 0-5

< Bottom 6 bits contain data bits 0-5

! Top 4 bits are '1110'

! Bottom 5 bits contain data bits 15-12

! Bottom 5 bits contain data bits 15-12

! Move first byte up to bits 15-12

< Looks suspiciously like ASCII NUL

! Top 2 bits are '10'

! Bottom 6 bits contain data bits 11-6

! Bottom 6 bits contain data bits 11-6

! Move second byte up to bits 10-6

< Looks suspiciously like ASCII NUL

! Top 2 bits are '10'

! Bottom 6 bits contain data bits 5-0

! Bottom 6 bits contain data bits 5-0

Definition at line 116 of file utf.c.

References CONSTANT_Utf8_info::bytes, MAP_INVALID_UTF8_TO_QUESTION_MARK, RETURN_IF_NUL_BYTE, UTF8_DOUBLE_FIRST_MASK0, UTF8_DOUBLE_FIRST_SHIFT, UTF8_DOUBLE_FIRST_VAL, UTF8_DOUBLE_SECOND_MASK0, UTF8_DOUBLE_SECOND_VAL, UTF8_SINGLE_MAX, UTF8_TRIPLE_FIRST_MASK0, UTF8_TRIPLE_FIRST_SHIFT, UTF8_TRIPLE_FIRST_VAL, UTF8_TRIPLE_SECOND_MASK0, UTF8_TRIPLE_SECOND_SHIFT, UTF8_TRIPLE_SECOND_VAL, UTF8_TRIPLE_THIRD_MASK0, and UTF8_TRIPLE_THIRD_VAL.

rchar* utf_utf2prchar CONSTANT_Utf8_info src  ) 
 

Convert a UTF string from a (CONSTANT_Utf8_info *) into a null-terminated string by allocating heap and copying the UTF data.

When done with result, perform HEAP_FREE_DATA(result).

Parameters:
src Pointer to UTF string, most likely from constant pool
Returns:
Null-terminated string in heap or rnull if heap alloc error.

Definition at line 259 of file utf.c.

Referenced by class_load_primative(), opcode_run(), and utf_isarray().

rchar* utf_utf2prchar_classname CONSTANT_Utf8_info src  ) 
 

Convert and an un-formatted class name UTF string (of the type ClassName and not of type [[[LClassName) from a (CONSTANT_Utf8_info *) into a null-terminated string with Java class formatting items. Result is delivered in a heap-allocated buffer. When done with result, perform HEAP_FREE_DATA(result) to return that buffer to the heap.

This function will work on formatted class names [[[LClassName; and the difference is benign, but that is not its purpose.

Parameters:
src Pointer to UTF string, most likely from constant pool
Returns:
Null-terminated string LClasSName; in heap or rnull if heap alloc error.
< an instance of class '/class/name'

< terminator for instance of class

Definition at line 687 of file utf.c.

jbyte utf_utf_strcmp CONSTANT_Utf8_info s1,
CONSTANT_Utf8_info s2
 

Compare two UTF strings from constant_pool, s1 minus s2.

Parameters:
s1 First of two UTF strings to compare
s2 Second of two UTF strings to compare
Returns:
lexicographical value of first difference in strings, else 0.

Definition at line 379 of file utf.c.

References CP_THIS_STRLEN, PTR_CP_THIS_STRNAME, and s1_s2_strncmp().

jbyte utf_prchar_pcfs_strcmp rchar s1,
ClassFile pcfs2,
jvm_constant_pool_index  cpidx2
 

Compare contents of null-terminated string to contents of a UTF string from a class file structure.

Parameters:
s1 Null-terminated string name
pcfs2 ClassFile where UTF string is found
cpidx2 Index in pcfs2 constant_pool of UTF string
Returns:
lexicographical value of first difference in strings, else 0.

Definition at line 402 of file utf.c.

References CONSTANT_Utf8_info::bytes, CP_THIS_STRLEN, CONSTANT_Utf8_info::length, PTR_CP_THIS_STRNAME, and s1_s2_strncmp().

jbyte utf_pcfs_strcmp CONSTANT_Utf8_info s1,
ClassFile pcfs2,
jvm_constant_pool_index  cpidx2
 

Compare contents of UTF string to contents of a UTF string from a class file structure.

Parameters:
s1 UTF string name
pcfs2 ClassFile where UTF string is found
cpidx2 Index in pcfs2 constant_pool of UTF string
Returns:
lexicographical value of first difference in strings, else 0.

Definition at line 433 of file utf.c.

References BASETYPE_CHAR_L_TERM, CP_THIS_STRLEN, CONSTANT_Class_info::name_index, nts_prchar_isclassformatted(), PTR_CP_ENTRY_CLASS, PTR_CP_THIS_STRNAME, and rtrue.

Referenced by attribute_name_common_find(), field_find_by_cp_entry(), and method_find_by_cp_entry().

jbyte utf_prchar_classname_strcmp rchar s1,
ClassFile pcfs2,
jvm_constant_pool_index  cpidx2
 

Compare a null-terminated string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool.

Parameters:
s1 Null-terminated string to compare, containing formatted or unformatted class name (utf_prchar_classname_strcmp() only).
pcfs2 ClassFile structure containing second string (containing an unformatted class name)
cpidx2 constant_pool index of CONSTANT_Class_info entry whose name will be compared (by getting its name_index and the UTF string name of it)
Returns:
lexicographical value of first difference in strings, else 0.

Definition at line 537 of file utf.c.

References CONSTANT_Utf8_info::bytes, CONSTANT_Utf8_info::length, and utf_common_classname_strcmp().

Referenced by opcode_run().

jbyte utf_classname_strcmp CONSTANT_Utf8_info s1,
ClassFile pcfs2,
jvm_constant_pool_index  cpidx2
 

Compare a UTF string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool.

Parameters:
s1 UTF string to compare, containing formatted or unformatted class name.
pcfs2 ClassFile structure containing second string (containing an unformatted class name)
cpidx2 constant_pool index of CONSTANT_Class_info entry whose name will be compared (by getting its name_index and the UTF string name of it)
Returns:
lexicographical value of first difference in strings, else 0.

Definition at line 571 of file utf.c.

References BASETYPE_CHAR_ARRAY, CONSTANT_Utf8_info::bytes, and CONSTANT_MAX_ARRAY_DIMS.

jvm_array_dim utf_get_utf_arraydims CONSTANT_Utf8_info inbfr  ) 
 

Report the number of array dimensions prefixing a Java type string.

No overflow condition is reported since it is assumed that inbfr is formatted with correct length. Notice that because this logic checks only for array specifiers and does not care about the rest of the string, it may be used to evaluate field descriptions, which will not contain any class formatting information.

If there is even a remote possibility that more than CONSTANT_MAX_ARRAY_DIMS dimensions will be found, compare the result of this function with the result of utf_isarray(). If there is a discrepancy, then there was an overflow here. Properly formatted class files will never contain code with this condition.

Note:
This function is identical to nts_get_arraydims() except that it works on (CONSTANT_Utf8_info *) instead of (rchar *).
Parameters:
inbfr CONSTANT_Utf8_info string.
Returns:
Number of array dimensions in string. For example, this string contains three array dimensions:
[[[Lsome/path/name/filename;

If more than CONSTANT_MAX_ARRAY_DIMS are located, the result is zero-- no other error is reported.

< Reference to one array dimension

< Highest number of array dimensions

< Not stated in spec, but implied

Definition at line 617 of file utf.c.

Referenced by class_load_primative().

rboolean utf_utf_isarray CONSTANT_Utf8_info inbfr  ) 
 

rboolean utf_utf_isclassformatted CONSTANT_Utf8_info src  ) 
 

Verify if a UTF string contains class formatting or not.

Parameters:
src Pointer to UTF string, most likely from constant pool
Returns:
rtrue if string is formtted as LClasSName; but rfalse otherwise, may also have array descriptor prefixed, thus [[LClassName;
Note:
This function works just like nts_prchar_isclassformatted() except that it works on (CONSTANT_Utf8_info) strings rather than on (rchar *) strings.
< Reference to one array dimension

< an instance of class '/class/name'

< terminator for instance of class

< an instance of class '/class/name'

Definition at line 759 of file utf.c.

References BASETYPE_CHAR_L_TERM, and rtrue.

cp_info_dup* utf_utf2utf_unformatted_classname cp_info_dup inbfr  ) 
 

Strip a UTF string of any class formatting it contains and return result in a heap-allocated buffer.

When done with this result, perform HEAP_DATA_FREE(result) to return buffer to heap.

Parameters:
inbfr Pointer to UTF string that is potentially formatted as LClassName; and which may also have array descriptor prefixed, thus [[LClassName; . This will typically be an entry from the constant_pool.
Returns:
heap-allocated buffer containing ClassName with no formatting, regardless of input formatting or lack thereof.
Note:
This function works just like nts_prchar2prchar_unformatted_classname() except that it takes a (CONSTANT_Utf8_info) string rather than a (rchar *) string and returns a (CONSTANT_Utf8_info *).

Definition at line 843 of file utf.c.


Generated on Fri Sep 30 18:50:38 2005 by  doxygen 1.4.4