There are three character string types in this program: null-terminated (rchar) strings ala 'C' language, UTF-8 (CONSTANT_Utf8_info) strings, and Unicode (jchar)[] strings.
Convert one or UTF-8 (jbyte) bytes to and from Unicode (jchar) characters, plus related functions, like comparison and string length.
Why are these functions called utf_XXX() instead of utf8_XXX()? Originally, they were called such, but when the JDK 1.5 class file spec, section 4, was reviewed (after working with the 1.2/1.4 versions), it was discovered that certain other UTF-xx formats were also provided in the spec, even if not accurately defined. (Due to errors in the revised class file specification, the 21-bit UTF characters (6 bytes) will not be implemented until a definitive correction is located. However, in anticipation of this correction, the functions are now named utf_XXX() without respect to character bit width.) Notice, however, that the spec, section 4, defines a CONSTANT_Utf8 and a CONSTANT_Utf8_info. Therefore, these designations will remain in the code unless changed in the spec.
Copyright 2005 The Apache Software Foundation or its licensors, as applicable.
Licensed under the Apache License, Version 2.0 ("the License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
Definition in file utf.c.
#include "arch.h"
#include <string.h>
#include "jvmcfg.h"
#include "cfmacros.h"
#include "classfile.h"
#include "nts.h"
#include "util.h"
Go to the source code of this file.
Defines | |
#define | MAP_INVALID_UTF8_TO_QUESTION_MARK |
#define | RETURN_IF_NUL_BYTE |
Functions | |
static jbyte | s1_s2_strncmp (u1 *s1, int l1, u1 *s2, int l2) |
Compare two strings of any length, and potentially neither null-terminated, that is, could be a UTF string. | |
static void | utf_c_dummy (void) |
jbyte | utf_classname_strcmp (CONSTANT_Utf8_info *s1, ClassFile *pcfs2, jvm_constant_pool_index cpidx2) |
Compare a UTF string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool. | |
static jbyte | utf_common_classname_strcmp (u1 *s1, int l1, ClassFile *pcfs2, jvm_constant_pool_index cpidx2) |
Common generic comparison, all parameters regularized. | |
jvm_array_dim | utf_get_utf_arraydims (CONSTANT_Utf8_info *inbfr) |
Report the number of array dimensions prefixing a Java type string. | |
rboolean | utf_isarray (CONSTANT_Utf8_info *inbfr) |
Test whether or not a Java type string is an array or not. | |
jbyte | utf_pcfs_strcmp (CONSTANT_Utf8_info *s1, ClassFile *pcfs2, jvm_constant_pool_index cpidx2) |
Compare contents of UTF string to contents of a UTF string from a class file structure. | |
jbyte | utf_prchar_classname_strcmp (rchar *s1, ClassFile *pcfs2, jvm_constant_pool_index cpidx2) |
Compare a null-terminated string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool. | |
jbyte | utf_prchar_pcfs_strcmp (rchar *s1, ClassFile *pcfs2, jvm_constant_pool_index cpidx2) |
Compare contents of null-terminated string to contents of a UTF string from a class file structure. | |
rchar * | utf_utf2prchar (CONSTANT_Utf8_info *src) |
Convert a UTF string from a (CONSTANT_Utf8_info *) into a null-terminated string by allocating heap and copying the UTF data. | |
rchar * | utf_utf2prchar_classname (CONSTANT_Utf8_info *src) |
Convert and an un-formatted class name UTF string (of the type ClassName and not of type [ [[LClassName) from a (CONSTANT_Utf8_info *) into a null-terminated string with Java class formatting items. Result is delivered in a heap-allocated buffer. When done with result, perform HEAP_FREE_DATA(result) to return that buffer to the heap. | |
jshort | utf_utf2unicode (CONSTANT_Utf8_info *utf_inbfr, jchar *outbfr) |
Convert UTF8 buffer into Unicode buffer. | |
cp_info_dup * | utf_utf2utf_unformatted_classname (cp_info_dup *inbfr) |
Strip a UTF string of any class formatting it contains and return result in a heap-allocated buffer. | |
rboolean | utf_utf_isclassformatted (CONSTANT_Utf8_info *src) |
Verify if a UTF string contains class formatting or not. | |
jbyte | utf_utf_strcmp (CONSTANT_Utf8_info *s1, CONSTANT_Utf8_info *s2) |
Compare two UTF strings from constant_pool, s1 minus s2. | |
Variables | |
static char * | utf_c_copyright = "\0" "$URL: https://svn.apache.org/path/name/utf.c $ $Id: utf.c 0 09/28/2005 dlydick $" " " "Copyright 2005 The Apache Software Foundation or its licensors, as applicable." |
|
Value: *outbfr++ = (jchar) '?'; \ inbfr++ ? when invalid UTF state found, adj return code
Definition at line 79 of file utf.c. Referenced by utf_utf2unicode(). |
|
Value: if (UTF8_FORBIDDEN_ZERO == *inbfr) \ {return(charcnvcount); } Definition at line 83 of file utf.c. Referenced by utf_utf2unicode(). |
|
|
|
Convert UTF8 buffer into Unicode buffer.
charcnvcount (Return value of function) Number of Unicode characters in outbfr. This will only be the same as length when ALL UTF characters are ASCII. It will otherwise be less than that.
SPEC AMBIGUITY: In case of invalid characters, a Unicode < Looks suspiciously like ASCII NUL < '\u007f', UTF-8 representation ! Top 4 bits are '1110' < Top 3 bits are '110' < Bottom 5 bits contain data bits 10-6 < Move first byte up to bits 10-6 < Looks suspiciously like ASCII NUL < Top 2 bits are '10' < Bottom 6 bits contain data bits 0-5 < Bottom 6 bits contain data bits 0-5 ! Top 4 bits are '1110' ! Bottom 5 bits contain data bits 15-12 ! Bottom 5 bits contain data bits 15-12 ! Move first byte up to bits 15-12 < Looks suspiciously like ASCII NUL ! Top 2 bits are '10' ! Bottom 6 bits contain data bits 11-6 ! Bottom 6 bits contain data bits 11-6 ! Move second byte up to bits 10-6 < Looks suspiciously like ASCII NUL ! Top 2 bits are '10' ! Bottom 6 bits contain data bits 5-0 ! Bottom 6 bits contain data bits 5-0 Definition at line 116 of file utf.c. References CONSTANT_Utf8_info::bytes, MAP_INVALID_UTF8_TO_QUESTION_MARK, RETURN_IF_NUL_BYTE, UTF8_DOUBLE_FIRST_MASK0, UTF8_DOUBLE_FIRST_SHIFT, UTF8_DOUBLE_FIRST_VAL, UTF8_DOUBLE_SECOND_MASK0, UTF8_DOUBLE_SECOND_VAL, UTF8_SINGLE_MAX, UTF8_TRIPLE_FIRST_MASK0, UTF8_TRIPLE_FIRST_SHIFT, UTF8_TRIPLE_FIRST_VAL, UTF8_TRIPLE_SECOND_MASK0, UTF8_TRIPLE_SECOND_SHIFT, UTF8_TRIPLE_SECOND_VAL, UTF8_TRIPLE_THIRD_MASK0, and UTF8_TRIPLE_THIRD_VAL. |
|
Convert a UTF string from a (CONSTANT_Utf8_info *) into a null-terminated string by allocating heap and copying the UTF data. When done with result, perform HEAP_FREE_DATA(result).
Definition at line 259 of file utf.c. Referenced by class_load_primative(), opcode_run(), and utf_isarray(). |
|
Compare two strings of any length, and potentially neither null-terminated, that is, could be a UTF string.
If strings are of equal length, this function is equivalent to This function should be used on ALL string comparisons that potentially involve lack of NUL termination, namely, anything to do with UTF strings of any sort. It is recommended also for any null-terminated string just so all string comparisons work exactly alike, no matter whether (rchar *) or UTF, whether of equal length or not.
Definition at line 315 of file utf.c. Referenced by utf_prchar_pcfs_strcmp(), and utf_utf_strcmp(). |
|
Compare two UTF strings from constant_pool, s1 minus s2.
Definition at line 379 of file utf.c. References CP_THIS_STRLEN, PTR_CP_THIS_STRNAME, and s1_s2_strncmp(). |
|
Compare contents of null-terminated string to contents of a UTF string from a class file structure.
Definition at line 402 of file utf.c. References CONSTANT_Utf8_info::bytes, CP_THIS_STRLEN, CONSTANT_Utf8_info::length, PTR_CP_THIS_STRNAME, and s1_s2_strncmp(). |
|
Compare contents of UTF string to contents of a UTF string from a class file structure.
Definition at line 433 of file utf.c. References BASETYPE_CHAR_L_TERM, CP_THIS_STRLEN, CONSTANT_Class_info::name_index, nts_prchar_isclassformatted(), PTR_CP_ENTRY_CLASS, PTR_CP_THIS_STRNAME, and rtrue. Referenced by attribute_name_common_find(), field_find_by_cp_entry(), and method_find_by_cp_entry(). |
|
Common generic comparison, all parameters regularized. Compare a UTF or null-terminated string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool. Compare s1 minus s2, but skipping, where applicable, the s1 initial BASETYPE_CHAR_L and the terminating BASETYPE_CHAR_L_TERM, plus any array dimension modifiers. The second string is specified by a constant_pool index. Notice that there are NO formatted class string names in the (CONSTANT_Class_info) entries of the constant_pool because such would be redundant. (Such entries are the formal definition of the class.)
Definition at line 479 of file utf.c. Referenced by utf_prchar_classname_strcmp(). |
|
Compare a null-terminated string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool.
Definition at line 537 of file utf.c. References CONSTANT_Utf8_info::bytes, CONSTANT_Utf8_info::length, and utf_common_classname_strcmp(). Referenced by opcode_run(). |
|
Compare a UTF string containing a formatted or unformatted class name with an unformatted UTF string from constant_pool.
Definition at line 571 of file utf.c. References BASETYPE_CHAR_ARRAY, CONSTANT_Utf8_info::bytes, and CONSTANT_MAX_ARRAY_DIMS. |
|
Report the number of array dimensions prefixing a Java type string. No overflow condition is reported since it is assumed that inbfr is formatted with correct length. Notice that because this logic checks only for array specifiers and does not care about the rest of the string, it may be used to evaluate field descriptions, which will not contain any class formatting information. If there is even a remote possibility that more than CONSTANT_MAX_ARRAY_DIMS dimensions will be found, compare the result of this function with the result of utf_isarray(). If there is a discrepancy, then there was an overflow here. Properly formatted class files will never contain code with this condition.
[ [[Lsome/path/name/filename;If more than CONSTANT_MAX_ARRAY_DIMS are located, the result is zero-- no other error is reported. < Reference to one array dimension < Highest number of array dimensions < Not stated in spec, but implied Definition at line 617 of file utf.c. Referenced by class_load_primative(). |
|
Test whether or not a Java type string is an array or not.
Definition at line 660 of file utf.c. References HEAP_GET_DATA, CONSTANT_Utf8_info::length, nts_prchar_isclassformatted(), rfalse, rnull, and utf_utf2prchar(). |
|
Convert and an un-formatted class name UTF string (of the type
This function will work on formatted class names
< terminator for instance of class |
|
Verify if a UTF string contains class formatting or not.
< an instance of class '/class/name' < terminator for instance of class < an instance of class '/class/name' Definition at line 759 of file utf.c. References BASETYPE_CHAR_L_TERM, and rtrue. |
|
Strip a UTF string of any class formatting it contains and return result in a heap-allocated buffer. When done with this result, perform HEAP_DATA_FREE(result) to return buffer to heap.
|
|
|