String Functions

Syntax	Description	Example	Normal Query	Pushdown Query	Defined as Computed Column	Suggested as Computed Column
CHAR_LENGTH(string)	Returns the number of characters in a character string	`CHAR_LENGTH('Kylin')` = 5	✔️	✔️	✔️	✔️
CHARACTER_LENGTH(string)	As CHAR_LENGTH(string)	`CHARACTER_LENGTH('Kylin')` = 5	✔️	✔️	✔️	✔️
UPPER(string)	Returns a character string converted to upper case	`UPPER('Kylin')` = KYLIN	✔️	✔️	✔️	✔️
LOWER(string)	Returns a character string converted to lower case	`LOWER('Kylin')` = kylin	✔️	✔️	✔️	✔️
POSITION(string1 IN string2)	Returns the position of the first occurrence of string1 in string2	`POSITION('Kyli' IN 'Kylin')` = 1	✔️	✔️	✔️	✔️
TRIM( { BOTH \ LEADING\ TRAILING } string1 FROM string2)	Removes the longest string containing only the characters in string1 from the both ends/start/end of string1	Example1: `TRIM(BOTH '6' FROM '666Kylin66')` = Kylin Example 2: `TRIM(LEADING '6' FROM '666Kylin66')` = Kylin66 Example 3: `TRIM(TRAILING '6' FROM '666Kylin66')` = 666Kylin	✔️	✔️	✔️	❌
OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ])	Replaces a substring of string1 with string2 starting at the integer bit	`OVERLAY('666' placing 'KYLIN' FROM 2 for 2)` = 6KYLIN	✔️	✔️	✔️	✔️
SUBSTRING(string FROM integer)	Returns a substring of a character string starting at a given point	`SUBSTRING('Kylin' FROM 5)` = n	✔️	✔️	✔️	✔️
SUBSTRING(string FROM integer1 FOR integer2)	Returns a substring of a character string starting at a given point with a given length	`SUBSTRING('KYlin' from 5 for 2)` = n	✔️	✔️	✔️	✔️
INITCAP(string)	Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.	`INITCAP('kylin')` = Kylin	✔️	✔️	✔️	✔️
REPLACE(string, search, replacement)	Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed	`REPLACE('Kylin','Kyli','Kyliiiiiii')` = Kyliiiiiiin	✔️	✔️	✔️	✔️
BASE64(bin)	Converts the argument from a binary `bin` to a base 64 string	`BASE64('Spark SQL')` = U3BhcmsgU1FM	✔️	✔️	✔️	✔️
DECODE(bin, charset)	Decodes the first argument using the second argument character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null	`DECODE(ENCODE('abc', 'utf-8'), 'utf-8')` = abc	✔️	✔️	✔️	✔️
ENCODE(str, charset)	Encodes the first argument using the second argument character set(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null	`ENCODE('abc', 'utf-8')` = `[B@2b442236`	✔️	✔️	❌	❌
FIND_IN_SET(str, str_array)	Returns the index (1-based) of the given string (`str`) in the comma-delimited list (`str_array`).Returns 0, if the string was not found or if the given string (`str`) contains a comma	`FIND_IN_SET('ab','abc,b,ab,c,def')` = 3	✔️	✔️	✔️	✔️
LCASE(string)	Returns `string` with all characters changed to lowercase	`LCASE('SparkSql')` = sparksql	✔️	✔️	✔️	✔️
LEVENSHTEIN(str, str)	Returns the Levenshtein distance between the two given strings	`LEVENSHTEIN('kitten', 'sitting')` = 3	✔️	✔️	✔️	✔️
LOCATE(substr, str[, pos])	Returns the position of the first occurrence of `substr` in `str` after position `pos`. The given `pos` and return value are 1-based	`LOCATE('bar', 'foobarbar')` = 4 `LOCATE('bar', 'foobarbar', 5)` =7	✔️	✔️	✔️	✔️
LPAD(str, len, pad)	Returns `str`, left-padded with `pad` to a length of `len`.If `str` is longer than `len`, the return value is shortened to `len` characters.	`LPAD('hi', 5, '??')` = ???hi	✔️	✔️	✔️	✔️
RPAD(str, len, pad)	Returns `str`, right-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters	`RPAD('hi', 5, '??')` = hi???	✔️	✔️	✔️	✔️
RTRIM(trimStr, str)	Removes the trailing string which contains the characters from the trim string from the `str`	`RTRIM('KR', 'SPARK')` = SPA	✔️	✔️	✔️	✔️
SENTENCES(str)	Splits `str` into an array of array of words.	`SENTENCES('Hi there! Good morning.')` = [["Hi","there"],["Good","morning"]]	✔️	✔️	❌	❌
SUBSTRING_INDEX(str, delim, count)	Returns the substring from `str` before `count` occurrences of the delimiter `delim`. If `count` is positive, everything to the left of the final delimiter (counting from the left) is returned. If `count` is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for `delim`	`SUBSTRING_INDEX('www.apache.org', '.', 1)` = www	✔️	✔️	✔️	✔️
UCASE(str)	Returns `str` with all characters changed to uppercase.	`UCASE('SparkSql')` = SPARKSQL	✔️	✔️	✔️	✔️
UNBASE64(str)	Converts the argument from a base 64 string `str` to a binary.	`UNBASE64('U3BhcmsgU1FM')` = `[B@7c66820d`	✔️	✔️	❌	❌
ASCII(str)	Convert character to corresponding ascii code	`ASCII('a')` = 97	✔️	❌	✔️	✔️
CHR(str)	Convert ascii code to corresponding character	`CHR(97)` = a	✔️	❌	✔️	✔️
SPACE(len)	Generate `len` number of continuous space	space(2) =	✔️	❌	❌	❌
SPLIT_PART(str, separator, index)	Split `str` with `separator` and return the `index`-th token. `index` counts from 1. when `index` is negative, tokens are counted starting from the right.	`split_part('a-b-c', '-', 1)` = a, `split_part('a-b-c', '-', -1)` = c,	✔️	❌	✔️	✔️
CONCAT(any, [any]*)	Concatenates multiple data of any type into a string	`concat('Kyl', 'in')` = Kylin	✔️	✔️	✔️	✔️
REPEAT(str,n)	Repeat `str` `n` times and return string. When querying the model, `str` supports passing in constants, columns and expressions, and `n` only supports passing in constants	`repeat('kylin',2)` ='kylinkylin'	✔️	✔️	✔️	✔️
LEFT(str,n)	Return the `n` characters from the left of `str`. When querying the model, `str` supports passing in constants, columns, and `n` only supports passing in constants	`left('kylin',2)` ='ky'	✔️	✔️	✔️	✔️