String Functions
Syntax | Description | Example | Normal Query | Pushdown Query | Defined as Computed Column | Suggested as Computed Column |
---|---|---|---|---|---|---|
CHAR_LENGTH(string) | Returns the number of characters in a character string | CHAR_LENGTH('Kylin') = 5 | ✔️ | ✔️ | ✔️ | ✔️ |
CHARACTER_LENGTH(string) | As CHAR_LENGTH(string) | CHARACTER_LENGTH('Kylin') = 5 | ✔️ | ✔️ | ✔️ | ✔️ |
UPPER(string) | Returns a character string converted to upper case | UPPER('Kylin') = KYLIN | ✔️ | ✔️ | ✔️ | ✔️ |
LOWER(string) | Returns a character string converted to lower case | LOWER('Kylin') = kylin | ✔️ | ✔️ | ✔️ | ✔️ |
POSITION(string1 IN string2) | Returns the position of the first occurrence of string1 in string2 | POSITION('Kyli' IN 'Kylin') = 1 | ✔️ | ✔️ | ✔️ | ✔️ |
TRIM( { BOTH \ LEADING\ TRAILING } string1 FROM string2) | Removes the longest string containing only the characters in string1 from the both ends/start/end of string1 | Example1: TRIM(BOTH '6' FROM '666Kylin66') = Kylin Example 2: TRIM(LEADING '6' FROM '666Kylin66') = Kylin66 Example 3: TRIM(TRAILING '6' FROM '666Kylin66') = 666Kylin | ✔️ | ✔️ | ✔️ | ❌ |
OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ]) | Replaces a substring of string1 with string2 starting at the integer bit | OVERLAY('666' placing 'KYLIN' FROM 2 for 2) = 6KYLIN | ✔️ | ✔️ | ✔️ | ✔️ |
SUBSTRING(string FROM integer) | Returns a substring of a character string starting at a given point | SUBSTRING('Kylin' FROM 5) = n | ✔️ | ✔️ | ✔️ | ✔️ |
SUBSTRING(string FROM integer1 FOR integer2) | Returns a substring of a character string starting at a given point with a given length | SUBSTRING('KYlin' from 5 for 2) = n | ✔️ | ✔️ | ✔️ | ✔️ |
INITCAP(string) | Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters. | INITCAP('kylin') = Kylin | ✔️ | ✔️ | ✔️ | ✔️ |
REPLACE(string, search, replacement) | Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed | REPLACE('Kylin','Kyli','Kyliiiiiii') = Kyliiiiiiin | ✔️ | ✔️ | ✔️ | ✔️ |
BASE64(bin) | Converts the argument from a binary bin to a base 64 string | BASE64('Spark SQL') = U3BhcmsgU1FM | ✔️ | ✔️ | ✔️ | ✔️ |
DECODE(bin, charset) | Decodes the first argument using the second argument character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null | DECODE(ENCODE('abc', 'utf-8'), 'utf-8') = abc | ✔️ | ✔️ | ✔️ | ✔️ |
ENCODE(str, charset) | Encodes the first argument using the second argument character set(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null | ENCODE('abc', 'utf-8') = [B@2b442236 | ✔️ | ✔️ | ❌ | ❌ |
FIND_IN_SET(str, str_array) | Returns the index (1-based) of the given string (str ) in the comma-delimited list (str_array ).Returns 0, if the string was not found or if the given string (str ) contains a comma | FIND_IN_SET('ab','abc,b,ab,c,def') = 3 | ✔️ | ✔️ | ✔️ | ✔️ |
LCASE(string) | Returns string with all characters changed to lowercase | LCASE('SparkSql') = sparksql | ✔️ | ✔️ | ✔️ | ✔️ |
LEVENSHTEIN(str, str) | Returns the Levenshtein distance between the two given strings | LEVENSHTEIN('kitten', 'sitting') = 3 | ✔️ | ✔️ | ✔️ | ✔️ |
LOCATE(substr, str[, pos]) | Returns the position of the first occurrence of substr in str after position pos . The given pos and return value are 1-based | LOCATE('bar', 'foobarbar') = 4 LOCATE('bar', 'foobarbar', 5) =7 | ✔️ | ✔️ | ✔️ | ✔️ |
LPAD(str, len, pad) | Returns str , left-padded with pad to a length of len .If str is longer than len , the return value is shortened to len characters. | LPAD('hi', 5, '??') = ???hi | ✔️ | ✔️ | ✔️ | ✔️ |
RPAD(str, len, pad) | Returns str , right-padded with pad to a length of len . If str is longer than len , the return value is shortened to len characters | RPAD('hi', 5, '??') = hi??? | ✔️ | ✔️ | ✔️ | ✔️ |
RTRIM(trimStr, str) | Removes the trailing string which contains the characters from the trim string from the str | RTRIM('KR', 'SPARK') = SPA | ✔️ | ✔️ | ✔️ | ✔️ |
SENTENCES(str) | Splits str into an array of array of words. | SENTENCES('Hi there! Good morning.') = [["Hi","there"],["Good","morning"]] | ✔️ | ✔️ | ❌ | ❌ |
SUBSTRING_INDEX(str, delim, count) | Returns the substring from str before count occurrences of the delimiter delim . If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim | SUBSTRING_INDEX('www.apache.org', '.', 1) = www | ✔️ | ✔️ | ✔️ | ✔️ |
UCASE(str) | Returns str with all characters changed to uppercase. | UCASE('SparkSql') = SPARKSQL | ✔️ | ✔️ | ✔️ | ✔️ |
UNBASE64(str) | Converts the argument from a base 64 string str to a binary. | UNBASE64('U3BhcmsgU1FM') = [B@7c66820d | ✔️ | ✔️ | ❌ | ❌ |
ASCII(str) | Convert character to corresponding ascii code | ASCII('a') = 97 | ✔️ | ❌ | ✔️ | ✔️ |
CHR(str) | Convert ascii code to corresponding character | CHR(97) = a | ✔️ | ❌ | ✔️ | ✔️ |
SPACE(len) | Generate len number of continuous space | space(2) = | ✔️ | ❌ | ❌ | ❌ |
SPLIT_PART(str, separator, index) | Split str with separator and return the index -th token. index counts from 1. when index is negative, tokens are counted starting from the right. | split_part('a-b-c', '-', 1) = a, split_part('a-b-c', '-', -1) = c, | ✔️ | ❌ | ✔️ | ✔️ |
CONCAT(any, [any]*) | Concatenates multiple data of any type into a string | concat('Kyl', 'in') = Kylin | ✔️ | ✔️ | ✔️ | ✔️ |
REPEAT(str,n) | Repeat str n times and return string. When querying the model, str supports passing in constants, columns and expressions, and n only supports passing in constants | repeat('kylin',2) ='kylinkylin' | ✔️ | ✔️ | ✔️ | ✔️ |
LEFT(str,n) | Return the n characters from the left of str . When querying the model, str supports passing in constants, columns, and n only supports passing in constants | left('kylin',2) ='ky' | ✔️ | ✔️ | ✔️ | ✔️ |