提示
Hive SQL 教程 欢迎使用。提供建议、纠错、催更等加作者微信: gr99123(备注:sql )和关注公众号「盖若」ID: gairuo。跟作者学习,请进入 Python学习课程。欢迎关注作者出版的书籍:《深入浅出Pandas》 和 《Python之光》。
Hive支持以下内置的 String 类型数据处理的函数:
返回类型 |
用法 |
描述 |
||||||||
---|---|---|---|---|---|---|---|---|---|---|
int |
ascii(string str) |
返回 str 的第一个字符的 ascii 编码数值。 |
||||||||
string |
base64(binary bin) |
将参数从二进制转换为 64 进制字符串。(始于 Hive 0.12.0). |
||||||||
int | character_length(string str) | 返回 str 中包含的 UTF-8 字符数 (始于 Hive 2.2.0). 可简写为 char_length。 | ||||||||
string | chr(bigint|double A) | 返回 ASCII 编码为 A 的字符。(始于 1.3.0 和 2.1.0)。如果 A 大于 256 则相当于 chr(A % 256),例如 select chr(88) 返回 "X" | ||||||||
string |
concat(string|binary A, string|binary B...) |
返回按顺序连接作为参数传入的字符串或字节所产生的字符串或字节。例如,concat('foo', 'bar') 结果为 'foobar'。此函数可以接受任意数量的输入字符串。 |
||||||||
array<struct<string,double>> |
context_ngrams(array<array<string>>, array<string>, int K, int pf) |
给定一个上下文字符串,从一组标记化句子中返回 top-k 上下文 N-gram。有关更多信息,请参阅 StatisticsAndDataMining。 |
||||||||
string |
concat_ws(string SEP, string A, string B...) |
与上面的 concat() 类似,但带有自定义分隔符 SEP。 |
||||||||
string |
concat_ws(string SEP, array<string>) |
与上面的 concat_ws() 类似,但使用字符串数组。(始于 Hive 0.9.0) |
||||||||
string |
decode(binary bin, string charset) |
使用提供的字符集将第一个参数解码为字符串 (以下编码之一 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'),如果任一参数为 null,则结果也将为 null. (始于 0.12.0.) |
||||||||
string | elt(N int,str1 string,str2 string,str3 string,...) |
返回索引号处的字符串。例如,elt(2,'hello','world') 返回 'world'。如果 N 小于 1 或大于参数数,则返回 NULL。(参见) |
||||||||
binary |
encode(string src, string charset) |
使用提供的字符集将第一个参数编码为二进制 (以下编码之一 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'),如果任一参数为 null,则结果也将为 null. (始于 0.12.0.) |
||||||||
int | field(val T,val1 T,val2 T,val3 T,...) |
返回 val 在 val1,val2,val3,... 列的索引,不存在为 0,比如 field('world','say','hello','world') 返回 3。支持所有基元类型,使用 str.equals(x) 比较参数。如果 val 为 NULL,则返回值为 0。(参见) |
||||||||
int |
find_in_set(string str, string strList) |
返回 strList 中第一个出现的 str,其中 strList 是逗号分隔的字符串。如果任一参数为 null,则返回 null。如果第一个参数包含任何逗号,则返回 0。 如 find_in_set('ab', 'abc,b,ab,c,def') 返回 3。 |
||||||||
string |
format_number(number x, int d) |
将数字 X 格式化为形如 '#,###,###.##' ,舍入到 D 位小数,并以字符串形式返回结果。如果 D 为 0,则结果没有小数点或小数部分。(始于 0.10.0; Hive 0.14.0 修正了浮点类型的错误, Hive 0.14.0 添加了 decimal 类型支持) |
||||||||
string |
get_json_object(string json_string, string path) |
根据指定的 json 路径从 json 字符串中提取 json 对象,并返回提取的 json 对象的 json 字符串。 如果输入的 json 字符串无效,它将返回 null。 |
||||||||
boolean |
in_file(string str, string filename) |
如果字符串 str 在文件名中显示为整行,则返回 true。 |
||||||||
int |
instr(string str, string substr) |
返回字符 |
||||||||
int |
length(string A) |
返回字符串的长度。 |
||||||||
int |
locate(string substr, string str[, int pos]) |
返回字符在 |
||||||||
string |
lower(string A) lcase(string A) |
返回将 A 的所有字符转换为小写所产生的字符串。如 lower('fOoBaR') 为 'foobar'. |
||||||||
string |
lpad(string str, int len, string pad) |
返回 str,左填充 pad,长度为 len。如果 str 长于len,则返回值将缩短为 len 个字符。如果填充字符串为空,则返回值为空。 |
||||||||
string |
ltrim(string A) |
返回从 A 开头(左侧)修剪空格所产生的字符串。如 ltrim(' foobar ') 为 'foobar '. |
||||||||
array<struct<string,double>> |
ngrams(array<array<string>>, int N, int K, int pf) |
返回一组标记化句子中的 top-k N-gram,例如那些由 sentences() UDAF 返回的句子。见 StatisticsAndDataMining |
||||||||
int | octet_length(string str) | 返回以 UTF-8 编码保存字符串 str 所需的八位字节数 (始于 Hive 2.2.0). 请注意,octet_length(str) 可以大于 character_length(str)。 | ||||||||
string |
parse_url(string urlString, string partToExtract [, string keyToExtract]) |
从 URL 返回指定的部分。partToExtract 的有效值包括 HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. 如 parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') 返回 'facebook.com'. 此外,还可以通过提供键作为第三个参数来提取查询中特定键的值,如 parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') 返回 'v1'. |
||||||||
string |
printf(String format, Obj... args) |
返回根据 printf 样式格式字符串格式化的输入 (始于 Hive 0.9.0). |
||||||||
string | quote(String text) |
返回带引号的字符串 (包括任何单引号的转义字符,HIVE-4.0.0)
|
||||||||
string |
regexp_extract(string subject, string pattern, int index) |
返回使用模式提取的字符串。例如, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) 返回 'bar.',请注意,在使用预定义的字符类时需要谨慎:使用'\s'作为第二个参数将匹配字母s;'\\s'是匹配空格等所必需的。index参数是Java regex Matcher group() 方法索引。 参见 docs/api/java/util/regex/Matcher.html 有关“index” 或 Java regex group() 方法的更多信息。 |
||||||||
string |
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) |
Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. |
||||||||
string |
repeat(string str, int n) |
Repeats str n times. |
||||||||
string | replace(string A, string OLD, string NEW) | Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0). Example: select replace("ababab", "abab", "Z"); returns "Zab". | ||||||||
string |
reverse(string A) |
Returns the reversed string. |
||||||||
string |
rpad(string str, int len, string pad) |
Returns str, right-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters. In case of empty pad string, the return value is null. |
||||||||
string |
rtrim(string A) |
Returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar'. |
||||||||
array<array<string>> |
sentences(string str, string lang, string locale) |
Tokenizes a string of natural language text into words and sentences, where each sentence is broken at the appropriate sentence boundary and returned as an array of words. The 'lang' and 'locale' are optional arguments. For example, sentences('Hello there! How are you?') returns ( ("Hello", "there"), ("How", "are", "you") ). |
||||||||
string |
space(int n) |
Returns a string of n spaces. |
||||||||
array |
split(string str, string pat) |
Splits str around pat (pat is a regular expression). |
||||||||
map<string,string> |
str_to_map(text[, delimiter1, delimiter2]) |
Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2. |
||||||||
string |
substr(string|binary A, int start) substring(string|binary A, int start) |
Returns the substring or slice of the byte array of A starting from start position till the end of string A. For example, substr('foobar', 4) results in 'bar' (see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]). |
||||||||
string |
substr(string|binary A, int start, int len) substring(string|binary A, int start, int len) |
Returns the substring or slice of the byte array of A starting from start position with length len. For example, substr('foobar', 4, 1) results in 'b' (see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]). |
||||||||
string | substring_index(string A, string delim, int count) | Returns the substring from string A before count occurrences of the delimiter delim (as of Hive 1.3.0). If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: substring_index('www.apache.org', '.', 2) = 'www.apache'. | ||||||||
string |
translate(string|char|varchar input, string|char|varchar from, string|char|varchar to) |
Translates the input string by replacing the characters present in the Char/varchar support added as of Hive 0.14.0. |
||||||||
string |
trim(string A) |
Returns the string resulting from trimming spaces from both ends of A. For example, trim(' foobar ') results in 'foobar' |
||||||||
binary |
unbase64(string str) |
Converts the argument from a base 64 string to BINARY. (As of Hive 0.12.0.) |
||||||||
string |
upper(string A) ucase(string A) |
Returns the string resulting from converting all characters of A to upper case. For example, upper('fOoBaR') results in 'FOOBAR'. |
||||||||
string | initcap(string A) | Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by whitespace. (As of Hive 1.1.0.) | ||||||||
int | levenshtein(string A, string B) | Returns the Levenshtein distance between two strings (as of Hive 1.2.0). For example, levenshtein('kitten', 'sitting') results in 3. | ||||||||
string | soundex(string A) | Returns soundex code of the string (as of Hive 1.2.0). For example, soundex('Miller') results in M460. |
Hive支持以下内置数据屏蔽功能:
Return Type |
Name(Signature) |
Description |
---|---|---|
string |
mask(string str[, string upper[, string lower[, string number]]]) |
Returns a masked version of str (as of Hive 2.1.0). By default, upper case letters are converted to "X", lower case letters are converted to "x" and numbers are converted to "n". For example mask("abcd-EFGH-8765-4321") results in xxxx-XXXX-nnnn-nnnn. You can override the characters used in the mask by supplying additional arguments: the second argument controls the mask character for upper case letters, the third argument for lower case letters and the fourth argument for numbers. For example, mask("abcd-EFGH-8765-4321", "U", "l", "#") results in llll-UUUU-####-####. |
string | mask_first_n(string str[, int n]) | Returns a masked version of str with the first n values masked (as of Hive 2.1.0). Upper case letters are converted to "X", lower case letters are converted to "x" and numbers are converted to "n". For example, mask_first_n("1234-5678-8765-4321", 4) results in nnnn-5678-8765-4321. |
string | mask_last_n(string str[, int n]) | Returns a masked version of str with the last n values masked (as of Hive 2.1.0). Upper case letters are converted to "X", lower case letters are converted to "x" and numbers are converted to "n". For example, mask_last_n("1234-5678-8765-4321", 4) results in 1234-5678-8765-nnnn. |
string | mask_show_first_n(string str[, int n]) | Returns a masked version of str, showing the first n characters unmasked (as of Hive 2.1.0). Upper case letters are converted to "X", lower case letters are converted to "x" and numbers are converted to "n". For example, mask_show_first_n("1234-5678-8765-4321", 4) results in 1234-nnnn-nnnn-nnnn. |
string | mask_show_last_n(string str[, int n]) | Returns a masked version of str, showing the last n characters unmasked (as of Hive 2.1.0). Upper case letters are converted to "X", lower case letters are converted to "x" and numbers are converted to "n". For example, mask_show_last_n("1234-5678-8765-4321", 4) results in nnnn-nnnn-nnnn-4321. |
string | mask_hash(string|char|varchar str) | Returns a hashed value based on str (as of Hive 2.1.0). The hash is consistent and can be used to join masked values together across tables. This function returns null for non-string types. |
更新时间:2024-11-23 15:19:50 标签:hql 字符 sql