Text types#
HeavyDB supports two text encoding options: TEXT ENCODING NONE
and
TEXT ENCODING DICT
.
TEXT ENCODING NONE
stores textual data without compression, while
TEXT ENCODING DICT
uses dictionary-based encoding to reduce storage
requirements by replacing common words or phrases with shorter codes.
The choice depends on data characteristics and the trade-off between storage
space and encoding/decoding overhead.
Defining an UDF with Text types:#
Encoding dict#
1# Requires HeavyDB server v6.3 or newer
2@heavydb('TextEncodingDict(RowFunctionManager, TextEncodingDict)')
3def text_copy(mgr, t):
4 db_id: int = mgr.get_dict_db_id('text_copy', 0)
5 dict_id: int = mgr.get_dict_id('text_copy', 0)
6 s: str = mgr.get_string(db_id, dict_id, t)
7 return mgr.get_or_add_transient(
8 mgr.TRANSIENT_DICT_DB_ID,
9 mgr.TRANSIENT_DICT_ID,
10 s)
Encoding none#
1from rbc.heavydb import TextEncodingNone
2
3@heavydb('TextEncodingNone(TextEncodingNone)')
4def text_duplicate(t):
5 s: str = t.to_string()
6 return TextEncodingNone(s + s)
Converting a Text Encoding None to a string#
Text encoding none objects feature a handy to_string()
method for converting
the object into a Python Unicode type.
1from rbc.heavydb import TextEncodingNone
2
3@heavydb('TextEncodingNone(TextEncodingNone)')
4def text_capitalize(t):
5 s: str = t.to_string()
6 return TextEncodingNone(s.capitalize())
Check the Numba readthedocs page for a list of supported string methods.
Converting a Text Encoding Dict to a string#
See the first example in this page.