Text types#

HeavyDB supports two text encoding options: TEXT ENCODING NONE and TEXT ENCODING DICT.

TEXT ENCODING NONE stores textual data without compression, while TEXT ENCODING DICT uses dictionary-based encoding to reduce storage requirements by replacing common words or phrases with shorter codes. The choice depends on data characteristics and the trade-off between storage space and encoding/decoding overhead.

Defining an UDF with Text types:#

Encoding dict#

from test_udf_text of rbc/tests/heavydb/test_howtos.py#
 1# Requires HeavyDB server v6.3 or newer
 2@heavydb('TextEncodingDict(RowFunctionManager, TextEncodingDict)')
 3def text_copy(mgr, t):
 4    db_id: int = mgr.get_dict_db_id('text_copy', 0)
 5    dict_id: int = mgr.get_dict_id('text_copy', 0)
 6    s: str = mgr.get_string(db_id, dict_id, t)
 7    return mgr.get_or_add_transient(
 8        mgr.TRANSIENT_DICT_DB_ID,
 9        mgr.TRANSIENT_DICT_ID,
10        s)

Encoding none#

from test_udf_text of rbc/tests/heavydb/test_howtos.py#
1from rbc.heavydb import TextEncodingNone
2
3@heavydb('TextEncodingNone(TextEncodingNone)')
4def text_duplicate(t):
5    s: str = t.to_string()
6    return TextEncodingNone(s + s)

Converting a Text Encoding None to a string#

Text encoding none objects feature a handy to_string() method for converting the object into a Python Unicode type.

from test_udf_text of rbc/tests/heavydb/test_howtos.py#
1from rbc.heavydb import TextEncodingNone
2
3@heavydb('TextEncodingNone(TextEncodingNone)')
4def text_capitalize(t):
5    s: str = t.to_string()
6    return TextEncodingNone(s.capitalize())

Check the Numba readthedocs page for a list of supported string methods.

Converting a Text Encoding Dict to a string#

See the first example in this page.