Text types#

HeavyDB supports two text encoding options: TEXT ENCODING NONE and TEXT ENCODING DICT.

TEXT ENCODING NONE stores textual data without compression, while TEXT ENCODING DICT uses dictionary-based encoding to reduce storage requirements by replacing common words or phrases with shorter codes. The choice depends on data characteristics and the trade-off between storage space and encoding/decoding overhead.

Defining an UDF with Text types:#

Encoding dict#

from test_udf_text of rbc/tests/heavydb/test_howtos.py#

# Requires HeavyDB server v6.3 or newer
@heavydb('TextEncodingDict(RowFunctionManager, TextEncodingDict)')
def text_copy(mgr, t):
    db_id: int = mgr.get_dict_db_id('text_copy', 0)
    dict_id: int = mgr.get_dict_id('text_copy', 0)
    s: str = mgr.get_string(db_id, dict_id, t)
    return mgr.get_or_add_transient(
        mgr.TRANSIENT_DICT_DB_ID,
        mgr.TRANSIENT_DICT_ID,
        s)

Encoding none#

from test_udf_text of rbc/tests/heavydb/test_howtos.py#

from rbc.heavydb import TextEncodingNone

@heavydb('TextEncodingNone(TextEncodingNone)')
def text_duplicate(t):
    s: str = t.to_string()
    return TextEncodingNone(s + s)

Converting a Text Encoding None to a string#

Text encoding none objects feature a handy to_string() method for converting the object into a Python Unicode type.

from test_udf_text of rbc/tests/heavydb/test_howtos.py#

from rbc.heavydb import TextEncodingNone

@heavydb('TextEncodingNone(TextEncodingNone)')
def text_capitalize(t):
    s: str = t.to_string()
    return TextEncodingNone(s.capitalize())

Check the Numba readthedocs page for a list of supported string methods.

Converting a Text Encoding Dict to a string#

See the first example in this page.