How to use index?

1. Create index

The first step for utilizing index is to create an index. You can create an index using SQL (Data Definition Language) or Tajo API (Tajo Client API). For example, the following SQL statement will create a BST index on the lineitem table.

default> create index l_orderkey_idx on lineitem (l_orderkey);

If the index is created successfully, you can see the index information as follows:

default> \d lineitem

table name: default.lineitem
table path: hdfs://localhost:7020/tpch/lineitem
store type: TEXT
number of rows: unknown
volume: 753.9 MB
Options:
      'text.delimiter'='|'

schema:
l_orderkey    INT8
l_partkey     INT8
l_suppkey     INT8
l_linenumber  INT8
l_quantity    FLOAT4
l_extendedprice       FLOAT4
l_discount    FLOAT4
l_tax FLOAT4
l_returnflag  TEXT
l_linestatus  TEXT
l_shipdate    DATE
l_commitdate  DATE
l_receiptdate DATE
l_shipinstruct        TEXT
l_shipmode    TEXT
l_comment     TEXT


Indexes:
"l_orderkey_idx" TWO_LEVEL_BIN_TREE (l_orderkey ASC NULLS LAST )

For more information about index creation, please refer to the above links.

2. Enable/disable index scans

Reading data using index is disabled by default. So, exploiting the created index, you need a further step, enabling ‘index scan’ as following:

default> \set INDEX_ENABLED true

If you don’t want to use index scan anymore, you can simply disable it as follows:

default> \set INDEX_ENABLED false

Note

Once index scan is enabled, Tajo will perform ‘index scan’ if possible. In some cases, it may cause performance degradation. If you always want to get better performance, you should either enable or disable ‘index scan’ according to selectivity. Usually, the performance gain of index will increase when the selectivity is low.

3. Index backup and restore

Tajo currently provides only the catalog backup and restore for index. Please refer to Backup and Restore Catalog for more information about catalog backup and restore.