Session Variables¶

Once a tajo client connects to the Tajo master, it assigns a unique session. This session is kept until the client is disconnected or it is expired.

For the sake of more convenient user configuration, Tajo provides session variables. With session variables, different configurations are allowed for each session.

tsql provides the meta command \set to manipulate session variables. Just \set command shows all session variables.

default> \set
'name1'='val1'
'name2'='val2'
'name3'='val3'
     ...

\set key val will set the session variable named key with the value val.

default> \set
'CURRENT_DATABASE'='default'

default> \set key1 val1

default> \set
'CURRENT_DATABASE'='default'
'key1'='val1'

Also, \unset key will unset the session variable named key.

Currently, tajo provides the following session variables.

BROADCAST_NON_CROSS_JOIN_THRESHOLD

A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won’t exceed this threshold.

Configuration name: tajo.dist-query.broadcast.non-cross-join.threshold-kb

Property value: Integer

Unit: KB

Default value: 5120

Example

\set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120

BROADCAST_CROSS_JOIN_THRESHOLD

A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won’t exceed this threshold.

Configuration name: tajo.dist-query.broadcast.cross-join.threshold-kb

Property value: Integer

Unit: KB

Default value: 1024

Example

\set BROADCAST_CROSS_JOIN_THRESHOLD 1024

Warning

In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.

JOIN_TASK_INPUT_SIZE

The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the join query.

Configuration name: tajo.dist-query.join.task-volume-mb

Property value: Integer

Unit: MB

Default value: 64

Example

\set JOIN_TASK_INPUT_SIZE 64

JOIN_PER_SHUFFLE_SIZE

The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.

Configuration name: tajo.dist-query.join.partition-volume-mb

Property value: Integer

Unit: MB

Default value: 128

Example

\set JOIN_PER_SHUFFLE_SIZE 128

HASH_JOIN_SIZE_LIMIT

This value provides the criterion to decide the algorithm to perform a join in a task. If the input data is smaller than this value, join is performed with the in-memory hash join. Otherwise, the sort-merge join is used.

Configuration name: tajo.executor.join.common.in-memory-hash-threshold-mb

Property value: Integer

Unit: MB

Default value: 64

Example

\set HASH_JOIN_SIZE_LIMIT 64

Warning

This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully.

INNER_HASH_JOIN_SIZE_LIMIT

This value provides the criterion to decide the algorithm to perform an inner join in a task. If the input data is smaller than this value, the inner join is performed with the in-memory hash join. Otherwise, the sort-merge join is used.

Configuration name: tajo.executor.join.inner.in-memory-hash-threshold-mb

Property value: Integer

Unit: MB

Default value: 64

Example

\set INNER_HASH_JOIN_SIZE_LIMIT 64

Warning

This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully.

OUTER_HASH_JOIN_SIZE_LIMIT

This value provides the criterion to decide the algorithm to perform an outer join in a task. If the input data is smaller than this value, the outer join is performed with the in-memory hash join. Otherwise, the sort-merge join is used.

Configuration name: tajo.executor.join.outer.in-memory-hash-threshold-mb

Property value: Integer

Unit: MB

Default value: 64

Example

\set OUTER_HASH_JOIN_SIZE_LIMIT 64

Warning

This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully.

JOIN_HASH_TABLE_SIZE

The initial size of hash table for in-memory hash join.

Configuration name: tajo.executor.join.hash-table.size

Property value: Integer

Default value: 100000

Example

\set JOIN_HASH_TABLE_SIZE 100000

SORT_TASK_INPUT_SIZE

The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the sort query.

Configuration name: tajo.dist-query.sort.task-volume-mb

Property value: Integer

Unit: MB

Default value: 64

Example

\set SORT_TASK_INPUT_SIZE 64

EXTSORT_BUFFER_SIZE

A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.

Configuration name: tajo.executor.external-sort.buffer-mb

Property value: Integer

Unit: MB

Default value: 200

Example

\set EXTSORT_BUFFER_SIZE 200

SORT_LIST_SIZE

The initial size of list for in-memory sort.

Configuration name: tajo.executor.sort.list.size

Property value: Integer

Default value: 100000

Example

\set SORT_LIST_SIZE 100000

GROUPBY_MULTI_LEVEL_ENABLED

A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. Otherwise, 2-phase aggregation algorithm is used.

Configuration name: tajo.dist-query.groupby.multi-level-aggr

Property value: Boolean

Default value: true

Example

\set GROUPBY_MULTI_LEVEL_ENABLED true

GROUPBY_PER_SHUFFLE_SIZE

The aggregation is executed in two stages. When an aggregation query is executed, this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.

Configuration name: tajo.dist-query.groupby.partition-volume-mb

Property value: Integer

Unit: MB

Default value: 256

Example

\set GROUPBY_PER_SHUFFLE_SIZE 256

GROUPBY_TASK_INPUT_SIZE

The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the aggregation query.

Configuration name: tajo.dist-query.groupby.task-volume-mb

Property value: Integer

Unit: MB

Default value: 64

Example

\set GROUPBY_TASK_INPUT_SIZE 64

HASH_GROUPBY_SIZE_LIMIT

This value provides the criterion to decide the algorithm to perform an aggregation in a task. If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. Otherwise, the sort-based aggregation is used.

Configuration name: tajo.executor.groupby.in-memory-hash-threshold-mb

Property value: Integer

Unit: MB

Default value: 64

Example

\set HASH_GROUPBY_SIZE_LIMIT 64

Warning

This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully.

AGG_HASH_TABLE_SIZE

The initial size of hash table for in-memory aggregation.

Configuration name: tajo.executor.aggregate.hash-table.size

Property value: Integer

Default value: 10000

Example

\set AGG_HASH_TABLE_SIZE 10000

TIMEZONE

Refer to Time Zone.

Configuration name: tajo.timezone

Property value: Time zone id

Default value: Default time zone of JVM

Example

\set TIMEZONE GMT+9

DATE_ORDER

Date order specification.

Configuration name: tajo.datetime.date-order

Property value: One of YMD, DMY, MDY.

Default value: YMD

Example

\set DATE_ORDER YMD

PARTITION_NO_RESULT_OVERWRITE_ENABLED

If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.

Configuration name: tajo.partition.overwrite.even-if-no-result

Property value: Boolean

Default value: false

Example

\set PARTITION_NO_RESULT_OVERWRITE_ENABLED false

TABLE_PARTITION_PER_SHUFFLE_SIZE

In Tajo, storing a partition table is executed in two stages. This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.

Configuration name: tajo.dist-query.table-partition.task-volume-mb

Property value: Integer

Unit: MB

Default value: 256

Example

\set TABLE_PARTITION_PER_SHUFFLE_SIZE 256

ARITHABORT

A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.

Configuration name: tajo.behavior.arithmetic-abort

Property value: Boolean

Default value: false

Example

\set ARITHABORT false

MAX_OUTPUT_FILE_SIZE

Maximum per-output file size. 0 means infinite.

Property value: Integer

Unit: MB

Default value: 0

Example

\set MAX_OUTPUT_FILE_SIZE 0

SESSION_EXPIRY_TIME

Session expiry time.

Property value: Integer

Unit: seconds

Default value: 3600

Example

\set SESSION_EXPIRY_TIME 3600

CLI_COLUMNS

Sets the width for the wrapped format.

Property value: Integer

Default value: 120

Example

\set CLI_COLUMNS 120

CLI_NULL_CHAR

Sets the string to be printed in place of a null value.

Property value: String

Default value: ‘’

Example

\set CLI_NULL_CHAR ''

CLI_PAGE_ROWS

Sets the number of rows for paging.

Property value: Integer

Default value: 100

Example

\set CLI_PAGE_ROWS 100

CLI_PAGING_ENABLED

Enable paging of result display.

Property value: Boolean

Default value: true

Example

\set CLI_PAGING_ENABLED true

CLI_DISPLAY_ERROR_TRACE

Enable display of error trace.

Property value: Boolean

Default value: true

Example

\set CLI_DISPLAY_ERROR_TRACE true

CLI_FORMATTER_CLASS

Sets the output format class to display results.

Property value: Class name

Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter

Example

\set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter

ON_ERROR_STOP

tsql will exit if an error occurs.

Property value: Boolean

Default value: false

Example

\set ON_ERROR_STOP false

NULL_CHAR

Null char of text file output. This value is used when the table property text.null is not specified.

Property value: String

Default value: ‘\N’

Example

\set NULL_CHAR '\\N'

DEBUG_ENABLED

A flag to enable debug mode.

Property value: Boolean

Default value: false

Example

\set DEBUG_ENABLED false

FETCH_ROWNUM

The number of rows to be fetched from Master each time.

Property value: Integer

Default value: 200

Example

\set FETCH_ROWNUM 200