Release Notes - Tajo - Version 0.11.0
Changes since Tajo 0.10.0
Sub-task
- [TAJO-921] - Add STDDEV_SAMP and STDDEV_POP window functions
- [TAJO-923] - Add VAR_SAMP and VAR_POP window functions
- [TAJO-1260] - Add ALTER TABLE ADD/DROP PARTITION statement to parser
- [TAJO-1284] - Add alter partition method to CatalogStore
- [TAJO-1329] - Improve Schema class to support nested struct support
- [TAJO-1337] - Implements common modules to handle RESTful API
- [TAJO-1338] - Defines RESTful API for Clients
- [TAJO-1345] - Implement logical plan part and DDL executor for alter partition.
- [TAJO-1346] - Create dynamic partitions to CatalogStore by running insert query or CTAS query.
- [TAJO-1351] - Resolve findbug warnings on Tajo Common Module
- [TAJO-1353] - Nested record support in CREATE TABLE statement
- [TAJO-1357] - Resolve findbugs warnings on Tajo Catalog Modules
- [TAJO-1359] - Add nested field projector and language extension to project nested record
- [TAJO-1362] - Resolve findbug warnings on Tajo Core Module
- [TAJO-1392] - Resolve findbug warnings on Tajo Plan Module
- [TAJO-1393] - Resolve findbug warnings on Tajo Cli Module
- [TAJO-1437] - Resolve findbug warnings on Tajo JDBC Module
- [TAJO-1464] - Add ORCFileScanner to read ORCFile table
- [TAJO-1465] - Add ORCFileAppender to write into ORCFile table
- [TAJO-1484] - Apply on ColPartitionStoreExec
- [TAJO-1493] - Make partition pruning based on catalog informations
- [TAJO-1496] - Remove legacy CSVFile
- [TAJO-1514] - Distinguish UNION and UNION ALL
- [TAJO-1525] - Implement INTERSECT [ALL] physical operator
- [TAJO-1529] - Implement json_extract_path_text(string, string) function
- [TAJO-1599] - Implement NodeResourceManager and Status updater
- [TAJO-1603] - Refactor StorageManager
- [TAJO-1613] - Rename StorageManager to Tablespace
- [TAJO-1614] - Configuration format proposal for generic storage support
- [TAJO-1615] - Implement TaskManager
- [TAJO-1616] - Implement TablespaceManager to load Tablespaces
- [TAJO-1617] - Storage configuration reader
- [TAJO-1641] - Add window function documentation
- [TAJO-1652] - Automatic metadata registeration from underlying storages
- [TAJO-1658] - Filter push down to underlying storages
- [TAJO-1663] - Change the variable name storeType to dataFormat
- [TAJO-1670] - Refactor client errors and exceptions
- [TAJO-1673] - Implement recover partitions
- [TAJO-1684] - CREATE EXTERNAL TABLE should allows just a path.
- [TAJO-1691] - Refactor visitors and planner to throw TajoException
- [TAJO-1693] - Rearrange metric names
- [TAJO-1694] - Add cluster resource metrics
- [TAJO-1723] - INSERT INTO statement should allow nested fields as target columns
- [TAJO-1735] - Implement MetadataProvider and LinkedMetadataManager
- [TAJO-1737] - Implement SQL Parser rule for Map type
- [TAJO-1739] - Add a statement for adding partition to TajoDump
- [TAJO-1748] - Refine client APIs to throw specific exceptions
- [TAJO-1749] - Refine JDBC exceptions to better handle exceptional cases.
- [TAJO-1754] - Implement several functions to handle json array
- [TAJO-1758] - Some TajoRuntimeException should be restored as TajoException in client side
- [TAJO-1787] - Remove unused and legacy exceptions
- [TAJO-1824] - Remove partition_keys table from information_schema
- [TAJO-1826] - Revert 'refining code for Parquet 1.8.1'
- [TAJO-1841] - Eliminate explicit diamond expressions in tajo-{algebra,rpc}
- [TAJO-1853] - Add tablespace syntax to the CREATE TABLE section of DDL page
- [TAJO-1887] - Disable the alter table add partition statement temporarily.
Bug
- [TAJO-1146] - TajoWorkerResources are occasionally not released after the query completion relating to semaphore condition.
- [TAJO-1147] - Simple query doesn't work in Web UI
- [TAJO-1277] - GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins
- [TAJO-1283] - ORDER BY with the first descending order causes wrong results
- [TAJO-1316] - NPE occurs when performing window functions after join
- [TAJO-1324] - Remove warehouse directory rewriting in Unit Test
- [TAJO-1325] - Invalid history cleaner timeout
- [TAJO-1341] - The result type of modular operation must be integer.
- [TAJO-1356] - Race conditions in QueryInProgress
- [TAJO-1360] - VALUES_ field in OPTIONS table of catalog store should be longer.
- [TAJO-1365] - Suppress release audit warnings on Jekins builds
- [TAJO-1368] - Exceptions during processing nested union queries
- [TAJO-1370] - TUtils.checkEquals() is not consistent with description in javadoc
- [TAJO-1384] - Duplicated output file path problem
- [TAJO-1386] - CURRENT_DATE generates parsing errors sometimes.
- [TAJO-1387] - Correct error message for EXISTS clause
- [TAJO-1396] - Unexpected IllegalMonitorStateException can be thrown in QueryInProgress
- [TAJO-1399] - TajoResourceAllocator might hang on network error
- [TAJO-1405] - Fix some illegal way of usages on connection pool
- [TAJO-1413] - tsql should not print a stacktrace message of not critical exceptions
- [TAJO-1414] - Two RemoteException in rpc module
- [TAJO-1434] - Fix supporting version of Hadoop
- [TAJO-1440] - Some tests fail in parallel test environment in TestKillQuery
- [TAJO-1445] - Optimizer removes some filter in where clause.
- [TAJO-1446] - Comparison of boolean datum is not valid
- [TAJO-1449] - TestSelectQuery.testExplainSelect() fails
- [TAJO-1467] - Parenthesis at the start of SQL query is ignored
- [TAJO-1468] - Integration test does not terminate occasionally
- [TAJO-1469] - allocateQueryMaster can leak resources if it times-out (3sec, hardcoded)
- [TAJO-1479] - NPE during startup CatalogStore
- [TAJO-1481] - Numeric conversion of Inet4 type should be considered as unsigned
- [TAJO-1485] - Datum 'Char' returned only 1byte.
- [TAJO-1497] - RPC client does not share a connection
- [TAJO-1510] - Change a function name from getFileCunks to getFileChunks
- [TAJO-1512] - Fix various Expr cloning bugs
- [TAJO-1522] - NPE making stage history before task scheduler is initialized
- [TAJO-1534] - DelimitedTextFile return null instead of a NullDatum
- [TAJO-1536] - Fix minor issues in QueryMaster
- [TAJO-1538] - TajoWorkerResourceManager.allocatedResourceMap is increasing forever
- [TAJO-1540] - RpcCallback must be able to handle TimeoutException or cancel.
- [TAJO-1541] - Connection timeout in netty client is not working
- [TAJO-1556] - "insert into select" with reordered column list does not work.
- [TAJO-1558] - HBASE_LIB/hbase-server-*.jar should be included in the CLASSPATH
- [TAJO-1560] - HashShuffle report should be ignored when a succeed tasks are not included
- [TAJO-1564] - TestFetcher fails occasionally
- [TAJO-1569] - BlockingRpcClient can make other request fail
- [TAJO-1574] - Fix NPE on natural join
- [TAJO-1580] - Error line number is incorrect
- [TAJO-1581] - Does not update last state of query stage in non-hash shuffle
- [TAJO-1582] - Occasional resource leak in RawFile during test
- [TAJO-1586] - TajoMaster HA startup failure on Yarn.
- [TAJO-1593] - Add missing stop condition to Taskrunner
- [TAJO-1596] - TestPythonFunctions occasionally fails
- [TAJO-1597] - Problem of ignoring theta join condition
- [TAJO-1598] - TableMeta should change equals mechanism
- [TAJO-1600] - Invalid query planning for distinct group-by
- [TAJO-1601] - '\d information_schema' does not work
- [TAJO-1605] - Fix master build failure on jdk 1.6
- [TAJO-1606] - IF NOT EXISTS is not checked for CTAS queries
- [TAJO-1610] - Cannot find column when the same name is used for table and database
- [TAJO-1620] - random() in an SQL should generate RANDOM numbers
- [TAJO-1621] - Compilation error with hadoop 2.7.0
- [TAJO-1622] - UniformRangePartition occasionally causes IllegalStateException
- [TAJO-1630] - Test failure after TAJO-1130
- [TAJO-1634] - REST API: fix error when offset is zero
- [TAJO-1642] - CatalogServer need to check meta table first.
- [TAJO-1644] - When inserting empty data into a partitioned table, existing data would be removed.
- [TAJO-1650] - TestQueryResource.testGetAllQueries() occasionally fails
- [TAJO-1674] - Validation of CTAS schema mismatch
- [TAJO-1676] - Queries on information_schema.partitions table return invalid result
- [TAJO-1679] - Client APIs should validate identifiers for database object names
- [TAJO-1681] - Fix TajoDump invalid null check for database name
- [TAJO-1689] - Metrics file reporter prints histogram metric without group name.
- [TAJO-1690] - Create table using text file fails in HiveCatalogStore.
- [TAJO-1697] - RCFile progress causes NPE occasionally
- [TAJO-1702] - Fix race condition in finished query cache
- [TAJO-1706] - querytasks.jsp in Worker causes NPE when a running EB page is open.
- [TAJO-1707] - Rack local count can be more than actual number of tasks.
- [TAJO-1712] - querytasks.jsp throws NPE occasionally when tasks are running.
- [TAJO-1716] - Repartitioner.makeEvenDistributedFetchImpl() does not distribute fetches evenly
- [TAJO-1725] - TaskContainer can hang when RuntimeException occurs in TaskImpl.
- [TAJO-1726] - Error in a running TaskAttempt may cause invalid event.
- [TAJO-1727] - Avoid to create external table using TableSpace
- [TAJO-1731] - With a task failure, query processing is hanged after first retry
- [TAJO-1732] - During filter push down phase, join conditions are not set properly
- [TAJO-1733] - Finished query occasionally does not appear in Web-UI
- [TAJO-1741] - Two tables having same time zone display different timestamps
- [TAJO-1742] - Remove hadoop dependency in DatumFactory
- [TAJO-1752] - NameResolver cannot find nested records properly
- [TAJO-1763] - tpch/*.tbl files cannot be founded in maven modules except for core-tests.
- [TAJO-1776] - Fix Invalid column type in JDBC
- [TAJO-1777] - JsonLineDeserializer returns invalid unicode text, if contains control character
- [TAJO-1779] - Remove "DFSInputStream has been closed already" messages in DelimitedLineReader
- [TAJO-1781] - Join condition is still not found when it exists in OR clause
- [TAJO-1782] - Check ON_ERROR_STOP flag in TSQL when error is occured
- [TAJO-1783] - Query result is not returned by invalid output path
- [TAJO-1790] - TestTajoClientV2::testExecuteQueryAsyncWithListener occasionally is failed
- [TAJO-1796] - Count all query for Parquet is crashed
- [TAJO-1797] - Workers cannot bind web server address when it starts
- [TAJO-1798] - Dynamic partitioning occasionally fails.
- [TAJO-1799] - Fix incorrect event handler when kill-query failed
- [TAJO-1800] - WHERE clause is ignored with UNION
- [TAJO-1801] - Table name is not unique of tableDescMap in QueryMasterTask
- [TAJO-1802] - PythonScriptEngine copies controller and tajo util whenever it is initialized
- [TAJO-1808] - Wrong table type problem in catalog
- [TAJO-1811] - Catalog server address must be set dynamically during test
- [TAJO-1815] - Catalog store initialization with PostgreSQL failed
- [TAJO-1819] - Cannot find existing tables when pgsql catalog starts up
- [TAJO-1820] - Fix wrong case sensitivity rules of non-reserved keywords
- [TAJO-1821] - Temporary data is not cleared after TestCatalog
- [TAJO-1822] - Missing return state when CatalogServer.getAllIndexes() is called
- [TAJO-1823] - Can't start TajoMaster
- [TAJO-1827] - JSON parsing error at storage-site.json while tajo master starts up
- [TAJO-1829] - Fix DelimitedTextFileAppender NPE in negative tests
- [TAJO-1830] - Fix race condition in HdfsServiceTracker
- [TAJO-1835] - TajoClient::executeQueryAndGetResult should throw Query(Failed|Killed)Exception
- [TAJO-1838] - Create external table fails using hbase if tablespaces don't include the location.
- [TAJO-1839]
- IllegalStateException when querying external table using
hbase if storage-site.json doesn't exist on configuration directory.
- [TAJO-1846] - Python temp directory path should be selected differently based on user platform
- [TAJO-1848] - ShutdownHook in TajoMaster can throw NPE if serviceInit() is failed.
- [TAJO-1851] - Can not release a different rack task
- [TAJO-1861] - TSQL should change line after printout error message during connecting other database
- [TAJO-1863] - CTAS with union clause does not work properly
- [TAJO-1869] - Incorrect result when sorting table with small files
- [TAJO-1871] - '-DskipTests' flag does not work
- [TAJO-1873] - Fix NPE in QueryExecutorServlet
- [TAJO-1884] - Add missing jetty-util dependency
- [TAJO-1889] - UndefinedColumnException when a query with table subquery is executed on self-describing tables
- [TAJO-1894] - Filter condition is ignored when a query involves multiple subqueries and aggregations
Improvement
- [TAJO-680] - Improve the IN operator to support sub queries
- [TAJO-751] - JDBC driver should support cancel() method.
- [TAJO-993] - Cleanup the result data in HDFS after query finished.
- [TAJO-1134] - start-tajo.sh should display WEB UI URL and TajoMaster RPC address
- [TAJO-1160] - Remove Hadoop dependency from tajo-client module
- [TAJO-1269] - Separate cli from tajo-client
- [TAJO-1311] - Enable Scattered Hash Shuffle for CTAS statement.
- [TAJO-1326] - Remove legacy memory task histories
- [TAJO-1328] - Fix deprecated property names in the catalog configuration document
- [TAJO-1335] - Bump up 0.10.0-SNAPSHOT to 0.11.0-SNAPSHOT in master branch
- [TAJO-1340] - Change the default output file format.
- [TAJO-1343] - Improve the memory usage of physical executors
- [TAJO-1350] - Refactor FilterPushDownRule::visitJoin() into well-defined, small methods
- [TAJO-1352] - Improve the join order algorithm to consider missed cases of associative join operators
- [TAJO-1369] - Some stack trace information is missed in error/fail logging
- [TAJO-1374] - Support multi-bytes delimiter for CSV file
- [TAJO-1381] - Support multi-bytes delimiter for Text file
- [TAJO-1383] - Improve broadcast table cache
- [TAJO-1391] - RpcConnectionPool should check reference counter of connection before close
- [TAJO-1394] - Support reconnect on tsql
- [TAJO-1395] - Remove deprecated sql files for Oracle and PostgreSQL
- [TAJO-1397] - Resource allocation should be fine grained.
- [TAJO-1400] - Add TajoStatement::setMaxRows method support
- [TAJO-1403] - Improve 'Simple Query' with only partition columns and constant values
- [TAJO-1408] - Make IntermediateEntryProto more compact
- [TAJO-1409] - Clients calling remote services returning BoolProto ignores false values
- [TAJO-1418] - Comment on TAJO_PULLSERVER_STANDALONE in tajo-env.sh is not consistent
- [TAJO-1422] - Investigate the case where fragments == null in SeqScanExec
- [TAJO-1436] - Add Bind method to EvalNode
- [TAJO-1442] - Improve Hive Compatibility
- [TAJO-1454] - Comparing two date or two timestamp need not normalizing
- [TAJO-1460] - Apply TAJO-1407 to ExternalSortExec
- [TAJO-1495] - Clean up CatalogStore
- [TAJO-1499] - Check the bind status when EvalNode::eval() is called
- [TAJO-1501] - Too many log message of HashShuffleAppenderManager.
- [TAJO-1507] - Resource leak when a worker does not response KILLED message
- [TAJO-1508] - ResourceTracker does not update workers' resource capacities after the first join
- [TAJO-1509] - Use dedicated thread to release resource allocated to container
- [TAJO-1523] - ClassSize should consider compressed oops
- [TAJO-1530] - Display warn message when the query kill button is clicked in WEB UI.
- [TAJO-1542] - Refactoring of HashJoinExecs
- [TAJO-1548] - Refactoring condition code for CHAR into CatalogUtil
- [TAJO-1551] - Reuse allocated resources in next execution block if possible
- [TAJO-1553] - Improve broadcast join planning
- [TAJO-1563] - Improve RPC error handling
- [TAJO-1570] - CatalogUtil newSimpleDataTypeArray should use newSimpleDataType
- [TAJO-1576] - Sometimes DefaultTajoCliOutputFormatter.parseErrorMessage() eliminates an important kind of information.
- [TAJO-1577] - Add test cases to verify join plans
- [TAJO-1584] - Remove QueryMaster client sharing in TajoMaster and TajoWorker
- [TAJO-1591] - Change StoreType represented as Enum to String type
- [TAJO-1595] - Pluggable Storage Handler
- [TAJO-1607] - Tajo Rest Cache-Id should be bigger than zero
- [TAJO-1623] - INSERT INTO with wrong target columns causes NPE.
- [TAJO-1624] - Add managed table or external description in Table management section
- [TAJO-1625] - Recap error propagation system
- [TAJO-1626] - JdbcConnection::setAutoCommit() should not throw an exception.
- [TAJO-1633] - Cleanup TajoMasterClientService
- [TAJO-1636] - query rest api uri should change from /databases/{database_name}/queies to /queries
- [TAJO-1638] - Remove offset parameter from rest api result/{cacheId}
- [TAJO-1645] - Bump up hbase to 1.1.1
- [TAJO-1646] - Add extlib directory for third-party libraries
- [TAJO-1649] - Change Rest API /databases/{database-name}/functions to /functions
- [TAJO-1651] - Too long fetcher default retries
- [TAJO-1659] - Simplify scan iteration in SeqScan
- [TAJO-1660] - Update copyright year in NOTICE
- [TAJO-1672] - Removing rest api to create table POST /databases/{database-name}/tables interface
- [TAJO-1677] - Remove unnecessary messages for the Travis CI build
- [TAJO-1695] - Shuffle fetcher executor should be consider random writing
- [TAJO-1696] - Resource calculator should consider the requested disk resource at the first stage
- [TAJO-1699] - Tajo Java Client version 2
- [TAJO-1700] - Add better exception handling in TajoMasterClientService
- [TAJO-1701] - Remove forward or non-forward query concept in TajoClient
- [TAJO-1703] - Remove hardcoded value in NodeStatusUpdater
- [TAJO-1715] - Precompute the hash value of various kinds of ids
- [TAJO-1717] - Parquet Update
- [TAJO-1721] - Separate routine for CREATE TABLE from DDLExecutor
- [TAJO-1729] - No handling of default case in DDLExecutor
- [TAJO-1736] - Remove unnecessary getMountPath()
- [TAJO-1738] - Improve off-heap RowBlock
- [TAJO-1743] - Improve calculation of intermediate table statistics
- [TAJO-1745] - Add positive and negative test methods
- [TAJO-1746] - Improve resource usage at first request of DefaultTaskScheduler
- [TAJO-1751] - Reduce the client connection timeout
- [TAJO-1757] - Add examples for TajoClient v2
- [TAJO-1761] - Separate an integration unit test kit into an independent module.
- [TAJO-1766] - Improve the performance of cross join
- [TAJO-1775] - HCatalogStore need to be deprecated
- [TAJO-1780] - QueryTestCaseBase should be instantized for each test class
- [TAJO-1792] - tajo-cluster-tests is not available when it is used as an external maven module.
- [TAJO-1810] - Remove QueryMasterTask cache immediately, if it stored to persistent storage
- [TAJO-1814] - Add some positive/negative tests for CREATE TABLE IF NOT EXISTS
- [TAJO-1816] - Refactor SQL parser tests
- [TAJO-1817] - Improve SQL parser error message
- [TAJO-1818] - Separate sql parser into an independent maven module
- [TAJO-1825] - Remove zero length fragments when file length is zero
- [TAJO-1828] - tajo-daemon scripts should kill process after process can not stop gracefully
- [TAJO-1831] - Add a shutdown hook manager in order to set priorities
- [TAJO-1847] - Support Socket/Connection Timeout in Rpc
- [TAJO-1860] - Refactor Rpc clients to take Connection Parameters
- [TAJO-1867] - TajoMaster should handle the change of ${tajo.root}.
- [TAJO-1868] - Allow TablespaceManager::get to return unregistered tablespace
- [TAJO-1885] - Simple query with projection should be supported
- [TAJO-1890] - Clean up debug and test modes and unhandled exceptions
New Feature
- [TAJO-29] - ORCFile integration
- [TAJO-1135] - Implement queryable virtual table for cluster information
- [TAJO-1344] - Python UDF support
- [TAJO-1421] - Add 'ALTER TABLE SET PROPERTY' statement
- [TAJO-1430] - Improve SQLAnalyzer by session-based parsing-result caching
- [TAJO-1486] - Text file should support to skip header rows when creating external table
- [TAJO-1494] - Add SeekableScanner support to DelimitedTextFileScanner
- [TAJO-1537] - Implement a virtual table for sessions
- [TAJO-1562] - Python UDAF support
- [TAJO-1661] - Implement CORR function
- [TAJO-1730] - JDBC Tablespace support
- [TAJO-1812] - Timezone support in JSON file format
- [TAJO-1832] - Well support for self-describing data formats
Task
- [TAJO-1076] - Add start second worker documentation
- [TAJO-1273] - Merge DirectRawFile to master branch
- [TAJO-1300] - Merge the index branch into the master branch
- [TAJO-1314] - Documentation for the support of the swift
- [TAJO-1380] - Update JDBC documentation for new JDBC driver
- [TAJO-1424] - Investigate the problem of too many "Try to connect" messeges during Travic CI build
- [TAJO-1439] - Some method name is written wrongly
- [TAJO-1450] - Encapsulate Datum in Tuple
- [TAJO-1482] - Cleanup the legacy cluster mode
- [TAJO-1559] - Fix data model description (tinyint, smallint)
- [TAJO-1567] - Update old license in some pom.xml files
- [TAJO-1568] - Apply UnpooledByteBufAllocator when a tajo.test.enabled is set to enable
- [TAJO-1575] - HBASE_HOME guidance is duplicated in tajo-env.sh
- [TAJO-1583] - Remove ServerCallable in RPC client
- [TAJO-1587] - Upgrade java version to 1.7 for Travis CI
- [TAJO-1590] - Moving to JDK 7
- [TAJO-1628] - Add a document for join operation
- [TAJO-1682] - Write ORC document
- [TAJO-1687] - sphinx-mavan-plugin version should be 1.0.3
- [TAJO-1713] - Change the type of edge cache in JoinGraphContext from HashMap to LRUMap
- [TAJO-1744] - Porting bash shell scripts to Windows command shell scripts.
- [TAJO-1750] - Upgrade hadoop dependency to 2.7.1
- [TAJO-1755] - Add documentation for missing built-in functions
- [TAJO-1803] - Use in-memory derby as the default catalog for unit tests
- [TAJO-1805] - In the 'Execute Query' page of web UI, default database should be set as 'default'
- [TAJO-1809] - Change default value of several configurations
- [TAJO-1813] - Allow external catalog store for unit testing
- [TAJO-1833] - Refine LogicalPlanPreprocessor to add new rules easily
- [TAJO-1845] - Enforcers in the master plan should be printed in a fixed order
- [TAJO-1872] - Increase the minimum split size and add a classpath to hadoop tools
Test
- [TAJO-1870] - Enable tests of tajo-storage-pgsql module when arch type is 64-bit