Apache Tajo 0.8.0 Release Notes
Changes since Tajo 0.2.0-incubating
Sub-task
- [TAJO-60] - Implement Date Datum Type
- [TAJO-61] - Implement Time Datum Type
- [TAJO-62] - Implement Timestamp Datum type
- [TAJO-207] - Implement bit_length(string) function
- [TAJO-215] - Catalog should allow compatible types when finding functions
- [TAJO-218] - HiveQLAnalyzer has to support cast expression.
- [TAJO-284] - Add table partitioning entry to Catalog.
- [TAJO-285] - Add CREATE TABLE... BY PARTITION statement to parser
- [TAJO-289] - HCatalogStore supports SELECT statement
- [TAJO-297] - Rename JDBC variables in CatalogConstants to be more generic.
- [TAJO-300] - HCatalogStore supports DROP TABLE statement
- [TAJO-301] - HCatalogStore supports CREATE TABLE statement
- [TAJO-311] - Improve Hive dependency
- [TAJO-312] - Implement distributed execution part of outer join
- [TAJO-313] - Support deprecated variables in CatalogConstants.
- [TAJO-318] - Remove unnecessary Hive dependencies
- [TAJO-327] - Add testcase to verify TAJO-16
- [TAJO-329] - Implement physical operator to store in column-partitioned table.
- [TAJO-338] - Add Query Optimization Part for Column-Partitioned Tables
- [TAJO-346] - Implement hex function
- [TAJO-348] - Implement octet_length(text)
- [TAJO-349] - Implement md5(text)
- [TAJO-351] - Implement reverse(text)
- [TAJO-352] - Implement right/left(text, size) function
- [TAJO-355] - Implement repeat(text,int) function
- [TAJO-364] - Implement mod/div function
- [TAJO-365] - Implement degrees/radians function
- [TAJO-369] - Add CREATE EXTERNAL TABLE... BY PARTITION statement to parser
- [TAJO-392] - Implement cbrt function
- [TAJO-394] - Implement abs function
- [TAJO-395] - Implement exp function
- [TAJO-396] - Implement sqrt function
- [TAJO-397] - Implement sign function
- [TAJO-400] - Implement pow(float8, float8) function
- [TAJO-405] - Improve HCatalogStore to support partitioned table.
- [TAJO-409] - Add explored and explained annotations to Tajo function system
- [TAJO-432] - Add shuffle phase for column-partitioned table store
- [TAJO-436] - Implement ceiling(FLOAT8) function
- [TAJO-437] - Timestamp literal support
- [TAJO-438] - Date literal support
- [TAJO-439] - Time literal support
- [TAJO-460] - CTAS statement should support partitioned table
- [TAJO-475] - Table partition catalog recap
- [TAJO-482] - Implements listing functions and describing a specified function.
- [TAJO-495] - Implement sha1(text)
- [TAJO-498] - Implement digest(text, text) function
- [TAJO-500] - Add description annotation to functions
- [TAJO-513] - IP clearance document
- [TAJO-515] - Configurable Text (De)serializer Documentation
- [TAJO-517] - Publish Tajo jar to a public maven repository
- [TAJO-526] - HCatalogStore Documentation
- [TAJO-529] - Fix warnings in tajo-algebra
- [TAJO-530] - Fix warnings in tajo-catalog
- [TAJO-531] - Fix warnings in tajo-client
- [TAJO-532] - Fix warnings in tajo-common
- [TAJO-535] - Fix warnings in tajo-rpc
- [TAJO-536] - Fix warnings in tajo-core-storage
- [TAJO-545] - MySQLStore Documentation
- [TAJO-578] - Update configuration for tajo-site.xml
- [TAJO-615] - Implement ADD TABLE RENAME TABLE
- [TAJO-659] - Add Tajo JDBC documentation
- [TAJO-669] - Add cluster setup documentation
- [TAJO-696] - Implement ALTER TABLE ADD COLUMN
- [TAJO-697] - Implement ALTER TABLE RENAME COLUMN
- [TAJO-736] - Add table management documentation
Bug
- [TAJO-160] - StorageManager throws InvalidInputException while running simple join query
- [TAJO-182] - Correct NULL value handling of primitive operators
- [TAJO-192] - SELECT statement with Limit clause should result in rows without a distributed query execution
- [TAJO-268] - Temporal files should be removed after query is finished
- [TAJO-272] - boolean test does not work correctly
- [TAJO-273] - NotEval incurs NPE with boolean column
- [TAJO-277] - Infinite loop occurs when a table is empty
- [TAJO-281] - 'mvn package -Pdist' generates duplicate Tajo jar files
- [TAJO-290] - TajoDataType.Type.NULL should be NULL_TYPE
- [TAJO-292] - Too many intermediate partition files
- [TAJO-293] - querymasters directory not found in single node setup
- [TAJO-294] - Removing dead workers from the live worker list
- [TAJO-295] - ConcurrentModificationException in TaskScheduler
- [TAJO-296] - Late registration of Tajo workers
- [TAJO-320] - Visualize Tajo statemachine
- [TAJO-321] - Invalid split file of compressed text file
- [TAJO-332] - Invalid row count of CSVScanner
- [TAJO-334] - select count error?
- [TAJO-335] - Unknown logical node type error occurs when a query includes some expressions
- [TAJO-340] - Some joins with inline view causes 'Not all join conditions are pushed down to joins'
- [TAJO-344] - Tajo cannot recognize negative numeric expressions
- [TAJO-345] - MergeScanner should support projectable storages
- [TAJO-347] - Fix bug when to call function with insensitive function name.
- [TAJO-354] - Fix invalid type to valid type for udfs(bit_length/char_length)
- [TAJO-357] - Fix invalid filename TestMethFunction to TestMathFUnction
- [TAJO-360] - If there is no matched function, catalog causes NPE.
- [TAJO-372] - When an exception except for network issues occurs, the operation should not be repeated.
- [TAJO-375] - TajoClient can't get result data when different os user
- [TAJO-387] - Query is hanging when errors occurs in Query or SubQuery class
- [TAJO-388] - limit clause does not work properly
- [TAJO-389] - The LazyTuple does not work when number format exception occurs in text deserializer
- [TAJO-390] - Queries on history are expired earlier than a given expiry time.
- [TAJO-393] - Unit tests must use test-data directory.
- [TAJO-403] - HiveQLAnalyzer should supports standard function in the GROUP BY Clause.
- [TAJO-404] - Tajo does not recognize boolean literal
- [TAJO-406] - PullServer occasionally causes could not find xxx in any of the configured local directories
- [TAJO-407] - PostgreSQL-style cast should have higher operator priority
- [TAJO-410] - A query with a combination of general and distinct aggregation functions fails
- [TAJO-415] - Some complex queries causes NPE and unlimited recursions.
- [TAJO-417] - TestSQLExpression.testCastFromTable causes unit test failure
- [TAJO-418] - sort operator after Inline views consisting of unions can cause an incorrect distributed plan
- [TAJO-422] - Support single row functions at GROUP BY clause.
- [TAJO-423] - Using aggregation query on local file system.
- [TAJO-426] - HCatalogStore created partitions automatically.
- [TAJO-427] - Empty table makes IndexOutOfBoundsException at LEFT OUTER JOIN clause.
- [TAJO-428] - CASE WHEN IS NULL condition is a problem using LEFT OUTER JOIN
- [TAJO-431] - HCatalogStore can't write any data using INSERT OVERWRITE clause.
- [TAJO-442] - Cast operator with nested functions causes NPE
- [TAJO-443] - Order by query gives NullPointerException at at org.apache.tajo.catalog.Schema.getColumnId(Schema.java:142)
- [TAJO-444] - Tajo fails to parse order by query with "is null" predicate in sort key
- [TAJO-445] - testCastWithNestedFunctions causes unit test failure
- [TAJO-448] - Timestamp should be based on unixtime
- [TAJO-450] - Incorrect inet4datum comparison
- [TAJO-451] - Update documentation and version constant for Tajo 0.8
- [TAJO-452] - Timstamp literal with fractional seconds results in java.lang.ArrayIndexOutOfBoundsException
- [TAJO-453] - PartitionedStoreExec can cause NPE due to column schema mismatch
- [TAJO-454] - pass invalid argument to DateTime constructor in LogicalPlanner
- [TAJO-467] - Too many open FD when master failed.
- [TAJO-469] - CTAS with no column definition will get a NPE
- [TAJO-470] - Fetcher's finished time and file length is changed in WEB UI.
- [TAJO-479] - Rename obsolete name 'partition' to 'shuffle and fix the broken taskdetail.jsp.
- [TAJO-485] - 'CREATE TABLE AS' does not work properly with partition
- [TAJO-488] - Data fetcher doesn't close small file in shuffle
- [TAJO-490] - Tajo can't use 'dfs.nameservices' based on namenode ha mode.
- [TAJO-492] - Cannot create a table named `time`
- [TAJO-493] - maven pom.xml should enforce protobuf 2.5
- [TAJO-496] - java.lang.NoSuchFieldError: IS_SECURITY_ENABLED when debugging tajo
- [TAJO-502] - Jenkins build is failing
- [TAJO-503] - HCatalogStore can't scan several hive databases.
- [TAJO-504] - when inserting to a column partitioned table, if a queryunit attempt fails, an AlreadyExistsStorageException will throw
- [TAJO-506] - RawFile cannot support DATE type
- [TAJO-507] - Column partitioned table is not dropped
- [TAJO-511] - Sometimes, a query progress becomes higher than 100%.
- [TAJO-518] - tajo-algebra and ProjectionPushDownRule code cleanup
- [TAJO-519] - HCatalogStore can't support OrderBy clause.
- [TAJO-522] - OutOfMemoryError: unable to create new native thread
- [TAJO-525] - Concurrent queries hang
- [TAJO-537] - After TAJO-522, still OutOfMemoryError: unable to create new native thread
- [TAJO-538] - Groupby queries with constant target lists doesn't work
- [TAJO-541] - Parsing Group by clause differently from Hive
- [TAJO-544] - Thread pool abusing
- [TAJO-549] - Worker fails to find some columns when it processes order by queries
- [TAJO-556] - java.lang.NoSuchFieldError: IS_SECURITY_ENABLED
- [TAJO-557] - HCatalogStore can't scan partitioned tables.
- [TAJO-558] - HCatalogStore can't scan columns.
- [TAJO-559] - CTAS with partition causes OOM
- [TAJO-560] - CTAS PARTITION BY with UNION can cause invalid global plan
- [TAJO-561] - PhysicalPlanner::createBestSortPlan should consider input size of leaf tasks
- [TAJO-563] - INSERT OVERWRITE should not remove data before query success
- [TAJO-565] - FilterPushDown rewrite rule does not push filters on partitioned scans
- [TAJO-566] - BIN/TAJO_DUMP makes wrong ddl script.
- [TAJO-567] - Expression projection bugs
- [TAJO-568] - Union query with the same alias names cause NPE
- [TAJO-569] - Add max(TEXT) function
- [TAJO-570] - InvalidOperationException in outer join with constant values
- [TAJO-575] - Worker's env.jsp has wrong URL which go to worker's index.jsp.
- [TAJO-576] - Add omitted explain feature
- [TAJO-577] - Support S3FileSystem split
- [TAJO-580] - Union query with partitioned tables cause NPE.
- [TAJO-581] - Inline view on column partitioned table causes NPE
- [TAJO-582] - Invalid split calculation
- [TAJO-583] - Broadcast join does not work on partitioned tables
- [TAJO-586] - containFunction shouldn't throw NoSuchFunctionException
- [TAJO-588] - In some case, leaf task of DefaultTaskScheduler are not distributed execution
- [TAJO-590] - Rename HiveConverter to HiveQLAnalyzer
- [TAJO-593] - outer groupby and groupby in derived table causes only one shuffle output number
- [TAJO-594] - MySQL store doesn't work
- [TAJO-595] - The same expressions without different alias are not allowed.
- [TAJO-606] - Statemachine visualization fails
- [TAJO-607] - Statemachine visualization fails
- [TAJO-608] - Statemachine visualization fails
- [TAJO-609] - PlannerUtil::getRelationLineage ignores PartitionedTableScanNode
- [TAJO-610] - Refactor Column class
- [TAJO-619] - SELECT count(1) after joins on text keys causes wrong plans
- [TAJO-620] - A join query can cause IndexOutOfBoundsException if one of tables is empty.
- [TAJO-628] - The second stage of distinct aggregation can be scheduled to only one node.
- [TAJO-630] - QueryMasterTask never finished when Internal error occurs.
- [TAJO-635] - Improve tests of query semantic verification
- [TAJO-638] - QueryUnitAttempt causes Invalid event error: TA_UPDATE at TA_ASSIGNED
- [TAJO-640] - In inner join clause, empty table can cause a error by order-by clause.
- [TAJO-641] - NPE in HCatalogStore.addTable()
- [TAJO-645] - Task.Reporter can cause NPE during reporting.
- [TAJO-646] - TajoClient is blocked while main thread finished.
- [TAJO-647] - Work unbalance on disk scheduling of DefaultScheduler
- [TAJO-650] - Repartitioner::scheduleHashShuffledFetches should adjust the number of tasks
- [TAJO-651] - HcatalogStore should support (de)serialization of RCFile
- [TAJO-652] - logical planner cannot handle alias on partition columns
- [TAJO-653] - RCFileAppender throws IOException
- [TAJO-655] - QueryMaster sent "Select query" command to TajoWorker,but the TajoWorker don't working
- [TAJO-663] - CREATE TABLE USING RAW doesn't throw ERROR
- [TAJO-671] - RangeParitionAlgorithm.computeCardinality() should return a positive value
- [TAJO-672] - Wrong progress status when overwrites to partition table
- [TAJO-674] - ExplainLogicalPlan can cause NPE when a query includes derived tables
- [TAJO-679] - TimestampDatum, TimeDatum, DateDatum should be able to be compared with NullDatum
- [TAJO-682] - RangePartitionAlgorithm should be improved to handle empty texts
- [TAJO-687] - TajoMaster should pass tajoConf to create catalogServer
- [TAJO-689] - NoSuchElementException occurs during assigning the leaf tasks
- [TAJO-690] - infinite loop occurs when rack task is assigning
- [TAJO-692] - Missing Null handling for INET4 in RowStoreUtil
- [TAJO-693] - StatusUpdateTransition in QueryUnitAttempt handles TA_UPDATE incorrectly
- [TAJO-698] - Error occurs when FUNCTION and IN statement are used together.
- [TAJO-701] - Invalid bytes when creating BlobDatum with offset
- [TAJO-705] - CTAS always stores tables with CSV storage type into catalog
- [TAJO-706] - In the case of very quick query, client can't get query status.
- [TAJO-707] - Jenkins build failure in TestNetTypes
- [TAJO-712] - Fix some bugs after database is supported
- [TAJO-713] - Missing INET4 in UniformRangePartition
- [TAJO-716] - Using column names actually aliased in aggregation functions can cause planning error.
- [TAJO-718] - A group-by clause with the same columns but aliased causes planning error.
- [TAJO-719] - JUnit test failures
- [TAJO-729] - PreLogicalPlanVerifier verifies distinct aggregation functions incorrectly.
- [TAJO-738] - NPE occur when failed in QueryMaster's GlobalPlanner.build().
- [TAJO-739] - A subquery with the same column alias caused planning error.
- [TAJO-741] - GreedyHeuristicJoinOrderAlgorithm removes some join pairs.
- [TAJO-747] - BroadCastJoin omits some data.
- [TAJO-748] - Shuffle output numbers of join may be inconsistent.
- [TAJO-750] - Join orders affects abnormal to the result data.
- [TAJO-754] - failure of INSERT INTO may remove the target table.
- [TAJO-759] - Fix findbug errors added recently.
- [TAJO-763] - Out of range problem in utc_usec_to()
- [TAJO-765] - Incorrect Configuration Classpaths
- [TAJO-777] - Partition column in function parameter occurs NPE
- [TAJO-786] - TajoDataMetaDatabase::getSchemas creates invalid MetaDataTuple
- [TAJO-787] - FilterPushDownRule::visitSubQuery does not consider aliased columns.
Improvement
- [TAJO-9] - Change the default intermediate data file format for hash repartitioning
- [TAJO-16] - Enable Tajo catalog to access Hive metastore.
- [TAJO-36] - Improve ExternalSortExec with N-merge sort and final pass omission
- [TAJO-135] - Bump up hadoop to 2.2.0
- [TAJO-138] - Too many RPC connections in TajoWorker
- [TAJO-146] - Complex expressions in group-by clause should be supported
- [TAJO-225] - Separate TajoClient from tajo-core to an independent module
- [TAJO-261] - Rearrange default port numbers and config names.
- [TAJO-270] - Boolean datum compatible to apache hive
- [TAJO-274] - Maintaining connectivity to Tajo master regardless of the restart of the Tajo master
- [TAJO-275] - Separating QueryMaster and TaskRunner roles in worker
- [TAJO-279] - Improving the query_executor page of web UI
- [TAJO-286] - Refactor TableDesc, TableMeta, and Fragment
- [TAJO-287] - Improve Fragment to be more generic
- [TAJO-304] - drop table command should not remove data files in default
- [TAJO-305] - Implement killQuery feature
- [TAJO-307] - Implement chr(int) function
- [TAJO-308] - Implement length(string) function
- [TAJO-310] - Make the DataLocation class as a separate class and move it to the tajo-core-storage package.
- [TAJO-314] - Make TaskScheduler be pluggable
- [TAJO-316] - Improve GreedyHeuristicJoinOrderAlgorithm to deal with non-commutative joins
- [TAJO-317] - Improve TajoResourceManager to support more elaborate resource management
- [TAJO-325] - QueryState.NEW and QueryState.INIT should be combined into one state
- [TAJO-336] - Separate catalog stores into separate modules
- [TAJO-339] - Implement sin( x ) - returns the sine of x (x is in radians)
- [TAJO-356] - Improve TajoClient to directly get query results in the first request
- [TAJO-381] - Implement find_in_set function
- [TAJO-384] - to_bin()
- [TAJO-391] - Change the default type of real values from FLOAT4 to FLOAT8 when parsing the user queries
- [TAJO-399] - Simple cast expression in function parameter does not work properly
- [TAJO-402] - Implement from_unixtime() function
- [TAJO-419] - Add missing visitor methods of AlgebraVisitor and BaseAlgebraVisitor
- [TAJO-421] - Improve split for compression file
- [TAJO-424] - Make serializer/deserializer configurable in CSVFile
- [TAJO-433] - Improve integration with Hive
- [TAJO-435] - Improve intermediate file
- [TAJO-455] - Throw PlanningException when Creating table with Partition exception COLUMN
- [TAJO-456] - Separate tajo-jdbc and tajo-client from tajo-core-backend
- [TAJO-458] - Visit methods of LogicalPlanVisitor should take a query block as parameter
- [TAJO-464] - Rename the name 'partition', actually meaning shuffle to 'shuffle'.
- [TAJO-468] - Implements task's detail info page in WEB UI
- [TAJO-471] - Extract ColumnPartitonUtils class for ColumnPartition rewrite
- [TAJO-476] - Add a test development kit for unit tests based on executions of queries
- [TAJO-477] - Rename killQuery of QMClientProtocol to closeQuery
- [TAJO-478] - Add request-patch-review.py that helps submitting patches to jira and reviewboard.
- [TAJO-497] - Rearrange reserved and non-reserved keywords
- [TAJO-499] - Shorten the length of classpath in shell command
- [TAJO-501] - Rewrite the projection part of logical planning
- [TAJO-516] - Add default database name 'default' to Tajo
- [TAJO-539] - Change some EvalNode::eval to directly return a Datum value
- [TAJO-543] - InsertNode and CreateTableNode should play their roles
- [TAJO-548] - Investigate frequent young gc
- [TAJO-553] - Add a method to the TajoClient to get finished query lists
- [TAJO-562] - ExternalSortExec should aware of available memory of container
- [TAJO-564] - Show execution block's progress in querydetail.jsp
- [TAJO-573] - Allow the same column in a schema
- [TAJO-584] - Improve distributed merge sort
- [TAJO-589] - Add fine grained progress indicator for each task
- [TAJO-592] - HCatalogStore should supports RCFile and default hive field delimiter.
- [TAJO-598] - Refactoring Tajo RPC
- [TAJO-601] - Improve distinct aggregation query processing
- [TAJO-614] - Explaining a logical node should use ExplainLogicalPlanVisitor.
- [TAJO-616] - SequenceFile support
- [TAJO-634] - ExecutionBlock must be sorted by start time in querydetail.jsp
- [TAJO-644] - Support quoted identifiers
- [TAJO-665] - sort buffer size must be dealt as long type values.
- [TAJO-670] - Change daemon's hostname to canonical hostname
- [TAJO-675] - maximum frame size of frameDecoder should be increased
- [TAJO-685] - Add prerequisite to the document of network functions and operators
- [TAJO-691] - HashJoin or HashAggregation is too slow if there is many unique keys
- [TAJO-714] - Enable setting Parquet tuning parameters
- [TAJO-717] - Improve file splitting for large number of splits
- [TAJO-725] - Broadcast JOIN should supports multiple tables
- [TAJO-728] - Supports expressions in 'IN predicate'
- [TAJO-732] - Support executing LINUX shell command and HDFS command.
- [TAJO-735] - Remove multiple SLF4J bindings message.
- [TAJO-737] - Change version message when daemon starts up.
- [TAJO-743] - Change the default resource allocation policy of leaf tasks
- [TAJO-745] - APIs in TajoClient and JDBC should be case sensitive.
- [TAJO-755] - ALTER TABLESPACE LOCATION support
- [TAJO-768] - Improve the log4j configuration
New Feature
- [TAJO-30] - Parquet Integration
- [TAJO-34] - Outer Join
- [TAJO-122] - Add EXPLAIN clause to show a logical plan
- [TAJO-176] - Implement Tajo JDBC Driver
- [TAJO-200] - RCFile compatible to apache hive
- [TAJO-206] - Implement SQL Standard String Functions
- [TAJO-217] - Implement to_timestamp function
- [TAJO-306] - Implement ascii(string) function
- [TAJO-333] - Add metric system to Tajo
- [TAJO-341] - Implement substr function
- [TAJO-342] - Implement strpos(string, substring) function
- [TAJO-343] - Implement locate function
- [TAJO-350] - Implement round, floor, ceil
- [TAJO-353] - Add Database support to Tajo
- [TAJO-358] - Implement initcap(string) function
- [TAJO-359] - Implement lpad function
- [TAJO-361] - Implement rpad function
- [TAJO-368] - Implement quote_ident function
- [TAJO-377] - Implement concat function
- [TAJO-378] - Implement concat_ws function.
- [TAJO-449] - Implement extract() function
- [TAJO-474] - Add query admin utility
- [TAJO-480] - Umbrella Jira for adding ALTER TABLE statement
- [TAJO-574] - Add a sort-based physical executor for column partition store
- [TAJO-711] - Add Avro storage support
Task
- [TAJO-23] - Remove the deprecated classes in tajo-rpc module and cleanup related things.
- [TAJO-132] - Add CDH profile to pom.xml
- [TAJO-166] - Automatic precommit test using Jenkins
- [TAJO-265] - Update installation guide and other documentation for 0.2 release
- [TAJO-267] - Implement equals() and deepEquals() functions at LogicalNode
- [TAJO-271] - Add MIT license to NOTICE.txt and LICENSE.txt for jquery and jsPlumb
- [TAJO-276] - Update LICENSE.txt and NOTICE.txt files
- [TAJO-278] - Add ASF License 2.0 header to *.jsp and web.xml files
- [TAJO-280] - Add a configuration to specify a location of worker logs
- [TAJO-288] - Correct NOTICE file and LICENSE.txt
- [TAJO-315] - Change the version of pom.xml to 0.8-incubating
- [TAJO-319] - Update homepage and bump up tajo version to 0.8
- [TAJO-322] - Documentation by version
- [TAJO-457] - Update committer list and contributor list
- [TAJO-508] - Apply findbugs-excludeFilterFile to TajoQA
- [TAJO-512] - (Umbrella) 0.8 Release Preparation
- [TAJO-520] - Move tajo-core-storage to tajo-storage
- [TAJO-621] - Add DOAP file for Tajo
- [TAJO-622] - Add TM mark and navigation links required for TLP project
- [TAJO-632] - add intellij idea projects files into git ignore
- [TAJO-642] - Change tajo documentation tool to sphinx
- [TAJO-657] - Missing table stat in RCFile
- [TAJO-681] - Embed sphinx rtd theme into tajo-docs
- [TAJO-694] - Bump up hadoop to 2.3.0
- [TAJO-700] - Update site, wikis, pom.xml and other resources to point to the new repository location
- [TAJO-752] - Escalate sub modules in tajo-core into the top-level modules
- [TAJO-753] - Clean up of maven dependencies
- [TAJO-788] - Update Tajo documentation and README, and BUILDING
Test