/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ Pig Change Log Trunk (unreleased changes) INCOMPATIBLE CHANGES IMPROVEMENTS PIG-693: Proposed improvements to pig's optimizer (sms) PIG-700: To automate the pig patch test process (gkesavan via sms) PIG-712: Added utility functions to create schemas for tuples and bags (zjffdu via gates). PIG-775: PORelationToExprProject should create a NonSpillableDataBag to create empty bags (pradeepkth) PIG-743: To implement clover (gkesavan) BUG FIXES PIG-733: Order by sampling dumps entire sample to hdfs which causes dfs "FileSystem closed" error on large input (pradeepkth) PIG-693: Parameter to UDF which is an alias returned in another UDF in nested foreach causes incorrect results (thejas via sms) PIG-725: javadoc: warning - Multiple sources of package comments found for package "org.apache.commons.logging" (gkesavan via sms) PIG-745: Add DataType.toString() to force basic types to chararray, useful for UDFs that want to handle all simple types as strings (ciemo via gates). PIG-514: COUNT returns no results as a result of two filter statements in FOREACH (pradeepkth) Release 0.2.0 - Unreleased INCOMPATIBLE CHANGES PIG-157: Add types and rework execution pipeline (gates) PIG-458: integration with Hadoop 18 (olgan) NEW FEATURES PIG-139: command line editing (daijy via olgan) PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates) PIG-535: added rmf command PIG-704 Added ALIASES command that shows all currently defined ALIASES. Changed semantics of DEFINE to define last used alias if no argument is given (ericg via gates). PIG-713 Added alias completion as part of tab completion in grunt (ericg via gates). IMPROVEMENTS PIG-270: proper line number for parse errors (daijy via olgan) PIG-367: convinience function for UDFs to name schema PIG-443: Illustrate for the Types branch (shubham via olgan) PIG-599: Added buffering to BufferedPositionedInputStream (gates) PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth via olgan) PIG-628: misc performance improvements (pradeepkth via olgan) PIG-589: error handling, phase 1-2 (sms via olgan) PIG-590: error handling, phase 3 (sms) PIG-591: error handling, phase 4 (sms) PIG-545: PERFORMANCE: Sampler for order bys does not produce a good distribution (pradeepkth) PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan) PIG-636: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner (pradeepkth) PIG-563: support for multiple combiner invocations (pradeepkth via olgan) PIG-465: performance improvement - removing keys from the value (pradeepkth via olgan) PIG-450: PERFORMANCE: Distinct should make use of combiner to remove duplicate values from keys. (gates) PIG-350: PERFORMANCE: Join optimization for pipeline rework (pradeepkth via gates) BUG FIXES PIG-294: string comparator unit tests (sms via pi_song) PIG-258: cleaning up directories on failure (daijy via olgan) PIG-363: fix for describe to produce schema name PIG-368: making JobConf available to Load/Store UDFs PIG-311: cross is broken PIG-369: support for filter UDFs PIG-375: support for implicit split PIG-301: fix for order by descending PIG-378: fix for GENERATE + LIMIT PIG-362: don't push limit above generate with flatten PIG-381: bincond does not handle null data PIG-382: bincond throws typecast exception PIG-352: java.lang.ClassCastException when invalid field is accessed PIG-329: TestStoreOld, 2 unit tests were broken PIG-353: parsing of complex types PIG-392: error handling with multiple MRjobs PIG-397: code defaults to single reducer PIG-373: unconnected load causes problem, PIG-413: problem with float sum PIG-398: Expressions not allowed inside foreach (sms via olgan) PIG-418: divide by 0 problem PIG-402: order by with user comparator (shravanmn via olgan) PIG-415: problem with comparators (shravanmn via olgan) PIG-422: cross is broken (shravanmn via olgan) PIG-407: need to clone operators (pradeepkth via olgan) PIG-428: TypeCastInserter does not replace projects in inner plans correctly (pradeepkth vi olgan) PIG-421: error with complex nested plan (sms via olgan) PIG-429: Self join wth implicit split has the join output in wrong order (pradeepkth via olgan) PIG-434: short-circuit AND and OR (pradeepkth viia olgan) PIG-333: allowing no parethesis with single column alias with flatten (sms via olgan) PIG-426: Adding result of two UDFs gives a syntax error PIG-426: Adding result of two UDFs gives a syntax error (sms via olgan) PIG-436: alias is lost when single column is flattened (pradeepkth via olgan) PIG-364: Limit return incorrect records when we use multiple reducer (daijy via olgan) PIG-439: disallow alias renaming (pradeepkth via olgan) PIG-440: Exceptions from UDFs inside a foreach are not captured (pradeepkth via olgan) PIG-442: Disambiguated alias after a foreach flatten is not accessible a couple of statements after the foreach (sms via olgan) PIG-424: nested foreach with flatten and agg gives an error (sms via olgan) PIG-411: Pig leaves HOD processes behind if Ctrl-C is used before HOD connection is fully established (olgan) PIG-430: Projections in nested filter and inside foreach do not work (sms via olgan) PIG-445: Null Pointer Exceptions in the mappers leading to lot of retries (shravanmn via olgan) PIG-444: job.jar is left behined (pradeepkth via olgan) PIG-447: improved error messages (pradeepkth via olgan) PIG-448: explain broken after load with types (pradeepkth via olgan) PIG-380: invalid schema for databag constant (sms via olgan) PIG-451: If an field is part of group followed by flatten, then referring to it causes a parse error (pradeepkth via olgan) PIG-455: "group" alias is lost after a flatten(group) (pradeepkth vi olgan) PIG-459: increased sleep time before checking for job progress PIG-462: LIMIT N should create one output file with N rows (shravanmn via olgan) PIG-376: set job name (olgan) PIG-463: POCast changes (pradeepkth via olgan) PIG-427: casting input to UDFs PIG-437: as in alias names causing problems (sms via olgan) PIG-54: MIN/MAX don't deal with invalid data (pradeepkth via olgan) PIG-470: TextLoader should produce bytearrays (sms via olgan) PIG-335: lineage (sms vi olgan) PIG-464: bag schema definition (pradeepkth via olgan) PIG-457: report 100% on successful jobs only (shravanmn via olgan) PIG-471: ignoring status errors from hadoop (pradeepkth via olgan) PIG-489: (*) processing (sms via olgan) PIG-475: missing heartbeats (shravanmn via olgan) PIG-468: make determine Schema work for BinStorage (pradeepkth via olgan) PIG-494: invalid handling of UTF-8 data in PigStorage (pradeepkth via olgan) PIG-501: Make branches/types work under cygwin (daijy via olgan) PIG-504: cleanup illustrate not to produce cn= (shubham via olgan) PIG-469: make sure that describe says "int" not "integer" (sms via olgan) PIG-495: projecting of bags only give 1 field (olgan) PIG-500: Load Func for POCast is not being set in some cases (sms via olgan) PIG-499: parser issue with as (sms via olgan) PIG-507: permission error not reported (pradeepkth via olgan) PIG-508: problem with double joins (pradeepkth via olgan) PIG-497: problems with UTF8 handling in BinStorage (pradeepkth via olgan) PIG-505: working with map elements (sms via olgan) PIG-517: load functiin with parameters does not work with cast (pradeepkth via olgan) PIG-525: make sure cast for udf parameters works (olgan) PIG-512: Expressions in foreach lead to errors (sms via olgan) PIG-528: use UDF return in schema computation (sms via olgan) PIG-527: allow PigStorage to write out complex output (sms via olgan) PIG-537: Failure in Hadoop map collect stage due to type mismatch in the keys used in cogroup (pradeepkth vi olgan) PIG-538: support for null constants (pradeepkth via olgan) PIG-385: more null handling (pradeepkth via olgan) PIG-546: FilterFunc calls empty constructor when it should be calling parameterized constructor (sms via olgan) PIG-449: Schemas for bags should contain tuples all the time (pradeepkth via olgan) PIG-501: make unit tests run under windows (daijy via olgan) PIG-543: Restore local mode to truly run locally instead of use map reduce. (shubhamc via gates) PIG-556: Changed FindQuantiles to report progress. Fixed issue with null reporter being passed to EvalFuncs. (gates) PIG-6: Add load support from hbase (hustlmsp via gates). PIG-522: make negation work (pradeepkth via olgan) PIG-558: Distinct followed by a Join results in Invalid size 0 for a tuple error (pradeepkth via olgan) PIG-572 A PigServer.registerScript() method, which lets a client programmatically register a Pig Script. (shubhamc via gates) PIG-570: problems with handling bzip data (breed via olgan) PIG-597: Fix for how * is treated by UDFs (shravanmn via olgan) PIG-623: Fix spelling errors in output messages (tomwhite via sms) PIG-622: Include pig executable in distribution (tomwhite via sms) PIG-615: Wrong number of jobs with limit (shravanmn via sms) PIG-635: POCast.java has incorrect formatting (sms) PIG-634: When POUnion is one of the roots of a map plan, POUnion.getNext() gives a null pointer exception (pradeepkth) PIG-632: Improved error message for binary operators (sms) PIG-636: Performance improvement: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner (pradeepkth) PIG-631: 4 Unit test failures on Windows (daijy) PIG-645: Streaming is broken with the latest trunk (pradeepkth) PIG-646: Distinct UDF should report progress (sms) PIG-647: memory sized passed on pig command line does not get propagated to JobConf (sms) PIG-648: BinStorage fails when it finds markers unexpectedly in the data (pradeepkth) PIG-649: RandomSampleLoader does not handle skipping correctly in getNext() (pradeepkth) PIG-560: UTFDataFormatException (encoded string too long) is thrown when storing strings > 65536 bytes (in UTF8 form) using BinStorage() (sms) PIG-642: Limit after FRJ causes problems (daijy) PIG-637: Limit broken after order by in the local mode (shubhamc via olgan) PIG-553: EvalFunc.finish() not getting called (shravanmn via sms) PIG-654: Optimize build.xml (daijy) PIG-574: allowing to run scripts from within grunt shell (hagleitn via olgan) PIG-665: Map key type not correctly set (for use when key is null) when map plan does not have localrearrange (pradeepkth) PIG-590: error handling on the backend (sms via olgan) PIG-590: error handling on the backend (sms) PIG-658: Data type long : When 'L' or 'l' is included with data (123L or 123l) load produces null value. Also the case with Float (thejas via sms) PIG-591: Error handling phase four (sms via pradeepkth) PIG-664: Semantics of * is not consistent (sms) PIG-684: outputSchema method in TOKENIZE is broken (thejas via sms) PIG-655: Comparison of schemas of bincond operands is flawed (sms via pradeepkth) PIG-691: BinStorage skips tuples when ^A is present in data (pradeepkth via sms) PIG-577: outer join query looses name information (sms via pradeepkth) PIG-690: UNION doesn't work in the latest code (pradeepkth via sms) PIG-544: Utf8StorageConverter.java does not always produce NULLs when data is malformed(thejas via sms) PIG-532: Casting a field removes its alias.(thejas via sms) PIG-705: Pig should display a better error message when backend error messages cannot be parsed (sms) PIG-650: pig should look for and use the pig specific 'pig-cluster-hadoop-site.xml' in the non HOD case just like it does in the HOD case (sms) PIG-699: Implement forrest docs target in Pig Build (gkesavan via olgan) PIG-706: Implement ant target to use findbugs on PIG (gkesavan via olgan) PIG-708: implement releaseaudit tart to use rats on pig (gkesavan via olgan) PIG-703: user documentation (chandec vi olgan) PIG-711: Implement checkstyle for pig (gkesavan via olgan) PIG-715: doc updates (chandec vi olgan) PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates) PIG-692: When running a job from a script, use the name of that script as the default name for the job (vzaliva via gates) PIG-718: To add standard ant targets to build.xml file (gkesavan via olgan) PIG-720: further doc cleanup (gkesavan via olgan) Release 0.1.1 - 2008-12-04 INCOMPATIBLE CHANGES NEW FEATURES IMPROVEMENTS PIG-253: integration with hadoop-18 BUG FIXES PIG-342: Fix DistinctDataBag to recalculate size after it has spilled. (bdimcheff via gates) Release 0.1.0 - 2008-09-11 INCOMPATIBLE CHANGES PIG-123: requires escape of '\' in chars and string NEW FEATURES PIG-20 Added custom comparator functions for order by (phunt via gates) PIG-94: Streaming implementation (arunc via olgan) PIG-58: parameter substitution PIG-55: added custom splitter (groves via olgan) PIG-59: Add a new ILLUSTRATE command (shubhamc via gates). PIG-256: Added variable argument support for UDFs (pi_song) IMPROVEMENTS: PIG-8 added binary comparator (olgan) PIG-11 Add capability to search for jar file to register. (antmagna via olgan) PIG-7: Added use of combiner in some restricted cases. (gates) PIG-47: Added methods to DataMap to provide access to its content PIG-30: Rewrote DataBags to better handle decisions of when to spill to disk and to spill more intelligently. (gates) PIG-12: Added time stamps to log4j messages (phunt via gates). PIG-44: Added adaptive decision of the number of records to hold in memory before spilling (utkarsh) PIG-56: Made DataBag implement Iterable. (groves via gates) PIG-39: created more efficient version of read (spullara via olgan) PIG-32: ABstraction layer (olgan) PIG-83: Change everything except grunt and Main (PigServer on down) to use common logging abstraction instead of log4j. By default in grunt, log4j still used as logging layer. Also converted all System.out/err.println statements to use logging instead. (francisoud via gates) PIG-13: adding version to the system (joa23 via olgan) PIG-113: Make explain output more understandable (pi_song via gates) PIG-120: Support map reduce in local mode. To do this user needs to specify execution type as mapreduce and cluster name as local (joa23 via gates). PIG-106: Change StringBuffer and String '+' to StringBuilder (francisoud via gates). PIG-111: Reworked configuration to be setable via properties. (joa23, pi_song, oae via gates). BUG FIXES PIG-24 Files that were incorrectly placed under test/reports have been removed. ant clean now cleans test/reports. (milindb via gates) PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@) PIG-23 Made pig work with java 1.5. (milindb via gates) PIG-17 integrated with Hadoop 0.15 (olgan@) PIG-33 Help was commented out - uncommented (olgan) PIG-31: second half of concurrent mode problem addressed (olgan) PIG-14: added heartbeat functionality (olgan) PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release PIG-29: fixed bag factory to be properly initialized (utkarsh) PIG-43: fixed problem where using the combiner prevented a pig alias from being evaluated more than once. (gates) PIG-45: Fixed pig.pl to not assume hodrc file is named the same as cluster name (gates). PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples instead of Tuples, causing Reducer to crash in some cases. PIG-41: Added patterns to svn:ignore PIG-51: Fixed combiner in the presence of flattening PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the comparator function instead of Class.forName. (gates) PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@) PIG-77: Added eclipse specific files to svn:ignore PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates) PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates) PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arun via olgan) PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default path. Also fix it to not die if pigclient.conf is missing. (craigm via gates). PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill files when they are done spilling (contributions by craigm, breed, and gates, committed by gates). PIG-95: Remove System.exit() statements from inside pig (joa23 via gates). PIG-65: convert tabs to spaces (groves via olgan) PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when more than one bag is involved (gates). PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf reference. (francisoud via gates) PIG-80: In a number of places stack trace information was being lost by an exception being caught, and a different exception then thrown. All those locations have been changed so that the new exception now wraps the old. (francisoud via gates). PIG-84: Converted printStackTrace calls to calls to the logger. (francisoud via gates). PIG-88: Remove unused HadoopExe import from Main. (pi_song via gates). PIG-99: Fix to make unit tests not run out of memory. (francisoud via gates). PIG-107: enabled several tests. (francisoud via olgan) PIG-46: abort processing on error for non-interactive mode (olston via olgan) PIG-109: improved exception handling (oae via olgan) PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can be run w/o access to a hadoop cluster. (xuzh via gates) PIG-68: improvements to build.xml (joa23 via olgan) PIG-110: Replaced code accidently merged out in PIG-32 fix that handled flattening the combiner case. (gates and oae) PIG-213: Remove non-static references to logger from data bags and tuples, as it causes significant overhead (vgeschel via gates). PIG-284: target for building source jar (oae via olgan) PIG-627: multiquery support M1 (hagleitn via olgan) PIG-627: multiquery support M2 (hagleitn via pradeepkth) PIG-627: multiquery support incremental patch (Richard Ding via pradeepkth) PIG-627: multiquery support incremental patch (hagleitn via pradeepkth) PIG-627: multiquery support incremental patch (Richard Ding via pradeepkth) PIG-627: multiquery support incremental patch to merge changes in trunk to multiquery branch (hagleitn via pradeepkth) PIG-627: multiquery support incremental patch (hagleitn via pradeepkth) PIG-627: multiquery support incremental patch (hagleitn via pradeepkth) PIG-627: multiquery support incremental patch (hagleitn via pradeepkth) PIG-627: multiquery support M3 (rding via gates) PIG-652: Adapt changes in store interface to multi-query changes (hagleitn via gates).