/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ Pig Change Log Release 0.7.0 - 2010-05-03 INCOMPATIBLE CHANGES PIG-1292: Interface Refinements (hashutosh) PIG-1259: ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with > 1 subfields) (pradeepkth) PIG-1265: Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface (pradeepkth) PIG-1250: Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface (pradeepkth) PIG-1234: Unable to create input slice for har:// files (pradeepkth) PIG-1200: Using TableInputFormat in HBaseStorage (zjffdu via pradeepkth) PIG-1148: Move splitable logic from pig latin to InputFormat (zjffdu via pradeepkth) PIG-1141: Make streaming work with the new load-store interfaces (rding via pradeepkth) PIG-1110: Handle compressed file formats -- Gz, BZip with the new proposal (rding via pradeepkth) PIG-1088: change merge join and merge join indexer to work with new LoadFunc interface (thejas via pradeepkth) PIG-879: Pig should provide a way for input location string in load statement to be passed as-is to the Loader (rding via pradeepkth) PIG-966: load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface (thejas via pradeepkth) PIG-1094: Fix unit tests corresponding to source changes so far (pradeepkth) PIG-1090: Update sources to reflect recent changes in load-store interfaces (pradeepkth) PIG-1072: ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner (rding via pradeepkth) IMPROVEMENTS PIG-1384: Adding contrib javadoc to main Pig javadoc (daijy) PIG-1320: final documentation updates for Pig 0.7.0 (chandec via olgan) PIG-1330: Move pruned schema tracking logic from LoadFunc to core code (daijy) PIG-1320: more documentation updates for Pig 0.7.0 (chandec via olgan) PIG-1316: TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files (pradeepkth) PIG-1317: LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() (pradeepkth) PIG-1320: documentation updates for Pig 0.7.0 (chandec via olgan) PIG-1325: Provide a way to exclude a testcase when running "ant test" (pradeepkth) PIG-1312: Make Pig work with hadoop security (daijy) PIG-1308: Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2] (pradeepkth) PIG-1285: Allow SingleTupleBag to be serialized (dvryaboy) PIG-1117: Pig reading hive columnar rc tables (gerritjvv via dvryaboy) PIG-1287: Use hadoop-0.20.2 with pig 0.7.0 release (pradeepkth) PIG-1257: PigStorage per the new load-store redesign should support splitting of bzip files (pradeepkth) PIG-1290: WeightedRangePartitioner should not check if input is empty if quantile file is empty (pradeepkth) PIG-1262: Additional findbugs and javac warnings (daijy) PIG-1248: [piggybank] some useful String functions (dvryaboy) PIG-1251: Move SortInfo calculation earlier in compilation (ashutoshc) PIG-1233: NullPointerException in AVG (ankur via olgan) PIG-1218: Use distributed cache to store samples (rding via pradeepkth) PIG-1226: suuport for additional jar files (thejas via olgan) PIG-1230: Streaming input in POJoinPackage should use nonspillable bag to collect tuples (ashutoshc) PIG-1224: Collected group should change to use new (internal) bag (ashutoshc) PIG-1046: join algorithm specification is within double quotes (ashutoshc) PIG-1209: Port POJoinPackage to proactively spill (ashutoshc) PIG-1190: Handling of quoted strings in pig-latin/grunt commands (ashutoshc) PIG-1214: Pig 0.6 Docs fixes (chandec via olgan) PIG-977: exit status does not account for JOB_STATUS.TERMINATED (ashutoshc) PIG-1192: Pig 0.6 Docs fixes (chandec via olgan) PIG-1177: Pig 0.6 Docs - Zebra docs (chandec via olgan) PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan) PIG-1102: Collect number of spills per job (sriranjan via olgan) PIG-1149: Allow instantiation of SampleLoaders with parametrized LoadFuncs (dvryaboy via pradeepkth) PIG-1162: Pig 0.6.0 - UDF doc (chandec via olgan) PIG-1163: Pig/Zebra 0.6.0 release (chandec via olgan) PIG-1156: Add aliases to ExecJobs and PhysicalOperators (dvryaboy via gates) PIG-1161: add missing license headers (dvryaboy via olgan) PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi via olgan) PIG-760: Add a new PigStorageSchema load/store function that store schemas for text files (dvryaboy via gates) PIG-1106: FR join should not spill (ankit.modi via olgan) PIG-1147: Zebra Docs for Pig 0.6.0 (chandec via olgan) PIG-1129: Pig UDF doc: fieldsToRead function (chandec via olgan) PIG-978: MQ docs update (chandec via olgan) PIG-990: Provide a way to pin LogicalOperator Options (dvryaboy via gates) PIG-1103: refactoring of commit tests (olgan) PIG-1101: Allow arugment to limit to be long in addition to int (ashutoshc via gates) PIG-872: use distributed cache for the replicated data set in FR join (sriranjan via olgan) PIG-1053: Consider moving to Hadoop for local mode (ankit.modi via olgan) PIG-1085: Pass JobConf and UDF specific configuration information to UDFs (gates) PIG-1173: pig cannot be built without an internet connection (jmhodges via daijy) OPTIMIZATIONS BUG FIXES PIG-1394: POCombinerPackage hold too much memory for InternalCachedBag (daijy) PIG-1303: Inconsistent instantiation of parametrized UDFs (jrussek and dvryaboy) PIG-1348: PigStorage making unnecessary byte array copy when storing data (rding) PIG-1374: PushDownForeachFlatten shall not push ForEach below Join if the flattened fields is used in the next statement (daijy) PIG-1372: Restore PigInputFormat.sJob for backward compatibility (pradeepkth) PIG-1369: POProject does not handle null tuples and non existent fields in some cases (pradeepkth) PIG-1364: Public javadoc on apache site still on 0.2, needs to be updated for each version release (gates) PIG-1366: PigStorage's pushProjection implementation results in NPE under certain data conditions (pradeepkth) PIG-1365: WrappedIOException is missing from Pig.jar (pradeepkth) PIG-1362: Provide udf context signature in ensureAllKeysInSameSplit() method of loader (hashutosh) PIG-1352: piggybank UPPER udf throws exception if argument is null (thejas) PIG-1346: In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME (pradeepkth) PIG-1336: Optimize POStore serialized into JobConf (daijy) PIG-1335: UDFFinder should find LoadFunc used by POCast (daijy) PIG-1307: when we spill the DefaultDataBag we are not setting the sized changed flag to be true. (breed via daijy) PIG-1298: Restore file traversal behavior to Pig loaders (rding) PIG-1289: PIG Join fails while doing a filter on joined data (daijy) PIG-1266: Show spill count on the pig console at the end of the job (sriranjan via rding) PIG-1296: Skewed join fail due to negative partition index (daijy) PIG-1293: pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set (aw via gates) PIG-1272: Column pruner causes wrong results (daijy) PIG-1275: empty bag in PigStorage read as null (daijy) PIG-1252: Diamond splitter does not generate correct results when using Multi-query optimization (rding) PIG-1260: Param Subsitution results in parser error if there is no EOL after last line in script (rding) PIG-1238: Dump does not respect the schema (rding) PIG-1261: PigStorageSchema broke after changes to ResourceSchema (dvryaboy via daijy) PIG-1053: Put pig.properties back into release distribution (gates). PIG-1273: Skewed join throws error (rding) PIG-1267: Problems with partition filter optimizer (rding) PIG-1079: Modify merge join to use distributed cache to maintain the index (rding) PIG-1241: Accumulator is turned on when a map is used with a non-accumulative UDF (yinghe vi olgan) PIG-1215: Make Hadoop jobId more prominent in the client log (ashutoshc) PIG-1216: New load store design does not allow Pig to validate inputs and outputs up front (ashutoshc via pradeepkth) PIG-1239: PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed (pradeepkth) PIG-1169: Top-N queries produce incorrect results when a store statement is added between order by and limit statement (rding) PIG-1131: Pig simple join does not work when it contains empty lines (ashutoshc) PIG-834: incorrect plan when algebraic functions are nested (ashutoshc) PIG-1217: Fix argToFuncMapping in Piggybank Top function (dvryaboy via gates) PIG-1154: Local Mode fails when hadoop config directory is specified in classpath (ankit.modi via gates) PIG-1124: Unable to set Custom Job Name using the -Dmapred.job.name parameter (ashutoshc) PIG-1213: Schema serialization is broken (pradeepkth) PIG-1194: ERROR 2055: Received Error while processing the map plan (rding via ashutoshc) PIG-1204: Pig hangs when joining two streaming relations in local mode (rding) PIG-1191: POCast throws exception for certain sequences of LOAD, FILTER, FORACH (pradeepkth via gates) PIG-1171: Top-N queries produce incorrect results when followed by a cross statement (rding via olgan) PIG-1159: merge join right side table does not support comma seperated paths (rding via olgan) PIG-1158: pig command line -M option doesn't support table union correctly (comma seperated paths) (rding via olgan) PIG-1143: Poisson Sample Loader should compute the number of samples required only once (sriranjan via olgan) PIG-1157: Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM (rding via olgan) PIG-1075: Error in Cogroup when key fields types don't match (rding via olgan) PIG-973: type resolution inconsistency (rding via olgan) PIG-1135: skewed join partitioner returns negative partition index (yinghe via olgan) PIG-1134: Skewed Join sampling job overwhelms the name node (sriranjan via olgan) PIG-1105: COUNT_STAR accumulate interface implementation cases failure (sriranjan via olgan) PIG-1118: expression with aggregate functions returning null, with accumulate interface (yinghe via olgan) PIG-1068: COGROUP fails with 'Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple' (rding via gates) PIG-1113: Diamond query optimization throws error in JOIN (rding via olgan) PIG-1116: Remove redundant map-reduce job for merge join (pradeepkth) PIG-1114: MultiQuery optimization throws error when merging 2 level spl (rding via olgan) PIG-1108: Incorrect map output key type in MultiQuery optimiza (rding via olgan) PIG-1022: optimizer pushes filter before the foreach that generates column used by filter (daijy via gates) PIG-1107: PigLineRecordReader bails out on an empty line for compressed data (ankit.modi via olgan) PIG-598: Parameter substitution ($PARAMETER) should not be performed in comments (thejas via olgan) PIG-1064: Behaviour of COGROUP with and without schema when using "*" operator (pradeepkth) PIG-1086: Nested sort by * throw exception (rding via daijy) PIG-1146: Inconsistent column pruning in LOUnion (daijy) PIG-1176: Column Pruner issues in union of loader with and without schema (daijy) PIG-1184: PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later (daijy) PIG-1189: StoreFunc UDF should ship to the backend automatically without "register" (daijy) PIG-1212: LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null (daijy) PIG-1255: Tiny code cleanup for serialization code for PigSplit (daijy) PIG-613: Casting elements inside a tuple does not take effect (daijy) Release 0.6.0 INCOMPATIBLE CHANGES PIG-922: Logical optimizer: push up project (daijy) IMPROVEMENTS PIG-1084: Pig 0.6.0 Documentation improvements (chandec via olgan) PIG-1089: Pig 0.6.0 Documentation (chandec via olgan) PIG-958: Splitting output data on key field (ankur via pradeepkth) PIG-1058: FINDBUGS: remaining "Correctness Warnings" (olgan) PIG-1036: Fragment-replicate left outer join (ankit.modi via pradeepkth) PIG-920: optimizing diamond queries (rding via pradeepkth) PIG-1040: FINDBUGS: MS_SHOULD_BE_FINAL: Field isn't final but should be (olgan) PIG-1059: FINDBUGS: remaining Bad practice + Multithreaded correctness Warning (olgan) PIG-953: Enable merge join in pig to work with loaders and store functions which can internally index sorted data (pradeepkth) PIG-1055: FINDBUGS: remaining "Dodgy Warnings" (olgan) PIG-1052: FINDBUGS: remaining performance warningse(olgan) PIG-1037: Converted sorted and distinct bags to use the new active spilling paradigm (yinghe via gates) PIG-1051: FINDBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (olgan) PIG-1050: FINDBUGS: DLS_DEAD_LOCAL_STORE: Dead store to local variable (olgan) PIG-1045: Integration with Hadoop 20 New API (rding via pradeepkth) PIG-1043: FINDBUGS: SIC_INNER_SHOULD_BE_STATIC: Should be a static inner class (olgan) PIG-1047: FINDBUGS: URF_UNREAD_FIELD: Unread field (olgan) PIG-1032: FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor (olgan) PIG-984: Add map side grouping for data that is already collected when it is read into the map (rding via gates) PIG-1025: Add ability to set job priority from Pig Latin script (kevinweil via gates) PIG-1028: FINDBUGS: DM_NUMBER_CTOR: Method invokes inefficient Number constructor; use static valueOf instead (olgan) PIG-1012: FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in serializable class (olgan) PIG-1013: FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array (olgan) PIG-1011: FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't define serialVersionUID (olgan) PIG-1009: FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream (olgan) PIG-1008: FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL (olgan) PIG-1018: FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower case letter (olgan) PIG-1023: FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL (olgan) PIG-1019: added findbugs exclusion file (olgan) PIG-983: PERFORMANCE: multi-query optimization on multiple group bys following a join or cogroup (rding via pradeepkth) PIG-975: Need a databag that does not register with SpillableMemoryManager and spill data pro-actively (yinghe via olgan) PIG-891: Fixing dfs statement for Pig (zjffdu via daijy PIG-956: 10 minute commit tests (olgan) PIG-948: [Usability] Relating pig script with MR jobs (ashutoshc via daijy) PIG-960: Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage ( ankit.modi via daijy) PIG-1020: Include an ant target to build pig.jar without hadoop libraries (daijy) PIG-1033: javac warnings: deprecated hadoop APIs (daijy) PIG-1041: javac warnings: cast, fallthrough, serial (daijy) PIG-1042: javac warnings: unchecked (daijy) PIG-1038: Optimize nested distinct/sort to use secondary key (daijy) PIG-979: Acummulator Interface for UDFs (yinghe via daijy) OPTIMIZATIONS PIG-922: Logical optimizer: push up project (daijy) BUG FIXES PIG-1080: PigStorage may miss records when loading a file (rding via olgan) PIG-1071: Support comma separated file/directory names in load statements (rding via pradeepkth) PIG-970: Changes to make HBase loader work with HBase 0.20 (vbarat and zjffdu via gates) PIG-1035: support for skewed outer join (sriranjan via pradeepkth) PIG-1030: explain and dump not working with two UDFs inside inner plan of foreach (rding via pradeepkth) PIG-1048: inner join using 'skewed' produces multiple rows for keys with single row in both input relations (sriranjan via gates) PIG-1063: Pig does not call checkOutSpecs() on OutputFormat provided by StoreFunc in the multistore case (pradeepkth) PIG-746: Works in --exectype local, fails on grid - ERROR 2113: SingleTupleBag should never be serialized (rding via pradeepkth) PIG-1027: Number of bytes written are always zero in local mode (zjffdu via gates) PIG-976: Multi-query optimization throws ClassCastException (rding via pradeepkth) PIG-858: Order By followed by "replicated" join fails while compiling MR-plan from physical plan (ashutoshc via gates) PIG-968: Fix findContainingJar to work properly when there is a + in the jar path (tlipcon via gates) PIG-738: Regexp passed from pigscript fails in UDF (pradeepkth) PIG-942: Maps are not implicitly casted (pradeepkth) PIG-513: Removed unecessary bounds check in DefaultTuple (ashutoshc via gates) PIG-951: Set parallelism explicitly to 1 for indexing job in merge join (ashutoshc via gates) PIG-592: schema inferred incorrectly (daijy) PIG-989: Allow type merge between numerical type and non-numerical type (daijy) PIG-894: order-by fails when input is empty (daijy) PIG-995: Limit Optimizer throw exception "ERROR 2156: Error while fixing projections" (daijy) PIG-1000: InternalCachedBag.java generates javac warning and findbug warning (yinghe via daijy) PIG-921: Strange use case for Join which produces different results in local and map reduce mode (daijy) PIG-1024: Script contains nested limit fail due to "LOLimit does not support multiple outputs" (daijy) PIG-644: Duplicate column names in foreach do not throw parser error (daijy) PIG-927: null should be handled consistently in Join (daijy) PIG-790: Error message should indicate in which line number in the Pig script the error occured (debugging BinCond) (daijy) PIG-1001: Generate more meaningful error message when one input file does not exist (daijy) PIG-1060: MultiQuery optimization throws error for multi-level splits (rding via daijy) PIG-1128: column pruning causing failure when foreach has user-specified schema (daijy) PIG-1127: Logical operator should contains individual copy of schema object (daijy) PIG-1133: UDFContext should be made available to LoadFunc.bindTo (daijy) PIG-1132: Column Pruner issues in dealing with unprunable loader (daijy) PIG-1142: Got NullPointerException merge join with pruning (daijy) PIG-1155: Need to make sure existing loaders work "as is" (daijy) PIG-1144: set default_parallelism construct does not set the number of reducers correctly (daijy) PIG-1165: Signature of loader does not set correctly for order by (daijy) PIG-761: ERROR 2086 on simple JOIN (daijy) PIG-1172: PushDownForeachFlatten shall not push ForEach below Join if the flattened fields is used in Join (daijy) PIG-1180: Piggybank should compile even if we only have "pig-withouthadoop.jar" but no "pig.jar" in the pig home directory (daijy) PIG-1185: Data bags do not close spill files after using iterator to read tuples (yinghe via daijy) PIG-1186: Pig do not take values in "pig-cluster-hadoop-site.xml" (daijy) PIG-1193: Secondary sort issue on nested desc sort (daijy) PIG-1195: POSort should take care of sort order (daijy) PIG-1210: fieldsToRead send the same fields more than once in some cases (daijy) PIG-1231: DefaultDataBagIterator.hasNext() should be idempotent in all cases (daijy) Release 0.5.0 INCOMPATIBLE CHANGES IMPROVEMENTS PIG-1039: documentation update (chandec via olgan) OPTIMIZATIONS BUG FIXES PIG-963: Join in local mode matches null keys (pradeepkth) PIG-660: Integration with Hadoop 20 (sms via olgan) Release 0.4.0 - 2009-09-26 INCOMPATIBLE CHANGES PIG-892: Make COUNT and AVG deal with nulls accordingly with SQL standart (olgan) PIG-734: Changed maps to only take strings as keys (gates) IMPROVEMENTS PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan) PIG-578: join ... outer, ... outer semantics are a no-ops, should produce corresponding null values (pradeepkth) PIG-936: making dump and PigDump independent from Tuple.toString (daijy) PIG-890: Create a sampler interface and improve the skewed join sampler (sriranjan via daijy) PIG-922: Logical optimizer: push up project part 1 (daijy) PIG-812: COUNT(*) does not work (breed) PIG-923: Allow specifying log file location through pig.properties (dvryaboy via daijy) PIG-926: Merge-Join phase 2 (ashutoshc via pradeepkth) PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth) PIG-893: Added string -> integer, long, float, and double casts (zjffdu via gates) PIG-833: Added Zebra, new columnar storage mechanism for HDFS (rangadi plus many others via gates) PIG-697: Proposed improvements to pig's optimizer, Phase5 (daijy) PIG-895: Default parallel for Pig (daijy) PIG-820: Change RandomSampleLoader to take a LoadFunc instead of extending BinStorage. Added new Samplable interface for loaders to implement allowing them to be used by RandomSampleLoader (ashutoshc via gates) PIG-832: Make import list configurable (daijy) PIG-697: Proposed improvements to pig's optimizer (sms) PIG-753: Allow UDFs with no parameters (zjffdu via gates) PIG-765: jdiff for pig ( gkesavan OPTIMIZATIONS PIG-792: skew join implementation (sriranjan via olgan) BUG FIXES PIG-964: Handling null in skewed join (sriranjan via olgan) PIG-962: Skewed join creates 3 map reduce jobs (sriranjan via olgan) PIG-957: Tutorial is broken with 0.4 branch and trunk (pradeepkth) PIG-955: Skewed join produces invalid results (yinghe via olgan) PIG-954: Skewed join fails when pig.skewedjoin.reduce.memusage is not configured(yinghe via olgan) PIG-882: log level not propogated to loggers - duplicate message (daijy) PIG-943: Pig crash when it cannot get counter from hadoop (daijy) PIG-935: Skewed join throws an exception when used with map keys(sriranjan via pradeepkth) PIG-934: Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index (ashutoshc via pradeepkth) PIG-925: Fix join in local mode (daijy) PIG-913: Error in Pig script when grouping on chararray column (daijy) PIG-907: Provide multiple version of HashFNV (Piggybank) (daijy) PIG-905: TOKENIZE throws exception on null data (daijy) PIG-901: InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext (pradeepkth) PIG-882: log level not propogated to loggers (daijy) PIG-880: Order by is borken with complex fields (sms) PIG-773: Empty complex constants (empty bag, empty tuple and empty map) should be supported (ashutoshc via sms) PIG-695: Pig should not fail when error logs cannot be created (sms) PIG-878: Pig is returning too many blocks in the input split. (arunc via gates) PIG-888: Pig do not pass udf to the backend in some situation (daijy) PIG-728: All backend error messages must be logged to preserve the original error messages (sms) PIG-877: Push up filter does not account for added columns in foreach (sms) PIG-883: udf import list does not send to the backend (daijy) PIG-881: Pig should ship load udfs to the backend (daijy) PIG-876: limit changes order of order-by to ascending (daijy) PIG-851: Map type used as return type in UDFs not recognized at all times (zjffdu via sms) PIG-861: POJoinPackage lose tuple in large dataset (daijy) PIG-797: Limit with ORDER BY producing wrong results (daijy) PIG-850: Dump produce wrong result while "store into" is ok (daijy) PIG-852: pig -version or pig -help returns exit code of 1 (milindb via olgan) PIG-849: Local engine loses records in splits (hagleitn via olgan) PIG-939: Fix checkstyle ivy configuration ( gkesavan ) Release 0.3.0 - Unreleased INCOMPATIBLE CHANGES IMPROVEMENTS PIG-817: documentation update (chandec via olgan) PIG-830: Add RegExLoader and apache log utils to piggybank (dvryaboy via gates) PIG-831: Turned off reporting of records and bytes written for mutli-store queries as the returned results are confusing and wrong. (gates) PIG-813: documentation updates (chandec via olgan) PIG-825: PIG_HADOOP_VERSION should be set to 18 (dvryaboy via gates) PIG-795: support for SAMPLE command (ericg via olgan) PIG-619: Create one InputSplit even when the input file is zero length so that hadoop runs maps and creates output for the next job (gates) PIG-697: Proposed improvements to pig's optimizer (sms) PIG-700: To automate the pig patch test process (gkesavan via sms) PIG-712: Added utility functions to create schemas for tuples and bags (zjffdu via gates) PIG-652: Adapt changes in store interface to multi-query changes (hagleitn via gates) PIG-775: PORelationToExprProject should create a NonSpillableDataBag to create empty bags (pradeepkth) PIG-741: Allow limit to be nested in a foreach. PIG-627: multiquery support phase 3 (hagleitn and Richard Ding via olgan) PIG-743: To implement clover (gkesavan) PIG-701: Implement IVY for resolving pig dependencies (gkesavan) PIG-626: Add access to hadoop counters (shubhamc via gates) PIG-627: multiquery support phase 1 and phase 2 (hagleitn and Richard Ding via pradeepkth) BUG FIXES PIG-846: MultiQuery optimization in some cases has an issue when there is a split in the map plan (pradeepkth) PIG-835: Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type) (pradeepkth) PIG-839: incorrect return codes on failure when using -f or -e flags (hagleitn via sms) PIG-796: support conversion from numeric types to chararray (Ashutosh Chauhan via pradeepkth) PIG-564: problem with parameter substitution and special charachters (olgan) PIG-802: PERFORMANCE: not creating bags for ORDER BY (serakesh via olgan) PIG-816: PigStorage() does not accept Unicode characters in its contructor (pradeepkth) PIG-818: Explain doesn't handle PODemux properly (hagleitn via olgan) PIG-819: run -param -param; is a valid grunt command (milindb via olgan) PIG-656: Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception (milindb via sms) PIG-814: Make Binstorage more robust when data contains record markers (pradeepkth) PIG-811: Globs with "?" in the pattern are broken in local mode (hagleitn via olgan) PIG-810: Fixed NPE in PigStats (gates) PIG-804: problem with lineage with double map redirection (pradeepkth) PIG-733: Order by sampling dumps entire sample to hdfs which causes dfs "FileSystem closed" error on large input (pradeepkth) PIG-693: Parameter to UDF which is an alias returned in another UDF in nested foreach causes incorrect results (thejas via sms) PIG-725: javadoc: warning - Multiple sources of package comments found for package "org.apache.commons.logging" (gkesavan via sms) PIG-745: Add DataType.toString() to force basic types to chararray, useful for UDFs that want to handle all simple types as strings (ciemo via gates) PIG-514: COUNT returns no results as a result of two filter statements in FOREACH (pradeepkth) PIG-789: Fix dump and illustrate to work with new multi-query feature (hagleitn via gates) PIG-774: Pig does not handle Chinese characters (in both the parameter subsitution using -param_file or embedded in the Pig script) correctly (daijy) PIG-800: Fix distinct and order in local mode to not go into an infinite loop (gates) PIG-806: to remove author tags in the pig source code (sms PIG-799: Unit tests on windows are failing after multiquery commit (daijy) PIG-781: Error reporting for failed MR jobs (hagleitn via olgan) Release 0.2.0 INCOMPATIBLE CHANGES PIG-157: Add types and rework execution pipeline (gates) PIG-458: integration with Hadoop 18 (olgan) NEW FEATURES PIG-139: command line editing (daijy via olgan) PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates) PIG-535: added rmf command PIG-704 Added ALIASES command that shows all currently defined ALIASES. Changed semantics of DEFINE to define last used alias if no argument is given (ericg via gates) PIG-713 Added alias completion as part of tab completion in grunt (ericg via gates) IMPROVEMENTS PIG-270: proper line number for parse errors (daijy via olgan) PIG-367: convinience function for UDFs to name schema PIG-443: Illustrate for the Types branch (shubhamc via olgan) PIG-599: Added buffering to BufferedPositionedInputStream (gates) PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth via olgan) PIG-628: misc performance improvements (pradeepkth via olgan) PIG-589: error handling, phase 1-2 (sms via olgan) PIG-590: error handling, phase 3 (sms) PIG-591: error handling, phase 4 (sms) PIG-545: PERFORMANCE: Sampler for order bys does not produce a good distribution (pradeepkth) PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan) PIG-636: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner (pradeepkth) PIG-563: support for multiple combiner invocations (pradeepkth via olgan) PIG-465: performance improvement - removing keys from the value (pradeepkth via olgan) PIG-450: PERFORMANCE: Distinct should make use of combiner to remove duplicate values from keys. (gates) PIG-350: PERFORMANCE: Join optimization for pipeline rework (pradeepkth via gates) BUG FIXES PIG-294: string comparator unit tests (sms via pi_song) PIG-258: cleaning up directories on failure (daijy via olgan) PIG-363: fix for describe to produce schema name PIG-368: making JobConf available to Load/Store UDFs PIG-311: cross is broken PIG-369: support for filter UDFs PIG-375: support for implicit split PIG-301: fix for order by descending PIG-378: fix for GENERATE + LIMIT PIG-362: don't push limit above generate with flatten PIG-381: bincond does not handle null data PIG-382: bincond throws typecast exception PIG-352: java.lang.ClassCastException when invalid field is accessed PIG-329: TestStoreOld, 2 unit tests were broken PIG-353: parsing of complex types PIG-392: error handling with multiple MRjobs PIG-397: code defaults to single reducer PIG-373: unconnected load causes problem, PIG-413: problem with float sum PIG-398: Expressions not allowed inside foreach (sms via olgan) PIG-418: divide by 0 problem PIG-402: order by with user comparator (shravanmn via olgan) PIG-415: problem with comparators (shravanmn via olgan) PIG-422: cross is broken (shravanmn via olgan) PIG-407: need to clone operators (pradeepkth via olgan) PIG-428: TypeCastInserter does not replace projects in inner plans correctly (pradeepkth vi olgan) PIG-421: error with complex nested plan (sms via olgan) PIG-429: Self join wth implicit split has the join output in wrong order (pradeepkth via olgan) PIG-434: short-circuit AND and OR (pradeepkth viia olgan) PIG-333: allowing no parethesis with single column alias with flatten (sms via olgan) PIG-426: Adding result of two UDFs gives a syntax error PIG-426: Adding result of two UDFs gives a syntax error (sms via olgan) PIG-436: alias is lost when single column is flattened (pradeepkth via olgan) PIG-364: Limit return incorrect records when we use multiple reducer (daijy via olgan) PIG-439: disallow alias renaming (pradeepkth via olgan) PIG-440: Exceptions from UDFs inside a foreach are not captured (pradeepkth via olgan) PIG-442: Disambiguated alias after a foreach flatten is not accessible a couple of statements after the foreach (sms via olgan) PIG-424: nested foreach with flatten and agg gives an error (sms via olgan) PIG-411: Pig leaves HOD processes behind if Ctrl-C is used before HOD connection is fully established (olgan) PIG-430: Projections in nested filter and inside foreach do not work (sms via olgan) PIG-445: Null Pointer Exceptions in the mappers leading to lot of retries (shravanmn via olgan) PIG-444: job.jar is left behined (pradeepkth via olgan) PIG-447: improved error messages (pradeepkth via olgan) PIG-448: explain broken after load with types (pradeepkth via olgan) PIG-380: invalid schema for databag constant (sms via olgan) PIG-451: If an field is part of group followed by flatten, then referring to it causes a parse error (pradeepkth via olgan) PIG-455: "group" alias is lost after a flatten(group) (pradeepkth vi olgan) PIG-459: increased sleep time before checking for job progress PIG-462: LIMIT N should create one output file with N rows (shravanmn via olgan) PIG-376: set job name (olgan) PIG-463: POCast changes (pradeepkth via olgan) PIG-427: casting input to UDFs PIG-437: as in alias names causing problems (sms via olgan) PIG-54: MIN/MAX don't deal with invalid data (pradeepkth via olgan) PIG-470: TextLoader should produce bytearrays (sms via olgan) PIG-335: lineage (sms vi olgan) PIG-464: bag schema definition (pradeepkth via olgan) PIG-457: report 100% on successful jobs only (shravanmn via olgan) PIG-471: ignoring status errors from hadoop (pradeepkth via olgan) PIG-489: (*) processing (sms via olgan) PIG-475: missing heartbeats (shravanmn via olgan) PIG-468: make determine Schema work for BinStorage (pradeepkth via olgan) PIG-494: invalid handling of UTF-8 data in PigStorage (pradeepkth via olgan) PIG-501: Make branches/types work under cygwin (daijy via olgan) PIG-504: cleanup illustrate not to produce cn= (shubhamc via olgan) PIG-469: make sure that describe says "int" not "integer" (sms via olgan) PIG-495: projecting of bags only give 1 field (olgan) PIG-500: Load Func for POCast is not being set in some cases (sms via olgan) PIG-499: parser issue with as (sms via olgan) PIG-507: permission error not reported (pradeepkth via olgan) PIG-508: problem with double joins (pradeepkth via olgan) PIG-497: problems with UTF8 handling in BinStorage (pradeepkth via olgan) PIG-505: working with map elements (sms via olgan) PIG-517: load functiin with parameters does not work with cast (pradeepkth via olgan) PIG-525: make sure cast for udf parameters works (olgan) PIG-512: Expressions in foreach lead to errors (sms via olgan) PIG-528: use UDF return in schema computation (sms via olgan) PIG-527: allow PigStorage to write out complex output (sms via olgan) PIG-537: Failure in Hadoop map collect stage due to type mismatch in the keys used in cogroup (pradeepkth vi olgan) PIG-538: support for null constants (pradeepkth via olgan) PIG-385: more null handling (pradeepkth via olgan) PIG-546: FilterFunc calls empty constructor when it should be calling parameterized constructor (sms via olgan) PIG-449: Schemas for bags should contain tuples all the time (pradeepkth via olgan) PIG-501: make unit tests run under windows (daijy via olgan) PIG-543: Restore local mode to truly run locally instead of use map reduce. (shubhamc via gates) PIG-556: Changed FindQuantiles to report progress. Fixed issue with null reporter being passed to EvalFuncs. (gates) PIG-6: Add load support from hbase (hustlmsp via gates) PIG-522: make negation work (pradeepkth via olgan) PIG-558: Distinct followed by a Join results in Invalid size 0 for a tuple error (pradeepkth via olgan) PIG-572 A PigServer.registerScript() method, which lets a client programmatically register a Pig Script. (shubhamc via gates) PIG-570: problems with handling bzip data (breed via olgan) PIG-597: Fix for how * is treated by UDFs (shravanmn via olgan) PIG-623: Fix spelling errors in output messages (tomwhite via sms) PIG-622: Include pig executable in distribution (tomwhite via sms) PIG-615: Wrong number of jobs with limit (shravanmn via sms) PIG-635: POCast.java has incorrect formatting (sms) PIG-634: When POUnion is one of the roots of a map plan, POUnion.getNext() gives a null pointer exception (pradeepkth) PIG-632: Improved error message for binary operators (sms) PIG-636: Performance improvement: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner (pradeepkth) PIG-631: 4 Unit test failures on Windows (daijy) PIG-645: Streaming is broken with the latest trunk (pradeepkth) PIG-646: Distinct UDF should report progress (sms) PIG-647: memory sized passed on pig command line does not get propagated to JobConf (sms) PIG-648: BinStorage fails when it finds markers unexpectedly in the data (pradeepkth) PIG-649: RandomSampleLoader does not handle skipping correctly in getNext() (pradeepkth) PIG-560: UTFDataFormatException (encoded string too long) is thrown when storing strings > 65536 bytes (in UTF8 form) using BinStorage() (sms) PIG-642: Limit after FRJ causes problems (daijy) PIG-637: Limit broken after order by in the local mode (shubhamc via olgan) PIG-553: EvalFunc.finish() not getting called (shravanmn via sms) PIG-654: Optimize build.xml (daijy) PIG-574: allowing to run scripts from within grunt shell (hagleitn via olgan) PIG-665: Map key type not correctly set (for use when key is null) when map plan does not have localrearrange (pradeepkth) PIG-590: error handling on the backend (sms via olgan) PIG-590: error handling on the backend (sms) PIG-658: Data type long : When 'L' or 'l' is included with data (123L or 123l) load produces null value. Also the case with Float (thejas via sms) PIG-591: Error handling phase four (sms via pradeepkth) PIG-664: Semantics of * is not consistent (sms) PIG-684: outputSchema method in TOKENIZE is broken (thejas via sms) PIG-655: Comparison of schemas of bincond operands is flawed (sms via pradeepkth) PIG-691: BinStorage skips tuples when ^A is present in data (pradeepkth via sms) PIG-577: outer join query looses name information (sms via pradeepkth) PIG-690: UNION doesn't work in the latest code (pradeepkth via sms) PIG-544: Utf8StorageConverter.java does not always produce NULLs when data is malformed(thejas via sms) PIG-532: Casting a field removes its alias.(thejas via sms) PIG-705: Pig should display a better error message when backend error messages cannot be parsed (sms) PIG-650: pig should look for and use the pig specific 'pig-cluster-hadoop-site.xml' in the non HOD case just like it does in the HOD case (sms) PIG-699: Implement forrest docs target in Pig Build (gkesavan via olgan) PIG-706: Implement ant target to use findbugs on PIG (gkesavan via olgan) PIG-708: implement releaseaudit tart to use rats on pig (gkesavan via olgan) PIG-703: user documentation (chandec vi olgan) PIG-711: Implement checkstyle for pig (gkesavan via olgan) PIG-715: doc updates (chandec vi olgan) PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates) PIG-692: When running a job from a script, use the name of that script as the default name for the job (vzaliva via gates) PIG-718: To add standard ant targets to build.xml file (gkesavan via olgan) PIG-720: further doc cleanup (gkesavan via olgan) Release 0.1.1 - 2008-12-04 INCOMPATIBLE CHANGES NEW FEATURES IMPROVEMENTS PIG-253: integration with hadoop-18 BUG FIXES PIG-342: Fix DistinctDataBag to recalculate size after it has spilled. (bdimcheff via gates) Release 0.1.0 - 2008-09-11 INCOMPATIBLE CHANGES PIG-123: requires escape of '\' in chars and string NEW FEATURES PIG-20 Added custom comparator functions for order by (phunt via gates) PIG-94: Streaming implementation (arunc via olgan) PIG-58: parameter substitution PIG-55: added custom splitter (groves via olgan) PIG-59: Add a new ILLUSTRATE command (shubhamc via gates) PIG-256: Added variable argument support for UDFs (pi_song) IMPROVEMENTS: PIG-8 added binary comparator (olgan) PIG-11 Add capability to search for jar file to register. (antmagna via olgan) PIG-7: Added use of combiner in some restricted cases. (gates) PIG-47: Added methods to DataMap to provide access to its content PIG-30: Rewrote DataBags to better handle decisions of when to spill to disk and to spill more intelligently. (gates) PIG-12: Added time stamps to log4j messages (phunt via gates) PIG-44: Added adaptive decision of the number of records to hold in memory before spilling (utkarsh) PIG-56: Made DataBag implement Iterable. (groves via gates) PIG-39: created more efficient version of read (spullara via olgan) PIG-32: ABstraction layer (olgan) PIG-83: Change everything except grunt and Main (PigServer on down) to use common logging abstraction instead of log4j. By default in grunt, log4j still used as logging layer. Also converted all System.out/err.println statements to use logging instead. (francisoud via gates) PIG-13: adding version to the system (joa23 via olgan) PIG-113: Make explain output more understandable (pi_song via gates) PIG-120: Support map reduce in local mode. To do this user needs to specify execution type as mapreduce and cluster name as local (joa23 via gates) PIG-106: Change StringBuffer and String '+' to StringBuilder (francisoud via gates) PIG-111: Reworked configuration to be setable via properties. (joa23, pi_song, oae via gates) BUG FIXES PIG-24 Files that were incorrectly placed under test/reports have been removed. ant clean now cleans test/reports. (milindb via gates) PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@) PIG-23 Made pig work with java 1.5. (milindb via gates) PIG-17 integrated with Hadoop 0.15 (olgan@) PIG-33 Help was commented out - uncommented (olgan) PIG-31: second half of concurrent mode problem addressed (olgan) PIG-14: added heartbeat functionality (olgan) PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release PIG-29: fixed bag factory to be properly initialized (utkarsh) PIG-43: fixed problem where using the combiner prevented a pig alias from being evaluated more than once. (gates) PIG-45: Fixed pig.pl to not assume hodrc file is named the same as cluster name (gates) PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples instead of Tuples, causing Reducer to crash in some cases. PIG-41: Added patterns to svn:ignore PIG-51: Fixed combiner in the presence of flattening PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the comparator function instead of Class.forName. (gates) PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@) PIG-77: Added eclipse specific files to svn:ignore PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates) PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates) PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arunc via olgan) PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default path. Also fix it to not die if pigclient.conf is missing. (craigm via gates) PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill files when they are done spilling (contributions by craigm, breed, and gates, committed by gates) PIG-95: Remove System.exit() statements from inside pig (joa23 via gates) PIG-65: convert tabs to spaces (groves via olgan) PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when more than one bag is involved (gates) PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf reference. (francisoud via gates) PIG-80: In a number of places stack trace information was being lost by an exception being caught, and a different exception then thrown. All those locations have been changed so that the new exception now wraps the old. (francisoud via gates) PIG-84: Converted printStackTrace calls to calls to the logger. (francisoud via gates) PIG-88: Remove unused HadoopExe import from Main. (pi_song via gates) PIG-99: Fix to make unit tests not run out of memory. (francisoud via gates) PIG-107: enabled several tests. (francisoud via olgan) PIG-46: abort processing on error for non-interactive mode (olston via olgan) PIG-109: improved exception handling (oae via olgan) PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can be run w/o access to a hadoop cluster. (xuzh via gates) PIG-68: improvements to build.xml (joa23 via olgan) PIG-110: Replaced code accidently merged out in PIG-32 fix that handled flattening the combiner case. (gates and oae) PIG-213: Remove non-static references to logger from data bags and tuples, as it causes significant overhead (vgeschel via gates) PIG-284: target for building source jar (oae via olgan)