/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ Pig Change Log Release 0.10.1 INCOMPATIBLE CHANGES IMPROVEMENTS PIG-2907: Publish pig jars for Hadoop2/23 to maven (rohini) PIG-3019: Need a target in build.xml for source releases (gates) PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates) PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy) PIG-2852: Update documentation regarding parallel local mode execution (cheolsoo via jcoveney) PIG-2712: Pig does not call OutputCommitter.abortJob() on the underlying OutputFormat (rohini via gates) PIG-2727: PigStorage Source tagging does not need pig.splitCombination to be turned off (prkommireddi via dvryaboy) PIG-2711: e2e harness: cache benchmark results between test runs (thw via daijy) PIG-2680: TOBAG output schema reporting (andy schlaikjer via jcoveney) PIG-2650: Convenience mock Loader and Storer to simplify unit testing of Pig scripts (julien) BUG FIXES PIG-3107: bin and autocomplete are missing in src release (daijy) PIG-3106: Missing license header in several java file (daijy) PIG-3099: Pig unit test fixes for TestGrunt(1), TestStore(2), TestEmptyInputDir(3) (vikram.dixit via daijy) PIG-3035: With latest version of hadoop23 pig does not return the correct exception stack trace from backend (rohini) PIG-2953: "which" utility does not exist on Windows (daijy) PIG-2960: Increase the timeout for unit test (daijy) PIG-2958: Pig tests do not appear to have a logger attached (daijy) PIG-2942: DevTests, TestLoad has a false failure on Windows (jgordon via daijy) PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy) PIG-2801: grunt "sh" command should invoke the shell implicitly instead of calling exec directly with the command tokens (jgordon via daijy) PIG-2800: pig.additional.jars path separator should align with File.pathSeparator instead of being hard-coded to ":" (jgordon via azaroth) PIG-2798: pig streaming tests assume interpreters are auto-resolved (jgordon via daijy) PIG-2797: Tests should not create their own file URIs through string concatenation, should use Util.generateURI instead (jgordon via daijy) PIG-2796: Local temporary paths are not always valid HDFS path names (jgordon via daijy) PIG-2795: Fix test cases that generate pig scripts with "load " + pathStr to encode "\" in the path (jgordon via daijy) PIG-2940: HBaseStorage store fails in secure cluster (cheolsoo via daijy) PIG-2821: HBaseStorage should work with secure hbase (rohini via daijy) PIG-2890: Revert PIG-2578 (dvryaboy) PIG-2859: Fix few e2e test failures (rohini via daijy) PIG-2729: Macro expansion does not use pig.import.search.path - UnitTest borked (johannesch via daijy) PIG-2791: Pig does not work with Namenode Federation (rohini via daijy) PIG-2783: Fix Iterator_1 e2e test for Hadoop 23 (rohini via daijy) PIG-2761: With hadoop23 importing modules inside python script does not work (rohini via daijy) PIG-2759: Typo in document "Built In Functions" (daijy) PIG-2745: Pig e2e test RubyUDFs fails in MR mode when running from tarball (cheolsoo via daijy) PIG-2741: Python script throws an NameError: name 'Configuration' is not defined in case cache dir is not created (knoguchi via daijy) PIG-2669: Pig release should include pig-default.properties after rebuild (daijy) PIG-2739: PyList should map to Bag automatically in Jython (daijy) PIG-2730: TFileStorage getStatistics incorrectly throws an exception instead of returning null (traviscrawford via daijy) PIG-2717: Tuple field mangled during flattening (daijy) PIG-2721: Wrong output generated while loading bags as input (knoguchi via daijy) PIG-2912: Pig should clone JobConf while creating JobContextImpl and TaskAttemptContextImpl in Hadoop23 (rohini via daijy) PIG-2775: Register jar does not goes to classpath in some cases (daijy) Release 0.10.0 INCOMPATIBLE CHANGES IMPROVEMENTS PIG-2685: Fix error in EvalFunc ctor when implementing Algebraic UDF whose return type is parameterized (andy schlaikjer via jcoveney) PIG-2541: Automatic record provenance (source tagging) for PigStorage (prkommireddi via daijy) PIG-2601: Additional document for 0.10 (daijy) PIG-2317: Ruby/Jruby UDFs (jcoveney via daijy) PIG-1270: Push limit into loader (daijy) PIG-2604: Pig should print its build info at runtime (traviscrawford via dvryaboy) PIG-2589: Additional e2e test for 0.10 new features (daijy) PIG-2182: Add more append support to DataByteArray (gsingers via daijy) PIG-438: Handle realiasing of existing Alias (A=B;) (daijy) PIG-2548: Support for providing parameters to python script (daijy) PIG-2518: Add ability to clean ivy cache in build.xml (daijy) PIG-2533: Pig MR job exceptions masked on frontend (traviscrawford via dvryaboy) PIG-2515: [piggybank] Make CustomFormatToISO return null on Exception in parsing dates (rjurney via dvryaboy) PIG-2503: Make @MonitoredUDF inherited (dvryaboy) PIG-2453: Fetching schema can be very slow for multi-thousand file LOADs (dvryaboy) PIG-2311: STRSPLIT needs to allow bytearray arguments (xuting via olgan) PIG-2300: Pig Docs - release 0.10.0 (and 0.9.1) (chandec via daijy) PIG-2230: Improved error message for invalid parameter format (xuitingz via olgan) PIG-2332: JsonLoader/JsonStorage (daijy) PIG-2328: Add builtin UDFs for building and using bloom filters (gates) PIG-2334: Set default number of reducers for S3N filesystem (ddaniels888 via daijy) PIG-1387: Syntactical Sugar for PIG-1385 (azaroth) PIG-2305: Pig should log the split locations in task logs (vivekp via thejas) PIG-2293: Pig should support a more efficient merge join against data sources that natively support point lookups or where the join is against large, sparse tables (aklish via daijy) PIG-2287: add test cases for limit and sample that use expressions with constants only (no scalar variables) (thejas via gates) PIG-2092: Missing sh command from Grant shell (olgan) PIG-2163: Improve nested cross to stream one relation (zjshen via daijy) PIG-2249: Enable pig e2e testing on EC2 (gates) PIG-2256: Upgrade Avro dependency to 1.5.3 (tucu00 via dvryaboy) PIG-604: Kill the Pig job should kill all associated Hadoop Jobs (daijy) PIG-2096: End to end tests for new Macro feature (gates) PIG-2242: Allow the delimiter to be specified when calling TOKENIZE (markroddy via hashutosh) PIG-2240: Allow any compression codec to be specified in AvroStorage (tomwhite via dvryaboy) PIG-2229: Pig end-to-end tests should test local mode as well as mr mode (gates) PIG-2235: Several files in e2e tests aren't being run (gates) PIG-2196: Test harness should be independent of Pig (hashutosh) -- Missed few changes in last commit. PIG-2196: Test harness should be independent of Pig (hashutosh) PIG-1429: Add Boolean Data Type to Pig (zjshen via daijy) PIG-2218: Pig end-to-end tests should be accessible from top level build.xml (gates) PIG-2176: add logical plan assumption checker (thejas) PIG-1631: Support to 2 level nested foreach (aniket486 via daijy) PIG-2191: Reduce amount of log spam generated by UDFs (dvryaboy) PIG-2200: Piggybank cannot be built from the Git mirror (dvryaboy) PIG-2168: CubeDimensions UDF (dvryaboy) PIG-2189: e2e test harness needs to use Pig as a source of truth (gates via daijy) PIG-1904: Default split destination (azaroth via thejas) PIG-2143: Make PigStorage optionally store schema; improve docs. (dvryaboy) PIG-1973: UDFContext.getUDFContext usage of ThreadLocal pattern is not typical (woody via thejas) PIG-2053: PigInputFormat uses class.isAssignableFrom() where instanceof is more appropriate (woody via thejas) PIG-2161: TOTUPLE should use no-copy tuple creation (dvryaboy) PIG-1946: HBaseStorage constructor syntax is error prone (billgraham via dvryaboy) PIG-2001: DefaultTuple(List) constructor is inefficient, causes List.size() System.arraycopy() calls (though they are 0 byte copies), DefaultTuple(int) constructor is a bit misleading wrt time complexity (woody via thejas) PIG-1916: Nested cross (zjshen via daijy) PIG-2121: e2e test harness should use ant instead of make (gates) PIG-2142: Allow registering multiple jars from DFS via single statement (rangadi via dvryaboy) PIG-1926: Sample/Limit should take scalar (azaroth via thejas) PIG-1950: e2e test harness needs to be able to compare to previous version of Pig (gates) PIG-536: the shell script 'pig' does not work if PIG_HOME has the word 'hadoop' in it's directory (miguno via olgan) PIG-2108 e2e test harness needs to be able to mark certain tests as ignored (gates) PIG-1825: ability to turn off the write ahead log for pig's HBaseStorage (billgraham via dvryaboy) PIG-1772: Pig 090 Documentation (chandec via olgan) PIG-1772: Pig 090 Documentation (chandec via olgan) PIG-1772: Pig 090 Documentation (chandec via olgan) PIG-1824: Support import modules in Jython UDF (woody via rding) PIG-1994: e2e test harness deployment implementation for existing cluster (gates) PIG-2036: [piggybank] Set header delimiter in PigStorageSchema (mmoeller via dvryaboy) PIG-1949: e2e test harness should use bin/pig rather than calling java directly (gates) PIG-2026: e2e tests in eclipse classpath (azaroth via hashutosh) PIG-2024: Incorrect jar paths in .classpath template for eclipse (azaroth via hashutosh) OPTIMIZATIONS PIG-2011: Speed up TestTypedMap.java (dvryaboy) PIG-2228: support partial aggregation in map task (thejas) BUG FIXES PIG-2195: AvroStorage fails to STORE when LOADing via PigStorage (billgraham via gates) PIG-2202: AvroStorage doesn't work with Avro 1.5.1 (billgraham via gates) PIG-2578: Multiple Store-commands mess up mapred.output.dir. (daijy) PIG-2623: Support S3 paths for registering UDFs (nshkrob via daijy) PIG-2618: e2e local fails to build PIG-2612: e2e harness: should not require PH_OLD_CLUSTER_CONF (thw via daijy) PIG-2505: AvroStorage won't read any file not ending in .avro (russell.jurney via daijy) PIG-2585: Enable ignored e2e test cases (daijy) PIG-2563: IndexOutOfBoundsException: while projecting fields from a bag (daijy) PIG-2411: AvroStorage UDF in PiggyBank fails to STORE a bag of single-field tuples as Avro arrays (russell.jurney via daijy) PIG-2565: Support IMPORT for macros stored in S3 Buckets (daijy) PIG-2570: LimitOptimizer fails with dynamic LIMIT argument (daijy) PIG-2543: PigStats.isSuccessful returns false if embedded pig script has sh commands (daijy) PIG-2509: Util.getSchemaFromString fails with java.lang.NullPointerException when a tuple in a bag has no name (as when used in MongoStorage UDF) (jcoveney via daijy) PIG-2559: Embedded pig in python; invoking sys.exit(0) causes script failure (vivekp via daijy) PIG-2489: Input Path Globbing{} not working with PigStorageSchema or PigStorage('\t', '-schema') (daijy) PIG-2484: Fix several e2e test failures/aborts for 23 (daijy) PIG-2400: Document has based aggregation support (chandec via daijy) PIG-2444: Remove the Zebra *.xml documentation files from the TRUNK and Branch-10 (chandec via daijy) PIG-2430: An EvalFunc which overrides getArgToFuncMapping with FuncSpec with constructor arguments is not properly instantiated with said arguments (jcoveney via thejas) PIG-2457: JsonLoaderStorage tests is broken for e2e (daijy) PIG-2426: ProgressableReporter.progress(String msg) is an empty function (vivekp via daijy) PIG-2363: _logs for streaming commands bug in new parser (vivekp via daijy) PIG-2425: Aggregate Warning does not work as expected on Embedding Pig in Java 0.9.1 (prkommireddi via thejas) PIG-2331: BinStorage in LOAD statement failing when input has curly braces (xutingz via thejas) PIG-2391: Bzip_2 test is broken (xutingz via daijy) PIG-2358: JobStats.getHadoopCounters() is never set and always returns null (xutingz via daijy) PIG-2384: PIG-2384: Generic Invokers should use PigContext to resolve classes (dvryaboy) PIG-2184: Not able to provide positional reference to macro invocations (xutingz via daijy) PIG-2209: JsonMetadata fails to find schema for glob paths (daijy) PIG-2352: e2e test harness' use of environment variables causes unintended effects between tests (gates) PIG-2165: Need a way to deal with params and param_file in embedded pig in python (daijy) PIG-2313: NPE in ILLUSTRATE trying to get StatusReporter in STORE (daijy) PIG-2335: bin/pig does not work with bash 3.0 (azaroth) PIG-2275: NullPointerException from ILLUSTRATE (daijy) PIG-2290: TOBAG wraps tuple parameters in another tuple (ryan.hoegg via thejas) PIG-2288: Pig 0.9 error message not useful as compared to 0.8 in case of group by (vivekp via thejas) PIG-2309: Keyword 'NOT' is wrongly treated as a UDF in split statement (vivekp via thejas) PIG-2273: Pig.compileFromFile in embedded python fails when pig script starts with a comment (ddaniels888 via gates) PIG-2278: Wrong version numbers for libraries in eclipse template classpath (azaroth) PIG-2115: Fix Pig HBaseStorage configuration and setup issues (gbowyer@fastmail.co.uk via dvryaboy) PIG-2232: "declare" document contains a typo (daijy) PIG-2185: NullPointerException while Accessing Empty Bag in FOREACH { FILTER } (daijy) PIG-2227: Wrong jars copied into lib directory in e2e tests when invoked from top level (gates) PIG-2219: Pig tests fail if ${user.home}/pigtest/conf does not already exist (cwsteinbach via gates) PIG-2215: Newlines in function arguments still cause exceptions to be thrown (awarring via gates) PIG-2214: InternalSortedBag two-arg constructor doesn't pass bagCount (sallen via gates) PIG-2174: HBaseStorage column filters miss some fields (billgraham via dvryaboy) PIG-2090: re-enable TestGrunt test cases (thejas) PIG-2181: Improvement : for error message when describe misses alias (vivekp via daijy) PIG-2124: Script never ending when joining from the same source (daijy) PIG-2170: NPE thrown during illustrate (thejas) PIG-2186: PigStorage new warnings about missing schema file can be confusing (thejas) PIG-2179: tests in TestLoad are failing (thejas) PIG-2146: POStore.getSchema() returns null because of which PigOutputCommitter is not storing schema while cleanup (thejas) PIG-2027: NPE if Pig don't have permission for log file (daijy) PIG-2171: TestScriptLanguage is broken on trunk (daijy and thejas) PIG-2162: bin/pig should not modify user args (rangadi via thejas) PIG-2060: Fix errors in pig grammars reported by ANTLRWorks (azaroth via thejas) PIG-2156: Limit/Sample with variable does not work if the expression starts with an integer/double (azaroth via thejas) PIG-2130: Piggybank:MultiStorage is not compressing output files (vivekp via daijy) PIG-2147: Support nested tags for XMLLoader (vivekp via daijy) PIG-2110: NullPointerException in piggybank.evaluation.util.apachelogparser.SearchTermExtractor (dale_jin via daijy) PIG-2144: ClassCastException when using IsEmpty(DIFF()) (thejas) PIG-2136: Implementation of Sample should use LessThanExpression instead of LessThanEqualExpression (azaroth via thejas) PIG-2129: NOTICE file needs updates (gates) PIG-2131: Add back test for PIG-1769 (qwertymaniac via gates) PIG-2112: ResourceSchema.toString does not properly handle maps in the schema (gates) PIG-1702: Streaming debug output outputs null input-split information (awarring via daijy) PIG-2109: Ant build continues even if the parser classes fail to be generated. (zjshen via daijy) PIG-2071: casting numeric type to chararray during schema merge for union is inconsistent with other schema merge cases (thejas) PIG-2044: Patten match bug in org.apache.pig.newplan.optimizer.Rule (knoguchi via daijy) PIG-2048: Add zookeeper to pig jar (gbowyer via gates) PIG-2025: org.apache.pig.test.udf.evalfunc.TOMAP is missing package declaration (azaroth via gates) PIG-2019: smoketest-jar target has to depend on pigunit-jar to guarantee inclusion of test classes (cos via gates) Release 0.9.3 IMPROVEMENT PIG-2766: Pig-HCat Usability (vikram.dixit via daijy) PIG-2619: HBaseStorage constructs a Scan with cacheBlocks = false (andy lindeman via jcoveney) PIG-2590: running ant tar and rpm targets on same copy of pig source results in problems (thejas) BUG FIXES PIG-2944: PIG-2944: ivysettings.xml does not let you override .m2/repository (raluri via daijy) PIG-2666: LoadFunc.setLocation() is not called when pig script only has Order By (daijy) PIG-2671: e2e harness: Reference local test path via :LOCALTESTPATH: (thw via daijy) PIG-2530: Reusing alias name in nested foreach causes incorrect results (daijy) PIG-2540: [piggybank] AvroStorage can't read schema on amazon s3 in elastic mapreduce (rjurney via jcoveney) PIG-2588: e2e harness: use pig command for cluster deploy (thw via daijy) PIG-2642: StoreMetadata.storeSchema can't access files in the output directory (Hadoop 0.23) (thw via daijy) PIG-2621: Documentation inaccurate regarding Pig Properties in trunk (prkommireddi via daijy) PIG-2550: Custom tuple results in "Unexpected datatype 110 while reading tuplefrom binary file" while spilling (daijy) PIG-2442: Multiple Stores in pig streaming causes infinite waiting (daijy) PIG-2609: e2e harness: make hdfs base path configurable (outside default.conf) (thw via daijy) PIG-2576: Change in behavior for UDFContext.getUDFContext().getJobConf() in front-end (thw via daijy) PIG-2532: Registered classes fail deserialization in frontend (traviscrawford via julien) PIG-2572: e2e harness deploy fails when using pig that does not bundle hadoop (thw via daijy) PIG-2568: PigOutputCommitter hide exception in commitJob (daijy) PIG-2564: Build fails - Hadoop 0.23.1-SNAPSHOT no longer available (thw via daijy) PIG-2535: Bug in new logical plan results in no output for join (daijy) PIG-2534: Pig generating infinite map outputs (daijy) PIG-2508: PIG can unpredictably ignore deprecated Hadoop config options (daijy) PIG-2493: UNION causes casting issues (vivekp via daijy) PIG-2497: Order of execution of fs, store and sh commands in Pig is not maintained (daijy) Release 0.9.2 IMPROVEMENTS PIG-2468: Speed up TestBuiltin (dvryaboy) PIG-2467: Speed up TestCommit (dvryaboy) PIG-2460: Use guava 11 instead of r06 (dvryaboy) PIG-2125: Make Pig work with hadoop .NEXT (daijy) PIG-2471: Pig Requirements Hadoop (chandec via daijy) PIG-2431: Upgrade bundled hadoop version to 1.0.0 (daijy) PIG-2447: piggybank: get hive dependency from maven (thw via azaroth) PIG-2347: Fix Pig Unit tests for hadoop 23 (daijy) PIG-2128: Generating the jar file takes a lot of time and is unnecessary when running Pig local mode (julien) BUG FIXES PIG-2055: inconsistent behavior in parser generated during build (thejas) PIG-2119: DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan (daijy) PIG-2120: UDFContext.getClientSystemProps() does not respect pig.properties (dvryaboy) PIG-2172: Fix test failure for ant 1.8.x (daijy) PIG-2379: Bug in Schema.getPigSchema(ResourceSchema rSchema) improperly adds two level access (jcoveney via dvryaboy) PIG-2427: getSchemaFromString throws away the name of the tuple that is in a bag (jcoveney via dvryaboy) PIG-2428: In pig9, can't have limit(order by) without getting a null error (jcoveney via daijy) PIG-2477: TestBuiltin testLFText/testSFPig failing against 23 due to invalid test setup -- InvalidInputException (phunt via daijy) PIG-2462: getWrappedSplit is incorrectly returning the first split instead of the current split. (arov via daijy) PIG-2472: piggybank unit tests write directly to /tmp (thw via daijy) PIG-2413: e2e test should support testing against two cluster (daijy) PIG-2342: Pig tutorial documentation needs to update about building tutorial (daijy) PIG-2458: Can't have spaces in parameter substitution (jcoveney via daijy) PIG-2410: Piggybank does not compile in 23 (daijy) PIG-2418: rpm release package does not take PIG_CLASSPATH (daijy) PIG-2291: PigStats.isSuccessful returns false if embedded pig script has dump (xutingz via daijy) PIG-2415: A fix for 0.23 local mode: put "yarn-default.xml" into the configuration (daijy) PIG-2402: inIllustrator condition in PigMapReduce is wrong for hadoop 23 (daijy) PIG-2370: SkewedParitioner results in Kerberos error (daijy) PIG-2374: streaming regression with dotNext (daijy) PIG-2387: BinStorageRecordReader causes negative progress (xutingz via daijy) PIG-2354: Several fixes for bin/pig (daijy) PIG-2385: Store statements not getting processed (daijy) PIG-2320: Error: "projection with nothing to reference" (daijy) PIG-2346: TypeCastInsert should not insert Foreach if there is no as statement (daijy) PIG-2339: HCatLoader loads all the partitions in a partitioned table even though a filter clause on the partitions is specified in the Pig script (daijy) PIG-2316: Incorrect results for FILTER *** BY ( *** OR ***) with FilterLogicExpressionSimplifier optimizer turned on (knoguchi via thejas) PIG-2271: PIG regression in BinStorage/PigStorage in 0.9.1 (thejas) Release 0.9.1 IMPROVEMENTS PIG-2284: Add pig-setup-conf.sh script (eyang via daijy) PIG-2272: e2e test harness should be able to set HADOOP_HOME (gates via daijy) PIG-2239: Pig should use "bin/hadoop jar pig-withouthadoop.jar" in bin/pig instead of forming java command itself (daijy) PIG-2213: Pig 0.9.1 Documentation (chandec via daijy) PIG-2221: Couldnt find documentation for ColumnMapKeyPrune optimization rule (chandec via daijy) BUG FIXES PIG-2307: Jetty version should be updated in .eclipse.templates/.classpath, pig-template.xml and pig.pom as well (zjshen via daijy) PIG-2310: bin/pig fail when both pig-0.9.1.jar and pig.jar are in PIG_HOME (daijy) PIG-1857: Create an package integration project (eyang via daijy) PIG-2013: Penny gets a null pointer when no properties are set (breed via daijy) PIG-2102: MonitoredUDF does not work (dvryaboy) PIG-2152: Null pointer exception while reporting progress (thejas) PIG-2183: Pig not working with Hadoop 0.20.203.0 (daijy) PIG-2193: Using HBaseStorage to scan 2 tables in the same Map job produces bad data (rangadi via dvryaboy) PIG-2199: Penny throws Exception when netty classes are missing (ddaniels888 via daijy) PIG-2223: error accessing column in output schema of udf having project-star input (thejas) PIG-2208: Restrict number of PIG generated Haddop counters (rding via daijy) PIG-2299: jetty 6.1.14 startup issue causes unit tests to fail in CI (thw via daijy) PIG-2301: Some more bin/pig, build.xml cleanup for 0.9.1 (daijy) PIG-2237: LIMIT generates wrong number of records if pig determines no of reducers as more than 1 (daijy) PIG-2261: Restore support for parenthesis in Pig 0.9 (rding via daijy) PIG-2238: Pig 0.9 error message not useful as compared to 0.8 (daijy) PIG-2286: Using COR function in Piggybank results in ERROR 2018: Internal error. Unable to introduce the combiner for optimization (daijy) PIG-2270: Put jython.jar in classpath (daijy) PIG-2274: remove pig deb package dependency on sun-java6-jre (gkesavan via daijy) PIG-2264: Change conf/log4j.properties to conf/log4j.properties.template (daijy) PIG-2231: Limit produce wrong number of records after foreach flatten (daijy) Release 0.9.0 - Unreleased INCOMPATIBLE CHANGES PIG-1622: DEFINE streaming options are ill defined and not properly documented (xuefu) PIG-1680: HBaseStorage should work with HBase 0.90 (gstathis, billgraham, dvryaboy, tlipcon via dvryaboy) PIG-1745: Disable converting bytes loading from BinStorage (daijy) PIG-1188: Padding nulls to the input tuple according to input schema (daijy) PIG-1876: Typed map for Pig (daijy) IMPROVEMENTS PIG-1938: support project-range as udf argument (thejas) PIG-2059: PIG doesn't validate incomplete query in batch mode even if -c option is given (xuefu) PIG-2062: Script silently ended (xuefu) PIG-2039: IndexOutOfBounException for a case (xuefu) PIG-2038: Pig fails to parse empty tuple/map/bag constant (xuefu) PIG-1775: Removal of old logical plan (xuefu) PIG-1998: Allow macro to return void (rding) PIG-2003: Using keyward as alias doesn't either emit an error or produce a logical plan (xuefu) PIG-1981: LoadPushDown.pushProjection should pass alias in addition to position (daijy) PIG-2006: Regression: NPE when Pig processes an empty script file, fix test case (xuefu) PIG-2006: Regression: NPE when Pig processes an empty script file (xuefu) PIG-2007: Parsing error when map key referred directly from udf in nested foreach (xuefu) PIG-2000: Pig gives incorrect error message dealing with scalar projection (xuefu) PIG-2002: Regression: Pig gives error "Projection with nothing to reference!" for a valid query (xuefu) PIG-1921: Improve error messages in new parser (xuefu) PIG-1996: Pig new parser fails to recognize PARALLEL keywords in a case (xuefu) PIG-1612: error reporting: PigException needs to have a way to indicate that its message is appropriate for user (laukik via thejas) PIG-1782: Add ability to load data by column family in HBaseStorage (billgraham via dvryaboy) PIG-1772: Pig 090 Documentation (chandec via olgan) PIG-1954: Design deployment interface for e2e test harness (gates) PIG-1881: Need a special interface for Penny (Inspector Gadget) (laukik via gates) PIG-1947: Incorrect line number is reported during parsing(xuefu) PIG1918: Line number should be give for logical plan failures (xuefu) PIG-1961: Pig prints "null" as file name in case of grammar error (xuefu) PIG-1956: Pig parser shouldn't log error code 0 (xuefu) PIG-1957: Pig parser gives misleading error message when the next foreach block has syntactic errors (xuefu) PIG-1958: Regression: Pig doesn't log type cast warning messages (xuefu) PIG-1918: Line number should be give for logical plan failures (xuefu) PIG-1899: Add end to end test harness for Pig (gates) PIG-1932: GFCross should allow the user to set the DEFAULT_PARALLELISM value (gates) PIG-1913: Use a file for excluding tests (tomwhite via gates) PIG-1693: support project-range expression. (was: There needs to be a way in foreach to indicate "and all the rest of the fields" ) (thejas) PIG-1772: Pig 090 Documentation (chandec via daijy) PIG-1830: Type mismatch error in key from map, when doing GROUP on PigStorageSchema() variable (dvryaboy) PIG-1566: Support globbing for registering jars in pig script (nrai via daijy) PIG-1886: Add zookeeper jar to list of jars shipped when HBaseStorage used (dvryaboy) PIG-1874: Make PigServer work in a multithreading environment (rding) PIG-1889: bin/pig should pick up HBase configuration from HBASE_CONF_DIR PIG-1794: Javascript support for Pig embedding and UDFs in scripting languages (julien) PIG-1853: Using ANTLR jars from maven repository (rding) PIG-1728: more doc updates (chandec via olgan) PIG-1793: Add macro expansion to Pig Latin (rding) PIG-847: Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag (daijy) PIG-1748: Add load/store function AvroStorage for avro data (guolin2001, jghoman via daijy) PIG-1769: Consistency for HBaseStorage (dvryaboy) PIG-1786: Move describe/nested describe to new logical plan (daijy) PIG-1809: addition of TOMAP function (olgan) PIG-1749: Update Pig parser so that function arguments can contain newline characters (jghoman via daijy) PIG-1806: Modify embedded Pig API for usability (rding) PIG-1799: Provide deployable maven artifacts for pigunit and pig smoke tests (cos via gates) PIG-1728: turing complete docs (chandec via olgan) PIG-1675: allow PigServer to register pig script from InputStream (zjffdu via dvryaboy) PIG-1479: Embed Pig in scripting languages (rding) PIG-946: Combiner optimizer does not optimize when limit follow group, foreach (thejas) PIG-1277: Pig should give error message when cogroup on tuple keys of different inner type (daijy) PIG-1755: Clean up duplicated code in PhysicalOperators (dvryaboy) PIG-750: Use combiner when algebraic UDFs are used in expressions (thejas) PIG-490: Combiner not used when group elements referred to in tuple notation instead of flatten. (thejas) PIG-1768: 09 docs: illustrate (changec via olgan) PIG-1768: docs reorg (changec via olgan) PIG-1712: ILLUSTRATE rework (yanz) PIG-1758: Deep cast of complex type (daijy) PIG-1728: doc updates (chandec via olgan) PIG-1752: Enable UDFs to indicate files to load into the Distributed Cache (gates) PIG-1747: pattern match classes for matching patterns in physical plan (thejas) PIG-1707: Allow pig build to pull from alternate maven repo to enable building against newer hadoop versions (pradeepkth) PIG-1618: Switch to new parser generator technology (xuefuz via thejas) PIG-1531: Pig gobbles up error messages (nrai via hashutosh) PIG-1508: Make 'docs' target (forrest) work with Java 1.6 (cwsteinbach via gates) PIG-1608: pig should always include pig-default.properties and pig.properties in the pig.jar (nrai via daijy) OPTIMIZATIONS PIG-1696: Performance: Use System.arraycopy() instead of manually copying the bytes while reading the data (hashutosh) BUG FIXES PIG-1890: Fix piggybank unit test TestAvroStorage (kengoodhope via daijy) PIG-2008: Cache outputFormat in HBaseStorage (thedatachef via gates) PIG-2137: SAMPLE should not be pushed above DISTINCT (dvryaboy and thejas) PIG-2139: LogicalExpressionSimplifier optimizer rule should check if udf is deterministic while checking if they are equal (thejas) PIG-2140: Usage printed from Main.java gives wrong option for disabling LogicalExpressionSimplifier (thejas) PIG-2159: New logical plan uses incorrect class for SUM causing for ClassCastException (daijy) PIG-2106: Fix Zebra unit test TestBasicUnion.testNeg3, TestBasicUnion.testNeg4 (daijy) PIG-2083: bincond ERROR 1025: Invalid field projection when null is used (thejas) PIG-2089: Javadoc for ResourceFieldSchema.getSchema() is wrong (daijy) PIG-2084: pig is running validation for a statement at a time batch mode, instead of running it for whole script (thejas) PIG-2088: Return alias validation failed when there is single line comment in the macro (rding) PIG-2081: Dryrun gives wrong line numbers in error message for scripts containing macro (rding) PIG-2078: POProject.getNext(DataBag) does not handle null (daijy) PIG-2029: Inconsistency in Pig Stats reports (rding) PIG-2070: "Unknown" appears in error message for an error case (thejas) PIG-2069: LoadFunc jar does not ship to backend in MultiQuery case (rding) PIG-2076: update documentation, help command with correct default value of pig.cachedbag.memusage (thejas) PIG-2072: NPE when udf has project-star argument and input schema is null (thejas) PIG-2075: Bring back TestNewPlanPushUpFilter (daijy) PIG-1827: When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason (rding) PIG-2056: Jython error messages should show script name (rding) PIG-2014: SAMPLE shouldn't be pushed up (dvryaboy) PIG-2058: Macro missing returns clause doesn't give a good error message (rding) PIG-2035: Macro expansion doesn't handle multiple expansions of same macro inside another macro (rding) PIG-2030: Merged join/cogroup does not automatically ship loader (daijy) PIG-2052: Ship guava.jar to backend (daijy) PIG-2012: Comments at the begining of the file throws off line numbers in errors (rding) PIG-2043: Ship antlr-runtime.jar to backend (daijy) PIG-2049: Pig should display TokenMgrError message consistently across all parsers (rding) PIG-2041: Minicluster should make each run independent (daijy) PIG-2040: Move classloader from QueryParserDriver to PigContext (daijy) PIG-1999: Macro alias masker should consider schema context (rding) PIG-1821: UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes) (thejas) PIG-2028: Speed up multiquery unit tests (rding) PIG-1990: support casting of complex types with empty inner schema to complex type with non-empty inner schema (thejas) PIG-2016: -dot option does not work with explain and new logical plan (daijy) PIG-2018: NPE for co-group with group-by column having complex schema and different load functions for each input (thejas) PIG-2015: Explain writes out logical plan twice (alangates) PIG-2017: consumeMap() fails with EmptyStackException (thedatachef via daijy) PIG-1989: complex type casting should return null on casting failure (daijy) PIG-1826: Unexpected data type -1 found in stream error (daijy) PIG-2004: Incorrect input types passed on to eval function (thejas) PIG-1814: mapred.output.compress in SET statement does not work (daijy) PIG-1976: One more TwoLevelAccess to remove (daijy) PIG-1865: BinStorage/PigStorageSchema cannot load data from a different namenode (daijy) PIG-1910: incorrect schema shown when project-star is used with other projections (daijy) PIG-2005: Discrepancy in the way dry run handles semicolon in macro definition (rding) PIG-1281: Detect org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple type of errors at Compile Type during creation of logical plan (thejas) PIG-1939: order-by statement should support project-range to-end in any position among the sort columns if input schema is known (thejas) PIG-1978: Secondary sort fail when dereferencing two fields inside foreach (daijy) PIG-1962: Wrong alias assinged to store operator (daijy) PIG-1975: Need to provide backward compatibility for legacy LoadCaster (without bytesToMap(bytes, fieldSchema)) (daijy) PIG-1987: -dryrun does not work with set (rding) PIG-1871: Dont throw exception if partition filters cannot be pushed up. (rding) PIG-1870: HBaseStorage doesn't project correctly (dvryaboy) PIG-1788: relation-as-scalar error messages should indicate the field being used as scalar (laukik via thejas) PIG-1697: NullPointerException if log4j.properties is Used (laukik via daijy) PIG-1929:Type checker failed to catch invalid type comparison (thejas) PIG-1928: Type Checking, incorrect error message (thejas) PIG-1979: New logical plan failing with ERROR 2229: Couldn't find matching uid -1 (daijy) PIG-1897: multiple star projection in a statement does not produce the right plan (thejas) PIG-1917: NativeMapReduce does not Allow Configuration Parameters containing Spaces (thejas) PIG-1974: Lineage need to set for every cast (thejas) PIG-1988: Importing an empty macro file causing NPE (rding) PIG-1977: "Stream closed" error while reading Pig temp files (results of intermediate jobs) (rding) PIG-1963: in nested foreach, accumutive udf taking input from order-by does not get results in order (thejas) PIG-1911: Infinite loop with accumulator function in nested foreach (thejas) PIG-1923: Jython UDFs fail to convert Maps of Integer values back to Pig types (julien) PIG-1944: register javascript UDFs does not work (julien) PIG-1955: PhysicalOperator has a member variable (non-static) Log object that is non-transient, this causes serialization errors (woody via rding) PIG-1964: PigStorageSchema fails if a column value is null (thejas)) PIG-1866: Dereference a bag within a tuple does not work (daijy) PIG-1984: Worng stats shown when there are multiple stores but same file names (rding) PIG-1893: Pig report input size -1 for empty input file (rding) PIG-1868: New logical plan fails when I have complex data types from udf (daijy) PIG-1927: Dereference partial name failed (daijy) PIG-1934: Fix zebra test TestCheckin1, TestCheckin4 (daijy) PIG-1931: Integrate Macro Expansion with New Parser (rding) PIG-1933: Hints such as 'collected' and 'skewed' for "group by" or "join by" should not be treated as tokens. (xuefuz via thejas) PIG-1925: Parser error message doesn't show location of the error or show it as Line 0:0 (xuefuz via gates) PIG-671: typechecker does not throw an error when multiple arguments are passed to COUNT (deepujain via gates) PIG-1152: bincond operator throws parser error (xuefuz via thejas) PIG-1885: SUBSTRING fails when input length less than start (deepujain via gates) PIG-719: store into 'filename'; should be valid syntax, but does not work (xuefuz via thejas) PIG-1770: matches clause problem with chars that have special meaning in dk.brics - #, @ .. (thejas) PIG-1862: Pig returns exit code 0 for the failed Pig script due to non-existing input directory (rding) PIG-1888: Fix TestLogicalPlanGenerator not use hardcoded path (daijy) PIG-1837: Error while using IsEmpty function (rding) PIG-1884: Change ReadToEndLoader.setLocation not throw UnsupportedOperationException (thejas) PIG-1887: Fix pig-withouthadoop.jar to contains proper jars (daijy) PIG-1779: Wrong stats shown when there are multiple loads but same file names (rding) PIG-1861: The pig script stored in the Hadoop History logs is stored as a concatenated string without whitespace this causes problems when attempting to extract and execute the script (rding) PIG-1829: "0" value seen in PigStat's map/reduce runtime, even when the job is successful (rding) PIG-1856: Custom jar is not packaged with the new job created by LimitAdjuster (rding) PIG-1872: Fix bug in AvroStorage (guolin2001, jghoman via daijy) PIG-1536: use same logic for merging inner schemas in "default union" and "union onschema" (daijy) PIG-1304: Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input (laukik via rding) PIG-1852: Packaging antlr jar with pig.jar (rding via daijy) PIG-1717 pig needs to call setPartitionFilter if schema is null but getPartitionKeys is not (gerritjvv via gates) PIG-313: Error handling aggregate of a computation (daijy) PIG-496: project of bags from complex data causes failures (daijy) PIG-730: problem combining schema from a union of several LOAD expressions, with a nested bag inside the schema (daijy) PIG-767: Schema reported from DESCRIBE and actual schema of inner bags are different (daijy) PIG-1801: Need better error message for Jython errors (rding) PIG-1742: org.apache.pig.newplan.optimizer.Rule.java does not work with plan patterns where leaves/sinks are not siblings (thejas) Release 0.8.0 - Unreleased INCOMPATIBLE CHANGES PIG-1518: multi file input format for loaders (yanz via rding) PIG-1249: Safe-guards against misconfigured Pig scripts without PARALLEL keyword (zjffdu vi olgan) IMPROVEMENTS PIG-1561: XMLLoader in Piggybank does not support bz2 or gzip compressed XML files (vivekp via daijy) PIG-1677: modify the repository path of pig artifacts to org/apache/pig in stead or org/apache/hadoop/pig (nrai via olgan) PIG-1600: Docs update (romainr via olgan) PIG-1632: The core jar in the tarball contains the kitchen sink (eli via olgan) PIG-1617: 'group all' should always use one reducer (thejas) PIG-1589: add test cases for mapreduce operator which use distributed cache (thejas) PIG-1548: Optimize scalar to consolidate the part file (rding) PIG-1600: Docs update (chandec via olgan) PIG-1585: Add new properties to help and documentation(olgan) PIG-1399: Filter expression optimizations (yanz via gates) PIG-1531: Pig gobbles up error messages (nrai via hashutosh) PIG-1458: aggregate files for replicated join (rding) PIG-1205: Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc (zjffdu and dvryaboy) PIG-1568: Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly (xuefuz via daijy) PIG-1574: Optimization rule PushUpFilter causes filter to be pushed up out joins (xuefuz via daijy) PIG-1515: Migrate logical optimization rule: PushDownForeachFlatten (xuefuz via daijy) PIG-1321: Logical Optimizer: Merge cascading foreach (xuefuz via daijy) PIG-1483: [piggybank] Add HadoopJobHistoryLoader to the piggybank (rding) PIG-1555: [piggybank] add CSV Loader (dvryaboy) PIG-1501: need to investigate the impact of compression on pig performance (yanz via thejas) PIG-1497: Mandatory rule PartitionFilterOptimizer (xuefuz via daijy) PIG-1514: Migrate logical optimization rule: OpLimitOptimizer (xuefuz via daijy) PIG-1551: Improve dynamic invokers to deal with no-arg methods and array parameters (dvryaboy) PIG-1311: Document audience and stability for remaining interfaces (gates) PIG-506: Does pig need a NATIVE keyword? (aniket486 via thejas) PIG-1510: Add `deepCopy` for LogicalExpressions (swati.j via daijy) PIG-1447: Tune memory usage of InternalCachedBag (thejas) PIG-1505: support jars and scripts in dfs (anhi via rding) PIG-1334: Make pig artifacts available through maven (niraj via rding) PIG-1466: Improve log messages for memory usage (thejas) PIG-1404: added PigUnit, a framework fo building unit tests of Pig Latin scripts (romainr via gates) PIG-1452: to remove hadoop20.jar from lib and use hadoop from the apache maven repo. (rding) PIG-1295: Binary comparator for secondary sort (azaroth via daijy) PIG-1448: Detach tuple from inner plans of physical operator (thejas) PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi via olgan) PIG-103: Shared Job /tmp location should be configurable (niraj via rding) PIG-1496: Mandatory rule ImplicitSplitInserter (yanz via daijy) PIG-346: grant help command cleanup (olgan) PIG-1199: help includes obsolete options (olgan) PIG-1434: Allow casting relations to scalars (aniket486 via rding) PIG-1461: support union operation that merges based on column names (thejas) PIG-1517: Pig needs to support keywords in the package name (aniket486 via olgan) PIG-928: UDFs in scripting languages (aniket486 via daijy) PIG-1509: Add .gitignore file (cwsteinbach via gates) PIG-1478: Add progress notification listener to PigRunner API (rding) PIG-1472: Optimize serialization/deserialization between Map and Reduce and between MR jobs (thejas) PIG-1389: Implement Pig counter to track number of rows for each input files (rding) PIG-1454: Consider clean up backend code (rding) PIG-1333: API interface to Pig (rding) PIG-1405: Need to move many standard functions from piggybank into Pig (aniket486 via daijy) PIG-1427: Monitor and kill runaway UDFs (dvryaboy) PIG-1428: Make a StatusReporter singleton available for incrementing counters (dvryaboy) PIG-972: Make describe work with nested foreach (aniket486 via daijy) PIG-1438: [Performance] MultiQueryOptimizer should also merge DISTINCT jobs (rding) PIG-1441: new test targets (olgan) PIG-282: Custom Partitioner (aniket486 via daijy) PIG-283: Allow to set arbitrary jobconf key-value pairs inside pig program (hashutosh) PIG-1373: We need to add jdiff output to docs on the website (daijy) PIG-1422: Duplicate code in LOPrinter.java (zjffdu) PIG-1420: Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple (rjurney via dvryaboy) PIG-1408: Annotate explain plans with aliases (rding) PIG-1410: Make PigServer can handle files with parameters (zjffdu) PIG-1406: Allow to run shell commands from grunt (zjffdu) PIG-1398: Marking Pig interfaces for org.apache.pig.data package (gates) PIG-1396: eclipse-files target in build.xml fails to generate necessary classes in src-gen PIG-1390: Provide a target to generate eclipse-related classpath and files (chaitk via thejas) PIG-1384: Adding contrib javadoc to main Pig javadoc (daijy) PIG-1320: final documentation updates for Pig 0.7.0 (chandec via olgan) PIG-1363: Unnecessary loadFunc instantiations (hashutosh) PIG-1370: Marking Pig interface for org.apache.pig package (gates) PIG-1354: UDFs for dynamic invocation of simple Java methods (dvryaboy) PIG-1316: TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files (pradeepkth) PIG-1317: LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() (pradeepkth) PIG-1413: Remove svn:externals reference for test-patch.sh and create a local copy of test-patch.sh (gkesavan) PIG-1302: Include zebra's "pigtest" ant target as a part of pig's ant test target. (gkesavan) PIG-1582: To upgrade commons-logging OPTIMIZATIONS PIG-1353: Map-side joins (ashutoshc) PIG-1309: Map-side Cogroup (ashutoshc) BUG FIXES PIG-2067: FilterLogicExpressionSimplifier removed some branches in some cases (daijy) PIG-2033: Pig returns sucess for the failed Pig script (rding) PIG-1993: PigStorageSchema throw NPE with ColumnPruning (daijy) PIG-1935: New logical plan: Should not push up filter in front of Bincond (daijy) PIG-1912: non-deterministic output when a file is loaded multiple times (daijy) PIG-1892: Bug in new logical plan : No output generated even though there are valid records (daijy) PIG-1808: Error message in 0.8 not much helpful as compared to 0.7 (daijy) PIG-1850: Order by is failing with ClassCastException if schema is undefined for new logical plan in 0.8 (daijy) PIG-1831: Indeterministic behavior in local mode due to static variable PigMapReduce.sJobConf (daijy) PIG-1841: TupleSize implemented incorrectly (laukik via daijy) PIG-1843: NPE in schema generation (daijy) PIG-1820: New logical plan: FilterLogicExpressionSimplifier fail to deal with UDF (daijy) PIG-1854: Pig returns exit code 0 for the failed Pig script (rding) PIG-1812: Problem with DID_NOT_FIND_LOAD_ONLY_MAP_PLAN (daijy) PIG-1813: Pig 0.8 throws ERROR 1075 while trying to refer a map in the result of eval udf.Works with 0.7 (daijy) PIG-1776: changing statement corresponding to alias after explain , then doing dump gives incorrect result (thejas) PIG-1800: Missing Signature for maven staging release (rding) PIG-1815: pig task retains used instances of PhysicalPlan (thejas) PIG-1785: New logical plan: uid conflict in flattened fields (daijy) PIG-1787: Error in logical plan generated (daijy) PIG-1791: System property mapred.output.compress, but pig-cluster-hadoop-site.xml doesn't (daijy) PIG-1771: New logical plan: Merge schema fail if LoadFunc.getSchema return different schema with "Load...AS" (daijy) PIG-1766: New logical plan: ImplicitSplitInserter should before DuplicateForEachColumnRewrite (daijy) PIG-1762: Logical simplification fails on map key referenced values (yanz) PIG-1761: New logical plan: Exception when bag dereference in the middle of expression (daijy) PIG-1757: After split combination, the number of maps may vary slightly (yanz) PIG-1760: Need to report progress in all databags (rding) PIG-1709: Skewed join use fewer reducer for extreme large key (daijy) PIG-1751: New logical plan: PushDownForEachFlatten fail in UDF with unknown output schema (daijy) PIG-1741: Lineage fail when flatten a bag (daijy) PIG-1739: zero status is returned when pig script fails (yanz) PIG-1738: New logical plan: Optimized UserFuncExpression.getFieldSchema (daijy) PIG-1732: New logical plan: logical plan get confused if we generate the same field twice in ForEach (daijy) PIG-1737: New logical plan: Improve error messages when merge schema fail (daijy) PIG-1725: New logical plan: uidOnlySchema bug in LOGenerate (daijy) PIG-1729: New logical plan: Dereference does not add into plan after deepCopy (daijy) PIG-1721: New logical plan: script fail when reuse foreach inner alias (daijy) PIG-1716: New logical plan: LogToPhyTranslationVisitor should translate the structure for regex optimization (daijy) PIG-1740: Fix SVN location in setup doc (chandec via olgan) PIG-1719: New logical plan: FieldSchema generation for BinCond is wrong (daijy) PIG-1720: java.lang.NegativeArraySizeException during Quicksort (thejas) PIG-1727: Hadoop default config override pig.properties (rding) PIG-1731: Stack Overflows where there are composite logical expressions on UDFs using the new logical plan (yanz) PIG-1723: Need to limit the length of Pig counter names (rding) PIG-1714: Option mapred.output.compress doesn't work in Pig 0.8 but worked in 0.7 (xuefuz via rding) PIG-1715: pig-withouthadoop.jar missing automaton.jar (thejas) PIG-1706: New logical plan: PushDownFlattenForEach fail if flattened field has user defined schema (daijy) PIG-1705: New logical plan: self-join fail for some queries (daijy) PIG-1704: Output Compression is not at work if the output path is absolute and there is a trailing / afte the compression suffix (yanz) PIG-1695: MergeForEach does not carry user defined schema if any one of the merged ForEach has user defined schema (daijy) PIG-1684: Inconsistent usage of store func. (thejas) PIG-1694: union-onschema projects null schema at parsing stage for some queries (thejas) PIG-1685: Pig is unable to handle counters for glob paths ? (daijy) PIG-1683: New logical plan: Nested foreach plan fail if one inner alias is refered more than once (daijy) PIG-1542: log level not propogated to MR task loggers (nrai via daijy) PIG-1673: query with consecutive union-onschema statement errors out (thejas) PIG-1653: Scripting UDF fails if the path to script is an absolute path (daijy) PIG-1669: PushUpFilter fail when filter condition contains scalar (daijy) PIG-1672: order of relations in replicated join gets switched in a query where first relation has two mergeable foreach statements (thejas) PIG-1666: union onschema fails when the input relation has cast from bytearray to another type (thejas) PIG-1655: code duplicated for udfs that were moved from piggybank to builtin (nrai via daijy) PIG-1670: pig throws ExecException in stead of FrontEnd exception when the plan validation fails (nrai via daijy) PIG-1668: Order by failed with RuntimeException (rding) PIG-1659: sortinfo is not set for store if there is a filter after ORDER BY (daijy) PIG-1664: leading '_' in directory/file names should be ignored; the "pigtest" build target should include all pig-related zebra tests. (yanz) PIG-1662: Need better error message for MalFormedProbVecException (rding) PIG-1656: TOBAG udfs ignores columns with null value; it does not use input type to determine output schema (thejas) PIG-1658: ORDER BY does not work properly on integer/short keys that are -1 (yanz) PIG-1638: sh output gets mixed up with the grunt prompt (nrai via daijy) PIG-1607: pig should have separate javadoc.jar in the maven repository (nrai via thejas) PIG-1651: PIG class loading mishandled (rding) PIG-1650: pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc (nrai via thejas) PIG-1649: FRJoin fails to compute number of input files for replicated input (thejas) PIG-1637: Combiner not use because optimizor inserts a foreach between group and algebric function (daijy) PIG-1648: Split combination may return too many block locations to map/reduce framework (yanz) PIG-1641: Incorrect counters in local mode (rding) PIG-1647: Logical simplifier throws a NPE (yanz) PIG-1642: Order by doesn't use estimation to determine the parallelism (rding) PIG-1644: New logical plan: Plan.connect with position is misused in some places (daijy) PIG-1643: join fails for a query with input having 'load using pigstorage without schema' + 'foreach' (daijy) PIG-1645: Using both small split combination and temporary file compression on a query of ORDER BY may cause crash (yanz) PIG-1635: Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed (yanz) PIG-1639: New logical plan: PushUpFilter should not push before group/cogroup if filter condition contains UDF (xuefuz via daijy) PIG-1643: join fails for a query with input having 'load using pigstorage without schema' + 'foreach' (thejas) PIG-1628: log this message at debug level : 'Pig Internal storage in use' (thejas) PIG-1636: Scalar fail if the scalar variable is generated by limit (daijy) PIG-1605: PIG-1605: Adding soft link to plan to solve input file dependency (daijy) PIG-1598: Pig gobbles up error messages - Part 2 (nrai via daijy) PIG-1616: 'union onschema' does not use create output with correct schema when udfs are involved (thejas) PIG-1610: 'union onschema' does handle some cases involving 'namespaced' column names in schema (thejas) PIG-1609: 'union onschema' should give a more useful error message when schema of one of the relations has null column name (thejas) PIG-1562: Fix the version for the dependent packages for the maven (nrai via rding) PIG-1604: 'relation as scalar' does not work with complex types (thejas) PIG-1601: Make scalar work for secure hadoop (daijy) PIG-1602: The .classpath of eclipse template still use hbase-0.20.0 (zjffdu) PIG-1596: NPE's thrown when attempting to load hbase columns containing null values (zjffdu) PIG-1597: Development snapshot jar no longer picked up by bin/pig (dvryaboy) PIG-1599: pig gives generic message for few cases (nrai via rding) PIG-1595: casting relation to scalar- problem with handling of data from non PigStorage loaders (thejas) PIG-1591: pig does not create a log file, if tje MR job succeeds but front end fails (nrai via daijy) PIG-1543: IsEmpty returns the wrong value after using LIMIT (daijy) PIG-1550: better error handling in casting relations to scalars (thejas) PIG-1572: change default datatype when relations are used as scalar to bytearray (thejas) PIG-1583: piggybank unit test TestLookupInFiles is broken (daijy) PIG-1563: some of string functions don't work on bytearrays (olgan) PIG-1569: java properties not honored in case of properties such as stop.on.failure (rding) PIG-1570: native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs (thejas) PIG-1343: pig_log file missing even though Main tells it is creating one and an M/R job fails (nrai via rding) PIG-1482: Pig gets confused when more than one loader is involved (xuefuz via thejas) PIG-1579: Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput (daijy) PIG-1557: couple of issue mapping aliases to jobs (rding) PIG-1552: Nested describe failed when the alias is not referred in the first foreach inner plan (aniket486 via daijy) PIG-1486: update ant eclipse-files target to include new jar and remove contrib dirs from build path (thejas) PIG-1524: 'Proactive spill count' is misleading (thejas) PIG-1546: Incorrect assert statements in operator evaluation (ajaykidave via pradeepkth) PIG-1392: Parser fails to recognize valid field (niraj via rding) PIG-1541: FR Join shouldn't match null values (rding) PIG-1525: Incorrect data generated by diff of SUM (rding) PIG-1288: EvalFunc returnType is wrong for generic subclasses (daijy) PIG-1534: Code discovering UDFs in the script has a bug in a order by case (pradeepkth) PIG-1533: Compression codec should be a per-store property (rding) PIG-1527: No need to deserialize UDFContext on the client side (rding) PIG-1516: finalize in bag implementations causes pig to run out of memory in reduce (thejas) PIG-1521: explain plan does not show correct Physical operator in MR plan when POSortedDistinct, POPackageLite are used (thejas) PIG-1513: Pig doesn't handle empty input directory (rding) PIG-1500: guava.jar should be removed from the lib folder (niraj via rding) PIG-1034: Pig does not support ORDER ... BY group alias (zjffdu) PIG-1445: Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented (daijy) PIG-348: -j command line option doesn't work (rding) PIG-1487: Replace "bz" with ".bz" in all the LoadFunc PIG-1489: Pig MapReduceLauncher does not use jars in register statement (rding) PIG-1435: make sure dependent jobs fail when a jon in multiquery fails (niraj via rding) PIG-1492: DefaultTuple and DefaultMemory understimate their memory footprint (thejas) PIG-1409: Fix up javadocs for org.apache.pig.builtin (gates) PIG-1490: Make Pig storers work with remote HDFS in secure mode (rding) PIG-1469: DefaultDataBag assumes ArrayList as default List type (azaroth via dvryaboy) PIG-1467: order by fail when set "fs.file.impl.disable.cache" to true (daijy) PIG-1463: Replace "bz" with ".bz" in setStoreLocation in PigStorage (zjffdu) PIG-1221: Filter equality does not work for tuples (zjffdu) PIG-1456: TestMultiQuery takes a long time to run (rding) PIG-1457: Pig will run complete zebra test even we give -Dtestcase=xxx (daijy) PIG-1450: TestAlgebraicEvalLocal failures due to OOM (daijy) PIG-1433: pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true (pradeepkth) PIG-1347: Clear up output directory for a failed job (daijy) PIG-1419: Remove "user.name" from JobConf (daijy) PIG-1359: bin/pig script does not pick up correct jar libraries (zjffdu) PIG-566: Dump and store outputs do not match for PigStorage (azaroth via daijy) PIG-1414: Problem with parameter substitution (rding) PIG-1407: Logging starts before being configured (azaroth via daijy) PIG-1391: pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted (thejas) PIG-1211: Pig script runs half way after which it reports syntax error (pradeepkth) PIG-1401: "explain -script