Pig 0.7.0 API

Pig is a platform for a data flow programming on large data sets in a parallel environment.

See:
          Description

pig
org.apache.pig Public interfaces and classes for Pig.
org.apache.pig.backend  
org.apache.pig.backend.datastorage  
org.apache.pig.backend.executionengine  
org.apache.pig.backend.executionengine.util  
org.apache.pig.backend.hadoop  
org.apache.pig.backend.hadoop.datastorage  
org.apache.pig.backend.hadoop.executionengine  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans  
org.apache.pig.backend.hadoop.executionengine.physicalLayer Implementation of physical operators that use hadoop as the execution engine and data storage.
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.util  
org.apache.pig.backend.hadoop.executionengine.util  
org.apache.pig.backend.hadoop.hbase  
org.apache.pig.backend.hadoop.streaming  
org.apache.pig.builtin  
org.apache.pig.data Data types for Pig.
org.apache.pig.experimental.logical  
org.apache.pig.experimental.logical.expression  
org.apache.pig.experimental.logical.optimizer  
org.apache.pig.experimental.logical.relational  
org.apache.pig.experimental.logical.rules  
org.apache.pig.experimental.plan  
org.apache.pig.experimental.plan.optimizer  
org.apache.pig.impl  
org.apache.pig.impl.builtin  
org.apache.pig.impl.io  
org.apache.pig.impl.logicalLayer The logical operators that represent a pig script and tools for manipulating those operators.
org.apache.pig.impl.logicalLayer.optimizer  
org.apache.pig.impl.logicalLayer.schema  
org.apache.pig.impl.logicalLayer.validators  
org.apache.pig.impl.plan  
org.apache.pig.impl.plan.optimizer  
org.apache.pig.impl.streaming  
org.apache.pig.impl.util  
org.apache.pig.pen  
org.apache.pig.pen.physicalOperators  
org.apache.pig.pen.util  
org.apache.pig.tools.cmdline  
org.apache.pig.tools.grunt  
org.apache.pig.tools.parameters  
org.apache.pig.tools.pigstats  
org.apache.pig.tools.streams  
org.apache.pig.tools.timer  

 

contrib: Piggybank
org.apache.pig.piggybank.evaluation  
org.apache.pig.piggybank.evaluation.datetime  
org.apache.pig.piggybank.evaluation.datetime.convert  
org.apache.pig.piggybank.evaluation.datetime.diff  
org.apache.pig.piggybank.evaluation.datetime.truncate  
org.apache.pig.piggybank.evaluation.decode  
org.apache.pig.piggybank.evaluation.math  
org.apache.pig.piggybank.evaluation.stats  
org.apache.pig.piggybank.evaluation.string  
org.apache.pig.piggybank.evaluation.util  
org.apache.pig.piggybank.evaluation.util.apachelogparser  
org.apache.pig.piggybank.storage  
org.apache.pig.piggybank.storage.apachelog  
org.apache.pig.piggybank.storage.hiverc  

 

contrib: Zebra
org.apache.hadoop.zebra Hadoop Table - tabular data storage for Hadoop MapReduce and PIG.
org.apache.hadoop.zebra.io Physical I/O management of Hadoop Zebra Tables.
org.apache.hadoop.zebra.mapred Providing InputFormat and OutputFormat adaptor classes for Hadoop Zebra Table.
org.apache.hadoop.zebra.mapreduce Providing InputFormat and OutputFormat adaptor classes for Hadoop Zebra Table.
org.apache.hadoop.zebra.pig Implementation of PIG Storer/Loader Interfaces
org.apache.hadoop.zebra.pig.comparator Utilities to allow PIG Storer to generate keys for sorted Zebra tables
org.apache.hadoop.zebra.schema Zebra Schema
org.apache.hadoop.zebra.tfile  
org.apache.hadoop.zebra.types Data types being shared between the io and mapred packages.

 

Pig is a platform for a data flow programming on large data sets in a parallel environment. It consists of a language to specify these programs, Pig Latin, a compiler for this language, and an execution engine to execute the programs.

Pig currently runs on the hadoop platform, reading data from and writing data to hdfs, and doing processing via one or more map-reduce jobs.

Design

This section gives a very high overview of the design of the Pig system. Throughout the documents you can see design for that package or class by looking for the Design heading in the documentation.

Overview

Pig's design is guided by our pig philosophy and by our experience with similar data processing systems.

Pig shares many similarities with a traditional RDBMS design. It has a parser, type checker, optimizer, and operators that perform the data processing. However, there are some significant differences. Pig does not have a data catalog, there are no transactions, pig does not directly manage data storage, nor does it implement the execution framework.

High Level Architecture

Pig is split between the front and back ends of the engine. The front end handles parsing, checking, and doing initial optimization on a Pig Latin script. The result is a LogicalPlan that defines how the script will be executed.

Once a LogicalPlan has been generated, the backend of Pig handles executing the script. Pig supports multiple different backend implementations, in order to allow Pig to run on different systems. Currently pig comes with two backends, Map-Reduce and local. For a given run, pig selects the backend to use via configuration.



Copyright © ${year} The Apache Software Foundation