Pig 0.7.0 API

Pig is a platform for a data flow programming on large data sets in a parallel environment.

See:
Description

pig
org.apache.pig	Public interfaces and classes for Pig.
org.apache.pig.backend
org.apache.pig.backend.datastorage
org.apache.pig.backend.executionengine
org.apache.pig.backend.executionengine.util
org.apache.pig.backend.hadoop
org.apache.pig.backend.hadoop.datastorage
org.apache.pig.backend.hadoop.executionengine
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans
org.apache.pig.backend.hadoop.executionengine.physicalLayer	Implementation of physical operators that use hadoop as the execution engine and data storage.
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators
org.apache.pig.backend.hadoop.executionengine.physicalLayer.util
org.apache.pig.backend.hadoop.executionengine.util
org.apache.pig.backend.hadoop.hbase
org.apache.pig.backend.hadoop.streaming
org.apache.pig.builtin
org.apache.pig.data	Data types for Pig.
org.apache.pig.experimental.logical
org.apache.pig.experimental.logical.expression
org.apache.pig.experimental.logical.optimizer
org.apache.pig.experimental.logical.relational
org.apache.pig.experimental.logical.rules
org.apache.pig.experimental.plan
org.apache.pig.experimental.plan.optimizer
org.apache.pig.impl
org.apache.pig.impl.builtin
org.apache.pig.impl.io
org.apache.pig.impl.logicalLayer	The logical operators that represent a pig script and tools for manipulating those operators.
org.apache.pig.impl.logicalLayer.optimizer
org.apache.pig.impl.logicalLayer.schema
org.apache.pig.impl.logicalLayer.validators
org.apache.pig.impl.plan
org.apache.pig.impl.plan.optimizer
org.apache.pig.impl.streaming
org.apache.pig.impl.util
org.apache.pig.pen
org.apache.pig.pen.physicalOperators
org.apache.pig.pen.util
org.apache.pig.tools.cmdline
org.apache.pig.tools.grunt
org.apache.pig.tools.parameters
org.apache.pig.tools.pigstats
org.apache.pig.tools.streams
org.apache.pig.tools.timer

contrib: Piggybank
org.apache.pig.piggybank.evaluation
org.apache.pig.piggybank.evaluation.datetime
org.apache.pig.piggybank.evaluation.datetime.convert
org.apache.pig.piggybank.evaluation.datetime.diff
org.apache.pig.piggybank.evaluation.datetime.truncate
org.apache.pig.piggybank.evaluation.decode
org.apache.pig.piggybank.evaluation.math
org.apache.pig.piggybank.evaluation.stats
org.apache.pig.piggybank.evaluation.string
org.apache.pig.piggybank.evaluation.util
org.apache.pig.piggybank.evaluation.util.apachelogparser
org.apache.pig.piggybank.storage
org.apache.pig.piggybank.storage.apachelog
org.apache.pig.piggybank.storage.hiverc

contrib: Zebra
org.apache.hadoop.zebra	Hadoop Table - tabular data storage for Hadoop MapReduce and PIG.
org.apache.hadoop.zebra.io	Physical I/O management of Hadoop Zebra Tables.
org.apache.hadoop.zebra.mapred	Providing `InputFormat` and `OutputFormat` adaptor classes for Hadoop Zebra Table.
org.apache.hadoop.zebra.mapreduce	Providing `InputFormat` and `OutputFormat` adaptor classes for Hadoop Zebra Table.
org.apache.hadoop.zebra.pig	Implementation of PIG Storer/Loader Interfaces
org.apache.hadoop.zebra.pig.comparator	Utilities to allow PIG Storer to generate keys for sorted Zebra tables
org.apache.hadoop.zebra.schema	Zebra Schema
org.apache.hadoop.zebra.tfile
org.apache.hadoop.zebra.types	Data types being shared between the io and mapred packages.

Pig is a platform for a data flow programming on large data sets in a parallel environment. It consists of a language to specify these programs, Pig Latin, a compiler for this language, and an execution engine to execute the programs.

Pig currently runs on the hadoop platform, reading data from and writing data to hdfs, and doing processing via one or more map-reduce jobs.

Design

This section gives a very high overview of the design of the Pig system. Throughout the documents you can see design for that package or class by looking for the Design heading in the documentation.

Overview

Pig's design is guided by our pig philosophy and by our experience with similar data processing systems.

Pig shares many similarities with a traditional RDBMS design. It has a parser, type checker, optimizer, and operators that perform the data processing. However, there are some significant differences. Pig does not have a data catalog, there are no transactions, pig does not directly manage data storage, nor does it implement the execution framework.

High Level Architecture

Pig is split between the front and back ends of the engine. The front end handles parsing, checking, and doing initial optimization on a Pig Latin script. The result is a LogicalPlan that defines how the script will be executed.

Once a LogicalPlan has been generated, the backend of Pig handles executing the script. Pig supports multiple different backend implementations, in order to allow Pig to run on different systems. Currently pig comes with two backends, Map-Reduce and local. For a given run, pig selects the backend to use via configuration.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES