Apache > Hadoop > Hive
 

Welcome to Hive!

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language.

Getting Started

Check out the Getting Started Guide on the Hive wiki.

Getting Involved

Hive is an open source volunteer project under the Apache Software Foundation. It is a subproject of Hadoop. We encourage you to learn about the project and contribute your expertise. Here are some starter links:

  1. Give us feedback: What can we do better?
  2. Join the mailing list: Meet the community.
  3. Become an Hive Fan on Facebook.