Package | Description |
---|---|
org.apache.nutch.indexer |
Maintain Lucene full-text indexes.
|
org.apache.nutch.metadata |
A Multi-valued Metadata container, and set
of constant fields for Nutch Metadata.
|
org.apache.nutch.net.protocols | |
org.apache.nutch.parse | |
org.apache.nutch.protocol | |
org.apache.nutch.protocol.http |
Protocol plugin which supports retrieving documents via the http protocol.
|
org.apache.nutch.protocol.httpclient |
Protocol plugin which supports retrieving documents via the HTTP and
HTTPS protocols, optionally with Basic, Digest and NTLM authentication
schemes for web server as well as proxy server.
|
org.apache.nutch.scoring.webgraph | |
org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
Modifier and Type | Method and Description |
---|---|
Metadata |
NutchDocument.getDocumentMeta() |
Modifier and Type | Class and Description |
---|---|
class |
SpellCheckedMetadata
A decorator to Metadata that adds spellchecking capabilities to property
names.
|
Modifier and Type | Method and Description |
---|---|
Metadata |
MetaWrapper.getMetadata()
Get all metadata.
|
Constructor and Description |
---|
MetaWrapper(Metadata metadata,
org.apache.hadoop.io.Writable instance,
org.apache.hadoop.conf.Configuration conf) |
Modifier and Type | Method and Description |
---|---|
Metadata |
Response.getHeaders()
Returns all the headers.
|
Modifier and Type | Method and Description |
---|---|
Metadata |
ParseData.getContentMeta()
The original Metadata retrieved from content
|
Metadata |
HTMLMetaTags.getGeneralTags()
Returns all collected values of the general meta tags.
|
Metadata |
ParseData.getParseMeta()
Other content properties.
|
Modifier and Type | Method and Description |
---|---|
void |
ParseData.setParseMeta(Metadata parseMeta) |
Constructor and Description |
---|
ParseData(ParseStatus status,
String title,
Outlink[] outlinks,
Metadata contentMeta) |
ParseData(ParseStatus status,
String title,
Outlink[] outlinks,
Metadata contentMeta,
Metadata parseMeta) |
Modifier and Type | Method and Description |
---|---|
Metadata |
Content.getMetadata()
Other protocol-specific data.
|
Modifier and Type | Method and Description |
---|---|
void |
Content.setMetadata(Metadata metadata)
Other protocol-specific data.
|
Constructor and Description |
---|
Content(String url,
String base,
byte[] content,
String contentType,
Metadata metadata,
org.apache.hadoop.conf.Configuration conf) |
Modifier and Type | Method and Description |
---|---|
Metadata |
HttpResponse.getHeaders() |
Modifier and Type | Method and Description |
---|---|
Metadata |
HttpResponse.getHeaders() |
Modifier and Type | Method and Description |
---|---|
HttpAuthentication |
HttpAuthenticationFactory.findAuthentication(Metadata header) |
Modifier and Type | Method and Description |
---|---|
Metadata |
Node.getMetadata() |
Modifier and Type | Method and Description |
---|---|
void |
Node.setMetadata(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
static void |
CCParseFilter.Walker.walk(Node doc,
URL base,
Metadata metadata,
org.apache.hadoop.conf.Configuration conf)
Scan the document adding attributes to metadata.
|
Copyright © 2014 The Apache Software Foundation