- abort(String, String) - Method in class org.apache.nutch.service.impl.JobManagerImpl
-
- abort(String, String) - Method in interface org.apache.nutch.service.JobManager
-
- abort(String, String) - Method in class org.apache.nutch.service.resources.JobResource
-
- AbstractBasePage<T> - Class in org.apache.nutch.webui.pages
-
- AbstractBasePage() - Constructor for class org.apache.nutch.webui.pages.AbstractBasePage
-
- AbstractCommonCrawlFormat - Class in org.apache.nutch.tools
-
- AbstractCommonCrawlFormat(String, Content, Metadata, Configuration, CommonCrawlConfig) - Constructor for class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- AbstractFetchSchedule - Class in org.apache.nutch.crawl
-
This class provides common methods for implementations of
FetchSchedule
.
- AbstractFetchSchedule() - Constructor for class org.apache.nutch.crawl.AbstractFetchSchedule
-
- AbstractFetchSchedule(Configuration) - Constructor for class org.apache.nutch.crawl.AbstractFetchSchedule
-
- AbstractResource - Class in org.apache.nutch.service.resources
-
- AbstractResource() - Constructor for class org.apache.nutch.service.resources.AbstractResource
-
- AbstractScoringFilter - Class in org.apache.nutch.scoring
-
- AbstractScoringFilter() - Constructor for class org.apache.nutch.scoring.AbstractScoringFilter
-
- accept - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
The "Accept" request header value.
- accept() - Method in class org.apache.nutch.urlfilter.api.RegexRule
-
Return if this rule is used for filtering-in or out.
- acceptLanguage - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
The "Accept-Language" request header value.
- ACCESS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Access denied - authorization required, but missing/incorrect.
- action - Variable in class org.apache.nutch.indexer.NutchIndexAction
-
- AdaptiveFetchSchedule - Class in org.apache.nutch.crawl
-
This class implements an adaptive re-fetch algorithm.
- AdaptiveFetchSchedule() - Constructor for class org.apache.nutch.crawl.AdaptiveFetchSchedule
-
- add(Inlink) - Method in class org.apache.nutch.crawl.Inlinks
-
- add(Inlinks) - Method in class org.apache.nutch.crawl.Inlinks
-
- add(String, Object) - Method in class org.apache.nutch.indexer.NutchDocument
-
- add(Object) - Method in class org.apache.nutch.indexer.NutchField
-
- ADD - Static variable in class org.apache.nutch.indexer.NutchIndexAction
-
- add(String, String) - Method in class org.apache.nutch.metadata.Metadata
-
Add a metadata name/value mapping.
- add(String, String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- addAll(Metadata) - Method in class org.apache.nutch.metadata.Metadata
-
Add all name/value mappings (merge two metadata mappings).
- addAttribute(String, String) - Method in class org.apache.nutch.plugin.Extension
-
Adds a attribute and is only used until model creation at plugin system
start up.
- addClue(String, String, int) - Method in class org.apache.nutch.util.EncodingDetector
-
- addClue(String, String) - Method in class org.apache.nutch.util.EncodingDetector
-
- addDependency(String) - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Adds a dependency
- addEventData(String, Object) - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Add new data to the eventData object.
- addExportedLibRelative(String) - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Adds a exported library with a relative path to the plugin directory.
- addExtension(Extension) - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Install a coresponding extension to this extension point.
- addExtension(Extension) - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Adds a extension.
- addExtensionPoint(ExtensionPoint) - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Adds a extension point.
- addFetchItem(FetchItem) - Method in class org.apache.nutch.fetcher.FetchItemQueue
-
- addFetchItem(Text, CrawlDatum) - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- addFetchItem(FetchItem) - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- addInProgressFetchItem(FetchItem) - Method in class org.apache.nutch.fetcher.FetchItemQueue
-
- addInstancesMenuMenu() - Method in class org.apache.nutch.webui.pages.AbstractBasePage
-
- addMeta(String, String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Add metadata.
- addNotExportedLibRelative(String) - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Adds a exported library with a relative path to the plugin directory.
- addOutlinksToEventData(Collection<Outlink>) - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Given a collection of lists this method will add it
the oultink metadata
- addPatternBackward(String) - Method in class org.apache.nutch.util.TrieStringMatcher
-
Adds any necessary nodes to the trie so that the given String
can be decoded in reverse and the first character is represented
by a terminal node.
- addPatternForward(String) - Method in class org.apache.nutch.util.TrieStringMatcher
-
Adds any necessary nodes to the trie so that the given String
can be decoded and the last character is represented by a terminal node.
- addRobotsContent(List<Content>, URL, Response) - Method in class org.apache.nutch.protocol.http.api.HttpRobotRulesParser
-
Append
Content
of robots.txt to robotsTxtContent
- addTiming(String, String, long) - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
-
- addUrlFeatures(NutchDocument, String) - Method in class org.creativecommons.nutch.CCIndexingFilter
-
Add the features represented by a license URL.
- addUserMenu() - Method in class org.apache.nutch.webui.pages.AbstractBasePage
-
- AdminResource - Class in org.apache.nutch.service.resources
-
- AdminResource() - Constructor for class org.apache.nutch.service.resources.AdminResource
-
- afterExecute(Runnable, Throwable) - Method in class org.apache.nutch.service.impl.NutchServerPoolExecutor
-
- agentNames - Variable in class org.apache.nutch.protocol.RobotRulesParser
-
- allowForbidden - Variable in class org.apache.nutch.protocol.http.api.HttpRobotRulesParser
-
- analyze(Path) - Method in class org.apache.nutch.scoring.webgraph.LinkRank
-
Runs the complete link analysis job.
- AnchorIndexingFilter - Class in org.apache.nutch.indexer.anchor
-
Indexing filter that offers an option to either index all inbound anchor text
for a document or deduplicate anchors.
- AnchorIndexingFilter() - Constructor for class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
- append(Node) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Append a node to the current container.
- ArcInputFormat - Class in org.apache.nutch.tools.arc
-
A input format the reads arc files.
- ArcInputFormat() - Constructor for class org.apache.nutch.tools.arc.ArcInputFormat
-
- ArcRecordReader - Class in org.apache.nutch.tools.arc
-
The ArchRecordReader
class provides a record reader which reads
records from arc files.
- ArcRecordReader(Configuration, FileSplit) - Constructor for class org.apache.nutch.tools.arc.ArcRecordReader
-
Constructor that sets the configuration and file split.
- ArcSegmentCreator - Class in org.apache.nutch.tools.arc
-
The ArcSegmentCreator
is a replacement for fetcher that will
take arc files as input and produce a nutch segment as output.
- ArcSegmentCreator() - Constructor for class org.apache.nutch.tools.arc.ArcSegmentCreator
-
- ArcSegmentCreator(Configuration) - Constructor for class org.apache.nutch.tools.arc.ArcSegmentCreator
-
Constructor that sets the job configuration.
- ARG_CRAWLDB - Static variable in interface org.apache.nutch.metadata.Nutch
-
Argument key to specify the location of crawldb for the REST endpoints
- ARG_LINKDB - Static variable in interface org.apache.nutch.metadata.Nutch
-
Argument key to specify the location of linkdb for the REST endpoints
- ARG_SEEDDIR - Static variable in interface org.apache.nutch.metadata.Nutch
-
Argument key to specify location of the seed url dir for the REST endpoints
- ARG_SEEDNAME - Static variable in interface org.apache.nutch.metadata.Nutch
-
Argument key to specify name of a seed list for the REST endpoints
- ARG_SEGMENT - Static variable in interface org.apache.nutch.metadata.Nutch
-
Argument key to specify the location of individual segment for the REST endpoints
- ARG_SEGMENTDIR - Static variable in interface org.apache.nutch.metadata.Nutch
-
Argument key to specify the location of a directory of segments for the REST endpoints.
- args - Variable in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
- attrName - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
-
- autoDetectClues(Content, boolean) - Method in class org.apache.nutch.util.EncodingDetector
-
- AutomatonURLFilter - Class in org.apache.nutch.urlfilter.automaton
-
RegexURLFilterBase implementation based on the
dk.brics.automaton Finite-State
Automata for Java
TM.
- AutomatonURLFilter() - Constructor for class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
- AutomatonURLFilter(String) - Constructor for class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
- autoResolveContentType(String, String, byte[]) - Method in class org.apache.nutch.util.MimeUtil
-
A facade interface to trying all the possible mime type resolution
strategies available within Tika.
- CACHE - Static variable in class org.apache.nutch.protocol.RobotRulesParser
-
- CACHING_FORBIDDEN_ALL - Static variable in interface org.apache.nutch.metadata.Nutch
-
Don't show either original forbidden content or summaries.
- CACHING_FORBIDDEN_CONTENT - Static variable in interface org.apache.nutch.metadata.Nutch
-
Don't show original forbidden content, but show summaries.
- CACHING_FORBIDDEN_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
Sites may request that search engines don't provide access to cached
documents.
- CACHING_FORBIDDEN_NONE - Static variable in interface org.apache.nutch.metadata.Nutch
-
Show both original forbidden content and summaries (default).
- calculate(Content, Parse) - Method in class org.apache.nutch.crawl.MD5Signature
-
- calculate(Content, Parse) - Method in class org.apache.nutch.crawl.Signature
-
- calculate(Content, Parse) - Method in class org.apache.nutch.crawl.TextMD5Signature
-
- calculate(Content, Parse) - Method in class org.apache.nutch.crawl.TextProfileSignature
-
- calculateLastFetchTime(CrawlDatum) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method return the last fetch time of the CrawlDatum
- calculateLastFetchTime(CrawlDatum) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
Calculates last fetch time of the given CrawlDatum.
- call() - Method in class org.apache.nutch.webui.client.impl.RemoteCommandExecutor.JobStateChecker
-
- canStop(boolean) - Method in class org.apache.nutch.service.NutchServer
-
- CCIndexingFilter - Class in org.creativecommons.nutch
-
Adds basic searchable fields to a document.
- CCIndexingFilter() - Constructor for class org.creativecommons.nutch.CCIndexingFilter
-
- CCParseFilter - Class in org.creativecommons.nutch
-
Adds metadata identifying the Creative Commons license used, if any.
- CCParseFilter() - Constructor for class org.creativecommons.nutch.CCParseFilter
-
- CCParseFilter.Walker - Class in org.creativecommons.nutch
-
Walks DOM tree, looking for RDF in comments and licenses in anchors.
- cdata(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of cdata.
- CHAR_ENCODING_FOR_CONVERSION - Static variable in interface org.apache.nutch.metadata.Nutch
-
- characters(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of character data.
- charactersRaw(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
If available, when the disable-output-escaping attribute is used, output
raw text without escaping.
- CHARSET_UTF8 - Static variable in class org.apache.nutch.parse.feed.FeedParser
-
- checkAndReplace(String, String) - Method in class org.apache.nutch.indexer.replace.FieldReplacer
-
Return a replacement value for a field.
- checkClientTrusted(X509Certificate[], String) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- checkExceptionThreshold(String) - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
Increment the exception counter of a queue in case of an exception e.g.
- checkFailed - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- checkKnown - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- checkNew - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- checkOutputSpecs(FileSystem, JobConf) - Method in class org.apache.nutch.fetcher.FetcherOutputFormat
-
- checkOutputSpecs(FileSystem, JobConf) - Method in class org.apache.nutch.parse.ParseOutputFormat
-
- checkSegmentDir(Path, FileSystem) - Static method in class org.apache.nutch.segment.SegmentChecker
-
Check the segment to see if it is valid based on the sub directories.
- checkServerTrusted(X509Certificate[], String) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- checkTimelimit() - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- childLen - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
-
- children - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
-
- childrenList - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
-
- chooseRepr(String, String, boolean) - Static method in class org.apache.nutch.util.URLUtil
-
Given two urls, a src and a destination of a redirect, it returns the
representative url.
- CircularDependencyException - Exception in org.apache.nutch.plugin
-
CircularDependencyException
will be thrown if a circular
dependency is detected.
- CircularDependencyException(Throwable) - Constructor for exception org.apache.nutch.plugin.CircularDependencyException
-
- CircularDependencyException(String) - Constructor for exception org.apache.nutch.plugin.CircularDependencyException
-
- Classify - Class in org.apache.nutch.parsefilter.naivebayes
-
- Classify() - Constructor for class org.apache.nutch.parsefilter.naivebayes.Classify
-
- classify(String) - Static method in class org.apache.nutch.parsefilter.naivebayes.Classify
-
- classify(String) - Method in class org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
-
- cleanField(String) - Static method in class org.apache.nutch.util.StringUtil
-
Simple character substitution which cleans all � chars from a given String.
- CleaningJob - Class in org.apache.nutch.indexer
-
The class scans CrawlDB looking for entries with status DB_GONE (404) or
DB_DUPLICATE and sends delete requests to indexers for those documents.
- CleaningJob() - Constructor for class org.apache.nutch.indexer.CleaningJob
-
- CleaningJob.DBFilter - Class in org.apache.nutch.indexer
-
- CleaningJob.DeleterReducer - Class in org.apache.nutch.indexer
-
- cleanMimeType(String) - Static method in class org.apache.nutch.util.MimeUtil
-
Cleans a MimeType
name by removing out the actual MimeType
,
from a string of the form:
- cleanUpDriver(WebDriver) - Static method in class org.apache.nutch.protocol.htmlunit.HtmlUnitWebDriver
-
- cleanUpDriver(WebDriver) - Static method in class org.apache.nutch.protocol.selenium.HttpWebClient
-
- clear() - Method in class org.apache.nutch.crawl.Inlinks
-
- clear() - Method in class org.apache.nutch.metadata.Metadata
-
Remove all mappings from metadata.
- clearClues() - Method in class org.apache.nutch.util.EncodingDetector
-
Clears all clues.
- Client - Class in org.apache.nutch.protocol.ftp
-
Client.java encapsulates functionalities necessary for nutch to get dir list
and retrieve file from an FTP server.
- Client() - Constructor for class org.apache.nutch.protocol.ftp.Client
-
Public default constructor
- clone() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- clone() - Method in class org.apache.nutch.hostdb.HostDatum
-
- clone() - Method in class org.apache.nutch.indexer.NutchField
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbFilter
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbMerger.Merger
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbReader
-
- close(Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbDumpMapper
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatMapper
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNMapper
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
-
- close() - Method in class org.apache.nutch.crawl.CrawlDbReducer
-
- close() - Method in class org.apache.nutch.crawl.DeduplicationJob.DBFilter
-
- close() - Method in class org.apache.nutch.crawl.DeduplicationJob.DedupReducer
-
- close() - Method in class org.apache.nutch.crawl.DeduplicationJob.StatusUpdateReducer
-
- close() - Method in class org.apache.nutch.crawl.Generator.Selector
-
- close() - Method in class org.apache.nutch.crawl.LinkDb
-
- close() - Method in class org.apache.nutch.crawl.LinkDbFilter
-
- close() - Method in class org.apache.nutch.crawl.LinkDbMerger
-
- close() - Method in class org.apache.nutch.crawl.LinkDbReader
-
- close() - Method in class org.apache.nutch.crawl.LinkDbReader.LinkDBDumpMapper
-
- close() - Method in class org.apache.nutch.crawl.URLPartitioner
-
- close() - Method in class org.apache.nutch.fetcher.Fetcher
-
- close() - Method in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
- close() - Method in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
Shut down all running threads and wait for completion.
- close() - Method in class org.apache.nutch.indexer.CleaningJob.DBFilter
-
- close() - Method in class org.apache.nutch.indexer.CleaningJob.DeleterReducer
-
- close() - Method in class org.apache.nutch.indexer.IndexerMapReduce
-
- close() - Method in interface org.apache.nutch.indexer.IndexWriter
-
- close() - Method in class org.apache.nutch.indexer.IndexWriters
-
- close() - Method in class org.apache.nutch.indexwriter.dummy.DummyIndexWriter
-
- close() - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- close() - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- close() - Method in class org.apache.nutch.parse.ParseSegment
-
- close() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
-
- close() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Merger
-
- close() - Method in class org.apache.nutch.scoring.webgraph.LinkRank
-
- close() - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Dumper
-
- close() - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
-
- close() - Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
-
- close() - Method in class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
-
- close() - Method in class org.apache.nutch.segment.SegmentMerger
-
- close() - Method in class org.apache.nutch.segment.SegmentReader
-
- close() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- close() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Closes the record reader resources.
- close() - Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
-
- close() - Method in interface org.apache.nutch.tools.CommonCrawlFormat
-
Optional method that could be implemented if the actual format needs some
close procedure.
- close() - Method in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- close() - Method in class org.apache.nutch.tools.warc.WARCExporter.WARCReducer
-
- closeArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- closeArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.CommonCrawlFormatJackson
-
- closeArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.CommonCrawlFormatJettinson
-
- closeArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.CommonCrawlFormatSimple
-
- closeArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- closeObject(String) - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- closeObject(String) - Method in class org.apache.nutch.tools.CommonCrawlFormatJackson
-
- closeObject(String) - Method in class org.apache.nutch.tools.CommonCrawlFormatJettinson
-
- closeObject(String) - Method in class org.apache.nutch.tools.CommonCrawlFormatSimple
-
- closeObject(String) - Method in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- closeReaders(SequenceFile.Reader[]) - Static method in class org.apache.nutch.util.FSUtils
-
Closes a group of SequenceFile readers.
- closeReaders(MapFile.Reader[]) - Static method in class org.apache.nutch.util.FSUtils
-
Closes a group of MapFile readers.
- CLUSTER - Static variable in interface org.apache.nutch.indexwriter.elastic.ElasticConstants
-
- COLLECTION - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- CollectionManager - Class in org.apache.nutch.collection
-
- CollectionManager(Configuration) - Constructor for class org.apache.nutch.collection.CollectionManager
-
- CollectionManager() - Constructor for class org.apache.nutch.collection.CollectionManager
-
Used for testing
- ColorEnumLabel<E extends Enum<E>> - Class in org.apache.nutch.webui.pages.components
-
Label which renders connection status as bootstrap label
- ColorEnumLabelBuilder<E extends Enum<E>> - Class in org.apache.nutch.webui.pages.components
-
- ColorEnumLabelBuilder(String) - Constructor for class org.apache.nutch.webui.pages.components.ColorEnumLabelBuilder
-
- commandExecuted(Crawl, RemoteCommand, int) - Method in interface org.apache.nutch.webui.client.impl.CrawlingCycleListener
-
- commandExecuted(Crawl, RemoteCommand, int) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- CommandRunner - Class in org.apache.nutch.util
-
- CommandRunner() - Constructor for class org.apache.nutch.util.CommandRunner
-
- comment(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Report an XML comment anywhere in the document.
- commit() - Method in interface org.apache.nutch.indexer.IndexWriter
-
- commit() - Method in class org.apache.nutch.indexer.IndexWriters
-
- commit() - Method in class org.apache.nutch.indexwriter.dummy.DummyIndexWriter
-
- commit() - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- commit() - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- COMMIT_INDEX - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
Deprecated.
- COMMIT_SIZE - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- CommonCrawlConfig - Class in org.apache.nutch.tools
-
- CommonCrawlConfig() - Constructor for class org.apache.nutch.tools.CommonCrawlConfig
-
Default constructor
- CommonCrawlConfig(InputStream) - Constructor for class org.apache.nutch.tools.CommonCrawlConfig
-
- CommonCrawlDataDumper - Class in org.apache.nutch.tools
-
The Common Crawl Data Dumper tool enables one to reverse generate the raw
content from Nutch segment data directories into a common crawling data
format, consumed by many applications.
- CommonCrawlDataDumper(CommonCrawlConfig) - Constructor for class org.apache.nutch.tools.CommonCrawlDataDumper
-
Constructor
- CommonCrawlDataDumper() - Constructor for class org.apache.nutch.tools.CommonCrawlDataDumper
-
- CommonCrawlFormat - Interface in org.apache.nutch.tools
-
Interface for all CommonCrawl formatter.
- CommonCrawlFormatFactory - Class in org.apache.nutch.tools
-
- CommonCrawlFormatFactory() - Constructor for class org.apache.nutch.tools.CommonCrawlFormatFactory
-
- CommonCrawlFormatJackson - Class in org.apache.nutch.tools
-
This class provides methods to map crawled data on JSON using Jackson Streaming APIs.
- CommonCrawlFormatJackson(Configuration, CommonCrawlConfig) - Constructor for class org.apache.nutch.tools.CommonCrawlFormatJackson
-
- CommonCrawlFormatJackson(String, Content, Metadata, Configuration, CommonCrawlConfig) - Constructor for class org.apache.nutch.tools.CommonCrawlFormatJackson
-
- CommonCrawlFormatJettinson - Class in org.apache.nutch.tools
-
This class provides methods to map crawled data on JSON using Jettinson APIs.
- CommonCrawlFormatJettinson(String, Content, Metadata, Configuration, CommonCrawlConfig) - Constructor for class org.apache.nutch.tools.CommonCrawlFormatJettinson
-
- CommonCrawlFormatSimple - Class in org.apache.nutch.tools
-
This class provides methods to map crawled data on JSON using a
StringBuilder
object.
- CommonCrawlFormatSimple(String, Content, Metadata, Configuration, CommonCrawlConfig) - Constructor for class org.apache.nutch.tools.CommonCrawlFormatSimple
-
- CommonCrawlFormatWARC - Class in org.apache.nutch.tools
-
- CommonCrawlFormatWARC(Configuration, CommonCrawlConfig) - Constructor for class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- CommonCrawlFormatWARC(String, Content, Metadata, Configuration, CommonCrawlConfig, ParseData) - Constructor for class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- Comparator() - Constructor for class org.apache.nutch.crawl.CrawlDatum.Comparator
-
- compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.CrawlDatum.Comparator
-
- compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.Generator.DecreasingFloatComparator
-
Compares two FloatWritables decreasing.
- compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.crawl.Generator.HashComparator
-
- compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.Generator.HashComparator
-
- compare(Object, Object) - Method in class org.apache.nutch.crawl.SignatureComparator
-
- compareTo(CrawlDatum) - Method in class org.apache.nutch.crawl.CrawlDatum
-
Sort by decreasing score.
- compareTo(TrieStringMatcher.TrieNode) - Method in class org.apache.nutch.util.TrieStringMatcher.TrieNode
-
- computeCosineSimilarity(DocVector) - Static method in class org.apache.nutch.scoring.similarity.cosine.Model
-
- conf - Variable in class org.apache.nutch.crawl.Signature
-
- conf - Variable in class org.apache.nutch.plugin.Plugin
-
- conf - Variable in class org.apache.nutch.protocol.RobotRulesParser
-
- conf - Static variable in interface org.apache.nutch.service.NutchReader
-
- conf - Variable in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- conf - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- conf - Static variable in class org.apache.nutch.util.ProtocolStatusStatistics
-
- configManager - Variable in class org.apache.nutch.service.resources.AbstractResource
-
- ConfigResource - Class in org.apache.nutch.service.resources
-
- ConfigResource() - Constructor for class org.apache.nutch.service.resources.ConfigResource
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbFilter
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbMerger.Merger
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbDumpMapper
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatMapper
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNMapper
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReducer
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.DeduplicationJob.DBFilter
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.DeduplicationJob.DedupReducer
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.DeduplicationJob.StatusUpdateReducer
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.Generator.CrawlDbUpdater
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.Generator.Selector
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.LinkDb
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.LinkDbFilter
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.LinkDbMerger
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.LinkDbReader.LinkDBDumpMapper
-
- configure(JobConf) - Method in class org.apache.nutch.crawl.URLPartitioner
-
- configure(JobConf) - Method in class org.apache.nutch.fetcher.Fetcher
-
- configure(JobConf) - Method in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
- configure(JobConf) - Method in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
Configures the thread pool and prestarts all resolver threads.
- configure(JobConf) - Method in class org.apache.nutch.indexer.CleaningJob.DBFilter
-
- configure(JobConf) - Method in class org.apache.nutch.indexer.CleaningJob.DeleterReducer
-
- configure(JobConf) - Method in class org.apache.nutch.indexer.IndexerMapReduce
-
- configure(JobConf) - Method in class org.apache.nutch.parse.ParseSegment
-
- configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
-
- configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Merger
-
- configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Dumper
-
- configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
-
Configures the job, sets the flag for type of content and the topN number
if any.
- configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
-
- configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
-
Configures the OutlinkDb job.
- configure(JobConf) - Method in class org.apache.nutch.segment.SegmentMerger
-
- configure(JobConf) - Method in class org.apache.nutch.segment.SegmentReader
-
- configure(JobConf) - Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
-
Configures the job.
- configure(JobConf) - Method in class org.apache.nutch.tools.FreeGenerator.FG
-
- configure(JobConf) - Method in class org.apache.nutch.tools.warc.WARCExporter.WARCReducer
-
- ConfManager - Interface in org.apache.nutch.service
-
- ConfManagerImpl - Class in org.apache.nutch.service.impl
-
- ConfManagerImpl() - Constructor for class org.apache.nutch.service.impl.ConfManagerImpl
-
- CONFORMS_TO - Static variable in class org.apache.nutch.tools.WARCUtils
-
- connectionFailures - Variable in class org.apache.nutch.hostdb.HostDatum
-
- ConnectionStatus - Enum in org.apache.nutch.webui.client.model
-
- containsWord(String, ArrayList<String>) - Method in class org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
-
- Content - Class in org.apache.nutch.protocol
-
- Content() - Constructor for class org.apache.nutch.protocol.Content
-
- Content(String, String, byte[], String, Metadata, Configuration) - Constructor for class org.apache.nutch.protocol.Content
-
- content - Variable in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- CONTENT_DISPOSITION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_ENCODING - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_LANGUAGE - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_LENGTH - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_LOCATION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_MD5 - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_REDIR - Static variable in class org.apache.nutch.fetcher.Fetcher
-
- CONTENT_TYPE - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- ContentAsTextInputFormat - Class in org.apache.nutch.segment
-
An input format that takes Nutch Content objects and converts them to text
while converting newline endings to spaces.
- ContentAsTextInputFormat() - Constructor for class org.apache.nutch.segment.ContentAsTextInputFormat
-
- CONTRIBUTOR - Static variable in interface org.apache.nutch.metadata.DublinCore
-
An entity responsible for making contributions to the content of the
resource.
- COOKIE - Static variable in class org.apache.nutch.protocol.http.api.HttpBase
-
- CosineSimilarity - Class in org.apache.nutch.scoring.similarity.cosine
-
- CosineSimilarity() - Constructor for class org.apache.nutch.scoring.similarity.cosine.CosineSimilarity
-
- count(String) - Method in class org.apache.nutch.service.impl.LinkReader
-
- count(String) - Method in class org.apache.nutch.service.impl.NodeReader
-
- count(String) - Method in class org.apache.nutch.service.impl.SequenceReader
-
- count(String) - Method in interface org.apache.nutch.service.NutchReader
-
- COVERAGE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
The extent or scope of the content of the resource.
- CpmIteratorAdapter<T> - Class in org.apache.nutch.webui.pages.components
-
This is iterator adapter, which wraps iterable items with
CompoundPropertyModel.
- CpmIteratorAdapter(Iterable<T>) - Constructor for class org.apache.nutch.webui.pages.components.CpmIteratorAdapter
-
- Crawl - Class in org.apache.nutch.webui.client.model
-
- Crawl() - Constructor for class org.apache.nutch.webui.client.model.Crawl
-
- Crawl.CrawlStatus - Enum in org.apache.nutch.webui.client.model
-
- CRAWL_ID_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
Used by Nutch REST service
- CrawlCompletionStats - Class in org.apache.nutch.util
-
Extracts some simple crawl completion stats from the crawldb
Stats will be sorted by host/domain and will be of the form:
1 www.spitzer.caltech.edu FETCHED
50 www.spitzer.caltech.edu UNFETCHED
- CrawlCompletionStats() - Constructor for class org.apache.nutch.util.CrawlCompletionStats
-
- CrawlCompletionStats.CrawlCompletionStatsCombiner - Class in org.apache.nutch.util
-
- CrawlCompletionStatsCombiner() - Constructor for class org.apache.nutch.util.CrawlCompletionStats.CrawlCompletionStatsCombiner
-
- CrawlDatum - Class in org.apache.nutch.crawl
-
- CrawlDatum() - Constructor for class org.apache.nutch.crawl.CrawlDatum
-
- CrawlDatum(int, int) - Constructor for class org.apache.nutch.crawl.CrawlDatum
-
- CrawlDatum(int, int, float) - Constructor for class org.apache.nutch.crawl.CrawlDatum
-
- crawlDatum - Variable in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
- CrawlDatum.Comparator - Class in org.apache.nutch.crawl
-
A Comparator optimized for CrawlDatum.
- CrawlDatumCsvOutputFormat() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat
-
- CrawlDb - Class in org.apache.nutch.crawl
-
This class takes the output of the fetcher and updates the crawldb
accordingly.
- CrawlDb() - Constructor for class org.apache.nutch.crawl.CrawlDb
-
- CrawlDb(Configuration) - Constructor for class org.apache.nutch.crawl.CrawlDb
-
- CRAWLDB_ADDITIONS_ALLOWED - Static variable in class org.apache.nutch.crawl.CrawlDb
-
- CRAWLDB_PURGE_404 - Static variable in class org.apache.nutch.crawl.CrawlDb
-
- CrawlDbDumpMapper() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbDumpMapper
-
- CrawlDbFilter - Class in org.apache.nutch.crawl
-
This class provides a way to separate the URL normalization and filtering
steps from the rest of CrawlDb manipulation code.
- CrawlDbFilter() - Constructor for class org.apache.nutch.crawl.CrawlDbFilter
-
- CrawlDbMerger - Class in org.apache.nutch.crawl
-
This tool merges several CrawlDb-s into one, optionally filtering URLs
through the current URLFilters, to skip prohibited pages.
- CrawlDbMerger() - Constructor for class org.apache.nutch.crawl.CrawlDbMerger
-
- CrawlDbMerger(Configuration) - Constructor for class org.apache.nutch.crawl.CrawlDbMerger
-
- CrawlDbMerger.Merger - Class in org.apache.nutch.crawl
-
- CrawlDbReader - Class in org.apache.nutch.crawl
-
Read utility for the CrawlDB.
- CrawlDbReader() - Constructor for class org.apache.nutch.crawl.CrawlDbReader
-
- CrawlDbReader.CrawlDatumCsvOutputFormat - Class in org.apache.nutch.crawl
-
- CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter - Class in org.apache.nutch.crawl
-
- CrawlDbReader.CrawlDbDumpMapper - Class in org.apache.nutch.crawl
-
- CrawlDbReader.CrawlDbStatCombiner - Class in org.apache.nutch.crawl
-
- CrawlDbReader.CrawlDbStatMapper - Class in org.apache.nutch.crawl
-
- CrawlDbReader.CrawlDbStatReducer - Class in org.apache.nutch.crawl
-
- CrawlDbReader.CrawlDbTopNMapper - Class in org.apache.nutch.crawl
-
- CrawlDbReader.CrawlDbTopNReducer - Class in org.apache.nutch.crawl
-
- CrawlDbReducer - Class in org.apache.nutch.crawl
-
Merge new page entries with existing entries.
- CrawlDbReducer() - Constructor for class org.apache.nutch.crawl.CrawlDbReducer
-
- CrawlDbStatCombiner() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
-
- CrawlDbStatMapper() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatMapper
-
- CrawlDbStatReducer() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
-
- CrawlDbTopNMapper() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNMapper
-
- CrawlDbTopNReducer() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
-
- CrawlDbUpdater() - Constructor for class org.apache.nutch.crawl.Generator.CrawlDbUpdater
-
- CrawlingCycle - Class in org.apache.nutch.webui.client.impl
-
This class implements crawl cycle as in crawl script
- CrawlingCycle(CrawlingCycleListener, RemoteCommandExecutor, Crawl, List<RemoteCommand>) - Constructor for class org.apache.nutch.webui.client.impl.CrawlingCycle
-
- CrawlingCycleListener - Interface in org.apache.nutch.webui.client.impl
-
- crawlingFinished(Crawl) - Method in interface org.apache.nutch.webui.client.impl.CrawlingCycleListener
-
- crawlingFinished(Crawl) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- crawlingStarted(Crawl) - Method in interface org.apache.nutch.webui.client.impl.CrawlingCycleListener
-
- crawlingStarted(Crawl) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- CrawlPanel - Class in org.apache.nutch.webui.pages.crawls
-
- CrawlPanel(String) - Constructor for class org.apache.nutch.webui.pages.crawls.CrawlPanel
-
- CrawlService - Interface in org.apache.nutch.webui.service
-
- CrawlServiceImpl - Class in org.apache.nutch.webui.service.impl
-
- CrawlServiceImpl() - Constructor for class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- CrawlsPage - Class in org.apache.nutch.webui.pages.crawls
-
This page is for crawls management
- CrawlsPage() - Constructor for class org.apache.nutch.webui.pages.crawls.CrawlsPage
-
- create(Text, CrawlDatum, String) - Static method in class org.apache.nutch.fetcher.FetchItem
-
Create an item.
- create(Text, CrawlDatum, String, int) - Static method in class org.apache.nutch.fetcher.FetchItem
-
- create(NutchConfig) - Method in interface org.apache.nutch.service.ConfManager
-
- create(NutchConfig) - Method in class org.apache.nutch.service.impl.ConfManagerImpl
-
Created a new configuration based on the values provided.
- create(JobConfig) - Method in class org.apache.nutch.service.impl.JobManagerImpl
-
- create(JobConfig) - Method in interface org.apache.nutch.service.JobManager
-
Creates specified job
- create(JobConfig) - Method in class org.apache.nutch.service.resources.JobResource
-
Create a new job
- create() - Static method in class org.apache.nutch.util.NutchConfiguration
-
- create(boolean, Properties) - Static method in class org.apache.nutch.util.NutchConfiguration
-
- createClient() - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- createCommands(Crawl) - Method in class org.apache.nutch.webui.client.impl.RemoteCommandsBatchFactory
-
- createComponents(String) - Method in class org.apache.nutch.scoring.similarity.util.LuceneAnalyzerUtil
-
- createConfig(NutchConfig) - Method in class org.apache.nutch.service.resources.ConfigResource
-
Create new configuration.
- createCrawlDao() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createDao(Class<T>) - Method in class org.apache.nutch.webui.config.CustomDaoFactory
-
- createDocFromCityDb(String, NutchDocument, DatabaseReader) - Static method in class org.apache.nutch.indexer.geoip.GeoIPDocumentCreator
-
- createDocFromCityService(String, NutchDocument, WebServiceClient) - Static method in class org.apache.nutch.indexer.geoip.GeoIPDocumentCreator
-
- createDocFromConnectionDb(String, NutchDocument, DatabaseReader) - Static method in class org.apache.nutch.indexer.geoip.GeoIPDocumentCreator
-
- createDocFromCountryService(String, NutchDocument, WebServiceClient) - Static method in class org.apache.nutch.indexer.geoip.GeoIPDocumentCreator
-
- createDocFromDomainDb(String, NutchDocument, DatabaseReader) - Static method in class org.apache.nutch.indexer.geoip.GeoIPDocumentCreator
-
- createDocFromInsightsService(String, NutchDocument, WebServiceClient) - Static method in class org.apache.nutch.indexer.geoip.GeoIPDocumentCreator
-
- createDocFromIspDb(String, NutchDocument, DatabaseReader) - Static method in class org.apache.nutch.indexer.geoip.GeoIPDocumentCreator
-
- createDocVector(String, int, int) - Static method in class org.apache.nutch.scoring.similarity.cosine.Model
-
Used to create a DocVector from given String text.
- createFileName(String, String, String) - Static method in class org.apache.nutch.util.DumpFileUtil
-
- createFileNameFromUrl(String, String, String, String, String, boolean) - Static method in class org.apache.nutch.util.DumpFileUtil
-
- createJob(Configuration, Path) - Static method in class org.apache.nutch.crawl.CrawlDb
-
- createKey() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Creates a new instance of the Text
object for the key.
- createLockFile(FileSystem, Path, boolean) - Static method in class org.apache.nutch.util.LockUtil
-
Create a lock file.
- createMergeJob(Configuration, Path, boolean, boolean) - Static method in class org.apache.nutch.crawl.CrawlDbMerger
-
- createMergeJob(Configuration, Path, boolean, boolean) - Static method in class org.apache.nutch.crawl.LinkDbMerger
-
- createModel(Configuration) - Static method in class org.apache.nutch.scoring.similarity.cosine.Model
-
- createNutchDao() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createParseResult(String, Parse) - Static method in class org.apache.nutch.parse.ParseResult
-
Convenience method for obtaining
ParseResult
from a single
Parse
output.
- createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- createRule(boolean, String, String) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
- createRule(boolean, String, String) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
- createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
-
- createRule(boolean, String, String) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
-
- createSeed(SeedList) - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- createSeed(SeedList) - Method in interface org.apache.nutch.webui.client.NutchClient
-
Create seed list and return seed directory location
- createSeedFile(SeedList) - Method in class org.apache.nutch.service.resources.SeedResource
-
Method creates seed list file and returns temporary directory path
- createSeedListDao() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createSeedUrlDao() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createSegments(Path, Path) - Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
-
Creates the arc files to segments job.
- createSocket(String, int, InetAddress, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
- createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
Attempts to get a new socket connection to the given host within the given
time limit.
- createSocket(String, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
- createSocket(Socket, String, int, boolean) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
- createSubCollection(String, String) - Method in class org.apache.nutch.collection.CollectionManager
-
Create a new subcollection.
- createTableCreator() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createToolByClassName(String, Configuration) - Method in class org.apache.nutch.service.impl.JobFactory
-
- createToolByType(JobManager.JobType, Configuration) - Method in class org.apache.nutch.service.impl.JobFactory
-
- createTwoLevelsDirectory(String, String, boolean) - Static method in class org.apache.nutch.util.DumpFileUtil
-
- createTwoLevelsDirectory(String, String) - Static method in class org.apache.nutch.util.DumpFileUtil
-
- createValue() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Creates a new instance of the BytesWritable
object for the key
- createWebGraph(Path, Path[], boolean, boolean) - Method in class org.apache.nutch.scoring.webgraph.WebGraph
-
Creates the three different WebGraph databases, Outlinks, Inlinks, and
Node.
- CreativeCommons - Interface in org.apache.nutch.metadata
-
A collection of Creative Commons properties names.
- CREATOR - Static variable in interface org.apache.nutch.metadata.DublinCore
-
An entity primarily responsible for making the content of the resource.
- CURRENT_NAME - Static variable in class org.apache.nutch.crawl.CrawlDb
-
- CURRENT_NAME - Static variable in class org.apache.nutch.crawl.LinkDb
-
- currentInstance - Variable in class org.apache.nutch.webui.pages.AbstractBasePage
-
- currentJob - Variable in class org.apache.nutch.util.NutchTool
-
- currentJobNum - Variable in class org.apache.nutch.util.NutchTool
-
- CustomDaoFactory - Class in org.apache.nutch.webui.config
-
- CustomDaoFactory(ConnectionSource) - Constructor for class org.apache.nutch.webui.config.CustomDaoFactory
-
- CustomTableCreator - Class in org.apache.nutch.webui.config
-
- CustomTableCreator(ConnectionSource, List<Dao<?, ?>>) - Constructor for class org.apache.nutch.webui.config.CustomTableCreator
-
- DashboardPage - Class in org.apache.nutch.webui.pages
-
- DashboardPage() - Constructor for class org.apache.nutch.webui.pages.DashboardPage
-
- DATE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
A date associated with an event in the life cycle of the resource.
- dateFormatStr - Static variable in class org.apache.nutch.indexer.feed.FeedIndexingFilter
-
- datePattern - Static variable in class org.apache.nutch.util.JexlUtil
-
- datum - Variable in class org.apache.nutch.crawl.Generator.SelectorEntry
-
- datum - Variable in class org.apache.nutch.hostdb.ResolverThread
-
- DB_IGNORE_EXTERNAL_EXEMPTIONS_FILE - Static variable in class org.apache.nutch.urlfilter.ignoreexempt.ExemptionUrlFilter
-
- DBFilter() - Constructor for class org.apache.nutch.crawl.DeduplicationJob.DBFilter
-
- DBFilter() - Constructor for class org.apache.nutch.indexer.CleaningJob.DBFilter
-
- DbQuery - Class in org.apache.nutch.service.model.request
-
- DbQuery() - Constructor for class org.apache.nutch.service.model.request.DbQuery
-
- DbResource - Class in org.apache.nutch.service.resources
-
- DbResource() - Constructor for class org.apache.nutch.service.resources.DbResource
-
- DEC_RATE - Variable in class org.apache.nutch.crawl.AdaptiveFetchSchedule
-
- DecreasingFloatComparator() - Constructor for class org.apache.nutch.crawl.Generator.DecreasingFloatComparator
-
- DeduplicationJob - Class in org.apache.nutch.crawl
-
Generic deduplicator which groups fetched URLs with the same digest and marks
all of them as duplicate except the one with the highest score (based on the
score in the crawldb, which is not necessarily the same as the score
indexed).
- DeduplicationJob() - Constructor for class org.apache.nutch.crawl.DeduplicationJob
-
- DeduplicationJob.DBFilter - Class in org.apache.nutch.crawl
-
- DeduplicationJob.DedupReducer - Class in org.apache.nutch.crawl
-
- DeduplicationJob.StatusUpdateReducer - Class in org.apache.nutch.crawl
-
Combine multiple new entries for a url.
- DedupReducer() - Constructor for class org.apache.nutch.crawl.DeduplicationJob.DedupReducer
-
- DEFAULT - Static variable in class org.apache.nutch.service.resources.ConfigResource
-
- DEFAULT_BOOST - Static variable in class org.apache.nutch.util.domain.DomainSuffix
-
- DEFAULT_FILE_NAME - Static variable in class org.apache.nutch.collection.CollectionManager
-
- DEFAULT_ID - Static variable in class org.apache.nutch.fetcher.FetchItemQueues
-
- DEFAULT_MAX_DEPTH - Static variable in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- DEFAULT_PLUGIN - Static variable in class org.apache.nutch.parse.ParserFactory
-
Wildcard for default plugins.
- DEFAULT_STATUS - Static variable in class org.apache.nutch.util.domain.DomainSuffix
-
- DefaultFetchSchedule - Class in org.apache.nutch.crawl
-
This class implements the default re-fetch schedule.
- DefaultFetchSchedule() - Constructor for class org.apache.nutch.crawl.DefaultFetchSchedule
-
- defaultInterval - Variable in class org.apache.nutch.crawl.AbstractFetchSchedule
-
- deflate(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils
-
Returns a deflated copy of the input array.
- DeflateUtils - Class in org.apache.nutch.util
-
A collection of utility methods for working on deflated data.
- DeflateUtils() - Constructor for class org.apache.nutch.util.DeflateUtils
-
- delete(String, boolean) - Method in class org.apache.nutch.indexer.CleaningJob
-
- delete(String) - Method in interface org.apache.nutch.indexer.IndexWriter
-
- delete(String) - Method in class org.apache.nutch.indexer.IndexWriters
-
- DELETE - Static variable in class org.apache.nutch.indexer.NutchIndexAction
-
- delete(String) - Method in class org.apache.nutch.indexwriter.dummy.DummyIndexWriter
-
- delete(String) - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- delete(String) - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- delete(String) - Method in interface org.apache.nutch.service.ConfManager
-
- delete(String) - Method in class org.apache.nutch.service.impl.ConfManagerImpl
-
- delete(Long) - Method in class org.apache.nutch.webui.service.impl.SeedListServiceImpl
-
- delete(Long) - Method in interface org.apache.nutch.webui.service.SeedListService
-
- deleteByQuery(String) - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- deleteConfig(String) - Method in class org.apache.nutch.service.resources.ConfigResource
-
Removes the configuration from the list of known configurations.
- deleteCrawl(Long) - Method in interface org.apache.nutch.webui.service.CrawlService
-
- deleteCrawl(Long) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- DeleterReducer() - Constructor for class org.apache.nutch.indexer.CleaningJob.DeleterReducer
-
- deleteSeedList(String) - Method in class org.apache.nutch.service.impl.SeedManagerImpl
-
- deleteSeedList(String) - Method in interface org.apache.nutch.service.SeedManager
-
- deleteSubCollection(String) - Method in class org.apache.nutch.collection.CollectionManager
-
Delete named subcollection
- DEPTH_KEY - Static variable in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- DEPTH_KEY_W - Static variable in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- DepthScoringFilter - Class in org.apache.nutch.scoring.depth
-
This scoring filter limits the number of hops from the initial seed urls.
- DepthScoringFilter() - Constructor for class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- describe() - Method in interface org.apache.nutch.indexer.IndexWriter
-
Returns a String describing the IndexWriter instance and the specific
parameters it can take
- describe() - Method in class org.apache.nutch.indexer.IndexWriters
-
- describe() - Method in class org.apache.nutch.indexwriter.dummy.DummyIndexWriter
-
- describe() - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- describe() - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- DESCRIPTION - Static variable in interface org.apache.nutch.metadata.DublinCore
-
An account of the content of the resource.
- DICTFILE_MODELFILTER - Static variable in class org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
-
- DIGEST_FIELD - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- DIR_NAME - Static variable in class org.apache.nutch.parse.ParseData
-
- DIR_NAME - Static variable in class org.apache.nutch.parse.ParseText
-
- DIR_NAME - Static variable in class org.apache.nutch.protocol.Content
-
- disconnect() - Method in class org.apache.nutch.protocol.ftp.Client
-
Closes the connection to the FTP server and restores connection parameters
to the default values.
- displayFileTypes(Map<String, Integer>, Map<String, Integer>) - Static method in class org.apache.nutch.util.DumpFileUtil
-
- distributeScoreToOutlink(Text, Text, ParseData, CrawlDatum, CrawlDatum, int, int) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
Get a float value from Fetcher.SCORE_KEY, divide it by the number of
outlinks and apply.
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
Distribute score value from the current page to all its outlinked pages.
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.ScoringFilters
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.similarity.cosine.CosineSimilarity
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in interface org.apache.nutch.scoring.similarity.SimilarityModel
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.similarity.SimilarityScoringFilter
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
This will take the metatags that you have listed in your "urlmeta.tags"
property, and looks for them inside the parseData object.
- DmozParser - Class in org.apache.nutch.tools
-
Utility that converts DMOZ RDF into a flat file of URLs to be injected.
- DmozParser() - Constructor for class org.apache.nutch.tools.DmozParser
-
- dnsFailures - Variable in class org.apache.nutch.hostdb.HostDatum
-
- doc - Variable in class org.apache.nutch.indexer.NutchIndexAction
-
- docToMetadata(NutchDocument) - Static method in class org.apache.nutch.tools.WARCUtils
-
- DocVector - Class in org.apache.nutch.scoring.similarity.cosine
-
- DocVector() - Constructor for class org.apache.nutch.scoring.similarity.cosine.DocVector
-
- docVectors - Static variable in class org.apache.nutch.scoring.similarity.cosine.Model
-
- DomainBlacklistURLFilter - Class in org.apache.nutch.urlfilter.domainblacklist
-
Filters URLs based on a file containing domain suffixes, domain names, and
hostnames.
- DomainBlacklistURLFilter() - Constructor for class org.apache.nutch.urlfilter.domainblacklist.DomainBlacklistURLFilter
-
Default constructor.
- DomainBlacklistURLFilter(String) - Constructor for class org.apache.nutch.urlfilter.domainblacklist.DomainBlacklistURLFilter
-
Constructor that specifies the domain file to use.
- DomainStatistics - Class in org.apache.nutch.util.domain
-
Extracts some very basic statistics about domains from the crawldb
- DomainStatistics() - Constructor for class org.apache.nutch.util.domain.DomainStatistics
-
- DomainStatistics.DomainStatisticsCombiner - Class in org.apache.nutch.util.domain
-
- DomainStatistics.MyCounter - Enum in org.apache.nutch.util.domain
-
- DomainStatisticsCombiner() - Constructor for class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsCombiner
-
- DomainSuffix - Class in org.apache.nutch.util.domain
-
This class represents the last part of the host name, which is operated by
authoritives, not individuals.
- DomainSuffix(String, DomainSuffix.Status, float) - Constructor for class org.apache.nutch.util.domain.DomainSuffix
-
- DomainSuffix(String) - Constructor for class org.apache.nutch.util.domain.DomainSuffix
-
- DomainSuffix.Status - Enum in org.apache.nutch.util.domain
-
Enumeration of the status of the tld.
- DomainSuffixes - Class in org.apache.nutch.util.domain
-
Storage class for DomainSuffix
objects Note: this class is
singleton
- DomainURLFilter - Class in org.apache.nutch.urlfilter.domain
-
Filters URLs based on a file containing domain suffixes, domain names, and
hostnames.
- DomainURLFilter() - Constructor for class org.apache.nutch.urlfilter.domain.DomainURLFilter
-
Default constructor.
- DomainURLFilter(String) - Constructor for class org.apache.nutch.urlfilter.domain.DomainURLFilter
-
Constructor that specifies the domain file to use.
- DOMBuilder - Class in org.apache.nutch.parse.html
-
This class takes SAX events (in addition to some extra events that SAX
doesn't handle yet) and adds the result to a document or document fragment.
- DOMBuilder(Document, Node) - Constructor for class org.apache.nutch.parse.html.DOMBuilder
-
DOMBuilder instance constructor...
- DOMBuilder(Document, DocumentFragment) - Constructor for class org.apache.nutch.parse.html.DOMBuilder
-
DOMBuilder instance constructor...
- DOMBuilder(Document) - Constructor for class org.apache.nutch.parse.html.DOMBuilder
-
DOMBuilder instance constructor...
- DOMContentUtils - Class in org.apache.nutch.parse.html
-
A collection of methods for extracting content from DOM trees.
- DOMContentUtils(Configuration) - Constructor for class org.apache.nutch.parse.html.DOMContentUtils
-
- DOMContentUtils - Class in org.apache.nutch.parse.tika
-
A collection of methods for extracting content from DOM trees.
- DOMContentUtils(Configuration) - Constructor for class org.apache.nutch.parse.tika.DOMContentUtils
-
- DOMContentUtils.LinkParams - Class in org.apache.nutch.parse.html
-
- DomUtil - Class in org.apache.nutch.util
-
- DomUtil() - Constructor for class org.apache.nutch.util.DomUtil
-
- dotProduct(DocVector) - Method in class org.apache.nutch.scoring.similarity.cosine.DocVector
-
- DublinCore - Interface in org.apache.nutch.metadata
-
A collection of Dublin Core metadata names.
- DummyIndexWriter - Class in org.apache.nutch.indexwriter.dummy
-
DummyIndexWriter.
- DummyIndexWriter() - Constructor for class org.apache.nutch.indexwriter.dummy.DummyIndexWriter
-
- DummySSLProtocolSocketFactory - Class in org.apache.nutch.protocol.httpclient
-
- DummySSLProtocolSocketFactory() - Constructor for class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
Constructor for DummySSLProtocolSocketFactory.
- DummyX509TrustManager - Class in org.apache.nutch.protocol.httpclient
-
- DummyX509TrustManager(KeyStore) - Constructor for class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
Constructor for DummyX509TrustManager.
- dump() - Method in class org.apache.nutch.fetcher.FetchItemQueue
-
- dump() - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- dump(Path, Path) - Method in class org.apache.nutch.segment.SegmentReader
-
- dump(File, File, File, boolean, String[], boolean, String, boolean) - Method in class org.apache.nutch.tools.CommonCrawlDataDumper
-
Dumps the reverse engineered CBOR content from the provided segment
directories if a parent directory contains more than one segment,
otherwise a single segment can be passed as an argument.
- dump(File, File, String[], boolean, boolean, boolean) - Method in class org.apache.nutch.tools.FileDumper
-
Dumps the reverse engineered raw content from the provided segment
directories if a parent directory contains more than one segment, otherwise
a single segment can be passed as an argument.
- DUMP_DIR - Static variable in class org.apache.nutch.scoring.webgraph.LinkDumper
-
- Dumper() - Constructor for class org.apache.nutch.scoring.webgraph.NodeDumper.Dumper
-
- DumpFileUtil - Class in org.apache.nutch.util
-
- DumpFileUtil() - Constructor for class org.apache.nutch.util.DumpFileUtil
-
- dumpLinks(Path) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper
-
Runs the inverter and merger jobs of the LinkDumper tool to create the url
to inlink node database.
- dumpNodes(Path, NodeDumper.DumpType, long, Path, boolean, NodeDumper.NameType, NodeDumper.AggrType, boolean) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper
-
Runs the process to dump the top urls out to a text file.
- dumpText - Variable in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- dumpUrl(Path, String) - Method in class org.apache.nutch.scoring.webgraph.NodeReader
-
Prints the content of the Node represented by the url to system out.
- FAILED - Static variable in class org.apache.nutch.parse.ParseStatus
-
General failure.
- FAILED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Content was not retrieved.
- FAILED_EXCEPTION - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsing failed.
- FAILED_INVALID_FORMAT - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsing failed.
- FAILED_MISSING_CONTENT - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsing failed.
- FAILED_MISSING_PARTS - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsing failed.
- FAILED_TRUNCATED - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsing failed.
- failures - Variable in class org.apache.nutch.hostdb.HostDatum
-
- Feed - Interface in org.apache.nutch.metadata
-
A collection of Feed property names extracted by the ROME library.
- FEED - Static variable in interface org.apache.nutch.metadata.Feed
-
- FEED_AUTHOR - Static variable in interface org.apache.nutch.metadata.Feed
-
- FEED_PUBLISHED - Static variable in interface org.apache.nutch.metadata.Feed
-
- FEED_TAGS - Static variable in interface org.apache.nutch.metadata.Feed
-
- FEED_UPDATED - Static variable in interface org.apache.nutch.metadata.Feed
-
- FeedIndexingFilter - Class in org.apache.nutch.indexer.feed
-
- FeedIndexingFilter() - Constructor for class org.apache.nutch.indexer.feed.FeedIndexingFilter
-
- FeedParser - Class in org.apache.nutch.parse.feed
-
- FeedParser() - Constructor for class org.apache.nutch.parse.feed.FeedParser
-
- fetch(Path, int) - Method in class org.apache.nutch.fetcher.Fetcher
-
- fetch(String, StringBuilder) - Method in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- FETCH_DIR_NAME - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- FETCH_EVENT_CONTENTLANG - Static variable in interface org.apache.nutch.metadata.Nutch
-
Content-lanueage key in the Pub/Sub event metadata for the content-language of the parsed page
- FETCH_EVENT_CONTENTTYPE - Static variable in interface org.apache.nutch.metadata.Nutch
-
Content-type key in the Pub/Sub event metadata for the content-type of the parsed page
- FETCH_EVENT_FETCHTIME - Static variable in interface org.apache.nutch.metadata.Nutch
-
Fetch time key in the Pub/Sub event metadata for the fetch time of the parsed page
- FETCH_EVENT_SCORE - Static variable in interface org.apache.nutch.metadata.Nutch
-
Score key in the Pub/Sub event metadata for the score of the parsed page
- FETCH_EVENT_TITLE - Static variable in interface org.apache.nutch.metadata.Nutch
-
Title key in the Pub/Sub event metadata for the title of the parsed page
- FETCH_STATUS_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- FETCH_TIME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- fetchDb(int, int) - Method in class org.apache.nutch.service.resources.DbResource
-
- fetched - Variable in class org.apache.nutch.hostdb.HostDatum
-
- fetched - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- Fetcher - Class in org.apache.nutch.fetcher
-
A queue-based fetcher.
- Fetcher() - Constructor for class org.apache.nutch.fetcher.Fetcher
-
- Fetcher(Configuration) - Constructor for class org.apache.nutch.fetcher.Fetcher
-
- Fetcher.InputFormat - Class in org.apache.nutch.fetcher
-
- FetcherOutputFormat - Class in org.apache.nutch.fetcher
-
Splits FetcherOutput entries into multiple map files.
- FetcherOutputFormat() - Constructor for class org.apache.nutch.fetcher.FetcherOutputFormat
-
- fetchErrors - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- FetcherThread - Class in org.apache.nutch.fetcher
-
This class picks items from queues and fetches the pages.
- FetcherThread(Configuration, AtomicInteger, FetchItemQueues, QueueFeeder, AtomicInteger, AtomicLong, Reporter, AtomicInteger, String, boolean, OutputCollector<Text, NutchWritable>, boolean, AtomicInteger, AtomicLong) - Constructor for class org.apache.nutch.fetcher.FetcherThread
-
- FetcherThreadEvent - Class in org.apache.nutch.fetcher
-
This class is used to capture the various events occurring
at fetch time.
- FetcherThreadEvent(FetcherThreadEvent.PublishEventType, String) - Constructor for class org.apache.nutch.fetcher.FetcherThreadEvent
-
Constructor to create an event to be published
- FetcherThreadEvent.PublishEventType - Enum in org.apache.nutch.fetcher
-
Type of event to specify start, end or reporting of a fetch item.
- FetcherThreadPublisher - Class in org.apache.nutch.fetcher
-
This class handles the publishing of the events to the queue implementation.
- FetcherThreadPublisher(Configuration) - Constructor for class org.apache.nutch.fetcher.FetcherThreadPublisher
-
Configure all registered publishers
- FetchItem - Class in org.apache.nutch.fetcher
-
This class describes the item to be fetched.
- FetchItem(Text, URL, CrawlDatum, String) - Constructor for class org.apache.nutch.fetcher.FetchItem
-
- FetchItem(Text, URL, CrawlDatum, String, int) - Constructor for class org.apache.nutch.fetcher.FetchItem
-
- FetchItemQueue - Class in org.apache.nutch.fetcher
-
This class handles FetchItems which come from the same host ID (be it a
proto/hostname or proto/IP pair).
- FetchItemQueue(Configuration, int, long, long) - Constructor for class org.apache.nutch.fetcher.FetchItemQueue
-
- FetchItemQueues - Class in org.apache.nutch.fetcher
-
Convenience class - a collection of queues that keeps track of the total
number of items, and provides items eligible for fetching from any queue.
- FetchItemQueues(Configuration) - Constructor for class org.apache.nutch.fetcher.FetchItemQueues
-
- FetchNode - Class in org.apache.nutch.fetcher
-
- FetchNode() - Constructor for class org.apache.nutch.fetcher.FetchNode
-
- FetchNodeDb - Class in org.apache.nutch.fetcher
-
- FetchNodeDb() - Constructor for class org.apache.nutch.fetcher.FetchNodeDb
-
- FetchNodeDbInfo - Class in org.apache.nutch.service.model.response
-
- FetchNodeDbInfo() - Constructor for class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- FetchSchedule - Interface in org.apache.nutch.crawl
-
This interface defines the contract for implementations that manipulate fetch
times and re-fetch intervals.
- FetchScheduleFactory - Class in org.apache.nutch.crawl
-
- FG() - Constructor for class org.apache.nutch.tools.FreeGenerator.FG
-
- FIELD - Static variable in class org.creativecommons.nutch.CCIndexingFilter
-
The name of the document field we use.
- fieldName - Static variable in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
Doc field name
- FieldReplacer - Class in org.apache.nutch.indexer.replace
-
POJO to store a filename, its match pattern and its replacement string.
- FieldReplacer(String, String, String, String, Integer) - Constructor for class org.apache.nutch.indexer.replace.FieldReplacer
-
Create a FieldReplacer for a field.
- FieldReplacer(String, String, String, Integer) - Constructor for class org.apache.nutch.indexer.replace.FieldReplacer
-
Field replacer with the input and output field the same.
- File - Class in org.apache.nutch.protocol.file
-
This class is a protocol plugin used for file: scheme.
- File() - Constructor for class org.apache.nutch.protocol.file.File
-
- FileDumper - Class in org.apache.nutch.tools
-
The file dumper tool enables one to reverse generate the raw content from
Nutch segment data directories.
- FileDumper() - Constructor for class org.apache.nutch.tools.FileDumper
-
- FileError - Exception in org.apache.nutch.protocol.file
-
Thrown for File error codes.
- FileError(int) - Constructor for exception org.apache.nutch.protocol.file.FileError
-
- FileException - Exception in org.apache.nutch.protocol.file
-
- FileException() - Constructor for exception org.apache.nutch.protocol.file.FileException
-
- FileException(String) - Constructor for exception org.apache.nutch.protocol.file.FileException
-
- FileException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.file.FileException
-
- FileException(Throwable) - Constructor for exception org.apache.nutch.protocol.file.FileException
-
- fileLen - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- FileResponse - Class in org.apache.nutch.protocol.file
-
FileResponse.java mimics file replies as http response.
- FileResponse(URL, CrawlDatum, File, Configuration) - Constructor for class org.apache.nutch.protocol.file.FileResponse
-
Default public constructor
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
-
Scan the HTML document looking at possible indications of content language
1.
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
-
- filter(String) - Method in class org.apache.nutch.collection.Subcollection
-
Simple "indexOf" currentFilter for matching patterns.
- filter - Variable in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
The
AnchorIndexingFilter
filter object which supports boolean
configuration settings for the deduplication of anchors.
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
-
The
BasicIndexingFilter
filter object which supports few
configuration settings for adding basic searchable fields.
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.feed.FeedIndexingFilter
-
Extracts out the relevant fields:
FEED_AUTHOR
FEED_TAGS
FEED_PUBLISHED
FEED_UPDATED
FEED
And sends them to the Indexer
for indexing within the Nutch index.
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.filter.MimeTypeIndexingFilter
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.geoip.GeoIPIndexingFilter
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in interface org.apache.nutch.indexer.IndexingFilter
-
Adds fields or otherwise modifies the document that will be indexed for a
parse.
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.IndexingFilters
-
Run all defined filters.
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.links.LinksIndexingFilter
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.metadata.MetadataIndexer
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.replace.ReplaceIndexer
-
Adds fields or otherwise modifies the document that will be indexed for a
parse.
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.staticfield.StaticFieldIndexer
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter
-
This will take the metatags that you have listed in your "urlmeta.tags"
property, and looks for them inside the CrawlDatum object.
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
-
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.microformats.reltag.RelTagParser
-
Scan the HTML document looking at possible rel-tags
- filter(String, String) - Method in interface org.apache.nutch.net.URLExemptionFilter
-
Checks if toUrl is exempted when the ignore external is enabled
- filter(String) - Method in interface org.apache.nutch.net.URLFilter
-
- filter(String) - Method in class org.apache.nutch.net.URLFilters
-
Run all defined filters.
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.headings.HeadingsParseFilter
-
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in interface org.apache.nutch.parse.HtmlParseFilter
-
Adds metadata or otherwise modifies a parse of HTML content, given the DOM
tree of a page.
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.HtmlParseFilters
-
Run all defined filters.
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.js.JSParseFilter
-
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.metatags.MetaTagsParser
-
- filter() - Method in class org.apache.nutch.parse.ParseResult
-
Remove all results where status is not successful (as determined by
ParseStatus#isSuccess()).
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
-
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parsefilter.regex.RegexParseFilter
-
- filter(Text, CrawlDatum, CrawlDatum, CrawlDatum, Content, ParseData, ParseText, Collection<CrawlDatum>) - Method in interface org.apache.nutch.segment.SegmentMergeFilter
-
The filtering method which gets all information being merged for a given
key (URL).
- filter(Text, CrawlDatum, CrawlDatum, CrawlDatum, Content, ParseData, ParseText, Collection<CrawlDatum>) - Method in class org.apache.nutch.segment.SegmentMergeFilters
-
Iterates over all
SegmentMergeFilter
extensions and if any of them
returns false, it will return false as well.
- filter(String) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- filter(String) - Method in class org.apache.nutch.urlfilter.domain.DomainURLFilter
-
- filter(String) - Method in class org.apache.nutch.urlfilter.domainblacklist.DomainBlacklistURLFilter
-
- filter(String, String) - Method in class org.apache.nutch.urlfilter.ignoreexempt.ExemptionUrlFilter
-
- filter(String) - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
-
- filter(String) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- filter(String) - Method in class org.apache.nutch.urlfilter.validator.UrlValidator
-
- filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.creativecommons.nutch.CCIndexingFilter
-
- filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.creativecommons.nutch.CCParseFilter
-
Adds metadata or otherwise modifies a parse of an HTML document, given the
DOM tree of a page.
- filterNormalize(String) - Method in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
Filters and or normalizes the input URL
- filterNormalize(String, String, String, boolean, boolean, String, URLFilters, URLExemptionFilters, URLNormalizers) - Static method in class org.apache.nutch.parse.ParseOutputFormat
-
- filterNormalize(String, String, String, boolean, boolean, String, URLFilters, URLExemptionFilters, URLNormalizers, String) - Static method in class org.apache.nutch.parse.ParseOutputFormat
-
- filterParse(String) - Method in class org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
-
- filters - Variable in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
- filterUrl(String) - Method in class org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
-
- finalize() - Method in class org.apache.nutch.plugin.Plugin
-
- finalize() - Method in class org.apache.nutch.plugin.PluginRepository
-
- finalize() - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- findAll() - Method in class org.apache.nutch.webui.service.impl.SeedListServiceImpl
-
- findAll() - Method in interface org.apache.nutch.webui.service.SeedListService
-
- findAuthentication(Metadata) - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
-
- findWorker(String) - Method in class org.apache.nutch.service.impl.NutchServerPoolExecutor
-
Find the Job Worker Thread
- finishFetchItem(FetchItem, boolean) - Method in class org.apache.nutch.fetcher.FetchItemQueue
-
- finishFetchItem(FetchItem) - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- finishFetchItem(FetchItem, boolean) - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- FIXED_INTERVAL_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
Used by AdaptiveFetchSchedule to maintain custom fetch interval
- flattenHashMap(HashMap<String, Integer>) - Static method in class org.apache.nutch.parsefilter.naivebayes.Train
-
- followRedirects - Variable in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- FORBID_ALL_RULES - Static variable in class org.apache.nutch.protocol.RobotRulesParser
-
A BaseRobotRules
object appropriate for use when the
robots.txt
file is not fetched due to a 403/Forbidden
response; all requests are disallowed.
- force - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- forceRefetch(Text, CrawlDatum, boolean) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method resets fetchTime, fetchInterval, modifiedTime,
retriesSinceFetch and page signature, so that it forces refetching.
- forceRefetch(Text, CrawlDatum, boolean) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
This method resets fetchTime, fetchInterval, modifiedTime and page
signature, so that it forces refetching.
- FORMAT - Static variable in interface org.apache.nutch.metadata.DublinCore
-
Typically, Format may include the media-type or dimensions of the resource.
- format - Static variable in class org.apache.nutch.net.protocols.HttpDateFormat
-
- FORMAT - Static variable in class org.apache.nutch.tools.WARCUtils
-
- forName(String) - Method in class org.apache.nutch.util.MimeUtil
-
A facade interface to Tika's underlying MimeTypes.forName(String)
method.
- FreeGenerator - Class in org.apache.nutch.tools
-
This tool generates fetchlists (segments to be fetched) from plain text files
containing one URL per line.
- FreeGenerator() - Constructor for class org.apache.nutch.tools.FreeGenerator
-
- FreeGenerator.FG - Class in org.apache.nutch.tools
-
- fromHexString(String) - Static method in class org.apache.nutch.util.StringUtil
-
Convert a String containing consecutive (no inside whitespace) hexadecimal
digits into a corresponding byte array.
- FSUtils - Class in org.apache.nutch.util
-
Utility methods for common filesystem operations.
- FSUtils() - Constructor for class org.apache.nutch.util.FSUtils
-
- Ftp - Class in org.apache.nutch.protocol.ftp
-
This class is a protocol plugin used for ftp: scheme.
- Ftp() - Constructor for class org.apache.nutch.protocol.ftp.Ftp
-
- FtpError - Exception in org.apache.nutch.protocol.ftp
-
Thrown for Ftp error codes.
- FtpError(int) - Constructor for exception org.apache.nutch.protocol.ftp.FtpError
-
- FtpException - Exception in org.apache.nutch.protocol.ftp
-
Superclass for important exceptions thrown during FTP talk, that must be
handled with care.
- FtpException() - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
-
- FtpException(String) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
-
- FtpException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
-
- FtpException(Throwable) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
-
- FtpExceptionBadSystResponse - Exception in org.apache.nutch.protocol.ftp
-
Exception indicating bad reply of SYST command.
- FtpExceptionCanNotHaveDataConnection - Exception in org.apache.nutch.protocol.ftp
-
Exception indicating failure of opening data connection.
- FtpExceptionControlClosedByForcedDataClose - Exception in org.apache.nutch.protocol.ftp
-
Exception indicating control channel is closed by server end, due to forced
closure of data channel at client (our) end.
- FtpExceptionUnknownForcedDataClose - Exception in org.apache.nutch.protocol.ftp
-
Exception indicating unrecognizable reply from server after forced closure of
data channel by client (our) side.
- FtpResponse - Class in org.apache.nutch.protocol.ftp
-
FtpResponse.java mimics ftp replies as http response.
- FtpResponse(URL, CrawlDatum, Ftp, Configuration) - Constructor for class org.apache.nutch.protocol.ftp.FtpResponse
-
- FtpRobotRulesParser - Class in org.apache.nutch.protocol.ftp
-
This class is used for parsing robots for urls belonging to FTP protocol.
- FtpRobotRulesParser(Configuration) - Constructor for class org.apache.nutch.protocol.ftp.FtpRobotRulesParser
-
- generate(Path, Path, int, long, long) - Method in class org.apache.nutch.crawl.Generator
-
- generate(Path, Path, int, long, long, boolean, boolean) - Method in class org.apache.nutch.crawl.Generator
-
old signature used for compatibility - does not specify whether or not to
normalise and set the number of segments to 1
- generate(Path, Path, int, long, long, boolean, boolean, boolean, int, String) - Method in class org.apache.nutch.crawl.Generator
-
Generate fetchlists in one or more segments.
- GENERATE_DIR_NAME - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- GENERATE_TIME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- GENERATE_UPDATE_CRAWLDB - Static variable in class org.apache.nutch.crawl.Generator
-
- generated - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- generateFileNameForKeyValue(FloatWritable, Generator.SelectorEntry, String) - Method in class org.apache.nutch.crawl.Generator.GeneratorOutputFormat
-
- generateJson() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- generateJson() - Method in class org.apache.nutch.tools.CommonCrawlFormatJackson
-
- generateJson() - Method in class org.apache.nutch.tools.CommonCrawlFormatJettinson
-
- generateJson() - Method in class org.apache.nutch.tools.CommonCrawlFormatSimple
-
- generateJson() - Method in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- generateSegmentName() - Static method in class org.apache.nutch.crawl.Generator
-
- generateSegmentName() - Static method in class org.apache.nutch.tools.arc.ArcSegmentCreator
-
Generates a random name for the segments.
- generateWARC(String, List<Path>) - Method in class org.apache.nutch.tools.warc.WARCExporter
-
- Generator - Class in org.apache.nutch.crawl
-
Generates a subset of a crawl db to fetch.
- Generator() - Constructor for class org.apache.nutch.crawl.Generator
-
- Generator(Configuration) - Constructor for class org.apache.nutch.crawl.Generator
-
- generator - Static variable in class org.apache.nutch.tools.WARCUtils
-
- Generator.CrawlDbUpdater - Class in org.apache.nutch.crawl
-
Update the CrawlDB so that the next generate won't include the same URLs.
- Generator.DecreasingFloatComparator - Class in org.apache.nutch.crawl
-
- Generator.GeneratorOutputFormat - Class in org.apache.nutch.crawl
-
- Generator.HashComparator - Class in org.apache.nutch.crawl
-
Sort fetch lists by hash of URL.
- Generator.PartitionReducer - Class in org.apache.nutch.crawl
-
- Generator.Selector - Class in org.apache.nutch.crawl
-
Selects entries due for fetch.
- Generator.SelectorEntry - Class in org.apache.nutch.crawl
-
- Generator.SelectorInverseMapper - Class in org.apache.nutch.crawl
-
- GENERATOR_COUNT_MODE - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_COUNT_VALUE_DOMAIN - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_COUNT_VALUE_HOST - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_CUR_TIME - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_DELAY - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_EXPR - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_FILTER - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_MAX_COUNT - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_MAX_NUM_SEGMENTS - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_MIN_INTERVAL - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_MIN_SCORE - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_NORMALISE - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_RESTRICT_STATUS - Static variable in class org.apache.nutch.crawl.Generator
-
- GENERATOR_TOP_N - Static variable in class org.apache.nutch.crawl.Generator
-
- GeneratorOutputFormat() - Constructor for class org.apache.nutch.crawl.Generator.GeneratorOutputFormat
-
- generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- generatorSortValue(Text, CrawlDatum, float) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
This method prepares a sort value for the purpose of sorting and selecting
top N scoring pages during fetchlist generation.
- generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.ScoringFilters
-
Calculate a sort value for Generate.
- generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
Boilerplate
- GenericWritableConfigurable - Class in org.apache.nutch.util
-
A generic Writable wrapper that can inject Configuration to
Configurable
s
- GenericWritableConfigurable() - Constructor for class org.apache.nutch.util.GenericWritableConfigurable
-
- GeoIPDocumentCreator - Class in org.apache.nutch.indexer.geoip
-
- GeoIPDocumentCreator() - Constructor for class org.apache.nutch.indexer.geoip.GeoIPDocumentCreator
-
Default constructor.
- GeoIPIndexingFilter - Class in org.apache.nutch.indexer.geoip
-
This plugin implements an indexing filter which takes advantage of the
GeoIP2-java API.
- GeoIPIndexingFilter() - Constructor for class org.apache.nutch.indexer.geoip.GeoIPIndexingFilter
-
Default constructor for this plugin
- get(String, String, JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader
-
- get(String) - Method in class org.apache.nutch.metadata.Metadata
-
Get the value associated to a metadata name.
- get(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- get(String) - Method in class org.apache.nutch.parse.ParseResult
-
Retrieve a single parse output.
- get(Text) - Method in class org.apache.nutch.parse.ParseResult
-
Retrieve a single parse output.
- get(Configuration) - Static method in class org.apache.nutch.plugin.PluginRepository
-
- get(FileSplit) - Static method in class org.apache.nutch.segment.SegmentPart
-
Create SegmentPart from a FileSplit.
- get(String) - Static method in class org.apache.nutch.segment.SegmentPart
-
Create SegmentPart from a full path of a location inside any segment part.
- get(Path, Text, Writer, Map<String, List<Writable>>) - Method in class org.apache.nutch.segment.SegmentReader
-
- get(String) - Method in interface org.apache.nutch.service.ConfManager
-
- get(String) - Method in class org.apache.nutch.service.impl.ConfManagerImpl
-
Returns the configuration associatedConfManagerImpl with the given confId
- get(String, String) - Method in class org.apache.nutch.service.impl.JobManagerImpl
-
- get(String, String) - Method in interface org.apache.nutch.service.JobManager
-
- get(String) - Method in class org.apache.nutch.util.domain.DomainSuffixes
-
Return the
DomainSuffix
object for the extension, if extension is a
top level domain returned object will be an instance of
TopLevelDomain
- get(Configuration) - Static method in class org.apache.nutch.util.ObjectCache
-
- getAccept() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getAcceptedIssuers() - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- getAcceptLanguage() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
Value of "Accept-Language" request header sent by Nutch.
- getAdditionalPostHeaders() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- getAgentString(String, String, String, String, String) - Static method in class org.apache.nutch.tools.WARCUtils
-
- getAll() - Method in class org.apache.nutch.collection.CollectionManager
-
Returns all collections
- getAllJobs() - Method in class org.apache.nutch.service.impl.NutchServerPoolExecutor
-
Gives all jobs(currently running and completed)
- getAnchor() - Method in class org.apache.nutch.crawl.Inlink
-
- getAnchor() - Method in class org.apache.nutch.parse.Outlink
-
- getAnchor() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- getAnchors() - Method in class org.apache.nutch.crawl.Inlinks
-
Return the set of anchor texts.
- getAnchors(Text) - Method in class org.apache.nutch.crawl.LinkDbReader
-
- getArgs() - Method in class org.apache.nutch.parse.ParseStatus
-
- getArgs() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getArgs() - Method in class org.apache.nutch.service.model.request.DbQuery
-
- getArgs() - Method in class org.apache.nutch.service.model.request.JobConfig
-
- getArgs() - Method in class org.apache.nutch.service.model.response.JobInfo
-
- getArgs() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getArgs() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getAsMap(String) - Method in interface org.apache.nutch.service.ConfManager
-
- getAsMap(String) - Method in class org.apache.nutch.service.impl.ConfManagerImpl
-
- getAsyncExecutor() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- getAttribute(String) - Method in class org.apache.nutch.plugin.Extension
-
Returns a attribute value, that is setuped in the manifest file and is
definied by the extension point xml schema.
- getAuthentication(String, Configuration) - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
This method is responsible for providing Basic authentication information.
- getBase(Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
If Node contains a BASE tag then it's HREF is returned.
- getBaseHref() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getBaseUrl() - Method in class org.apache.nutch.protocol.Content
-
The base url for relative links contained in the content.
- getBasicPattern() - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
Provides a pattern which can be used by an outside resource to determine if
this class can provide credentials based on simple header information.
- getBlackListString() - Method in class org.apache.nutch.collection.Subcollection
-
Returns blacklist String
- getBoost() - Method in class org.apache.nutch.util.domain.DomainSuffix
-
- getBufferSize() - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- getBuilder(String) - Static method in class org.apache.nutch.webui.pages.components.ColorEnumLabel
-
- getCachedClass(PluginDescriptor, String) - Method in class org.apache.nutch.plugin.PluginRepository
-
- getCacheKey(URL) - Static method in class org.apache.nutch.protocol.http.api.HttpRobotRulesParser
-
Compose unique key to store and access robot rules in cache for given URL
- getChildren() - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- getClassLoader() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a cached classloader for a plugin.
- getClazz() - Method in class org.apache.nutch.plugin.Extension
-
Returns the full class name of the extension point implementation
- getClient(NutchInstance) - Method in class org.apache.nutch.webui.client.NutchClientFactory
-
- getCloudSolrClient(String) - Static method in class org.apache.nutch.indexwriter.solr.SolrUtils
-
- getCode() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the response code.
- getCode(int) - Method in exception org.apache.nutch.protocol.file.FileError
-
- getCode() - Method in class org.apache.nutch.protocol.file.FileResponse
-
Returns the response code.
- getCode(int) - Method in exception org.apache.nutch.protocol.ftp.FtpError
-
- getCode() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
-
Returns the response code.
- getCode() - Method in class org.apache.nutch.protocol.htmlunit.HttpResponse
-
- getCode() - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getCode() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getCode() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getCode() - Method in class org.apache.nutch.protocol.selenium.HttpResponse
-
- getCollectionManager(Configuration) - Static method in class org.apache.nutch.collection.CollectionManager
-
- getCommand() - Method in class org.apache.nutch.util.CommandRunner
-
- getCommonCrawlFormat(String, String, Content, Metadata, Configuration, CommonCrawlConfig) - Static method in class org.apache.nutch.tools.CommonCrawlFormatFactory
-
Deprecated.
- getCommonCrawlFormat(String, Configuration, CommonCrawlConfig) - Static method in class org.apache.nutch.tools.CommonCrawlFormatFactory
-
- getConf() - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
-
- getConf() - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
-
- getConf() - Method in class org.apache.nutch.crawl.Signature
-
- getConf() - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.CleaningJob
-
- getConf() - Method in class org.apache.nutch.indexer.feed.FeedIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.filter.MimeTypeIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.geoip.GeoIPIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.links.LinksIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.metadata.MetadataIndexer
-
- getConf() - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.replace.ReplaceIndexer
- getConf() - Method in class org.apache.nutch.indexer.staticfield.StaticFieldIndexer
-
- getConf() - Method in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter
-
Boilerplate
- getConf() - Method in class org.apache.nutch.indexwriter.dummy.DummyIndexWriter
-
- getConf() - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- getConf() - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
-
- getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagParser
-
- getConf() - Method in class org.apache.nutch.net.urlnormalizer.host.HostURLNormalizer
-
- getConf() - Method in class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
-
- getConf() - Method in class org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer
-
- getConf() - Method in class org.apache.nutch.net.urlnormalizer.querystring.QuerystringURLNormalizer
-
- getConf() - Method in class org.apache.nutch.net.urlnormalizer.slash.SlashURLNormalizer
-
- getConf() - Method in class org.apache.nutch.parse.ext.ExtParser
-
- getConf() - Method in class org.apache.nutch.parse.feed.FeedParser
-
- getConf() - Method in class org.apache.nutch.parse.headings.HeadingsParseFilter
-
- getConf() - Method in class org.apache.nutch.parse.html.HtmlParser
-
- getConf() - Method in class org.apache.nutch.parse.js.JSParseFilter
-
- getConf() - Method in class org.apache.nutch.parse.metatags.MetaTagsParser
-
- getConf() - Method in class org.apache.nutch.parse.ParserChecker
-
- getConf() - Method in class org.apache.nutch.parse.swf.SWFParser
-
- getConf() - Method in class org.apache.nutch.parse.tika.TikaParser
-
- getConf() - Method in class org.apache.nutch.parse.zip.ZipParser
-
- getConf() - Method in class org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
-
- getConf() - Method in class org.apache.nutch.parsefilter.regex.RegexParseFilter
-
- getConf() - Method in class org.apache.nutch.protocol.file.File
-
- getConf() - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- getConf() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getConf() - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
-
- getConf() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
- getConf() - Method in class org.apache.nutch.protocol.RobotRulesParser
-
- getConf() - Method in class org.apache.nutch.publisher.NutchPublishers
-
- getConf() - Method in class org.apache.nutch.publisher.rabbitmq.RabbitMQPublisherImpl
-
- getConf() - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- getConf() - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- getConf() - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- getConf() - Method in class org.apache.nutch.scoring.similarity.SimilarityScoringFilter
-
- getConf() - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- getConf() - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
Boilerplate
- getConf() - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- getConf() - Method in class org.apache.nutch.urlfilter.domain.DomainURLFilter
-
- getConf() - Method in class org.apache.nutch.urlfilter.domainblacklist.DomainBlacklistURLFilter
-
- getConf() - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
-
- getConf() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- getConf() - Method in class org.apache.nutch.urlfilter.validator.UrlValidator
-
- getConf() - Method in class org.apache.nutch.util.GenericWritableConfigurable
-
- getConf() - Method in class org.creativecommons.nutch.CCIndexingFilter
-
- getConf() - Method in class org.creativecommons.nutch.CCParseFilter
-
- getConfId() - Method in class org.apache.nutch.service.model.request.DbQuery
-
- getConfId() - Method in class org.apache.nutch.service.model.request.JobConfig
-
- getConfId() - Method in class org.apache.nutch.service.model.response.JobInfo
-
- getConfId() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getConfId() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getConfig(String) - Method in class org.apache.nutch.service.resources.ConfigResource
-
Get configuration properties
- getConfigId() - Method in class org.apache.nutch.service.model.request.NutchConfig
-
- getConfigs() - Method in class org.apache.nutch.service.resources.ConfigResource
-
Returns a list of all configurations created.
- getConfiguration() - Method in class org.apache.nutch.service.model.response.NutchServerInfo
-
- getConfiguration() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getConfManager() - Method in class org.apache.nutch.service.NutchServer
-
- getConnectionFailures() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getConnectionSource() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- getConnectionStatus() - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getConnectionStatus() - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getConnectionStatus() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getConnectionStatus(Long) - Method in class org.apache.nutch.webui.service.impl.NutchServiceImpl
-
- getConnectionStatus(Long) - Method in interface org.apache.nutch.webui.service.NutchService
-
- getContent() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the full content of the response.
- getContent() - Method in class org.apache.nutch.protocol.Content
-
The binary content retrieved.
- getContent() - Method in class org.apache.nutch.protocol.file.FileResponse
-
- getContent() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
-
- getContent() - Method in class org.apache.nutch.protocol.htmlunit.HttpResponse
-
- getContent() - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getContent() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getContent() - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- getContent() - Method in class org.apache.nutch.protocol.selenium.HttpResponse
-
- getContentMeta() - Method in class org.apache.nutch.parse.ParseData
-
The original Metadata retrieved from content
- getContentType() - Method in exception org.apache.nutch.parse.ParserNotFound
-
- getContentType() - Method in class org.apache.nutch.protocol.Content
-
The media type of the retrieved content.
- getCookiePolicy() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- getCookies() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthentication
-
- getCopyMap() - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- getCountryName() - Method in class org.apache.nutch.util.domain.TopLevelDomain
-
Returns the country name if TLD is Country Code TLD
- getCrawlId() - Method in class org.apache.nutch.service.model.request.DbQuery
-
- getCrawlId() - Method in class org.apache.nutch.service.model.request.JobConfig
-
- getCrawlId() - Method in class org.apache.nutch.service.model.response.JobInfo
-
- getCrawlId() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getCrawlId() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getCrawlId() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getCrawlName() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getCrawls() - Method in interface org.apache.nutch.webui.service.CrawlService
-
- getCrawls() - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- getCreatedDaos() - Method in class org.apache.nutch.webui.config.CustomDaoFactory
-
- getCredentials() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication
-
Gets the credentials generated by the HttpAuthentication object.
- getCredentials() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
Gets the Basic credentials generated by this HttpBasicAuthentication object
- getCurrentInstance() - Method in class org.apache.nutch.webui.pages.AbstractBasePage
-
- getCurrentNode() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Get the node currently being processed.
- getCurrentNode() - Method in class org.apache.nutch.util.NodeWalker
-
Return the current node.
- getDaoFactory() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- getData() - Method in interface org.apache.nutch.parse.Parse
-
Other data extracted from the page.
- getData() - Method in class org.apache.nutch.parse.ParseImpl
-
- getDatum() - Method in class org.apache.nutch.fetcher.FetchItem
-
- getDependencies() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of plugin ids.
- getDescriptor() - Method in class org.apache.nutch.plugin.Extension
-
return the plugin descriptor.
- getDescriptor() - Method in class org.apache.nutch.plugin.Plugin
-
Returns the plugin descriptor
- getDnsFailures() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getDocumentMeta() - Method in class org.apache.nutch.indexer.NutchDocument
-
- getDom(InputStream) - Static method in class org.apache.nutch.util.DomUtil
-
Returns parsed dom tree or null if any error
- getDomain() - Method in class org.apache.nutch.util.domain.DomainSuffix
-
- getDomainName(URL) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the domain name of the url.
- getDomainName(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the domain name of the url.
- getDomainSuffix(URL) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the
DomainSuffix
corresponding to the last public part of
the hostname
- getDomainSuffix(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the
DomainSuffix
corresponding to the last public part of
the hostname
- getDriverForPage(String, Configuration) - Static method in class org.apache.nutch.protocol.htmlunit.HtmlUnitWebDriver
-
- getDriverForPage(String, Configuration) - Static method in class org.apache.nutch.protocol.selenium.HttpWebClient
-
- getElement(DocumentFragment, String) - Method in class org.apache.nutch.parse.headings.HeadingsParseFilter
-
Finds the specified element and returns its value
- getEmptyParse(Configuration) - Method in class org.apache.nutch.parse.ParseStatus
-
A convenience method.
- getEmptyParseResult(String, Configuration) - Method in class org.apache.nutch.parse.ParseStatus
-
A convenience method.
- getEventData() - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Get event data
- getEventType() - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Get type of this event object
- getExemptions() - Method in class org.apache.nutch.urlfilter.ignoreexempt.ExemptionUrlFilter
-
- getExitValue() - Method in class org.apache.nutch.util.CommandRunner
-
- getExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array exported librareis as URLs
- getExtensionInstance() - Method in class org.apache.nutch.plugin.Extension
-
Return an instance of the extension implementatio.
- getExtensionPoint(String) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns a extension point indentified by a extension point id.
- getExtensions(String) - Method in class org.apache.nutch.parse.ParserFactory
-
Finds the best-suited parse plugin for a given contentType.
- getExtensions() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns a array of extensions that lsiten to this extension point
- getExtensions() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns an array of extensions.
- getExtenstionPoints() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of extension points.
- getFetched() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getFetchInterval() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getFetchItem() - Method in class org.apache.nutch.fetcher.FetchItemQueue
-
- getFetchItem() - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- getFetchItemQueue(String) - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- getFetchNodeDb() - Method in class org.apache.nutch.fetcher.FetchNodeDb
-
- getFetchNodeDb() - Method in class org.apache.nutch.service.NutchServer
-
- getFetchSchedule(Configuration) - Static method in class org.apache.nutch.crawl.FetchScheduleFactory
-
Return the FetchSchedule implementation.
- getFetchTime() - Method in class org.apache.nutch.crawl.CrawlDatum
-
Returns either the time of the last fetch, or the next fetch time,
depending on whether Fetcher or CrawlDbReducer set the time.
- getFetchTime() - Method in class org.apache.nutch.fetcher.FetchNode
-
- getField(String) - Method in class org.apache.nutch.indexer.NutchDocument
-
- getFieldName() - Method in class org.apache.nutch.indexer.replace.FieldReplacer
-
- getFieldNames() - Method in class org.apache.nutch.indexer.NutchDocument
-
- getFieldValue(String) - Method in class org.apache.nutch.indexer.NutchDocument
-
- getFromUrl() - Method in class org.apache.nutch.crawl.Inlink
-
- getGeneralTags() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Returns all collected values of the general meta tags.
- getGone() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getHeader(String) - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the value of a named header.
- getHeader(String) - Method in class org.apache.nutch.protocol.file.FileResponse
-
Returns the value of a named header.
- getHeader(String) - Method in class org.apache.nutch.protocol.ftp.FtpResponse
-
Returns the value of a named header.
- getHeader(String) - Method in class org.apache.nutch.protocol.htmlunit.HttpResponse
-
- getHeader(String) - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getHeader(String) - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getHeader(String) - Method in class org.apache.nutch.protocol.selenium.HttpResponse
-
- getHeaders() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns all the headers.
- getHeaders() - Method in class org.apache.nutch.protocol.htmlunit.HttpResponse
-
- getHeaders() - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getHeaders() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getHeaders() - Method in class org.apache.nutch.protocol.selenium.HttpResponse
-
- getHomePage() - Method in class org.apache.nutch.webui.NutchUiApplication
-
- getHomepageUrl() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getHost(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the lowercased hostname for the url or null if the url is not well
formed.
- getHost() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getHostname(Configuration) - Static method in class org.apache.nutch.tools.WARCUtils
-
- getHostSegments(URL) - Static method in class org.apache.nutch.util.URLUtil
-
Partitions of the hostname of the url by "."
- getHostSegments(String) - Static method in class org.apache.nutch.util.URLUtil
-
Partitions of the hostname of the url by "."
- getHTMLContent(WebDriver, Configuration) - Static method in class org.apache.nutch.protocol.htmlunit.HtmlUnitWebDriver
-
- getHTMLContent(WebDriver, Configuration) - Static method in class org.apache.nutch.protocol.selenium.HttpWebClient
-
- getHtmlPage(String, Configuration) - Static method in class org.apache.nutch.protocol.htmlunit.HtmlUnitWebDriver
-
Function for obtaining the HTML BODY using the selected
org.openqa.selenium.WebDriver
.
- getHtmlPage(String, Configuration) - Static method in class org.apache.nutch.protocol.selenium.HttpWebClient
-
Function for obtaining the HTML BODY using the selected
org.openqa.selenium.WebDriver
.
- getHtmlPage(String) - Static method in class org.apache.nutch.protocol.selenium.HttpWebClient
-
- getHttpEquivTags() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Returns all collected values of the "http-equiv" meta tags.
- getHttpSolrClient(String) - Static method in class org.apache.nutch.indexwriter.solr.SolrUtils
-
- getId() - Method in class org.apache.nutch.collection.Subcollection
-
- getId() - Method in class org.apache.nutch.plugin.Extension
-
Return the unique id of the extension.
- getId() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns the unique id of the extension point.
- getId() - Method in class org.apache.nutch.service.model.request.SeedList
-
- getId() - Method in class org.apache.nutch.service.model.request.SeedUrl
-
- getId() - Method in class org.apache.nutch.service.model.response.JobInfo
-
- getId() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getId() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getId() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getId() - Method in class org.apache.nutch.webui.model.SeedList
-
- getId() - Method in class org.apache.nutch.webui.model.SeedUrl
-
- getImported() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getInfo() - Method in class org.apache.nutch.service.impl.JobWorker
-
- getInfo(String) - Method in class org.apache.nutch.service.impl.NutchServerPoolExecutor
-
- getInfo(String, String) - Method in class org.apache.nutch.service.resources.JobResource
-
Get job info
- getInlinks(Text) - Method in class org.apache.nutch.crawl.LinkDbReader
-
- getInLinks() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getInLinks() - Method in interface org.apache.nutch.tools.CommonCrawlFormat
-
gets set of inlinks
- getInlinkScore() - Method in class org.apache.nutch.scoring.webgraph.Node
-
- getInProgressSize() - Method in class org.apache.nutch.fetcher.FetchItemQueue
-
- getInstance() - Static method in class org.apache.nutch.fetcher.FetchNodeDb
-
- getInstance(Configuration) - Static method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- getInstance() - Static method in class org.apache.nutch.service.NutchServer
-
- getInstance() - Static method in class org.apache.nutch.util.domain.DomainSuffixes
-
Singleton instance, lazy instantination
- getInstance(Long) - Method in class org.apache.nutch.webui.service.impl.NutchInstanceServiceImpl
-
- getInstance(Long) - Method in interface org.apache.nutch.webui.service.NutchInstanceService
-
- getInstances() - Method in class org.apache.nutch.webui.config.NutchGuiConfiguration
-
- getInstances() - Method in class org.apache.nutch.webui.service.impl.NutchInstanceServiceImpl
-
- getInstances() - Method in interface org.apache.nutch.webui.service.NutchInstanceService
-
- getIPAddress(Configuration) - Static method in class org.apache.nutch.tools.WARCUtils
-
- getJobClassName() - Method in class org.apache.nutch.service.model.request.JobConfig
-
- getJobClassName() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getJobConfig() - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- getJobHistory() - Method in class org.apache.nutch.service.impl.NutchServerPoolExecutor
-
Gives the Job history
- getJobInfo(String) - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getJobInfo() - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- getJobInfo(String) - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getJobManager() - Method in class org.apache.nutch.service.NutchServer
-
- getJobRunning() - Method in class org.apache.nutch.service.impl.NutchServerPoolExecutor
-
Gives the list of currently running jobs
- getJobs() - Method in class org.apache.nutch.service.model.response.NutchServerInfo
-
- getJobs(String) - Method in class org.apache.nutch.service.resources.JobResource
-
Get job history
- getJobs() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getJsonArray() - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- getJsonData(String, Content, Metadata) - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getJsonData(String, Content, Metadata, ParseData) - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getJsonData() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getJsonData() - Method in interface org.apache.nutch.tools.CommonCrawlFormat
-
- getJsonData(String, Content, Metadata) - Method in interface org.apache.nutch.tools.CommonCrawlFormat
-
Returns a string representation of the JSON structure of the URL content
- getJsonData(String, Content, Metadata, ParseData) - Method in interface org.apache.nutch.tools.CommonCrawlFormat
-
Returns a string representation of the JSON structure of the URL content
takes into account the parsed metadata about the URL
- getJsonData(String, Content, Metadata, ParseData) - Method in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- getJsonData() - Method in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- getKey() - Method in class org.apache.nutch.collection.Subcollection
-
- getKey() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getKeyMap() - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- getKeyPrefix() - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- getL2Norm() - Method in class org.apache.nutch.scoring.similarity.cosine.DocVector
-
- getLastCheck() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getLastModified() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getLinks() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
-
- getLinkType() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- getLoginFormId() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- getLoginPostData() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- getLoginUrl() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- getMajorCode() - Method in class org.apache.nutch.parse.ParseStatus
-
- getMaxContent() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getMessage() - Method in class org.apache.nutch.parse.ParseStatus
-
A convenience method.
- getMessage() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getMeta(String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get metadata.
- getMeta(String) - Method in class org.apache.nutch.parse.ParseData
-
Get a metadata single value.
- getMetaData() - Method in class org.apache.nutch.crawl.CrawlDatum
-
returns a MapWritable if it was set or read in @see readFields(DataInput),
returns empty map in case CrawlDatum was freshly created (lazily
instantiated).
- getMetaData() - Method in class org.apache.nutch.hostdb.HostDatum
-
returns a MapWritable if it was set or read in @see readFields(DataInput),
returns empty map in case CrawlDatum was freshly created (lazily instantiated).
- getMetadata() - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get all metadata.
- getMetadata() - Method in class org.apache.nutch.parse.Outlink
-
- getMetadata() - Method in class org.apache.nutch.protocol.Content
-
Other protocol-specific data.
- getMetadata() - Method in class org.apache.nutch.scoring.webgraph.Node
-
- getMetaTags(HTMLMetaTags, Node, URL) - Static method in class org.apache.nutch.parse.html.HTMLMetaProcessor
-
Sets the indicators in robotsMeta
to appropriate values, based
on any META tags found under the given node
.
- getMetaTags(HTMLMetaTags, Node, URL) - Static method in class org.apache.nutch.parse.tika.HTMLMetaProcessor
-
Sets the indicators in robotsMeta
to appropriate values, based
on any META tags found under the given node
.
- getMetaValues(String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get multiple metadata.
- getMethod() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getMimeType(String) - Method in class org.apache.nutch.util.MimeUtil
-
Facade interface to Tika's underlying MimeTypes.getMimeType(String)
method.
- getMimeType(File) - Method in class org.apache.nutch.util.MimeUtil
-
Facade interface to Tika's underlying MimeTypes.getMimeType(File)
method.
- getMinorCode() - Method in class org.apache.nutch.parse.ParseStatus
-
- getModifiedTime() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getMsg() - Method in class org.apache.nutch.service.model.response.JobInfo
-
- getMsg() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getName() - Method in class org.apache.nutch.collection.Subcollection
-
- getName() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns the name of the extension point.
- getName() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the name of the plugin.
- getName() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getName() - Method in class org.apache.nutch.service.model.request.SeedList
-
- getName() - Method in class org.apache.nutch.webui.model.NutchConfig
-
- getName() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getName() - Method in class org.apache.nutch.webui.model.SeedList
-
- getNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNode() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
-
- getNodeValue(Node) - Static method in class org.apache.nutch.parse.headings.HeadingsParseFilter
-
Returns the text value of the specified Node and child nodes
- getNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNormalizedName(String) - Static method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
Get the normalized name of metadata attribute name.
- getNotExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of libraries as URLs that are not exported by the plugin.
- getNotModified() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getNumberOfRounds() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getNumInlinks() - Method in class org.apache.nutch.scoring.webgraph.Node
-
- getNumOfOutlinks() - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- getNumOutlinks() - Method in class org.apache.nutch.scoring.webgraph.Node
-
- getNutchConfig(String) - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getNutchConfig(String) - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getNutchConfig(Long) - Method in class org.apache.nutch.webui.service.impl.NutchServiceImpl
-
- getNutchConfig(Long) - Method in interface org.apache.nutch.webui.service.NutchService
-
- getNutchInstance() - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getNutchInstance() - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getNutchStatus() - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getNutchStatus() - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getNutchStatus(Long) - Method in class org.apache.nutch.webui.service.impl.NutchServiceImpl
-
- getNutchStatus(Long) - Method in interface org.apache.nutch.webui.service.NutchService
-
- getObject(String) - Method in class org.apache.nutch.util.ObjectCache
-
- getOrderedPlugins(Class<?>, String, String) - Method in class org.apache.nutch.plugin.PluginRepository
-
Get ordered list of plugins.
- getOutlinks() - Method in class org.apache.nutch.fetcher.FetchNode
-
- getOutlinks(URL, ArrayList<Outlink>, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
This method finds all anchors below the supplied DOM
node
, and
creates appropriate
Outlink
records for each (relative to the
supplied
base
URL), and adds them to the
outlinks
ArrayList
.
- getOutlinks(String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor
-
Extracts Outlink
from given plain text.
- getOutlinks(String, String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor
-
Extracts Outlink
from given plain text and adds anchor to the
extracted Outlink
s
- getOutlinks() - Method in class org.apache.nutch.parse.ParseData
-
The outlinks of the page.
- getOutlinks(URL, ArrayList<Outlink>, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
This method finds all anchors below the supplied DOM
node
, and
creates appropriate
Outlink
records for each (relative to the
supplied
base
URL), and adds them to the
outlinks
ArrayList
.
- getOutlinks(URL, ArrayList<Outlink>, List<Link>) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
- getOutlinkScore() - Method in class org.apache.nutch.scoring.webgraph.Node
-
- getOutputDir() - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- getPage(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the page for the url.
- getParams() - Method in class org.apache.nutch.service.model.request.NutchConfig
-
- getParse(Content) - Method in class org.apache.nutch.parse.ext.ExtParser
-
- getParse(Content) - Method in class org.apache.nutch.parse.feed.FeedParser
-
Parses the given feed and extracts out and parsers all linked items within
the feed, using the underlying ROME feed parsing library.
- getParse(Content) - Method in class org.apache.nutch.parse.html.HtmlParser
-
- getParse(Content) - Method in class org.apache.nutch.parse.js.JSParseFilter
-
- getParse(Content) - Method in interface org.apache.nutch.parse.Parser
-
This method parses the given content and returns a map of <key,
parse> pairs.
- getParse(Content) - Method in class org.apache.nutch.parse.swf.SWFParser
-
- getParse(Content) - Method in class org.apache.nutch.parse.tika.TikaParser
-
- getParse(Content) - Method in class org.apache.nutch.parse.zip.ZipParser
-
- getParseMeta() - Method in class org.apache.nutch.parse.ParseData
-
Other content properties.
- getParserById(String) - Method in class org.apache.nutch.parse.ParserFactory
-
Function returns a
Parser
instance with the specified
extId
, representing its extension ID.
- getParsers(String, String) - Method in class org.apache.nutch.parse.ParserFactory
-
Function returns an array of
Parser
s for a given content type.
- getPartition(FloatWritable, Writable, int) - Method in class org.apache.nutch.crawl.Generator.Selector
-
Partition by host / domain or IP.
- getPartition(Text, Writable, int) - Method in class org.apache.nutch.crawl.URLPartitioner
-
Hash by domain name.
- getPassAllFilter() - Static method in class org.apache.nutch.util.HadoopFSUtil
-
Returns PathFilter that passes all paths through.
- getPassDirectoriesFilter(FileSystem) - Static method in class org.apache.nutch.util.HadoopFSUtil
-
Returns PathFilter that passes directories through.
- getPassword() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getPath() - Method in class org.apache.nutch.service.model.request.ReaderConfig
-
- getPaths(FileStatus[]) - Static method in class org.apache.nutch.util.HadoopFSUtil
-
Turns an array of FileStatus into an array of Paths.
- getPattern() - Method in class org.apache.nutch.indexer.replace.FieldReplacer
-
- getPluginClass() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the fully qualified name of the class which implements the abstarct
Plugin
class.
- getPluginDescriptor(String) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns the descriptor of one plugin identified by a plugin id.
- getPluginDescriptors() - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns all registed plugin descriptors.
- getPluginFolder(String) - Method in class org.apache.nutch.plugin.PluginManifestParser
-
Return the named plugin folder.
- getPluginId() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the unique identifier of the plug-in or null
.
- getPluginInstance(PluginDescriptor) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns a instance of a plugin.
- getPluginPath() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the directory path of the plugin.
- getPort() - Method in class org.apache.nutch.service.NutchServer
-
- getPort() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getPos() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Returns the current position in the file.
- getProgress() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Returns the percentage of progress in processing the file.
- getProgress() - Method in class org.apache.nutch.util.NutchTool
-
Returns relative progress of the tool, a float in range [0,1].
- getProgress() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getProperty(String, String) - Method in class org.apache.nutch.service.resources.ConfigResource
-
Get property
- getProtocol(String) - Method in class org.apache.nutch.protocol.ProtocolFactory
-
Returns the appropriate
Protocol
implementation for a url.
- getProtocol(String) - Static method in class org.apache.nutch.util.URLUtil
-
- getProtocol(URL) - Static method in class org.apache.nutch.util.URLUtil
-
- getProtocolOutput(String, CrawlDatum) - Method in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- getProtocolOutput(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.file.File
-
- getProtocolOutput(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- getProtocolOutput(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getProtocolOutput(Text, CrawlDatum) - Method in interface org.apache.nutch.protocol.Protocol
-
Returns the
Content
for a fetchlist entry.
- getProviderName() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
- getProxyHost() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getProxyPort() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getQueueCount() - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- getQueueID() - Method in class org.apache.nutch.fetcher.FetchItem
-
- getQueueSize() - Method in class org.apache.nutch.fetcher.FetchItemQueue
-
- getRealm() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication
-
Gets the realm used by the HttpAuthentication object during creation.
- getRealm() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
Gets the realm attribute of the HttpBasicAuthentication object.
- getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.segment.ContentAsTextInputFormat
-
- getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.segment.SegmentMerger.ObjectInputFormat
-
- getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.tools.arc.ArcInputFormat
-
Returns the RecordReader
for reading the arc file.
- getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat
-
- getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.fetcher.FetcherOutputFormat
-
- getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.indexer.IndexerOutputFormat
-
- getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.parse.ParseOutputFormat
-
- getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.segment.SegmentMerger.SegmentOutputFormat
-
- getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.segment.SegmentReader.TextOutputFormat
-
- getRedirPerm() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getRedirTemp() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getRefresh() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getRefreshHref() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getRefreshTime() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getRemovedFormFields() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- getReplacement() - Method in class org.apache.nutch.indexer.replace.FieldReplacer
-
- getReprUrl() - Method in class org.apache.nutch.fetcher.FetcherThread
-
- getRequestAccept() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestAcceptEncoding() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestAcceptLanguage() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestContactEmail() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestContactName() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestHostAddress() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestHostName() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestRobots() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestSoftware() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getRequestUserAgent() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResourceString(String, Locale) - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a I18N'd resource string.
- getResponse(URL, CrawlDatum, boolean) - Method in class org.apache.nutch.protocol.htmlunit.Http
-
- getResponse(URL, CrawlDatum, boolean) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getResponse(URL, CrawlDatum, boolean) - Method in class org.apache.nutch.protocol.http.Http
-
- getResponse(URL, CrawlDatum, boolean) - Method in class org.apache.nutch.protocol.httpclient.Http
-
Fetches the url
with a configured HTTP client and gets the
response.
- getResponse(URL, CrawlDatum, boolean) - Method in class org.apache.nutch.protocol.selenium.Http
-
- getResponseAddress() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResponseContent() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResponseContentEncoding() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResponseContentType() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResponseDate() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResponseHostName() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResponseServer() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResponseStatus() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getResult() - Method in class org.apache.nutch.service.model.response.JobInfo
-
- getResult() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getRetriesSinceFetch() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getReversedHost(String) - Static method in class org.apache.nutch.util.TableUtil
-
Given a reversed url, returns the reversed host E.g
"com.foo.bar:http:8983/to/index.html?a=b" -> "com.foo.bar"
- getReverseKey() - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- getReverseKeyValue() - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- getRobotRules(Text, CrawlDatum, List<Content>) - Method in class org.apache.nutch.protocol.file.File
-
No robots parsing is done for file protocol.
- getRobotRules(Text, CrawlDatum, List<Content>) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Get the robots rules for a given url
- getRobotRules(Text, CrawlDatum, List<Content>) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getRobotRules(Text, CrawlDatum, List<Content>) - Method in interface org.apache.nutch.protocol.Protocol
-
Retrieve robot rules applicable for this URL.
- getRobotRulesSet(Protocol, URL, List<Content>) - Method in class org.apache.nutch.protocol.ftp.FtpRobotRulesParser
-
The hosts for which the caching of robots rules is yet to be done, it sends
a Ftp request to the host corresponding to the
URL
passed, gets
robots file, parses the rules and caches the rules object to avoid re-work
in future.
- getRobotRulesSet(Protocol, URL, List<Content>) - Method in class org.apache.nutch.protocol.http.api.HttpRobotRulesParser
-
Get the rules from robots.txt which applies for the given url
.
- getRobotRulesSet(Protocol, Text, List<Content>) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
Fetch robots.txt (or it's protocol-specific equivalent) which applies to
the given URL, parse it and return the set of robot rules applicable for
the configured agent name(s).
- getRobotRulesSet(Protocol, URL, List<Content>) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
Fetch robots.txt (or it's protocol-specific equivalent) which applies to
the given URL, parse it and return the set of robot rules applicable for
the configured agent name(s).
- getRootNode() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Get the root node of the DOM being created.
- getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
Returns the name of the file of rules to use for a particular
implementation.
- getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
Rules specified as a config property will override rules specified as a
config file.
- getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.ignoreexempt.ExemptionUrlFilter
-
Gets reader for regex rules
- getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
-
Rules specified as a config property will override rules specified as a
config file.
- getRunningJobs() - Method in class org.apache.nutch.service.model.response.NutchServerInfo
-
- getRunningJobs() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getRuns() - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
-
- getSchema() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns a path to the xml schema of a extension point.
- getScopedRules() - Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
- getScore() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getScore() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getScore() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- getSeedDirectory() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getSeedFilePath() - Method in class org.apache.nutch.service.model.request.SeedList
-
- getSeedList(String) - Method in class org.apache.nutch.service.impl.SeedManagerImpl
-
- getSeedList() - Method in class org.apache.nutch.service.model.request.SeedUrl
-
- getSeedList(String) - Method in interface org.apache.nutch.service.SeedManager
-
- getSeedList() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getSeedList() - Method in class org.apache.nutch.webui.model.SeedUrl
-
- getSeedList(Long) - Method in class org.apache.nutch.webui.service.impl.SeedListServiceImpl
-
- getSeedList(Long) - Method in interface org.apache.nutch.webui.service.SeedListService
-
- getSeedLists() - Method in class org.apache.nutch.service.resources.SeedResource
-
Gets the list of seedFiles already created
- getSeedManager() - Method in class org.apache.nutch.service.NutchServer
-
- getSeeds() - Method in class org.apache.nutch.service.impl.SeedManagerImpl
-
- getSeeds() - Method in interface org.apache.nutch.service.SeedManager
-
- getSeedUrls() - Method in class org.apache.nutch.service.model.request.SeedList
-
- getSeedUrls() - Method in class org.apache.nutch.webui.model.SeedList
-
- getSeedUrlsCount() - Method in class org.apache.nutch.service.model.request.SeedList
-
- getSeedUrlsCount() - Method in class org.apache.nutch.webui.model.SeedList
-
- getServerStatus() - Method in class org.apache.nutch.service.resources.AdminResource
-
To get the status of the Nutch Server
- getSignature() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getSignature(Configuration) - Static method in class org.apache.nutch.crawl.SignatureFactory
-
Return the default Signature implementation.
- getSimpleDateFormat() - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- getSolrClients(JobConf) - Static method in class org.apache.nutch.indexwriter.solr.SolrUtils
-
- getSplits(JobConf, int) - Method in class org.apache.nutch.fetcher.Fetcher.InputFormat
-
Don't split inputs, to keep things polite.
- getStages() - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
-
- getStartDate() - Method in class org.apache.nutch.service.model.response.NutchServerInfo
-
- getStartDate() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getStarted() - Method in class org.apache.nutch.service.NutchServer
-
- getState() - Method in class org.apache.nutch.service.model.response.JobInfo
-
- getState() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getStats(Path, SegmentReader.SegmentReaderStats) - Method in class org.apache.nutch.segment.SegmentReader
-
- getStatus() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getStatus() - Method in class org.apache.nutch.fetcher.FetchNode
-
- getStatus() - Method in class org.apache.nutch.parse.ParseData
-
The status of parsing the page.
- getStatus() - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- getStatus() - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- getStatus() - Method in class org.apache.nutch.util.domain.DomainSuffix
-
- getStatus() - Method in class org.apache.nutch.util.NutchTool
-
Returns current status of the running tool.
- getStatus() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getStatusName(byte) - Static method in class org.apache.nutch.crawl.CrawlDatum
-
- getSubColection(String) - Method in class org.apache.nutch.collection.CollectionManager
-
Returns named subcollection
- getSubCollections(String) - Method in class org.apache.nutch.collection.CollectionManager
-
Return names of collections url is part of
- getSystemName() - Method in class org.apache.nutch.protocol.ftp.Client
-
Fetches the system type name from the server and returns the string.
- getTargetPoint() - Method in class org.apache.nutch.plugin.Extension
-
Returns the Id of the extension point, that is implemented by this
extension.
- getText(StringBuffer, Node, boolean) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
This method takes a
StringBuffer
and a DOM
Node
, and will
append all the content text found beneath the DOM node to the
StringBuffer
.
- getText(StringBuffer, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
- getText() - Method in interface org.apache.nutch.parse.Parse
-
The textual content of the page.
- getText() - Method in class org.apache.nutch.parse.ParseImpl
-
- getText() - Method in class org.apache.nutch.parse.ParseText
-
- getText(StringBuffer, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
- getThrownError() - Method in class org.apache.nutch.util.CommandRunner
-
- getTimeout() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getTimeout() - Method in class org.apache.nutch.util.CommandRunner
-
- getTimeout() - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- getTimestamp() - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Get timestamp of current event.
- getTimestamp() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- getTimestamp() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getTitle() - Method in class org.apache.nutch.fetcher.FetchNode
-
- getTitle(StringBuffer, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
This method takes a
StringBuffer
and a DOM
Node
, and will
append the content text found beneath the first
title
node to
the
StringBuffer
.
- getTitle() - Method in class org.apache.nutch.parse.ParseData
-
The title of the page.
- getTitle(StringBuffer, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
This method takes a
StringBuffer
and a DOM
Node
, and will
append the content text found beneath the first
title
node to
the
StringBuffer
.
- getTlsPreferredCipherSuites() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getTlsPreferredProtocols() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getToFieldName() - Method in class org.apache.nutch.indexer.replace.FieldReplacer
-
- getTokenStream() - Method in class org.apache.nutch.scoring.similarity.util.LuceneTokenizer
-
Returns the tokenStream created by the Tokenizer
- getTopLevelDomainName(URL) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the top level domain name of the url.
- getTopLevelDomainName(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the top level domain name of the url.
- getTotalSize() - Method in class org.apache.nutch.fetcher.FetchItemQueues
-
- getToUrl() - Method in class org.apache.nutch.parse.Outlink
-
- getType() - Method in class org.apache.nutch.service.model.request.DbQuery
-
- getType() - Method in class org.apache.nutch.service.model.request.JobConfig
-
- getType() - Method in class org.apache.nutch.service.model.response.JobInfo
-
- getType() - Method in class org.apache.nutch.util.domain.TopLevelDomain
-
- getType() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getType() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getTypes() - Method in class org.apache.nutch.crawl.NutchWritable
-
- getUnfetched() - Method in class org.apache.nutch.hostdb.HostDatum
-
- getUniqueKey() - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- getUrl() - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Get URL of this event
- getUrl() - Method in class org.apache.nutch.fetcher.FetchItem
-
- getUrl() - Method in class org.apache.nutch.fetcher.FetchNode
-
- getUrl() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the URL used to retrieve this response.
- getUrl() - Method in exception org.apache.nutch.parse.ParserNotFound
-
- getUrl() - Method in class org.apache.nutch.protocol.Content
-
The url fetched.
- getUrl() - Method in class org.apache.nutch.protocol.htmlunit.HttpResponse
-
- getUrl() - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getUrl() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getUrl() - Method in exception org.apache.nutch.protocol.ProtocolNotFound
-
- getUrl() - Method in class org.apache.nutch.protocol.selenium.HttpResponse
-
- getUrl() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- getUrl() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
-
- getUrl() - Method in class org.apache.nutch.service.model.request.SeedUrl
-
- getUrl() - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- getUrl() - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- getUrl() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getUrl() - Method in class org.apache.nutch.webui.model.SeedUrl
-
- getURL2() - Method in class org.apache.nutch.fetcher.FetchItem
-
- getUrlMD5(String) - Static method in class org.apache.nutch.util.DumpFileUtil
-
- getUseHttp11() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getUserAgent() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getUsername() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getUUID(Configuration) - Static method in class org.apache.nutch.util.NutchConfiguration
-
Retrieve a Nutch UUID of this configuration object, or null if the
configuration was created elsewhere.
- getValue() - Method in class org.apache.nutch.webui.model.NutchConfig
-
- getValues() - Method in class org.apache.nutch.indexer.NutchField
-
- getValues(String) - Method in class org.apache.nutch.metadata.Metadata
-
Get the values associated to a metadata name.
- getValues(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- getVersion() - Method in class org.apache.nutch.parse.ParseData
-
- getVersion() - Method in class org.apache.nutch.parse.ParseStatus
-
- getVersion() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
- getWaitForExit() - Method in class org.apache.nutch.util.CommandRunner
-
- getWARCInfoContent(Configuration) - Static method in class org.apache.nutch.tools.WARCUtils
-
- getWarcSize() - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- getWeight() - Method in class org.apache.nutch.indexer.NutchDocument
-
- getWeight() - Method in class org.apache.nutch.indexer.NutchField
-
- getWhiteList() - Method in class org.apache.nutch.collection.Subcollection
-
Returns whitelist
- getWhiteListString() - Method in class org.apache.nutch.collection.Subcollection
-
Returns whitelist String
- getWriter() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Return null since there is no Writer for this class.
- gone - Variable in class org.apache.nutch.hostdb.HostDatum
-
- GONE - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Resource is gone.
- guessEncoding(Content, String) - Method in class org.apache.nutch.util.EncodingDetector
-
Guess the encoding with the previously specified list of clues.
- GZIPUtils - Class in org.apache.nutch.util
-
A collection of utility methods for working on GZIPed data.
- GZIPUtils() - Constructor for class org.apache.nutch.util.GZIPUtils
-
- ID_FIELD - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- IDENTIFIER - Static variable in interface org.apache.nutch.metadata.DublinCore
-
Recommended best practice is to identify the resource by means of a string
or number conforming to a formal identification system.
- ignorableWhitespace(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of ignorable whitespace in element content.
- IGNORE_EXTERNAL_LINKS - Static variable in class org.apache.nutch.crawl.LinkDb
-
- IGNORE_INTERNAL_LINKS - Static variable in class org.apache.nutch.crawl.LinkDb
-
- in - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- INC_RATE - Variable in class org.apache.nutch.crawl.AdaptiveFetchSchedule
-
- incConnectionFailures() - Method in class org.apache.nutch.hostdb.HostDatum
-
- incDnsFailures() - Method in class org.apache.nutch.hostdb.HostDatum
-
- incrementExceptionCounter() - Method in class org.apache.nutch.fetcher.FetchItemQueue
-
- index(Path, Path, List<Path>, boolean) - Method in class org.apache.nutch.indexer.IndexingJob
-
- index(Path, Path, List<Path>, boolean, boolean) - Method in class org.apache.nutch.indexer.IndexingJob
-
- index(Path, Path, List<Path>, boolean, boolean, String) - Method in class org.apache.nutch.indexer.IndexingJob
-
- index(Path, Path, List<Path>, boolean, boolean, String, boolean, boolean) - Method in class org.apache.nutch.indexer.IndexingJob
-
- index(Path, Path, List<Path>, boolean, boolean, String, boolean, boolean, boolean) - Method in class org.apache.nutch.indexer.IndexingJob
-
- index(Path, Path, List<Path>, boolean, boolean, String, boolean, boolean, boolean, boolean) - Method in class org.apache.nutch.indexer.IndexingJob
-
- INDEX - Static variable in interface org.apache.nutch.indexwriter.elastic.ElasticConstants
-
- INDEXER_BINARY_AS_BASE64 - Static variable in class org.apache.nutch.indexer.IndexerMapReduce
-
- INDEXER_DELETE - Static variable in class org.apache.nutch.indexer.IndexerMapReduce
-
- INDEXER_DELETE_ROBOTS_NOINDEX - Static variable in class org.apache.nutch.indexer.IndexerMapReduce
-
- INDEXER_DELETE_SKIPPED - Static variable in class org.apache.nutch.indexer.IndexerMapReduce
-
- INDEXER_PARAMS - Static variable in class org.apache.nutch.indexer.IndexerMapReduce
-
- INDEXER_SKIP_NOTMODIFIED - Static variable in class org.apache.nutch.indexer.IndexerMapReduce
-
- IndexerMapReduce - Class in org.apache.nutch.indexer
-
- IndexerMapReduce() - Constructor for class org.apache.nutch.indexer.IndexerMapReduce
-
- IndexerOutputFormat - Class in org.apache.nutch.indexer
-
- IndexerOutputFormat() - Constructor for class org.apache.nutch.indexer.IndexerOutputFormat
-
- indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
Dampen the boost value by scorePower.
- indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
This method calculates a Lucene document boost.
- indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.ScoringFilters
-
- indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
Boilerplate
- IndexingException - Exception in org.apache.nutch.indexer
-
- IndexingException() - Constructor for exception org.apache.nutch.indexer.IndexingException
-
- IndexingException(String) - Constructor for exception org.apache.nutch.indexer.IndexingException
-
- IndexingException(String, Throwable) - Constructor for exception org.apache.nutch.indexer.IndexingException
-
- IndexingException(Throwable) - Constructor for exception org.apache.nutch.indexer.IndexingException
-
- IndexingFilter - Interface in org.apache.nutch.indexer
-
Extension point for indexing.
- INDEXINGFILTER_ORDER - Static variable in class org.apache.nutch.indexer.IndexingFilters
-
- IndexingFilters - Class in org.apache.nutch.indexer
-
- IndexingFilters(Configuration) - Constructor for class org.apache.nutch.indexer.IndexingFilters
-
- IndexingFiltersChecker - Class in org.apache.nutch.indexer
-
Reads and parses a URL and run the indexers on it.
- IndexingFiltersChecker() - Constructor for class org.apache.nutch.indexer.IndexingFiltersChecker
-
- IndexingJob - Class in org.apache.nutch.indexer
-
Generic indexer which relies on the plugins implementing IndexWriter
- IndexingJob() - Constructor for class org.apache.nutch.indexer.IndexingJob
-
- IndexingJob(Configuration) - Constructor for class org.apache.nutch.indexer.IndexingJob
-
- IndexWriter - Interface in org.apache.nutch.indexer
-
- IndexWriters - Class in org.apache.nutch.indexer
-
- IndexWriters(Configuration) - Constructor for class org.apache.nutch.indexer.IndexWriters
-
- inflate(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils
-
Returns an inflated copy of the input array.
- inflateBestEffort(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils
-
Returns an inflated copy of the input array.
- inflateBestEffort(byte[], int) - Static method in class org.apache.nutch.util.DeflateUtils
-
Returns an inflated copy of the input array, truncated to
sizeLimit
bytes, if necessary.
- init() - Method in class org.apache.nutch.collection.CollectionManager
-
- init(Path) - Method in class org.apache.nutch.crawl.LinkDbReader
-
- init() - Method in class org.apache.nutch.webui.NutchUiApplication
-
- initialize(Element) - Method in class org.apache.nutch.collection.Subcollection
-
Initialize Subcollection from dom element
- initializeSchedule(Text, CrawlDatum) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
Initialize fetch schedule related data.
- initializeSchedule(Text, CrawlDatum) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
Initialize fetch schedule related data.
- initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
Set to 0.0f (unknown value) - inlink contributions will bring it to a
correct level.
- initialScore(Text, CrawlDatum) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
Set an initial score for newly discovered pages.
- initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.ScoringFilters
-
Calculate a new initial score, used when adding newly discovered pages.
- initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
Boilerplate
- initMRJob(Path, Path, Collection<Path>, JobConf, boolean) - Static method in class org.apache.nutch.indexer.IndexerMapReduce
-
- initPage(IModel<SeedList>) - Method in class org.apache.nutch.webui.pages.seed.SeedPage
-
- inject(Path, Path) - Method in class org.apache.nutch.crawl.Injector
-
- inject(Path, Path, boolean, boolean) - Method in class org.apache.nutch.crawl.Injector
-
- injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- injectedScore(Text, CrawlDatum) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
Set an initial score for newly injected pages.
- injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.ScoringFilters
-
Calculate a new initial score, used when injecting new pages.
- injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
Boilerplate
- InjectMapper() - Constructor for class org.apache.nutch.crawl.Injector.InjectMapper
-
- Injector - Class in org.apache.nutch.crawl
-
Injector takes a flat file of URLs and merges ("injects") these URLs into the
CrawlDb.
- Injector() - Constructor for class org.apache.nutch.crawl.Injector
-
- Injector(Configuration) - Constructor for class org.apache.nutch.crawl.Injector
-
- Injector.InjectMapper - Class in org.apache.nutch.crawl
-
- Injector.InjectReducer - Class in org.apache.nutch.crawl
-
Combine multiple new entries for a url.
- InjectReducer() - Constructor for class org.apache.nutch.crawl.Injector.InjectReducer
-
- Inlink - Class in org.apache.nutch.crawl
-
- Inlink() - Constructor for class org.apache.nutch.crawl.Inlink
-
- Inlink(String, String) - Constructor for class org.apache.nutch.crawl.Inlink
-
- INLINK - Static variable in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- INLINK_DIR - Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
-
- Inlinks - Class in org.apache.nutch.crawl
-
- Inlinks() - Constructor for class org.apache.nutch.crawl.Inlinks
-
- inLinks - Variable in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- InputCompatMapper() - Constructor for class org.apache.nutch.segment.SegmentReader.InputCompatMapper
-
- InputFormat() - Constructor for class org.apache.nutch.fetcher.Fetcher.InputFormat
-
- install(JobConf, Path) - Static method in class org.apache.nutch.crawl.CrawlDb
-
- install(Job, Path) - Static method in class org.apache.nutch.crawl.CrawlDb
-
- install(JobConf, Path) - Static method in class org.apache.nutch.crawl.LinkDb
-
- instance(JobInfo.JobType) - Static method in class org.apache.nutch.webui.client.impl.RemoteCommandBuilder
-
- instance() - Static method in class org.apache.nutch.webui.pages.assets.NutchUiCssReference
-
- InstancePanel - Class in org.apache.nutch.webui.pages.instances
-
- InstancePanel(String) - Constructor for class org.apache.nutch.webui.pages.instances.InstancePanel
-
- InstancesPage - Class in org.apache.nutch.webui.pages.instances
-
- InstancesPage() - Constructor for class org.apache.nutch.webui.pages.instances.InstancesPage
-
- invert(Path, Path, boolean, boolean, boolean) - Method in class org.apache.nutch.crawl.LinkDb
-
- invert(Path, Path[], boolean, boolean, boolean) - Method in class org.apache.nutch.crawl.LinkDb
-
- Inverter() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
-
- IP - Static variable in class org.apache.nutch.tools.WARCUtils
-
- isCanonical() - Method in interface org.apache.nutch.parse.Parse
-
Indicates if the parse is coming from a url or a sub-url
- isCanonical() - Method in class org.apache.nutch.parse.ParseImpl
-
- isClientTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- isCompressed() - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- isCookieEnabled() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- isDomainSuffix(String) - Method in class org.apache.nutch.util.domain.DomainSuffixes
-
return whether the extension is a registered domain entry
- isEligibleForCheck(HostDatum) - Method in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
Determines whether a record is eligible for recheck.
- isEmpty() - Method in class org.apache.nutch.hostdb.HostDatum
-
- isEmpty() - Method in class org.apache.nutch.parse.ParseResult
-
Checks whether the result is empty.
- isEmpty(String) - Static method in class org.apache.nutch.util.StringUtil
-
Checks if a string is empty (ie is null or empty).
- isExempted(String, String) - Method in class org.apache.nutch.net.URLExemptionFilters
-
Run all defined filters.
- isForce() - Method in class org.apache.nutch.service.model.request.NutchConfig
-
- isHalted() - Method in class org.apache.nutch.fetcher.FetcherThread
-
- isIfModifiedSinceEnabled() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- isIgnoreCase() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- isIndexable(Path, FileSystem) - Static method in class org.apache.nutch.segment.SegmentChecker
-
Check if the segment is indexable.
- isLoginRedirect() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- isMagic(byte[]) - Static method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Returns true if the byte array passed matches the gzip header magic number.
- isModeAccept() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- isModelCreated - Static variable in class org.apache.nutch.scoring.similarity.cosine.Model
-
- isMultiValued(String) - Method in class org.apache.nutch.metadata.Metadata
-
Returns true if named value is multivalued.
- isParsed(Path, FileSystem) - Static method in class org.apache.nutch.segment.SegmentChecker
-
Check the segment to see if it is has been parsed before.
- isParsing(Configuration) - Static method in class org.apache.nutch.fetcher.Fetcher
-
- isPermanentFailure() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- isRedirect() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthentication
-
- isRedirect() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- isRemoteVerificationEnabled() - Method in class org.apache.nutch.protocol.ftp.Client
-
Return whether or not verification of the remote host participating in data
connections is enabled.
- isRunning() - Method in class org.apache.nutch.service.NutchServer
-
- isSameDomainName(URL, URL) - Static method in class org.apache.nutch.util.URLUtil
-
Returns whether the given urls have the same domain name.
- isSameDomainName(String, String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns whether the given urls have the same domain name.
- isServerTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- isStoringContent(Configuration) - Static method in class org.apache.nutch.fetcher.Fetcher
-
- isSuccess() - Method in class org.apache.nutch.parse.ParseResult
-
A convenience method which returns true only if all parses are successful.
- isSuccess() - Method in class org.apache.nutch.parse.ParseStatus
-
A convenience method.
- isSuccess() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- isTransientFailure() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- isTruncated(Content) - Static method in class org.apache.nutch.parse.ParseSegment
-
Checks if the page's content is truncated.
- isValid() - Method in class org.apache.nutch.indexer.replace.FieldReplacer
-
Does this FieldReplacer have a valid fieldname and pattern?
- isWhiteListed(URL) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
Check whether a URL belongs to a whitelisted host.
- isWhiteSpace(char) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
-
Returns whether the specified ch conforms to the XML 1.0
definition of whitespace.
- isWhiteSpace(char[], int, int) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
-
Tell if the string is whitespace.
- isWhiteSpace(StringBuffer) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
-
Tell if the string is whitespace.
- isWhiteSpace(String) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
-
Tell if the string is whitespace.
- iterator() - Method in class org.apache.nutch.crawl.Inlinks
-
- iterator() - Method in class org.apache.nutch.indexer.NutchDocument
-
Iterate over all fields.
- iterator() - Method in class org.apache.nutch.parse.ParseResult
-
Iterate over all entries in the <url, Parse> map.
- LANGUAGE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
A language of the intellectual content of the resource.
- LanguageIndexingFilter - Class in org.apache.nutch.analysis.lang
-
- LanguageIndexingFilter() - Constructor for class org.apache.nutch.analysis.lang.LanguageIndexingFilter
-
Constructs a new Language Indexing Filter.
- LAST_MODIFIED - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- lastCheck - Variable in class org.apache.nutch.hostdb.HostDatum
-
- leftPad(String, int) - Static method in class org.apache.nutch.util.StringUtil
-
Returns a copy of s
padded with leading spaces so that it's
length is length
.
- LICENSE_LOCATION - Static variable in interface org.apache.nutch.metadata.CreativeCommons
-
- LICENSE_URL - Static variable in interface org.apache.nutch.metadata.CreativeCommons
-
- LineRecordWriter(DataOutputStream) - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter
-
- LinkAnalysisScoringFilter - Class in org.apache.nutch.scoring.link
-
- LinkAnalysisScoringFilter() - Constructor for class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- LinkDatum - Class in org.apache.nutch.scoring.webgraph
-
A class for holding link information including the url, anchor text, a score,
the timestamp of the link and a link type.
- LinkDatum() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDatum
-
Default constructor, no url, timestamp, score, or link type.
- LinkDatum(String) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDatum
-
Creates a LinkDatum with a given url.
- LinkDatum(String, String) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDatum
-
Creates a LinkDatum with a url and an anchor text.
- LinkDatum(String, String, long) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDatum
-
- LinkDb - Class in org.apache.nutch.crawl
-
Maintains an inverted link map, listing incoming links for each url.
- LinkDb() - Constructor for class org.apache.nutch.crawl.LinkDb
-
- LinkDb(Configuration) - Constructor for class org.apache.nutch.crawl.LinkDb
-
- LinkDBDumpMapper() - Constructor for class org.apache.nutch.crawl.LinkDbReader.LinkDBDumpMapper
-
- LinkDbFilter - Class in org.apache.nutch.crawl
-
This class provides a way to separate the URL normalization and filtering
steps from the rest of LinkDb manipulation code.
- LinkDbFilter() - Constructor for class org.apache.nutch.crawl.LinkDbFilter
-
- LinkDbMerger - Class in org.apache.nutch.crawl
-
This tool merges several LinkDb-s into one, optionally filtering URLs through
the current URLFilters, to skip prohibited URLs and links.
- LinkDbMerger() - Constructor for class org.apache.nutch.crawl.LinkDbMerger
-
- LinkDbMerger(Configuration) - Constructor for class org.apache.nutch.crawl.LinkDbMerger
-
- LinkDbReader - Class in org.apache.nutch.crawl
-
.
- LinkDbReader() - Constructor for class org.apache.nutch.crawl.LinkDbReader
-
- LinkDbReader(Configuration, Path) - Constructor for class org.apache.nutch.crawl.LinkDbReader
-
- LinkDbReader.LinkDBDumpMapper - Class in org.apache.nutch.crawl
-
- LinkDumper - Class in org.apache.nutch.scoring.webgraph
-
The LinkDumper tool creates a database of node to inlink information that can
be read using the nested Reader class.
- LinkDumper() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper
-
- LinkDumper.Inverter - Class in org.apache.nutch.scoring.webgraph
-
Inverts outlinks from the WebGraph to inlinks and attaches node
information.
- LinkDumper.LinkNode - Class in org.apache.nutch.scoring.webgraph
-
Bean class which holds url to node information.
- LinkDumper.LinkNodes - Class in org.apache.nutch.scoring.webgraph
-
Writable class which holds an array of LinkNode objects.
- LinkDumper.Merger - Class in org.apache.nutch.scoring.webgraph
-
Merges LinkNode objects into a single array value per url.
- LinkDumper.Reader - Class in org.apache.nutch.scoring.webgraph
-
Reader class which will print out the url and all of its inlinks to system
out.
- LinkNode() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
-
- LinkNode(String, Node) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
-
- LinkNodes() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
-
- LinkNodes(LinkDumper.LinkNode[]) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
-
- LinkParams(String, String, int) - Constructor for class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
-
- LinkRank - Class in org.apache.nutch.scoring.webgraph
-
- LinkRank() - Constructor for class org.apache.nutch.scoring.webgraph.LinkRank
-
Default constructor.
- LinkRank(Configuration) - Constructor for class org.apache.nutch.scoring.webgraph.LinkRank
-
Configurable constructor.
- linkRead() - Method in class org.apache.nutch.service.resources.ReaderResouce
-
Get Link Reader response schema
- linkRead(ReaderConfig, int, int, int, boolean) - Method in class org.apache.nutch.service.resources.ReaderResouce
-
Read link object
- LinkReader - Class in org.apache.nutch.service.impl
-
- LinkReader() - Constructor for class org.apache.nutch.service.impl.LinkReader
-
- LINKS_INLINKS_HOST - Static variable in class org.apache.nutch.indexer.links.LinksIndexingFilter
-
- LINKS_ONLY_HOSTS - Static variable in class org.apache.nutch.indexer.links.LinksIndexingFilter
-
- LINKS_OUTLINKS_HOST - Static variable in class org.apache.nutch.indexer.links.LinksIndexingFilter
-
- LinksIndexingFilter - Class in org.apache.nutch.indexer.links
-
An
IndexingFilter
that adds
outlinks
and
inlinks
field(s) to the document.
- LinksIndexingFilter() - Constructor for class org.apache.nutch.indexer.links.LinksIndexingFilter
-
- list(List<Path>, Writer) - Method in class org.apache.nutch.segment.SegmentReader
-
- list() - Method in interface org.apache.nutch.service.ConfManager
-
- list() - Method in class org.apache.nutch.service.impl.ConfManagerImpl
-
- list(String, JobInfo.State) - Method in class org.apache.nutch.service.impl.JobManagerImpl
-
- list(String, JobInfo.State) - Method in interface org.apache.nutch.service.JobManager
-
- listen() - Method in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- LOCATION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- LOCK_NAME - Static variable in class org.apache.nutch.crawl.CrawlDb
-
- LOCK_NAME - Static variable in class org.apache.nutch.crawl.LinkDb
-
- LOCK_NAME - Static variable in class org.apache.nutch.hostdb.UpdateHostDb
-
- LOCK_NAME - Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
-
- LockUtil - Class in org.apache.nutch.util
-
Utility methods for handling application-level locking.
- LockUtil() - Constructor for class org.apache.nutch.util.LockUtil
-
- LOG - Static variable in class org.apache.nutch.crawl.Generator
-
- LOG - Static variable in class org.apache.nutch.plugin.PluginManifestParser
-
- LOG - Static variable in class org.apache.nutch.plugin.PluginRepository
-
- LOG - Static variable in class org.apache.nutch.protocol.file.File
-
- LOG - Static variable in class org.apache.nutch.protocol.ftp.Ftp
-
- LOG - Static variable in class org.apache.nutch.protocol.htmlunit.Http
-
- LOG - Static variable in class org.apache.nutch.protocol.http.Http
-
- LOG - Static variable in class org.apache.nutch.protocol.httpclient.Http
-
- LOG - Static variable in class org.apache.nutch.protocol.selenium.Http
-
- LOG - Static variable in interface org.apache.nutch.service.NutchReader
-
- LOG - Static variable in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- logConf() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- login(String, String) - Method in class org.apache.nutch.protocol.ftp.Client
-
Login to the FTP server using the provided username and password.
- login() - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthentication
-
- logout() - Method in class org.apache.nutch.protocol.ftp.Client
-
Logout of the FTP server by sending the QUIT command.
- LogOutPage - Class in org.apache.nutch.webui.pages
-
- LogOutPage() - Constructor for class org.apache.nutch.webui.pages.LogOutPage
-
- longestMatch(String) - Method in class org.apache.nutch.util.PrefixStringMatcher
-
Returns the longest prefix of input that is matched,
or null if no match exists.
- longestMatch(String) - Method in class org.apache.nutch.util.SuffixStringMatcher
-
Returns the longest suffix of input that is matched,
or null if no match exists.
- longestMatch(String) - Method in class org.apache.nutch.util.TrieStringMatcher
-
Returns the longest substring of input that is
matched by a pattern in the trie, or null if no match
exists.
- LuceneAnalyzerUtil - Class in org.apache.nutch.scoring.similarity.util
-
Creates a custom analyzer based on user provided inputs
- LuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType, boolean) - Constructor for class org.apache.nutch.scoring.similarity.util.LuceneAnalyzerUtil
-
Creates an analyzer instance based on Lucene default stopword set if @param useStopFilter is set to true
- LuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType, List<String>, boolean) - Constructor for class org.apache.nutch.scoring.similarity.util.LuceneAnalyzerUtil
-
Creates an analyzer instance based on user provided stop words.
- LuceneAnalyzerUtil.StemFilterType - Enum in org.apache.nutch.scoring.similarity.util
-
- LuceneTokenizer - Class in org.apache.nutch.scoring.similarity.util
-
- LuceneTokenizer(String, LuceneTokenizer.TokenizerType, boolean, LuceneAnalyzerUtil.StemFilterType) - Constructor for class org.apache.nutch.scoring.similarity.util.LuceneTokenizer
-
Creates a tokenizer based on param values
- LuceneTokenizer(String, LuceneTokenizer.TokenizerType, List<String>, boolean, LuceneAnalyzerUtil.StemFilterType) - Constructor for class org.apache.nutch.scoring.similarity.util.LuceneTokenizer
-
Creates a tokenizer based on param values
- LuceneTokenizer(String, LuceneTokenizer.TokenizerType, LuceneAnalyzerUtil.StemFilterType, int, int) - Constructor for class org.apache.nutch.scoring.similarity.util.LuceneTokenizer
-
Creates a tokenizer for the ngram model based on param values
- LuceneTokenizer.TokenizerType - Enum in org.apache.nutch.scoring.similarity.util
-
- m_currentNode - Variable in class org.apache.nutch.parse.html.DOMBuilder
-
Current node
- m_doc - Variable in class org.apache.nutch.parse.html.DOMBuilder
-
Root document
- m_docFrag - Variable in class org.apache.nutch.parse.html.DOMBuilder
-
First node of document fragment or null if not a DocumentFragment
- m_elemStack - Variable in class org.apache.nutch.parse.html.DOMBuilder
-
Vector of element nodes
- m_inCData - Variable in class org.apache.nutch.parse.html.DOMBuilder
-
Flag indicating that we are processing a CData section
- main(String[]) - Static method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
-
- main(String[]) - Static method in class org.apache.nutch.crawl.CrawlDb
-
- main(String[]) - Static method in class org.apache.nutch.crawl.CrawlDbMerger
-
- main(String[]) - Static method in class org.apache.nutch.crawl.CrawlDbReader
-
- main(String[]) - Static method in class org.apache.nutch.crawl.DeduplicationJob
-
- main(String[]) - Static method in class org.apache.nutch.crawl.Generator
-
Generate a fetchlist from the crawldb.
- main(String[]) - Static method in class org.apache.nutch.crawl.Injector
-
- main(String[]) - Static method in class org.apache.nutch.crawl.LinkDb
-
- main(String[]) - Static method in class org.apache.nutch.crawl.LinkDbMerger
-
- main(String[]) - Static method in class org.apache.nutch.crawl.LinkDbReader
-
- main(String[]) - Static method in class org.apache.nutch.crawl.MimeAdaptiveFetchSchedule
-
- main(String[]) - Static method in class org.apache.nutch.crawl.TextProfileSignature
-
- main(String[]) - Static method in class org.apache.nutch.fetcher.Fetcher
-
Run the fetcher.
- main(String[]) - Static method in class org.apache.nutch.hostdb.ReadHostDb
-
- main(String[]) - Static method in class org.apache.nutch.hostdb.UpdateHostDb
-
- main(String[]) - Static method in class org.apache.nutch.indexer.CleaningJob
-
- main(String[]) - Static method in class org.apache.nutch.indexer.filter.MimeTypeIndexingFilter
-
Main method for invoking this tool
- main(String[]) - Static method in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- main(String[]) - Static method in class org.apache.nutch.indexer.IndexingJob
-
- main(String[]) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
-
- main(String[]) - Static method in class org.apache.nutch.net.URLFilterChecker
-
- main(String[]) - Static method in class org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
-
- main(String[]) - Static method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
Spits out patterns and substitutions that are in the configuration file.
- main(String[]) - Static method in class org.apache.nutch.net.URLNormalizerChecker
-
- main(String[]) - Static method in class org.apache.nutch.parse.feed.FeedParser
-
Runs a command line version of this
Parser
.
- main(String[]) - Static method in class org.apache.nutch.parse.html.HtmlParser
-
- main(String[]) - Static method in class org.apache.nutch.parse.js.JSParseFilter
-
- main(String[]) - Static method in class org.apache.nutch.parse.ParseData
-
- main(String[]) - Static method in class org.apache.nutch.parse.ParserChecker
-
- main(String[]) - Static method in class org.apache.nutch.parse.ParseSegment
-
- main(String[]) - Static method in class org.apache.nutch.parse.ParseText
-
- main(String[]) - Static method in class org.apache.nutch.parse.swf.SWFParser
-
Arguments are: 0.
- main(String[]) - Static method in class org.apache.nutch.parse.zip.ZipParser
-
- main(String[]) - Static method in class org.apache.nutch.plugin.PluginRepository
-
Loads all necessary dependencies for a selected plugin, and then runs one
of the classes' main() method.
- main(String[]) - Static method in class org.apache.nutch.protocol.Content
-
- main(String[]) - Static method in class org.apache.nutch.protocol.file.File
-
Quick way for running this class.
- main(String[]) - Static method in class org.apache.nutch.protocol.ftp.Ftp
-
For debugging.
- main(String[]) - Static method in class org.apache.nutch.protocol.htmlunit.Http
-
- main(HttpBase, String[]) - Static method in class org.apache.nutch.protocol.http.api.HttpBase
-
- main(String[]) - Static method in class org.apache.nutch.protocol.http.Http
-
- main(String[]) - Static method in class org.apache.nutch.protocol.httpclient.Http
-
Main method.
- main(String[]) - Static method in class org.apache.nutch.protocol.RobotRulesParser
-
- main(String[]) - Static method in class org.apache.nutch.protocol.selenium.Http
-
- main(String[]) - Static method in class org.apache.nutch.scoring.webgraph.LinkDumper
-
- main(String[]) - Static method in class org.apache.nutch.scoring.webgraph.LinkDumper.Reader
-
- main(String[]) - Static method in class org.apache.nutch.scoring.webgraph.LinkRank
-
- main(String[]) - Static method in class org.apache.nutch.scoring.webgraph.NodeDumper
-
- main(String[]) - Static method in class org.apache.nutch.scoring.webgraph.NodeReader
-
Runs the NodeReader tool.
- main(String[]) - Static method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
-
- main(String[]) - Static method in class org.apache.nutch.scoring.webgraph.WebGraph
-
- main(String[]) - Static method in class org.apache.nutch.segment.SegmentMerger
-
- main(String[]) - Static method in class org.apache.nutch.segment.SegmentReader
-
- main(String[]) - Static method in class org.apache.nutch.service.NutchServer
-
- main(String[]) - Static method in class org.apache.nutch.tools.arc.ArcSegmentCreator
-
- main(String[]) - Static method in class org.apache.nutch.tools.Benchmark
-
- main(String[]) - Static method in class org.apache.nutch.tools.CommonCrawlDataDumper
-
Main method for invoking this tool
- main(String[]) - Static method in class org.apache.nutch.tools.DmozParser
-
Command-line access.
- main(String[]) - Static method in class org.apache.nutch.tools.FileDumper
-
Main method for invoking this tool
- main(String[]) - Static method in class org.apache.nutch.tools.FreeGenerator
-
- main(String[]) - Static method in class org.apache.nutch.tools.ResolveUrls
-
Runs the resolve urls tool.
- main(String[]) - Static method in class org.apache.nutch.tools.warc.WARCExporter
-
- main(RegexURLFilterBase, String[]) - Static method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
Filter the standard input using a RegexURLFilterBase.
- main(String[]) - Static method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
- main(String[]) - Static method in class org.apache.nutch.urlfilter.ignoreexempt.ExemptionUrlFilter
-
- main(String[]) - Static method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
-
- main(String[]) - Static method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
-
- main(String[]) - Static method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- main(String[]) - Static method in class org.apache.nutch.util.CommandRunner
-
- main(String[]) - Static method in class org.apache.nutch.util.CrawlCompletionStats
-
- main(String[]) - Static method in class org.apache.nutch.util.domain.DomainStatistics
-
- main(String[]) - Static method in class org.apache.nutch.util.EncodingDetector
-
- main(String[]) - Static method in class org.apache.nutch.util.PrefixStringMatcher
-
- main(String[]) - Static method in class org.apache.nutch.util.ProtocolStatusStatistics
-
- main(String[]) - Static method in class org.apache.nutch.util.StringUtil
-
- main(String[]) - Static method in class org.apache.nutch.util.SuffixStringMatcher
-
- main(String[]) - Static method in class org.apache.nutch.util.URLUtil
-
For testing
- main(String[]) - Static method in class org.apache.nutch.webui.NutchUiServer
-
- majorCodes - Static variable in class org.apache.nutch.parse.ParseStatus
-
- makeClient(Configuration) - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
Generates a TransportClient or NodeClient
- makeIOException(SolrServerException) - Static method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- map(Text, CrawlDatum, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbFilter
-
- map(Text, CrawlDatum, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbDumpMapper
-
- map(Text, CrawlDatum, OutputCollector<Text, LongWritable>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatMapper
-
- map(Text, CrawlDatum, OutputCollector<FloatWritable, Text>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNMapper
-
- map(Text, CrawlDatum, OutputCollector<BytesWritable, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.DeduplicationJob.DBFilter
-
- map(Text, CrawlDatum, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.Generator.CrawlDbUpdater
-
- map(Text, CrawlDatum, OutputCollector<FloatWritable, Generator.SelectorEntry>, Reporter) - Method in class org.apache.nutch.crawl.Generator.Selector
-
Select & invert subset due for fetch.
- map(FloatWritable, Generator.SelectorEntry, OutputCollector<Text, Generator.SelectorEntry>, Reporter) - Method in class org.apache.nutch.crawl.Generator.SelectorInverseMapper
-
- map(Text, Writable, Mapper<Text, Writable, Text, CrawlDatum>.Context) - Method in class org.apache.nutch.crawl.Injector.InjectMapper
-
- map(Text, ParseData, OutputCollector<Text, Inlinks>, Reporter) - Method in class org.apache.nutch.crawl.LinkDb
-
- map(Text, Inlinks, OutputCollector<Text, Inlinks>, Reporter) - Method in class org.apache.nutch.crawl.LinkDbFilter
-
- map(Text, Inlinks, OutputCollector<Text, Inlinks>, Reporter) - Method in class org.apache.nutch.crawl.LinkDbReader.LinkDBDumpMapper
-
- map(Text, Writable, OutputCollector<Text, NutchWritable>, Reporter) - Method in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
Mapper ingesting records from the HostDB, CrawlDB and plaintext host
scores file.
- map(Text, CrawlDatum, OutputCollector<ByteWritable, Text>, Reporter) - Method in class org.apache.nutch.indexer.CleaningJob.DBFilter
-
- map(Text, Writable, OutputCollector<Text, NutchWritable>, Reporter) - Method in class org.apache.nutch.indexer.IndexerMapReduce
-
- map(WritableComparable<?>, Content, OutputCollector<Text, ParseImpl>, Reporter) - Method in class org.apache.nutch.parse.ParseSegment
-
- map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
-
Wraps all values in ObjectWritables.
- map(Text, Node, OutputCollector<Text, FloatWritable>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Dumper
-
Outputs the host or domain as key for this record and numInlinks,
numOutlinks or score as the value.
- map(Text, Node, OutputCollector<FloatWritable, Text>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
-
Outputs the url with the appropriate number of inlinks, outlinks, or for
score.
- map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
-
Changes input into ObjectWritables.
- map(Text, Writable, OutputCollector<Text, NutchWritable>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
-
Passes through existing LinkDatum objects from an existing OutlinkDb and
maps out new LinkDatum objects from new crawls ParseData.
- map(Text, MetaWrapper, OutputCollector<Text, MetaWrapper>, Reporter) - Method in class org.apache.nutch.segment.SegmentMerger
-
- map(WritableComparable<?>, Writable, OutputCollector<Text, NutchWritable>, Reporter) - Method in class org.apache.nutch.segment.SegmentReader.InputCompatMapper
-
- map(Text, BytesWritable, OutputCollector<Text, NutchWritable>, Reporter) - Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
-
Runs the Map job to translate an arc record into output for Nutch segments.
- map(WritableComparable<?>, Text, OutputCollector<Text, Generator.SelectorEntry>, Reporter) - Method in class org.apache.nutch.tools.FreeGenerator.FG
-
- map(Text, Writable, OutputCollector<Text, NutchWritable>, Reporter) - Method in class org.apache.nutch.tools.warc.WARCExporter.WARCReducer
-
- mapCopyKey(String) - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- mapKey(String) - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- MAPPING_FILE - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- match(String) - Method in class org.apache.nutch.urlfilter.api.RegexRule
-
Checks if a url matches this rule.
- matchChar(TrieStringMatcher.TrieNode, String, int) - Method in class org.apache.nutch.util.TrieStringMatcher
-
Returns the next
TrieStringMatcher.TrieNode
visited, given that you are at
node
, and the the next character in the input is the
idx
'th character of
s
.
- matches(String) - Method in class org.apache.nutch.util.PrefixStringMatcher
-
Returns true if the given String
is matched by a prefix in the
trie
- matches(String) - Method in class org.apache.nutch.util.SuffixStringMatcher
-
Returns true if the given String
is matched by a suffix in the
trie
- matches(String) - Method in class org.apache.nutch.util.TrieStringMatcher
-
Returns true if the given String
is matched by a pattern in
the trie
- MAX_BULK_DOCS - Static variable in interface org.apache.nutch.indexwriter.elastic.ElasticConstants
-
- MAX_BULK_LENGTH - Static variable in interface org.apache.nutch.indexwriter.elastic.ElasticConstants
-
- MAX_DEPTH_KEY - Static variable in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- MAX_DEPTH_KEY_W - Static variable in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- MAX_WARC_FILE_SIZE - Static variable in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- maxContent - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
The length limit for downloaded content, in bytes.
- maxCrawlDelay - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
Skip page if Crawl-Delay longer than this value.
- maxInterval - Variable in class org.apache.nutch.crawl.AbstractFetchSchedule
-
- MD5Signature - Class in org.apache.nutch.crawl
-
Default implementation of a page signature.
- MD5Signature() - Constructor for class org.apache.nutch.crawl.MD5Signature
-
- merge(Path, Path[], boolean, boolean) - Method in class org.apache.nutch.crawl.CrawlDbMerger
-
- merge(Path, Path[], boolean, boolean) - Method in class org.apache.nutch.crawl.LinkDbMerger
-
- merge(Path, Path[], boolean, boolean, long) - Method in class org.apache.nutch.segment.SegmentMerger
-
- Merger() - Constructor for class org.apache.nutch.crawl.CrawlDbMerger.Merger
-
- Merger() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.Merger
-
- metaData - Variable in class org.apache.nutch.hostdb.HostDatum
-
- metadata - Variable in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- Metadata - Class in org.apache.nutch.metadata
-
A multi-valued metadata container.
- Metadata() - Constructor for class org.apache.nutch.metadata.Metadata
-
Constructs a new, empty metadata.
- metadata - Variable in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- MetadataIndexer - Class in org.apache.nutch.indexer.metadata
-
Indexer which can be configured to extract metadata from the crawldb, parse
metadata or content metadata.
- MetadataIndexer() - Constructor for class org.apache.nutch.indexer.metadata.MetadataIndexer
-
- metadataSource - Static variable in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
Metadata source field name
- MetaTagsParser - Class in org.apache.nutch.parse.metatags
-
Parse HTML meta tags (keywords, description) and store them in the parse
metadata so that they can be indexed with the index-metadata plugin with the
prefix 'metatag.'.
- MetaTagsParser() - Constructor for class org.apache.nutch.parse.metatags.MetaTagsParser
-
- MetaWrapper - Class in org.apache.nutch.metadata
-
This is a simple decorator that adds metadata to any Writable-s that can be
serialized by NutchWritable.
- MetaWrapper() - Constructor for class org.apache.nutch.metadata.MetaWrapper
-
- MetaWrapper(Writable, Configuration) - Constructor for class org.apache.nutch.metadata.MetaWrapper
-
- MetaWrapper(Metadata, Writable, Configuration) - Constructor for class org.apache.nutch.metadata.MetaWrapper
-
- MimeAdaptiveFetchSchedule - Class in org.apache.nutch.crawl
-
Extension of @see AdaptiveFetchSchedule that allows for more flexible
configuration of DEC and INC factors for various MIME-types.
- MimeAdaptiveFetchSchedule() - Constructor for class org.apache.nutch.crawl.MimeAdaptiveFetchSchedule
-
- MIMEFILTER_REGEX_FILE - Static variable in class org.apache.nutch.indexer.filter.MimeTypeIndexingFilter
-
- MimeTypeIndexingFilter - Class in org.apache.nutch.indexer.filter
-
An
IndexingFilter
that allows filtering
of documents based on the MIME Type detected by Tika
- MimeTypeIndexingFilter() - Constructor for class org.apache.nutch.indexer.filter.MimeTypeIndexingFilter
-
- MimeUtil - Class in org.apache.nutch.util
-
- MimeUtil(Configuration) - Constructor for class org.apache.nutch.util.MimeUtil
-
- MIN_CONFIDENCE_KEY - Static variable in class org.apache.nutch.util.EncodingDetector
-
- MissingDependencyException - Exception in org.apache.nutch.plugin
-
MissingDependencyException
will be thrown if a plugin dependency
cannot be found.
- MissingDependencyException(Throwable) - Constructor for exception org.apache.nutch.plugin.MissingDependencyException
-
- MissingDependencyException(String) - Constructor for exception org.apache.nutch.plugin.MissingDependencyException
-
- Model - Class in org.apache.nutch.scoring.similarity.cosine
-
This class creates a model used to store Document vector representation of the corpus.
- Model() - Constructor for class org.apache.nutch.scoring.similarity.cosine.Model
-
- model(T) - Method in class org.apache.nutch.webui.pages.components.CpmIteratorAdapter
-
- MODIFIED - Static variable in interface org.apache.nutch.metadata.DublinCore
-
Date on which the resource was changed.
- modifyWebClient(WebClient) - Method in class org.apache.nutch.protocol.htmlunit.HtmlUnitWebDriver
-
- MoreIndexingFilter - Class in org.apache.nutch.indexer.more
-
Add (or reset) a few metaData properties as respective fields (if they are
available), so that they can be accurately used within the search index.
- MoreIndexingFilter() - Constructor for class org.apache.nutch.indexer.more.MoreIndexingFilter
-
- MOVED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Resource has moved permanently.
- ObjectCache - Class in org.apache.nutch.util
-
- ObjectInputFormat() - Constructor for class org.apache.nutch.segment.SegmentMerger.ObjectInputFormat
-
- OLD_OUTLINK_DIR - Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
-
- onCrawlError(Crawl, String) - Method in interface org.apache.nutch.webui.client.impl.CrawlingCycleListener
-
- onCrawlError(Crawl, String) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- onInitialize() - Method in class org.apache.nutch.webui.pages.components.ColorEnumLabel
-
- open(JobConf, String) - Method in interface org.apache.nutch.indexer.IndexWriter
-
- open(JobConf, String) - Method in class org.apache.nutch.indexer.IndexWriters
-
- open(JobConf, String) - Method in class org.apache.nutch.indexwriter.dummy.DummyIndexWriter
-
- open(JobConf, String) - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- open(JobConf, String) - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- OPERATOR - Static variable in class org.apache.nutch.tools.WARCUtils
-
- OPICScoringFilter - Class in org.apache.nutch.scoring.opic
-
This plugin implements a variant of an Online Page Importance Computation
(OPIC) score, described in this paper:
Abiteboul, Serge and Preda, Mihai and Cobena, Gregory (2003), Adaptive
On-Line Page Importance Computation .
- OPICScoringFilter() - Constructor for class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- org.apache.nutch.analysis.lang - package org.apache.nutch.analysis.lang
-
Text document language identifier.
- org.apache.nutch.collection - package org.apache.nutch.collection
-
Subcollection is a subset of an index.
- org.apache.nutch.crawl - package org.apache.nutch.crawl
-
Crawl control code and tools to run the crawler.
- org.apache.nutch.fetcher - package org.apache.nutch.fetcher
-
The Nutch robot.
- org.apache.nutch.hostdb - package org.apache.nutch.hostdb
-
- org.apache.nutch.indexer - package org.apache.nutch.indexer
-
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index.
- org.apache.nutch.indexer.anchor - package org.apache.nutch.indexer.anchor
-
An indexing plugin for inbound anchor text.
- org.apache.nutch.indexer.basic - package org.apache.nutch.indexer.basic
-
A basic indexing plugin, adds basic fields: url, host, title, content, etc.
- org.apache.nutch.indexer.feed - package org.apache.nutch.indexer.feed
-
Indexing filter to index meta data from RSS feeds.
- org.apache.nutch.indexer.filter - package org.apache.nutch.indexer.filter
-
- org.apache.nutch.indexer.geoip - package org.apache.nutch.indexer.geoip
-
This plugin implements an indexing filter which takes
advantage of the
GeoIP2-java API.
- org.apache.nutch.indexer.links - package org.apache.nutch.indexer.links
-
- org.apache.nutch.indexer.metadata - package org.apache.nutch.indexer.metadata
-
Indexing filter to add document metadata to the index.
- org.apache.nutch.indexer.more - package org.apache.nutch.indexer.more
-
A more indexing plugin, adds "more" index fields:
last modified date, MIME type, content length.
- org.apache.nutch.indexer.replace - package org.apache.nutch.indexer.replace
-
Indexing filter to allow pattern replacements on metadata.
- org.apache.nutch.indexer.staticfield - package org.apache.nutch.indexer.staticfield
-
A simple plugin called at indexing that adds fields with static data.
- org.apache.nutch.indexer.subcollection - package org.apache.nutch.indexer.subcollection
-
Indexing filter to assign documents to subcollections.
- org.apache.nutch.indexer.tld - package org.apache.nutch.indexer.tld
-
Top Level Domain Indexing plugin.
- org.apache.nutch.indexer.urlmeta - package org.apache.nutch.indexer.urlmeta
-
URL Meta Tag Indexing Plugin
- org.apache.nutch.indexwriter.dummy - package org.apache.nutch.indexwriter.dummy
-
Index writer plugin for debugging, writes pairs of <action, url> to a
text file, action is one of "add", "update", or "delete".
- org.apache.nutch.indexwriter.elastic - package org.apache.nutch.indexwriter.elastic
-
- org.apache.nutch.indexwriter.solr - package org.apache.nutch.indexwriter.solr
-
- org.apache.nutch.metadata - package org.apache.nutch.metadata
-
A Multi-valued Metadata container, and set
of constant fields for Nutch Metadata.
- org.apache.nutch.microformats.reltag - package org.apache.nutch.microformats.reltag
-
A microformats
Rel-Tag
Parser/Indexer/Querier plugin.
- org.apache.nutch.net - package org.apache.nutch.net
-
- org.apache.nutch.net.protocols - package org.apache.nutch.net.protocols
-
- org.apache.nutch.net.urlnormalizer.basic - package org.apache.nutch.net.urlnormalizer.basic
-
URL normalizer performing basic normalizations: remove default ports
and dot segments in path.
- org.apache.nutch.net.urlnormalizer.host - package org.apache.nutch.net.urlnormalizer.host
-
URL normalizer renaming hosts to a canonical form listed in the
configuration file.
- org.apache.nutch.net.urlnormalizer.pass - package org.apache.nutch.net.urlnormalizer.pass
-
URL normalizer dummy which does not change URLs.
- org.apache.nutch.net.urlnormalizer.protocol - package org.apache.nutch.net.urlnormalizer.protocol
-
- org.apache.nutch.net.urlnormalizer.querystring - package org.apache.nutch.net.urlnormalizer.querystring
-
URL normalizer which sort the elements in the query part to avoid duplicates
by permutations.
- org.apache.nutch.net.urlnormalizer.regex - package org.apache.nutch.net.urlnormalizer.regex
-
URL normalizer with configurable rules based on regular expressions
(
Pattern
).
- org.apache.nutch.net.urlnormalizer.slash - package org.apache.nutch.net.urlnormalizer.slash
-
- org.apache.nutch.parse - package org.apache.nutch.parse
-
The
Parse
interface and related classes.
- org.apache.nutch.parse.ext - package org.apache.nutch.parse.ext
-
Parse wrapper to run external command to do the parsing.
- org.apache.nutch.parse.feed - package org.apache.nutch.parse.feed
-
Parse RSS feeds.
- org.apache.nutch.parse.headings - package org.apache.nutch.parse.headings
-
Parse filter to extract headings (h1, h2, etc.) from DOM parse tree.
- org.apache.nutch.parse.html - package org.apache.nutch.parse.html
-
An HTML document parsing plugin.
- org.apache.nutch.parse.js - package org.apache.nutch.parse.js
-
Parser and parse filter plugin to extract all (possible) links
from JavaScript files and embedded JavaScript code snippets.
- org.apache.nutch.parse.metatags - package org.apache.nutch.parse.metatags
-
Parse filter to extract meta tags: keywords, description, etc.
- org.apache.nutch.parse.swf - package org.apache.nutch.parse.swf
-
Parse Flash SWF files.
- org.apache.nutch.parse.tika - package org.apache.nutch.parse.tika
-
Parse various document formats with help of
Apache Tika.
- org.apache.nutch.parse.zip - package org.apache.nutch.parse.zip
-
Parse ZIP files: embedded files are recursively passed to appropriate parsers.
- org.apache.nutch.parsefilter.naivebayes - package org.apache.nutch.parsefilter.naivebayes
-
Html Parse filter that classifies the outlinks from the parseresult as
relevant or irrelevant based on the parseText's relevancy (using a training
file where you can give positive and negative example texts see the
description of parsefilter.naivebayes.trainfile) and if found irrelevent
it gives the link a second chance if it contains any of the words from the
list given in parsefilter.naivebayes.wordlist.
- org.apache.nutch.parsefilter.regex - package org.apache.nutch.parsefilter.regex
-
RegexParseFilter.
- org.apache.nutch.plugin - package org.apache.nutch.plugin
-
- org.apache.nutch.protocol - package org.apache.nutch.protocol
-
- org.apache.nutch.protocol.file - package org.apache.nutch.protocol.file
-
Protocol plugin which supports retrieving local file resources.
- org.apache.nutch.protocol.ftp - package org.apache.nutch.protocol.ftp
-
Protocol plugin which supports retrieving documents via the ftp protocol.
- org.apache.nutch.protocol.htmlunit - package org.apache.nutch.protocol.htmlunit
-
Protocol plugin which supports retrieving documents via the http protocol.
- org.apache.nutch.protocol.http - package org.apache.nutch.protocol.http
-
Protocol plugin which supports retrieving documents via the http protocol.
- org.apache.nutch.protocol.http.api - package org.apache.nutch.protocol.http.api
-
- org.apache.nutch.protocol.httpclient - package org.apache.nutch.protocol.httpclient
-
Protocol plugin which supports retrieving documents via the HTTP and
HTTPS protocols, optionally with Basic, Digest and NTLM authentication
schemes for web server as well as proxy server.
- org.apache.nutch.protocol.selenium - package org.apache.nutch.protocol.selenium
-
Protocol plugin which supports retrieving documents via selenium.
- org.apache.nutch.publisher - package org.apache.nutch.publisher
-
- org.apache.nutch.publisher.rabbitmq - package org.apache.nutch.publisher.rabbitmq
-
Publisher package to implement queues
- org.apache.nutch.scoring - package org.apache.nutch.scoring
-
- org.apache.nutch.scoring.depth - package org.apache.nutch.scoring.depth
-
Scoring filter to stop crawling at a configurable depth
(number of "hops" from seed URLs).
- org.apache.nutch.scoring.link - package org.apache.nutch.scoring.link
-
Scoring filter used in conjunction with
WebGraph
.
- org.apache.nutch.scoring.opic - package org.apache.nutch.scoring.opic
-
Scoring filter implementing a variant of the Online Page Importance Computation
(OPIC) algorithm.
- org.apache.nutch.scoring.similarity - package org.apache.nutch.scoring.similarity
-
- org.apache.nutch.scoring.similarity.cosine - package org.apache.nutch.scoring.similarity.cosine
-
Implements the cosine similarity metric for scoring relevant documents
- org.apache.nutch.scoring.similarity.util - package org.apache.nutch.scoring.similarity.util
-
Utility package for Lucene functions
- org.apache.nutch.scoring.tld - package org.apache.nutch.scoring.tld
-
Top Level Domain Scoring plugin.
- org.apache.nutch.scoring.urlmeta - package org.apache.nutch.scoring.urlmeta
-
URL Meta Tag Scoring Plugin
- org.apache.nutch.scoring.webgraph - package org.apache.nutch.scoring.webgraph
-
- org.apache.nutch.segment - package org.apache.nutch.segment
-
A segment stores all data from on generate/fetch/update cycle:
fetch list, protocol status, raw content, parsed content, and extracted outgoing links.
- org.apache.nutch.service - package org.apache.nutch.service
-
- org.apache.nutch.service.impl - package org.apache.nutch.service.impl
-
- org.apache.nutch.service.model.request - package org.apache.nutch.service.model.request
-
- org.apache.nutch.service.model.response - package org.apache.nutch.service.model.response
-
- org.apache.nutch.service.resources - package org.apache.nutch.service.resources
-
- org.apache.nutch.tools - package org.apache.nutch.tools
-
Miscellaneous tools.
- org.apache.nutch.tools.arc - package org.apache.nutch.tools.arc
-
- org.apache.nutch.tools.warc - package org.apache.nutch.tools.warc
-
Tools to import / export between Nutch segments and
WARC archives.
- org.apache.nutch.urlfilter.api - package org.apache.nutch.urlfilter.api
-
Generic
URL filter
library,
abstracting away from regular expression implementations.
- org.apache.nutch.urlfilter.automaton - package org.apache.nutch.urlfilter.automaton
-
- org.apache.nutch.urlfilter.domain - package org.apache.nutch.urlfilter.domain
-
URL filter plugin to include only URLs which match an element in a given list of
domain suffixes, domain names, and/or host names.
- org.apache.nutch.urlfilter.domainblacklist - package org.apache.nutch.urlfilter.domainblacklist
-
URL filter plugin to exclude URLs by domain suffixes, domain names, and/or host names.
- org.apache.nutch.urlfilter.ignoreexempt - package org.apache.nutch.urlfilter.ignoreexempt
-
URL filter plugin which identifies exemptions to external urls when
when external urls are set to ignore.
- org.apache.nutch.urlfilter.prefix - package org.apache.nutch.urlfilter.prefix
-
URL filter plugin to include only URLs which match one of a given list of URL prefixes.
- org.apache.nutch.urlfilter.regex - package org.apache.nutch.urlfilter.regex
-
URL filter plugin to include and/or exclude URLs matching Java regular expressions.
- org.apache.nutch.urlfilter.suffix - package org.apache.nutch.urlfilter.suffix
-
URL filter plugin to either exclude or include only URLs which match
one of the given (path) suffixes.
- org.apache.nutch.urlfilter.validator - package org.apache.nutch.urlfilter.validator
-
URL filter plugin that validates given urls.
- org.apache.nutch.util - package org.apache.nutch.util
-
Miscellaneous utility classes.
- org.apache.nutch.util.domain - package org.apache.nutch.util.domain
-
Classes for domain name analysis.
- org.apache.nutch.webui - package org.apache.nutch.webui
-
- org.apache.nutch.webui.client - package org.apache.nutch.webui.client
-
- org.apache.nutch.webui.client.impl - package org.apache.nutch.webui.client.impl
-
- org.apache.nutch.webui.client.model - package org.apache.nutch.webui.client.model
-
- org.apache.nutch.webui.config - package org.apache.nutch.webui.config
-
- org.apache.nutch.webui.model - package org.apache.nutch.webui.model
-
- org.apache.nutch.webui.pages - package org.apache.nutch.webui.pages
-
- org.apache.nutch.webui.pages.assets - package org.apache.nutch.webui.pages.assets
-
- org.apache.nutch.webui.pages.components - package org.apache.nutch.webui.pages.components
-
- org.apache.nutch.webui.pages.crawls - package org.apache.nutch.webui.pages.crawls
-
- org.apache.nutch.webui.pages.instances - package org.apache.nutch.webui.pages.instances
-
- org.apache.nutch.webui.pages.menu - package org.apache.nutch.webui.pages.menu
-
- org.apache.nutch.webui.pages.seed - package org.apache.nutch.webui.pages.seed
-
- org.apache.nutch.webui.pages.settings - package org.apache.nutch.webui.pages.settings
-
- org.apache.nutch.webui.service - package org.apache.nutch.webui.service
-
- org.apache.nutch.webui.service.impl - package org.apache.nutch.webui.service.impl
-
- org.creativecommons.nutch - package org.creativecommons.nutch
-
Sample plugins that parse and index Creative Commons medadata.
- ORIGINAL_CHAR_ENCODING - Static variable in interface org.apache.nutch.metadata.Nutch
-
- Outlink - Class in org.apache.nutch.parse
-
- Outlink() - Constructor for class org.apache.nutch.parse.Outlink
-
- Outlink(String, String) - Constructor for class org.apache.nutch.parse.Outlink
-
- OUTLINK - Static variable in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- OUTLINK_DIR - Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
-
- OutlinkDb() - Constructor for class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
-
Default constructor.
- OutlinkDb(Configuration) - Constructor for class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
-
Configurable constructor.
- OutlinkExtractor - Class in org.apache.nutch.parse
-
Extractor to extract
Outlink
s / URLs from
plain text using Regular Expressions.
- OutlinkExtractor() - Constructor for class org.apache.nutch.parse.OutlinkExtractor
-
- output - Variable in class org.apache.nutch.hostdb.ResolverThread
-
- PARAMS - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
Deprecated.
- parse(InputStream) - Method in class org.apache.nutch.collection.CollectionManager
-
- Parse - Interface in org.apache.nutch.parse
-
The result of parsing a page's raw content.
- parse(Path) - Method in class org.apache.nutch.parse.ParseSegment
-
- parse(Content) - Method in class org.apache.nutch.parse.ParseUtil
-
Performs a parse by iterating through a List of preferred
Parser
s
until a successful parse is performed and a
Parse
object is
returned.
- parse(String) - Static method in class org.apache.nutch.segment.SegmentPart
-
Create SegmentPart from a String in format "segmentName/partName".
- PARSE_DIR_NAME - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- parseByExtensionId(String, Content) - Method in class org.apache.nutch.parse.ParseUtil
-
Method parses a
Content
object using the
Parser
specified
by the parameter
extId
, i.e., the Parser's extension ID.
- parseCharacterEncoding(String) - Static method in class org.apache.nutch.util.EncodingDetector
-
Parse the character encoding from the specified content type header.
- parsed - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- ParseData - Class in org.apache.nutch.parse
-
Data extracted from a page's content.
- ParseData() - Constructor for class org.apache.nutch.parse.ParseData
-
- ParseData(ParseStatus, String, Outlink[], Metadata) - Constructor for class org.apache.nutch.parse.ParseData
-
- ParseData(ParseStatus, String, Outlink[], Metadata, Metadata) - Constructor for class org.apache.nutch.parse.ParseData
-
- parseDmozFile(File, int, boolean, int, Pattern) - Method in class org.apache.nutch.tools.DmozParser
-
Iterate through all the items in this structured DMOZ file.
- parseErrors - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- ParseException - Exception in org.apache.nutch.parse
-
- ParseException() - Constructor for exception org.apache.nutch.parse.ParseException
-
- ParseException(String) - Constructor for exception org.apache.nutch.parse.ParseException
-
- ParseException(String, Throwable) - Constructor for exception org.apache.nutch.parse.ParseException
-
- ParseException(Throwable) - Constructor for exception org.apache.nutch.parse.ParseException
-
- parseExpression(String) - Static method in class org.apache.nutch.util.JexlUtil
-
Parses the given experssion to a Jexl expression.
- ParseImpl - Class in org.apache.nutch.parse
-
The result of parsing a page's raw content.
- ParseImpl() - Constructor for class org.apache.nutch.parse.ParseImpl
-
- ParseImpl(Parse) - Constructor for class org.apache.nutch.parse.ParseImpl
-
- ParseImpl(String, ParseData) - Constructor for class org.apache.nutch.parse.ParseImpl
-
- ParseImpl(ParseText, ParseData) - Constructor for class org.apache.nutch.parse.ParseImpl
-
- ParseImpl(ParseText, ParseData, boolean) - Constructor for class org.apache.nutch.parse.ParseImpl
-
- parseList(List<String>, String) - Method in class org.apache.nutch.collection.Subcollection
-
Create a list of patterns from chunk of text, patterns are separated with
newline
- ParseOutputFormat - Class in org.apache.nutch.parse
-
- ParseOutputFormat() - Constructor for class org.apache.nutch.parse.ParseOutputFormat
-
- parsePluginFolder(String[]) - Method in class org.apache.nutch.plugin.PluginManifestParser
-
Returns a list of all found plugin descriptors.
- Parser - Interface in org.apache.nutch.parse
-
A parser for content generated by a
Protocol
implementation.
- ParserChecker - Class in org.apache.nutch.parse
-
Parser checker, useful for testing parser.
- ParserChecker() - Constructor for class org.apache.nutch.parse.ParserChecker
-
- ParseResult - Class in org.apache.nutch.parse
-
A utility class that stores result of a parse.
- ParseResult(String) - Constructor for class org.apache.nutch.parse.ParseResult
-
Create a container for parse results.
- ParserFactory - Class in org.apache.nutch.parse
-
Creates and caches
Parser
plugins.
- ParserFactory(Configuration) - Constructor for class org.apache.nutch.parse.ParserFactory
-
- ParserNotFound - Exception in org.apache.nutch.parse
-
- ParserNotFound(String) - Constructor for exception org.apache.nutch.parse.ParserNotFound
-
- ParserNotFound(String, String) - Constructor for exception org.apache.nutch.parse.ParserNotFound
-
- ParserNotFound(String, String, String) - Constructor for exception org.apache.nutch.parse.ParserNotFound
-
- parseRules(String, byte[], String, String) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
Parses the robots content using the SimpleRobotRulesParser
from
crawler commons
- ParseSegment - Class in org.apache.nutch.parse
-
- ParseSegment() - Constructor for class org.apache.nutch.parse.ParseSegment
-
- ParseSegment(Configuration) - Constructor for class org.apache.nutch.parse.ParseSegment
-
- ParseStatus - Class in org.apache.nutch.parse
-
- ParseStatus() - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int, int, String[]) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int, String[]) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int, int) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int, int, String) - Constructor for class org.apache.nutch.parse.ParseStatus
-
Simplified constructor for passing just a text message.
- ParseStatus(int, String) - Constructor for class org.apache.nutch.parse.ParseStatus
-
Simplified constructor for passing just a text message.
- ParseStatus(Throwable) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseText - Class in org.apache.nutch.parse
-
- ParseText() - Constructor for class org.apache.nutch.parse.ParseText
-
- ParseText(String) - Constructor for class org.apache.nutch.parse.ParseText
-
- ParseUtil - Class in org.apache.nutch.parse
-
A Utility class containing methods to simply perform parsing utilities such
as iterating through a preferred list of
Parser
s to obtain
Parse
objects.
- ParseUtil(Configuration) - Constructor for class org.apache.nutch.parse.ParseUtil
-
- PARTITION_MODE_DOMAIN - Static variable in class org.apache.nutch.crawl.URLPartitioner
-
- PARTITION_MODE_HOST - Static variable in class org.apache.nutch.crawl.URLPartitioner
-
- PARTITION_MODE_IP - Static variable in class org.apache.nutch.crawl.URLPartitioner
-
- PARTITION_MODE_KEY - Static variable in class org.apache.nutch.crawl.URLPartitioner
-
- PartitionReducer() - Constructor for class org.apache.nutch.crawl.Generator.PartitionReducer
-
- partName - Variable in class org.apache.nutch.segment.SegmentPart
-
Name of the segment part (ie.
- passScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- passScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- passScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- passScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData.
- passScoreAfterParsing(Text, Content, Parse) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
Currently a part of score distribution is performed using only data coming
from the parsing process.
- passScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.ScoringFilters
-
- passScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.similarity.SimilarityScoringFilter
-
- passScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- passScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
Takes the metadata, which was lumped inside the content, and replicates it
within your parse data.
- passScoreBeforeParsing(Text, CrawlDatum, Content) - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- passScoreBeforeParsing(Text, CrawlDatum, Content) - Method in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- passScoreBeforeParsing(Text, CrawlDatum, Content) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- passScoreBeforeParsing(Text, CrawlDatum, Content) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
Store a float value of CrawlDatum.getScore() under Fetcher.SCORE_KEY.
- passScoreBeforeParsing(Text, CrawlDatum, Content) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
This method takes all relevant score information from the current datum
(coming from a generated fetchlist) and stores it into
Content
metadata.
- passScoreBeforeParsing(Text, CrawlDatum, Content) - Method in class org.apache.nutch.scoring.ScoringFilters
-
- passScoreBeforeParsing(Text, CrawlDatum, Content) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- passScoreBeforeParsing(Text, CrawlDatum, Content) - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
Takes the metadata, specified in your "urlmeta.tags" property, from the
datum object and injects it into the content.
- PassURLNormalizer - Class in org.apache.nutch.net.urlnormalizer.pass
-
This URLNormalizer doesn't change urls.
- PassURLNormalizer() - Constructor for class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
-
- PASSWORD - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- percentiles - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- PERM_REFRESH_TIME - Static variable in class org.apache.nutch.fetcher.Fetcher
-
- Pluggable - Interface in org.apache.nutch.plugin
-
Defines the capability of a class to be plugged into Nutch.
- Plugin - Class in org.apache.nutch.plugin
-
A nutch-plugin is an container for a set of custom logic that provide
extensions to the nutch core functionality or another plugin that provides an
API for extending.
- Plugin(PluginDescriptor, Configuration) - Constructor for class org.apache.nutch.plugin.Plugin
-
Constructor
- PluginClassLoader - Class in org.apache.nutch.plugin
-
The PluginClassLoader
contains only classes of the runtime
libraries setuped in the plugin manifest file and exported libraries of
plugins that are required pluguin.
- PluginClassLoader(URL[], ClassLoader) - Constructor for class org.apache.nutch.plugin.PluginClassLoader
-
Construtor
- PluginDescriptor - Class in org.apache.nutch.plugin
-
The PluginDescriptor
provide access to all meta information of a
nutch-plugin, as well to the internationalizable resources and the plugin own
classloader.
- PluginDescriptor(String, String, String, String, String, String, Configuration) - Constructor for class org.apache.nutch.plugin.PluginDescriptor
-
Constructor
- PluginManifestParser - Class in org.apache.nutch.plugin
-
The PluginManifestParser
parser just parse the manifest file in
all plugin directories.
- PluginManifestParser(Configuration, PluginRepository) - Constructor for class org.apache.nutch.plugin.PluginManifestParser
-
- PluginRepository - Class in org.apache.nutch.plugin
-
The plugin repositority is a registry of all plugins.
- PluginRepository(Configuration) - Constructor for class org.apache.nutch.plugin.PluginRepository
-
- PluginRuntimeException - Exception in org.apache.nutch.plugin
-
PluginRuntimeException
will be thrown until a exception in the
plugin managemnt occurs.
- PluginRuntimeException(Throwable) - Constructor for exception org.apache.nutch.plugin.PluginRuntimeException
-
- PluginRuntimeException(String) - Constructor for exception org.apache.nutch.plugin.PluginRuntimeException
-
- PORT - Static variable in interface org.apache.nutch.indexwriter.elastic.ElasticConstants
-
- pos - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- PrefixStringMatcher - Class in org.apache.nutch.util
-
A class for efficiently matching String
s against a set of
prefixes.
- PrefixStringMatcher(String[]) - Constructor for class org.apache.nutch.util.PrefixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any prefix in the supplied array.
- PrefixStringMatcher(Collection<String>) - Constructor for class org.apache.nutch.util.PrefixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any prefix in the supplied
Collection
.
- PrefixURLFilter - Class in org.apache.nutch.urlfilter.prefix
-
Filters URLs based on a file of URL prefixes.
- PrefixURLFilter() - Constructor for class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
-
- PrefixURLFilter(String) - Constructor for class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
-
- PrintCommandListener - Class in org.apache.nutch.protocol.ftp
-
This is a support class for logging all ftp command/reply traffic.
- PrintCommandListener(Logger) - Constructor for class org.apache.nutch.protocol.ftp.PrintCommandListener
-
- processDeflateEncoded(byte[], URL) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- processDumpJob(String, String, JobConf, String, String, String, Integer, String) - Method in class org.apache.nutch.crawl.CrawlDbReader
-
- processDumpJob(String, String, String) - Method in class org.apache.nutch.crawl.LinkDbReader
-
- processGzipEncoded(byte[], URL) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- processingInstruction(String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of a processing instruction.
- processStatJob(String, Configuration, boolean) - Method in class org.apache.nutch.crawl.CrawlDbReader
-
- processTopNJob(String, long, float, String, JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader
-
- PROTO_NOT_FOUND - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
This protocol was not found.
- PROTO_STATUS_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- Protocol - Interface in org.apache.nutch.protocol
-
A retriever of url content.
- PROTOCOL_REDIR - Static variable in class org.apache.nutch.fetcher.Fetcher
-
- PROTOCOL_STATUS_CODE_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- protocolCommandSent(ProtocolCommandEvent) - Method in class org.apache.nutch.protocol.ftp.PrintCommandListener
-
- ProtocolException - Exception in org.apache.nutch.net.protocols
-
- ProtocolException() - Constructor for exception org.apache.nutch.net.protocols.ProtocolException
-
Deprecated.
- ProtocolException(String) - Constructor for exception org.apache.nutch.net.protocols.ProtocolException
-
Deprecated.
- ProtocolException(String, Throwable) - Constructor for exception org.apache.nutch.net.protocols.ProtocolException
-
Deprecated.
- ProtocolException(Throwable) - Constructor for exception org.apache.nutch.net.protocols.ProtocolException
-
Deprecated.
- ProtocolException - Exception in org.apache.nutch.protocol
-
- ProtocolException() - Constructor for exception org.apache.nutch.protocol.ProtocolException
-
- ProtocolException(String) - Constructor for exception org.apache.nutch.protocol.ProtocolException
-
- ProtocolException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.ProtocolException
-
- ProtocolException(Throwable) - Constructor for exception org.apache.nutch.protocol.ProtocolException
-
- ProtocolFactory - Class in org.apache.nutch.protocol
-
- ProtocolFactory(Configuration) - Constructor for class org.apache.nutch.protocol.ProtocolFactory
-
- ProtocolNotFound - Exception in org.apache.nutch.protocol
-
- ProtocolNotFound(String) - Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
-
- ProtocolNotFound(String, String) - Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
-
- ProtocolOutput - Class in org.apache.nutch.protocol
-
Simple aggregate to pass from protocol plugins both content and protocol
status.
- ProtocolOutput(Content, ProtocolStatus) - Constructor for class org.apache.nutch.protocol.ProtocolOutput
-
- ProtocolOutput(Content) - Constructor for class org.apache.nutch.protocol.ProtocolOutput
-
- protocolReplyReceived(ProtocolCommandEvent) - Method in class org.apache.nutch.protocol.ftp.PrintCommandListener
-
- ProtocolStatus - Class in org.apache.nutch.protocol
-
- ProtocolStatus() - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, String[]) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, String[], long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, Object) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, Object, long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(Throwable) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatusStatistics - Class in org.apache.nutch.util
-
Extracts protocol status code information from the crawl database.
- ProtocolStatusStatistics() - Constructor for class org.apache.nutch.util.ProtocolStatusStatistics
-
- ProtocolStatusStatistics.ProtocolStatusStatisticsCombiner - Class in org.apache.nutch.util
-
- ProtocolStatusStatisticsCombiner() - Constructor for class org.apache.nutch.util.ProtocolStatusStatistics.ProtocolStatusStatisticsCombiner
-
- ProtocolURLNormalizer - Class in org.apache.nutch.net.urlnormalizer.protocol
-
- ProtocolURLNormalizer() - Constructor for class org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer
-
- ProtocolURLNormalizer(String) - Constructor for class org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer
-
- proxyException - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
The proxy exception list.
- proxyHost - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
The proxy hostname.
- proxyPort - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
The proxy port.
- publish(FetcherThreadEvent, Configuration) - Method in class org.apache.nutch.fetcher.FetcherThreadPublisher
-
Publish event to all registered publishers
- publish(Object, Configuration) - Method in interface org.apache.nutch.publisher.NutchPublisher
-
This method publishes the event.
- publish(Object, Configuration) - Method in class org.apache.nutch.publisher.NutchPublishers
-
- publish(Object, Configuration) - Method in class org.apache.nutch.publisher.rabbitmq.RabbitMQPublisherImpl
-
- PUBLISHER - Static variable in interface org.apache.nutch.metadata.DublinCore
-
An entity responsible for making the resource available.
- purgeFailedHostsThreshold - Variable in class org.apache.nutch.hostdb.ResolverThread
-
- purgeFailedHostsThreshold - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- push() - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- put(String, FetchNode) - Method in class org.apache.nutch.fetcher.FetchNodeDb
-
- put(Text, ParseText, ParseData) - Method in class org.apache.nutch.parse.ParseResult
-
Store a result of parsing.
- put(String, ParseText, ParseData) - Method in class org.apache.nutch.parse.ParseResult
-
Store a result of parsing.
- putAllMetaData(CrawlDatum) - Method in class org.apache.nutch.crawl.CrawlDatum
-
Add all metadata from other CrawlDatum to this CrawlDatum.
- putAllMetaData(HostDatum) - Method in class org.apache.nutch.hostdb.HostDatum
-
Add all metadata from other CrawlDatum to this CrawlDatum.
- RabbitMQPublisherImpl - Class in org.apache.nutch.publisher.rabbitmq
-
- RabbitMQPublisherImpl() - Constructor for class org.apache.nutch.publisher.rabbitmq.RabbitMQPublisherImpl
-
- read(DataInput) - Static method in class org.apache.nutch.crawl.CrawlDatum
-
- read(DataInput) - Static method in class org.apache.nutch.crawl.Inlink
-
- read(DataInput) - Static method in class org.apache.nutch.parse.Outlink
-
- read(DataInput) - Static method in class org.apache.nutch.parse.ParseData
-
- read(DataInput) - Static method in class org.apache.nutch.parse.ParseImpl
-
- read(DataInput) - Static method in class org.apache.nutch.parse.ParseStatus
-
- read(DataInput) - Static method in class org.apache.nutch.parse.ParseText
-
- read(DataInput) - Static method in class org.apache.nutch.protocol.Content
-
- read(DataInput) - Static method in class org.apache.nutch.protocol.ProtocolStatus
-
- read(String) - Method in class org.apache.nutch.service.impl.LinkReader
-
- read(String) - Method in class org.apache.nutch.service.impl.NodeReader
-
- read(String) - Method in class org.apache.nutch.service.impl.SequenceReader
-
- read(String) - Method in interface org.apache.nutch.service.NutchReader
-
- readConfiguration(Reader) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- readdb(DbQuery) - Method in class org.apache.nutch.service.resources.DbResource
-
- Reader() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.Reader
-
- ReaderConfig - Class in org.apache.nutch.service.model.request
-
- ReaderConfig() - Constructor for class org.apache.nutch.service.model.request.ReaderConfig
-
- ReaderResouce - Class in org.apache.nutch.service.resources
-
The Reader endpoint enables a user to read sequence files,
nodes and links from the Nutch webgraph.
- ReaderResouce() - Constructor for class org.apache.nutch.service.resources.ReaderResouce
-
- readFields(DataInput) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- readFields(DataInput) - Method in class org.apache.nutch.crawl.Generator.SelectorEntry
-
- readFields(DataInput) - Method in class org.apache.nutch.crawl.Inlink
-
- readFields(DataInput) - Method in class org.apache.nutch.crawl.Inlinks
-
- readFields(DataInput) - Method in class org.apache.nutch.hostdb.HostDatum
-
- readFields(DataInput) - Method in class org.apache.nutch.indexer.NutchDocument
-
- readFields(DataInput) - Method in class org.apache.nutch.indexer.NutchField
-
- readFields(DataInput) - Method in class org.apache.nutch.indexer.NutchIndexAction
-
- readFields(DataInput) - Method in class org.apache.nutch.metadata.Metadata
-
- readFields(DataInput) - Method in class org.apache.nutch.metadata.MetaWrapper
-
- readFields(DataInput) - Method in class org.apache.nutch.parse.Outlink
-
- readFields(DataInput) - Method in class org.apache.nutch.parse.ParseData
-
- readFields(DataInput) - Method in class org.apache.nutch.parse.ParseImpl
-
- readFields(DataInput) - Method in class org.apache.nutch.parse.ParseStatus
-
- readFields(DataInput) - Method in class org.apache.nutch.parse.ParseText
-
- readFields(DataInput) - Method in class org.apache.nutch.protocol.Content
-
- readFields(DataInput) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- readFields(DataInput) - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- readFields(DataInput) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
-
- readFields(DataInput) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
-
- readFields(DataInput) - Method in class org.apache.nutch.scoring.webgraph.Node
-
- readFields(DataInput) - Method in class org.apache.nutch.util.GenericWritableConfigurable
-
- ReadHostDb - Class in org.apache.nutch.hostdb
-
- ReadHostDb() - Constructor for class org.apache.nutch.hostdb.ReadHostDb
-
- readingCrawlDb - Variable in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
- readUrl(String, String, JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader
-
- recheckInterval - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- REDIR_EXCEEDED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Too many redirects.
- redirPerm - Variable in class org.apache.nutch.hostdb.HostDatum
-
- redirTemp - Variable in class org.apache.nutch.hostdb.HostDatum
-
- reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbMerger.Merger
-
- reduce(Text, Iterator<LongWritable>, OutputCollector<Text, LongWritable>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
-
- reduce(Text, Iterator<LongWritable>, OutputCollector<Text, LongWritable>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
-
- reduce(FloatWritable, Iterator<Text>, OutputCollector<FloatWritable, Text>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
-
- reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReducer
-
- reduce(BytesWritable, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.DeduplicationJob.DedupReducer
-
- reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.DeduplicationJob.StatusUpdateReducer
-
- reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.Generator.CrawlDbUpdater
-
- reduce(Text, Iterator<Generator.SelectorEntry>, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.crawl.Generator.PartitionReducer
-
- reduce(FloatWritable, Iterator<Generator.SelectorEntry>, OutputCollector<FloatWritable, Generator.SelectorEntry>, Reporter) - Method in class org.apache.nutch.crawl.Generator.Selector
-
Collect until limit is reached.
- reduce(Text, Iterable<CrawlDatum>, Reducer<Text, CrawlDatum, Text, CrawlDatum>.Context) - Method in class org.apache.nutch.crawl.Injector.InjectReducer
-
Merge the input records as per rules below :
- reduce(Text, Iterator<Inlinks>, OutputCollector<Text, Inlinks>, Reporter) - Method in class org.apache.nutch.crawl.LinkDbMerger
-
- reduce(Text, Iterator<NutchWritable>, OutputCollector<Text, HostDatum>, Reporter) - Method in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- reduce(ByteWritable, Iterator<Text>, OutputCollector<Text, ByteWritable>, Reporter) - Method in class org.apache.nutch.indexer.CleaningJob.DeleterReducer
-
- reduce(Text, Iterator<NutchWritable>, OutputCollector<Text, NutchIndexAction>, Reporter) - Method in class org.apache.nutch.indexer.IndexerMapReduce
-
- reduce(Text, Iterator<Writable>, OutputCollector<Text, Writable>, Reporter) - Method in class org.apache.nutch.parse.ParseSegment
-
- reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, LinkDumper.LinkNode>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
-
Inverts outlinks to inlinks while attaching node information to the
outlink.
- reduce(Text, Iterator<LinkDumper.LinkNode>, OutputCollector<Text, LinkDumper.LinkNodes>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Merger
-
Aggregate all LinkNode objects for a given url.
- reduce(Text, Iterator<FloatWritable>, OutputCollector<Text, FloatWritable>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Dumper
-
Outputs either the sum or the top value for this record.
- reduce(FloatWritable, Iterator<Text>, OutputCollector<Text, FloatWritable>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
-
Flips and collects the url and numeric sort value.
- reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
-
Creates new CrawlDatum objects with the updated score from the NodeDb or
with a cleared score.
- reduce(Text, Iterator<NutchWritable>, OutputCollector<Text, LinkDatum>, Reporter) - Method in class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
-
- reduce(Text, Iterator<MetaWrapper>, OutputCollector<Text, MetaWrapper>, Reporter) - Method in class org.apache.nutch.segment.SegmentMerger
-
NOTE: in selecting the latest version we rely exclusively on the segment
name (not all segment data contain time information).
- reduce(Text, Iterator<NutchWritable>, OutputCollector<Text, Text>, Reporter) - Method in class org.apache.nutch.segment.SegmentReader
-
- reduce(Text, Iterator<Generator.SelectorEntry>, OutputCollector<Text, CrawlDatum>, Reporter) - Method in class org.apache.nutch.tools.FreeGenerator.FG
-
- reduce(Text, Iterator<NutchWritable>, OutputCollector<NullWritable, WARCWritable>, Reporter) - Method in class org.apache.nutch.tools.warc.WARCExporter.WARCReducer
-
- reduce(Text, Iterable<LongWritable>, Reducer<Text, LongWritable, Text, LongWritable>.Context) - Method in class org.apache.nutch.util.CrawlCompletionStats.CrawlCompletionStatsCombiner
-
- reduce(Text, Iterable<LongWritable>, Reducer<Text, LongWritable, Text, LongWritable>.Context) - Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsCombiner
-
- reduce(Text, Iterable<LongWritable>, Reducer<Text, LongWritable, Text, LongWritable>.Context) - Method in class org.apache.nutch.util.ProtocolStatusStatistics.ProtocolStatusStatisticsCombiner
-
- regex() - Method in class org.apache.nutch.urlfilter.api.RegexRule
-
Return if this rule's regex.
- regexEscape(String) - Method in class org.apache.nutch.indexer.staticfield.StaticFieldIndexer
-
Escapes any character that needs escaping so it can be used in a regexp.
- regexNormalize(String, String) - Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
This function does the replacements by iterating through all the regex
patterns.
- RegexParseFilter - Class in org.apache.nutch.parsefilter.regex
-
RegexParseFilter.
- RegexParseFilter() - Constructor for class org.apache.nutch.parsefilter.regex.RegexParseFilter
-
- RegexParseFilter(String) - Constructor for class org.apache.nutch.parsefilter.regex.RegexParseFilter
-
- RegexRule - Class in org.apache.nutch.urlfilter.api
-
A generic regular expression rule.
- RegexRule(boolean, String) - Constructor for class org.apache.nutch.urlfilter.api.RegexRule
-
Constructs a new regular expression rule.
- RegexRule(boolean, String, String) - Constructor for class org.apache.nutch.urlfilter.api.RegexRule
-
Constructs a new regular expression rule.
- RegexURLFilter - Class in org.apache.nutch.urlfilter.regex
-
- RegexURLFilter() - Constructor for class org.apache.nutch.urlfilter.regex.RegexURLFilter
-
- RegexURLFilter(String) - Constructor for class org.apache.nutch.urlfilter.regex.RegexURLFilter
-
- RegexURLFilterBase - Class in org.apache.nutch.urlfilter.api
-
- RegexURLFilterBase() - Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
Constructs a new empty RegexURLFilterBase
- RegexURLFilterBase(File) - Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
Constructs a new RegexURLFilter and init it with a file of rules.
- RegexURLFilterBase(String) - Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
Constructs a new RegexURLFilter and inits it with a list of rules.
- RegexURLFilterBase(Reader) - Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
Constructs a new RegexURLFilter and init it with a Reader of rules.
- RegexURLNormalizer - Class in org.apache.nutch.net.urlnormalizer.regex
-
Allows users to do regex substitutions on all/any URLs that are encountered,
which is useful for stripping session IDs from URLs.
- RegexURLNormalizer() - Constructor for class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
The default constructor which is called from UrlNormalizerFactory
(normalizerClass.newInstance()) in method: getNormalizer()*
- RegexURLNormalizer(Configuration) - Constructor for class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
- RegexURLNormalizer(Configuration, String) - Constructor for class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
Constructor which can be passed the file name, so it doesn't look in the
configuration files for it.
- REL_TAG - Static variable in class org.apache.nutch.microformats.reltag.RelTagParser
-
- RELATION - Static variable in interface org.apache.nutch.metadata.DublinCore
-
A reference to a related resource.
- RelTagIndexingFilter - Class in org.apache.nutch.microformats.reltag
-
- RelTagIndexingFilter() - Constructor for class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
-
- RelTagParser - Class in org.apache.nutch.microformats.reltag
-
Adds microformat rel-tags of document if found.
- RelTagParser() - Constructor for class org.apache.nutch.microformats.reltag.RelTagParser
-
- RemoteCommand - Class in org.apache.nutch.webui.client.impl
-
- RemoteCommand(JobConfig) - Constructor for class org.apache.nutch.webui.client.impl.RemoteCommand
-
- RemoteCommandBuilder - Class in org.apache.nutch.webui.client.impl
-
- RemoteCommandExecutor - Class in org.apache.nutch.webui.client.impl
-
This class executes remote job and waits for success/failure result
- RemoteCommandExecutor(NutchClient) - Constructor for class org.apache.nutch.webui.client.impl.RemoteCommandExecutor
-
- RemoteCommandExecutor.JobStateChecker - Class in org.apache.nutch.webui.client.impl
-
- RemoteCommandsBatchFactory - Class in org.apache.nutch.webui.client.impl
-
- RemoteCommandsBatchFactory() - Constructor for class org.apache.nutch.webui.client.impl.RemoteCommandsBatchFactory
-
- remove(String) - Method in class org.apache.nutch.metadata.Metadata
-
Remove a metadata and all its associated values.
- remove(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- removeField(String) - Method in class org.apache.nutch.indexer.NutchDocument
-
- removeInstance(Long) - Method in class org.apache.nutch.webui.service.impl.NutchInstanceServiceImpl
-
- removeInstance(Long) - Method in interface org.apache.nutch.webui.service.NutchInstanceService
-
- removeLockFile(FileSystem, Path) - Static method in class org.apache.nutch.util.LockUtil
-
Remove lock file.
- replace(String) - Method in class org.apache.nutch.indexer.replace.FieldReplacer
-
Return the replacement value for a field value.
- replace(FileSystem, Path, Path, boolean) - Static method in class org.apache.nutch.util.FSUtils
-
Replaces the current path with the new path and if set removes the old
path.
- replacefirstoccuranceof(String, String) - Static method in class org.apache.nutch.parsefilter.naivebayes.Train
-
- replaceHost(String, String, String) - Method in class org.apache.nutch.net.urlnormalizer.host.HostURLNormalizer
-
- ReplaceIndexer - Class in org.apache.nutch.indexer.replace
-
Do pattern replacements on selected field contents prior to indexing.
- ReplaceIndexer() - Constructor for class org.apache.nutch.indexer.replace.ReplaceIndexer
-
- reporter - Variable in class org.apache.nutch.hostdb.ResolverThread
-
- REPR_URL_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- reprUrl - Variable in class org.apache.nutch.hostdb.UpdateHostDbMapper
-
- reset() - Method in class org.apache.nutch.indexer.NutchField
-
- reset() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets all boolean values to false
.
- resetFailures() - Method in class org.apache.nutch.hostdb.HostDatum
-
- resetStatistics() - Method in class org.apache.nutch.hostdb.HostDatum
-
- resolveEncodingAlias(String) - Static method in class org.apache.nutch.util.EncodingDetector
-
- ResolverThread - Class in org.apache.nutch.hostdb
-
Simple runnable that performs DNS lookup for a single host.
- ResolverThread(String, HostDatum, OutputCollector<Text, HostDatum>, Reporter, int) - Constructor for class org.apache.nutch.hostdb.ResolverThread
-
Constructor.
- resolverThread - Variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- resolveURL(URL, String) - Static method in class org.apache.nutch.util.URLUtil
-
Resolve relative URL-s and fix a java.net.URL error in handling of URLs
with pure query targets.
- ResolveUrls - Class in org.apache.nutch.tools
-
A simple tool that will spin up multiple threads to resolve urls to ip
addresses.
- ResolveUrls(String) - Constructor for class org.apache.nutch.tools.ResolveUrls
-
Create a new ResolveUrls with a file from the local file system.
- ResolveUrls(String, int) - Constructor for class org.apache.nutch.tools.ResolveUrls
-
Create a new ResolveUrls with a urls file and a number of threads for the
Thread pool.
- resolveUrls() - Method in class org.apache.nutch.tools.ResolveUrls
-
Creates a thread pool for resolving urls.
- Response - Interface in org.apache.nutch.net.protocols
-
A response interface.
- RESPONSE_TIME - Static variable in class org.apache.nutch.protocol.http.api.HttpBase
-
- responseTime - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
Record response time in CrawlDatum's meta data, see property
http.store.responsetime.
- results - Variable in class org.apache.nutch.util.NutchTool
-
- retrieveFile(String, OutputStream, int) - Method in class org.apache.nutch.protocol.ftp.Client
-
retrieve file for path
- retrieveList(String, List<FTPFile>, int, FTPFileEntryParser) - Method in class org.apache.nutch.protocol.ftp.Client
-
retrieve list reply for path
- retrieveNgrams(Configuration) - Static method in class org.apache.nutch.scoring.similarity.cosine.Model
-
Retrieves mingram and maxgram from configuration
- RETRY - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Temporary failure.
- reverseHost(String) - Static method in class org.apache.nutch.util.TableUtil
-
- reverseKey - Variable in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- reverseKeyValue - Variable in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- reverseUrl(String) - Static method in class org.apache.nutch.tools.CommonCrawlDataDumper
-
- reverseUrl(String) - Static method in class org.apache.nutch.util.TableUtil
-
Reverses a url's domain.
- reverseUrl(URL) - Static method in class org.apache.nutch.util.TableUtil
-
Reverses a url's domain.
- rightPad(String, int) - Static method in class org.apache.nutch.util.StringUtil
-
Returns a copy of s
padded with trailing spaces so that it's
length is length
.
- RIGHTS - Static variable in interface org.apache.nutch.metadata.DublinCore
-
Information about rights held in and over the resource.
- RobotRulesParser - Class in org.apache.nutch.protocol
-
This class uses crawler-commons for handling the parsing of
robots.txt
files.
- RobotRulesParser() - Constructor for class org.apache.nutch.protocol.RobotRulesParser
-
- RobotRulesParser(Configuration) - Constructor for class org.apache.nutch.protocol.RobotRulesParser
-
- ROBOTS - Static variable in class org.apache.nutch.tools.WARCUtils
-
- ROBOTS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Access denied by robots.txt rules.
- root - Variable in class org.apache.nutch.util.TrieStringMatcher
-
- run(String[]) - Method in class org.apache.nutch.crawl.CrawlDb
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.crawl.CrawlDb
-
- run(String[]) - Method in class org.apache.nutch.crawl.CrawlDbMerger
-
- run(String[]) - Method in class org.apache.nutch.crawl.CrawlDbReader
-
- run(String[]) - Method in class org.apache.nutch.crawl.DeduplicationJob
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.crawl.DeduplicationJob
-
- run(String[]) - Method in class org.apache.nutch.crawl.Generator
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.crawl.Generator
-
- run(String[]) - Method in class org.apache.nutch.crawl.Injector
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.crawl.Injector
-
Used by the Nutch REST service
- run(String[]) - Method in class org.apache.nutch.crawl.LinkDb
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.crawl.LinkDb
-
- run(String[]) - Method in class org.apache.nutch.crawl.LinkDbMerger
-
- run(String[]) - Method in class org.apache.nutch.crawl.LinkDbReader
-
- run(RecordReader<Text, CrawlDatum>, OutputCollector<Text, NutchWritable>, Reporter) - Method in class org.apache.nutch.fetcher.Fetcher
-
- run(String[]) - Method in class org.apache.nutch.fetcher.Fetcher
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.fetcher.Fetcher
-
- run() - Method in class org.apache.nutch.fetcher.FetcherThread
-
- run() - Method in class org.apache.nutch.fetcher.QueueFeeder
-
- run(String[]) - Method in class org.apache.nutch.hostdb.ReadHostDb
-
- run() - Method in class org.apache.nutch.hostdb.ResolverThread
-
- run(String[]) - Method in class org.apache.nutch.hostdb.UpdateHostDb
-
- run(String[]) - Method in class org.apache.nutch.indexer.CleaningJob
-
- run(String[]) - Method in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- run(String[]) - Method in class org.apache.nutch.indexer.IndexingJob
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.indexer.IndexingJob
-
- run(String[]) - Method in class org.apache.nutch.parse.ParserChecker
-
- run(String[]) - Method in class org.apache.nutch.parse.ParseSegment
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.parse.ParseSegment
-
- run(String[]) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
- run(String[]) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper
-
Runs the LinkDumper tool.
- run(String[]) - Method in class org.apache.nutch.scoring.webgraph.LinkRank
-
Runs the LinkRank tool.
- run(String[]) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper
-
Runs the node dumper tool.
- run(String[]) - Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
-
Runs the ScoreUpdater tool.
- run(String[]) - Method in class org.apache.nutch.scoring.webgraph.WebGraph
-
Parses command link arguments and runs the WebGraph jobs.
- run(String[]) - Method in class org.apache.nutch.segment.SegmentMerger
-
- run(String[]) - Method in class org.apache.nutch.segment.SegmentReader
-
- run() - Method in class org.apache.nutch.service.impl.JobWorker
-
- run(String[]) - Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
-
- run(String[]) - Method in class org.apache.nutch.tools.Benchmark
-
- run(String[]) - Method in class org.apache.nutch.tools.CommonCrawlDataDumper
-
- run(String[]) - Method in class org.apache.nutch.tools.FreeGenerator
-
- run(String[]) - Method in class org.apache.nutch.tools.warc.WARCExporter
-
- run(String[]) - Method in class org.apache.nutch.util.CrawlCompletionStats
-
- run(String[]) - Method in class org.apache.nutch.util.domain.DomainStatistics
-
- run(Map<String, Object>, String) - Method in class org.apache.nutch.util.NutchTool
-
Runs the tool, using a map of arguments.
- run(String[]) - Method in class org.apache.nutch.util.ProtocolStatusStatistics
-
- save() - Method in class org.apache.nutch.collection.CollectionManager
-
Save collections into file
- save(SeedList) - Method in class org.apache.nutch.webui.service.impl.SeedListServiceImpl
-
- save(SeedList) - Method in interface org.apache.nutch.webui.service.SeedListService
-
- saveCrawl(Crawl) - Method in interface org.apache.nutch.webui.service.CrawlService
-
- saveCrawl(Crawl) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- saveDom(OutputStream, Element) - Static method in class org.apache.nutch.util.DomUtil
-
save dom into ouputstream
- saveInstance(NutchInstance) - Method in class org.apache.nutch.webui.service.impl.NutchInstanceServiceImpl
-
- saveInstance(NutchInstance) - Method in interface org.apache.nutch.webui.service.NutchInstanceService
-
- SCHEDULE_DEC_RATE - Static variable in class org.apache.nutch.crawl.MimeAdaptiveFetchSchedule
-
- SCHEDULE_INC_RATE - Static variable in class org.apache.nutch.crawl.MimeAdaptiveFetchSchedule
-
- SCHEDULE_MIME_FILE - Static variable in class org.apache.nutch.crawl.MimeAdaptiveFetchSchedule
-
- SchedulingPage - Class in org.apache.nutch.webui.pages
-
- SchedulingPage() - Constructor for class org.apache.nutch.webui.pages.SchedulingPage
-
- SCOPE_CRAWLDB - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when updating the CrawlDb with new URLs.
- SCOPE_DEFAULT - Static variable in class org.apache.nutch.net.URLNormalizers
-
Default scope.
- SCOPE_FETCHER - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used by
Fetcher
when processing
redirect URLs.
- SCOPE_GENERATE_HOST_COUNT - Static variable in class org.apache.nutch.net.URLNormalizers
-
- SCOPE_INDEXER - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when indexing URLs.
- SCOPE_INJECT - Static variable in class org.apache.nutch.net.URLNormalizers
-
- SCOPE_LINKDB - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when updating the LinkDb with new URLs.
- SCOPE_OUTLINK - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when constructing new
Outlink
instances.
- SCOPE_PARTITION - Static variable in class org.apache.nutch.net.URLNormalizers
-
- score - Variable in class org.apache.nutch.hostdb.HostDatum
-
- SCORE_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- ScoreUpdater - Class in org.apache.nutch.scoring.webgraph
-
Updates the score from the WebGraph node database into the crawl database.
- ScoreUpdater() - Constructor for class org.apache.nutch.scoring.webgraph.ScoreUpdater
-
- ScoringFilter - Interface in org.apache.nutch.scoring
-
A contract defining behavior of scoring plugins.
- ScoringFilterException - Exception in org.apache.nutch.scoring
-
Specialized exception for errors during scoring.
- ScoringFilterException() - Constructor for exception org.apache.nutch.scoring.ScoringFilterException
-
- ScoringFilterException(String) - Constructor for exception org.apache.nutch.scoring.ScoringFilterException
-
- ScoringFilterException(String, Throwable) - Constructor for exception org.apache.nutch.scoring.ScoringFilterException
-
- ScoringFilterException(Throwable) - Constructor for exception org.apache.nutch.scoring.ScoringFilterException
-
- ScoringFilters - Class in org.apache.nutch.scoring
-
- ScoringFilters(Configuration) - Constructor for class org.apache.nutch.scoring.ScoringFilters
-
- SearchPage - Class in org.apache.nutch.webui.pages
-
- SearchPage() - Constructor for class org.apache.nutch.webui.pages.SearchPage
-
- SECONDS_PER_DAY - Static variable in interface org.apache.nutch.crawl.FetchSchedule
-
- secondsToDaysHMS(long) - Static method in class org.apache.nutch.util.TimingUtil
-
Show time in seconds as days, hours, minutes and seconds (d days, hh:mm:ss)
- secondsToHMS(long) - Static method in class org.apache.nutch.util.TimingUtil
-
Show time in seconds as hours, minutes and seconds (hh:mm:ss)
- SeedList - Class in org.apache.nutch.service.model.request
-
- SeedList() - Constructor for class org.apache.nutch.service.model.request.SeedList
-
- SeedList - Class in org.apache.nutch.webui.model
-
- SeedList() - Constructor for class org.apache.nutch.webui.model.SeedList
-
- SeedListService - Interface in org.apache.nutch.webui.service
-
- SeedListServiceImpl - Class in org.apache.nutch.webui.service.impl
-
- SeedListServiceImpl() - Constructor for class org.apache.nutch.webui.service.impl.SeedListServiceImpl
-
- SeedListsPage - Class in org.apache.nutch.webui.pages.seed
-
This page is for seed lists management
- SeedListsPage() - Constructor for class org.apache.nutch.webui.pages.seed.SeedListsPage
-
- SeedManager - Interface in org.apache.nutch.service
-
- SeedManagerImpl - Class in org.apache.nutch.service.impl
-
- SeedManagerImpl() - Constructor for class org.apache.nutch.service.impl.SeedManagerImpl
-
- SeedPage - Class in org.apache.nutch.webui.pages.seed
-
This page is for seed urls management
- SeedPage() - Constructor for class org.apache.nutch.webui.pages.seed.SeedPage
-
- SeedPage(PageParameters) - Constructor for class org.apache.nutch.webui.pages.seed.SeedPage
-
- SeedResource - Class in org.apache.nutch.service.resources
-
- SeedResource() - Constructor for class org.apache.nutch.service.resources.SeedResource
-
- SeedUrl - Class in org.apache.nutch.service.model.request
-
- SeedUrl() - Constructor for class org.apache.nutch.service.model.request.SeedUrl
-
- SeedUrl(String) - Constructor for class org.apache.nutch.service.model.request.SeedUrl
-
- SeedUrl - Class in org.apache.nutch.webui.model
-
- SeedUrl() - Constructor for class org.apache.nutch.webui.model.SeedUrl
-
- SEGMENT_NAME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- SegmentChecker - Class in org.apache.nutch.segment
-
Checks whether a segment is valid, or has a certain status (generated,
fetched, parsed), or can be used safely for a certain processing step
(e.g., indexing).
- SegmentChecker() - Constructor for class org.apache.nutch.segment.SegmentChecker
-
- SegmentMergeFilter - Interface in org.apache.nutch.segment
-
Interface used to filter segments during segment merge.
- SegmentMergeFilters - Class in org.apache.nutch.segment
-
This class wraps all
SegmentMergeFilter
extensions in a single object
so it is easier to operate on them.
- SegmentMergeFilters(Configuration) - Constructor for class org.apache.nutch.segment.SegmentMergeFilters
-
- SegmentMerger - Class in org.apache.nutch.segment
-
This tool takes several segments and merges their data together.
- SegmentMerger() - Constructor for class org.apache.nutch.segment.SegmentMerger
-
- SegmentMerger(Configuration) - Constructor for class org.apache.nutch.segment.SegmentMerger
-
- SegmentMerger.ObjectInputFormat - Class in org.apache.nutch.segment
-
Wraps inputs in an
MetaWrapper
, to permit merging different types
in reduce and use additional metadata.
- SegmentMerger.SegmentOutputFormat - Class in org.apache.nutch.segment
-
- segmentName - Variable in class org.apache.nutch.segment.SegmentPart
-
Name of the segment (just the last path component).
- SegmentOutputFormat() - Constructor for class org.apache.nutch.segment.SegmentMerger.SegmentOutputFormat
-
- SegmentPart - Class in org.apache.nutch.segment
-
Utility class for handling information about segment parts.
- SegmentPart() - Constructor for class org.apache.nutch.segment.SegmentPart
-
- SegmentPart(String, String) - Constructor for class org.apache.nutch.segment.SegmentPart
-
- SegmentReader - Class in org.apache.nutch.segment
-
Dump the content of a segment.
- SegmentReader() - Constructor for class org.apache.nutch.segment.SegmentReader
-
- SegmentReader(Configuration, boolean, boolean, boolean, boolean, boolean, boolean) - Constructor for class org.apache.nutch.segment.SegmentReader
-
- SegmentReader.InputCompatMapper - Class in org.apache.nutch.segment
-
- SegmentReader.SegmentReaderStats - Class in org.apache.nutch.segment
-
- SegmentReader.TextOutputFormat - Class in org.apache.nutch.segment
-
Implements a text output format
- SegmentReaderStats() - Constructor for class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- segnum - Variable in class org.apache.nutch.crawl.Generator.SelectorEntry
-
- Selector() - Constructor for class org.apache.nutch.crawl.Generator.Selector
-
- SelectorEntry() - Constructor for class org.apache.nutch.crawl.Generator.SelectorEntry
-
- SelectorInverseMapper() - Constructor for class org.apache.nutch.crawl.Generator.SelectorInverseMapper
-
- sendNoOp() - Method in class org.apache.nutch.protocol.ftp.Client
-
Sends a NOOP command to the FTP server.
- seqRead(ReaderConfig, int, int, int, boolean) - Method in class org.apache.nutch.service.resources.ReaderResouce
-
Read a sequence file
- SequenceReader - Class in org.apache.nutch.service.impl
-
Enables reading a sequence file and methods provide different
ways to read the file.
- SequenceReader() - Constructor for class org.apache.nutch.service.impl.SequenceReader
-
- server - Variable in class org.apache.nutch.service.resources.AbstractResource
-
- SERVER_URL - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- set(CrawlDatum) - Method in class org.apache.nutch.crawl.CrawlDatum
-
Copy the contents of another instance into this instance.
- set(String, String) - Method in class org.apache.nutch.metadata.Metadata
-
Set metadata name/value.
- set(String, String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- setAdditionalPostHeaders(Map<String, String>) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- setAll(Properties) - Method in class org.apache.nutch.metadata.Metadata
-
Copy All key-value pairs from properties.
- setAnchor(String) - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- setApplicationContext(ApplicationContext) - Method in class org.apache.nutch.webui.NutchUiApplication
-
- setArgs(String[]) - Method in class org.apache.nutch.parse.ParseStatus
-
- setArgs(String[]) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- setArgs(Map<String, String>) - Method in class org.apache.nutch.service.model.request.DbQuery
-
- setArgs(Map<String, Object>) - Method in class org.apache.nutch.service.model.request.JobConfig
-
- setArgs(Map<String, Object>) - Method in class org.apache.nutch.service.model.response.JobInfo
-
- setArgs(Map<String, Object>) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setArgs(Map<String, Object>) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setArgument(String, String) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setBaseHref(URL) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the baseHref
.
- setBlackList(String) - Method in class org.apache.nutch.collection.Subcollection
-
Set contents of blacklist from String
- setChildNodes(Outlink[]) - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- setChildren(List<FetchNodeDbInfo.ChildNode>) - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- setClazz(String) - Method in class org.apache.nutch.plugin.Extension
-
Sets the Class that implement the concret extension and is only used until
model creation at system start up.
- setCode(int) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- setCommand(String) - Method in class org.apache.nutch.util.CommandRunner
-
- setCompressed(boolean) - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- setConf(Configuration) - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
-
- setConf(Configuration) - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.MimeAdaptiveFetchSchedule
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.Signature
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.CleaningJob
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.feed.FeedIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.filter.MimeTypeIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.geoip.GeoIPIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.links.LinksIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.metadata.MetadataIndexer
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.replace.ReplaceIndexer
- setConf(Configuration) - Method in class org.apache.nutch.indexer.staticfield.StaticFieldIndexer
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter
-
handles conf assignment and pulls the value assignment from the
"urlmeta.tags" property
- setConf(Configuration) - Method in class org.apache.nutch.indexwriter.dummy.DummyIndexWriter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- setConf(Configuration) - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.microformats.reltag.RelTagParser
-
- setConf(Configuration) - Method in class org.apache.nutch.net.urlnormalizer.host.HostURLNormalizer
-
- setConf(Configuration) - Method in class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
-
- setConf(Configuration) - Method in class org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer
-
- setConf(Configuration) - Method in class org.apache.nutch.net.urlnormalizer.querystring.QuerystringURLNormalizer
-
- setConf(Configuration) - Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
- setConf(Configuration) - Method in class org.apache.nutch.net.urlnormalizer.slash.SlashURLNormalizer
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.ext.ExtParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.feed.FeedParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.headings.HeadingsParseFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.html.HtmlParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.js.JSParseFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.metatags.MetaTagsParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.ParserChecker
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.swf.SWFParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.tika.TikaParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.zip.ZipParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.parsefilter.regex.RegexParseFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.file.File
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.htmlunit.Http
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.http.api.HttpRobotRulesParser
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.http.Http
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.httpclient.Http
-
Reads the configuration from the Nutch configuration files and sets the
configuration.
- setConf(Configuration) - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.selenium.Http
-
- setConf(Configuration) - Method in class org.apache.nutch.publisher.NutchPublishers
-
- setConf(Configuration) - Method in class org.apache.nutch.publisher.rabbitmq.RabbitMQPublisherImpl
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.AbstractScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.depth.DepthScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.similarity.cosine.CosineSimilarity
-
- setConf(Configuration) - Method in interface org.apache.nutch.scoring.similarity.SimilarityModel
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.similarity.SimilarityScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter
-
handles conf assignment and pulls the value assignment from the
"urlmeta.tags" property
- setConf(Configuration) - Method in class org.apache.nutch.segment.SegmentMerger
-
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.domain.DomainURLFilter
-
Sets the configuration.
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.domainblacklist.DomainBlacklistURLFilter
-
Sets the configuration.
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.validator.UrlValidator
-
- setConf(Configuration) - Method in class org.apache.nutch.util.GenericWritableConfigurable
-
- setConf(Configuration) - Method in class org.creativecommons.nutch.CCIndexingFilter
-
- setConf(Configuration) - Method in class org.creativecommons.nutch.CCParseFilter
-
- setConfId(String) - Method in class org.apache.nutch.service.model.request.DbQuery
-
- setConfId(String) - Method in class org.apache.nutch.service.model.request.JobConfig
-
- setConfId(String) - Method in class org.apache.nutch.service.model.response.JobInfo
-
- setConfId(String) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setConfId(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setConfig(Configuration) - Method in interface org.apache.nutch.publisher.NutchPublisher
-
Use implementation specific configurations
- setConfig(Configuration) - Method in class org.apache.nutch.publisher.NutchPublishers
-
- setConfig(Configuration) - Method in class org.apache.nutch.publisher.rabbitmq.RabbitMQPublisherImpl
-
- setConfigId(String) - Method in class org.apache.nutch.service.model.request.NutchConfig
-
- setConfiguration(Set<String>) - Method in class org.apache.nutch.service.model.response.NutchServerInfo
-
- setConfiguration(Set<String>) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setConnectionFailures(Integer) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setConnectionStatus(ConnectionStatus) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setContent(byte[]) - Method in class org.apache.nutch.protocol.Content
-
- setContent(Content) - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- setContentType(String) - Method in class org.apache.nutch.protocol.Content
-
- setCookiePolicy(String) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- setCookies(String) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthentication
-
- setCrawlId(String) - Method in class org.apache.nutch.service.model.request.DbQuery
-
- setCrawlId(String) - Method in class org.apache.nutch.service.model.request.JobConfig
-
- setCrawlId(String) - Method in class org.apache.nutch.service.model.response.JobInfo
-
- setCrawlId(String) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setCrawlId(String) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setCrawlId(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setCrawlName(String) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setDataTimeout(int) - Method in class org.apache.nutch.protocol.ftp.Client
-
Sets the timeout in milliseconds to use for data connection.
- setDescriptor(PluginDescriptor) - Method in class org.apache.nutch.plugin.Extension
-
Sets the plugin descriptor and is only used until model creation at system
start up.
- setDnsFailures(Integer) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setDocumentLocator(Locator) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive an object for locating the origin of SAX document events.
- setEventData(Map<String, Object>) - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Set metadata to this even
- setEventType(FetcherThreadEvent.PublishEventType) - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Set event type of this object
- setFetched(int) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setFetchInterval(int) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setFetchInterval(float) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
Sets the fetchInterval
and fetchTime
on a
successfully fetched page.
- setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - Method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
-
- setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - Method in class org.apache.nutch.crawl.DefaultFetchSchedule
-
- setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
Sets the fetchInterval
and fetchTime
on a
successfully fetched page.
- setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - Method in class org.apache.nutch.crawl.MimeAdaptiveFetchSchedule
-
- setFetchTime(long) - Method in class org.apache.nutch.crawl.CrawlDatum
-
Sets either the time of the last fetch or the next fetch time, depending on
whether Fetcher or CrawlDbReducer set the time.
- setFetchTime(long) - Method in class org.apache.nutch.fetcher.FetchNode
-
- setFileType(int) - Method in class org.apache.nutch.protocol.ftp.Client
-
Sets the file type to be transferred.
- setFilterFromPath(boolean) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- setFollowTalk(boolean) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Set followTalk
- setForce(boolean) - Method in class org.apache.nutch.service.model.request.NutchConfig
-
- setGone(int) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setHalted(boolean) - Method in class org.apache.nutch.fetcher.FetcherThread
-
- setHomepageUrl(String) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setHost(String) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setId(String) - Method in class org.apache.nutch.plugin.Extension
-
Sets the unique extension Id and is only used until model creation at
system start up.
- setId(Long) - Method in class org.apache.nutch.service.model.request.SeedList
-
- setId(Long) - Method in class org.apache.nutch.service.model.request.SeedUrl
-
- setId(String) - Method in class org.apache.nutch.service.model.response.JobInfo
-
- setId(Long) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setId(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setId(Long) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setId(Long) - Method in class org.apache.nutch.webui.model.SeedList
-
- setId(Long) - Method in class org.apache.nutch.webui.model.SeedUrl
-
- setIDAttribute(String, Element) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Set an ID string to node association in the ID table.
- setIgnoreCase(boolean) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- setInfo(JobInfo) - Method in class org.apache.nutch.service.impl.JobWorker
-
- setInLinks(List<String>) - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- setInLinks(List<String>) - Method in interface org.apache.nutch.tools.CommonCrawlFormat
-
sets inlinks of this document
- setInlinkScore(float) - Method in class org.apache.nutch.scoring.webgraph.Node
-
- setInputStream(InputStream) - Method in class org.apache.nutch.util.CommandRunner
-
- setInstances(List<NutchInstance>) - Method in class org.apache.nutch.webui.config.NutchGuiConfiguration
-
- setJobClassName(String) - Method in class org.apache.nutch.service.model.request.JobConfig
-
- setJobClassName(String) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setJobConfig(JobConfig) - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- setJobInfo(JobInfo) - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- setJobs(Collection<JobInfo>) - Method in class org.apache.nutch.service.model.response.NutchServerInfo
-
- setJobs(Collection<JobInfo>) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setJsonArray(boolean) - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- setKeepConnection(boolean) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Set keepConnection
- setKeyPrefix(String) - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- setLastCheck() - Method in class org.apache.nutch.hostdb.HostDatum
-
- setLastCheck(Date) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setLastModified(long) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- setLinks(LinkDumper.LinkNode[]) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
-
- setLinkType(byte) - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- setLoginFormId(String) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- setLoginPostData(Map<String, String>) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- setLoginRedirect(boolean) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- setLoginUrl(String) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- setMajorCode(byte) - Method in class org.apache.nutch.parse.ParseStatus
-
- setMaxContentLength(int) - Method in class org.apache.nutch.protocol.file.File
-
Set the length after at which content is truncated.
- setMaxContentLength(int) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Set the point at which content is truncated.
- setMessage(String) - Method in class org.apache.nutch.parse.ParseStatus
-
- setMessage(String) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- setMeta(String, String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Set metadata.
- setMetaData(MapWritable) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setMetaData(MapWritable) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setMetadata(MapWritable) - Method in class org.apache.nutch.parse.Outlink
-
- setMetadata(Metadata) - Method in class org.apache.nutch.protocol.Content
-
Other protocol-specific data.
- setMetadata(Metadata) - Method in class org.apache.nutch.scoring.webgraph.Node
-
- setMinorCode(short) - Method in class org.apache.nutch.parse.ParseStatus
-
- setModeAccept(boolean) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- setModel(IModel<Crawl>) - Method in class org.apache.nutch.webui.pages.crawls.CrawlPanel
-
- setModel(IModel<NutchInstance>) - Method in class org.apache.nutch.webui.pages.instances.InstancePanel
-
- setModifiedTime(long) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setMsg(String) - Method in class org.apache.nutch.service.model.response.JobInfo
-
- setMsg(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setName(String) - Method in class org.apache.nutch.service.model.request.SeedList
-
- setName(String) - Method in class org.apache.nutch.webui.model.NutchConfig
-
- setName(String) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setName(String) - Method in class org.apache.nutch.webui.model.SeedList
-
- setNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noCache
to true
.
- setNode(Node) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
-
- setNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noFollow
to true
.
- setNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noIndex
to true
.
- setNotModified(int) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setNumberOfRounds(Integer) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setNumInlinks(int) - Method in class org.apache.nutch.scoring.webgraph.Node
-
- setNumOfOutlinks(int) - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- setNumOutlinks(int) - Method in class org.apache.nutch.scoring.webgraph.Node
-
- setObject(String, Object) - Method in class org.apache.nutch.util.ObjectCache
-
- setOutlinks(Outlink[]) - Method in class org.apache.nutch.fetcher.FetchNode
-
- setOutlinks(Outlink[]) - Method in class org.apache.nutch.parse.ParseData
-
- setOutputDir(String) - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- setPageGoneSchedule(Text, CrawlDatum, long, long, long) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method specifies how to schedule refetching of pages marked as GONE.
- setPageGoneSchedule(Text, CrawlDatum, long, long, long) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
This method specifies how to schedule refetching of pages marked as GONE.
- setPageRetrySchedule(Text, CrawlDatum, long, long, long) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method adjusts the fetch schedule if fetching needs to be re-tried due
to transient errors.
- setPageRetrySchedule(Text, CrawlDatum, long, long, long) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
This method adjusts the fetch schedule if fetching needs to be re-tried due
to transient errors.
- setParams(Map<String, String>) - Method in class org.apache.nutch.service.model.request.NutchConfig
-
- setParseMeta(Metadata) - Method in class org.apache.nutch.parse.ParseData
-
- setPassword(String) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setPath(String) - Method in class org.apache.nutch.service.model.request.ReaderConfig
-
- setPort(int) - Static method in class org.apache.nutch.service.NutchServer
-
- setPort(Integer) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setProgress(int) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setProperty(String, String, String) - Method in interface org.apache.nutch.service.ConfManager
-
- setProperty(String, String, String) - Method in class org.apache.nutch.service.impl.ConfManagerImpl
-
Sets the given property in the configuration associated with the confId
- setRedirect(boolean) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthentication
-
- setRedirPerm(int) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setRedirTemp(int) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setRefresh(boolean) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets refresh
to the supplied value.
- setRefreshHref(URL) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the refreshHref
.
- setRefreshTime(int) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the refreshTime
.
- setRemoteVerificationEnabled(boolean) - Method in class org.apache.nutch.protocol.ftp.Client
-
Enable or disable verification that the remote host taking part of a data
connection is the same as the host to which the control connection is
attached.
- setRemovedFormFields(Set<String>) - Method in class org.apache.nutch.protocol.httpclient.HttpFormAuthConfigurer
-
- setRequestDelay(Duration) - Method in class org.apache.nutch.webui.client.impl.RemoteCommandExecutor
-
- setResult(Map<String, Object>) - Method in class org.apache.nutch.service.model.response.JobInfo
-
- setResult(Map<String, Object>) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setRetriesSinceFetch(int) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setReverseKey(boolean) - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- setReverseKeyValue(String) - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- setRunningJobs(Collection<JobInfo>) - Method in class org.apache.nutch.service.model.response.NutchServerInfo
-
- setRunningJobs(Collection<JobInfo>) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setScore(float) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setScore(float) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setScore(float) - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- setSeedDirectory(String) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setSeedFilePath(String) - Method in class org.apache.nutch.service.model.request.SeedList
-
- setSeedList(String, SeedList) - Method in class org.apache.nutch.service.impl.SeedManagerImpl
-
- setSeedList(SeedList) - Method in class org.apache.nutch.service.model.request.SeedUrl
-
- setSeedList(String, SeedList) - Method in interface org.apache.nutch.service.SeedManager
-
- setSeedList(SeedList) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setSeedList(SeedList) - Method in class org.apache.nutch.webui.model.SeedUrl
-
- setSeedUrls(Collection<SeedUrl>) - Method in class org.apache.nutch.service.model.request.SeedList
-
- setSeedUrls(Collection<SeedUrl>) - Method in class org.apache.nutch.webui.model.SeedList
-
- setSignature(byte[]) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setSimpleDateFormat(boolean) - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- setStartDate(Date) - Method in class org.apache.nutch.service.model.response.NutchServerInfo
-
- setStartDate(Date) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setState(JobInfo.State) - Method in class org.apache.nutch.service.model.response.JobInfo
-
- setState(JobInfo.State) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setStatus(int) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setStatus(int) - Method in class org.apache.nutch.fetcher.FetchNode
-
- setStatus(ProtocolStatus) - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- setStatus(int) - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- setStatus(Crawl.CrawlStatus) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setStdErrorStream(OutputStream) - Method in class org.apache.nutch.util.CommandRunner
-
- setStdOutputStream(OutputStream) - Method in class org.apache.nutch.util.CommandRunner
-
- setTermFreqVector(HashMap<String, Integer>) - Method in class org.apache.nutch.scoring.similarity.cosine.DocVector
-
- setTimeLimit(long) - Method in class org.apache.nutch.fetcher.QueueFeeder
-
- setTimeout(int) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Set the timeout.
- setTimeout(int) - Method in class org.apache.nutch.util.CommandRunner
-
- setTimeout(Duration) - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- setTimestamp(Long) - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Set timestamp for this event
- setTimestamp(long) - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- SettingsPage - Class in org.apache.nutch.webui.pages.settings
-
- SettingsPage() - Constructor for class org.apache.nutch.webui.pages.settings.SettingsPage
-
- setTitle(String) - Method in class org.apache.nutch.fetcher.FetchNode
-
- setType(String) - Method in class org.apache.nutch.service.model.request.DbQuery
-
- setType(JobManager.JobType) - Method in class org.apache.nutch.service.model.request.JobConfig
-
- setType(JobManager.JobType) - Method in class org.apache.nutch.service.model.response.JobInfo
-
- setType(JobInfo.JobType) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setType(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setUnfetched(int) - Method in class org.apache.nutch.hostdb.HostDatum
-
- setup(Mapper<Text, Writable, Text, CrawlDatum>.Context) - Method in class org.apache.nutch.crawl.Injector.InjectMapper
-
- setup(Reducer<Text, CrawlDatum, Text, CrawlDatum>.Context) - Method in class org.apache.nutch.crawl.Injector.InjectReducer
-
- setUrl(String) - Method in class org.apache.nutch.fetcher.FetcherThreadEvent
-
Set URL of this event (fetched page)
- setUrl(Text) - Method in class org.apache.nutch.fetcher.FetchNode
-
- setUrl(String) - Method in class org.apache.nutch.parse.Outlink
-
- setUrl(String) - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
-
- setUrl(String) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
-
- setUrl(String) - Method in class org.apache.nutch.service.model.request.SeedUrl
-
- setUrl(String) - Method in class org.apache.nutch.service.model.response.FetchNodeDbInfo
-
- setUrl(String) - Method in class org.apache.nutch.webui.model.SeedUrl
-
- setURLScoreAfterParsing(Text, Content, Parse) - Method in class org.apache.nutch.scoring.similarity.cosine.CosineSimilarity
-
- setURLScoreAfterParsing(Text, Content, Parse) - Method in interface org.apache.nutch.scoring.similarity.SimilarityModel
-
- setUsername(String) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setValue(String) - Method in class org.apache.nutch.webui.model.NutchConfig
-
- setVectorEntry(int, long) - Method in class org.apache.nutch.scoring.similarity.cosine.DocVector
-
- setWaitForExit(boolean) - Method in class org.apache.nutch.util.CommandRunner
-
- setWarcSize(long) - Method in class org.apache.nutch.tools.CommonCrawlConfig
-
- setWeight(float) - Method in class org.apache.nutch.indexer.NutchDocument
-
- setWeight(float) - Method in class org.apache.nutch.indexer.NutchField
-
- setWhiteList(ArrayList<String>) - Method in class org.apache.nutch.collection.Subcollection
-
- setWhiteList(String) - Method in class org.apache.nutch.collection.Subcollection
-
Set contents of whitelist from String
- shortestMatch(String) - Method in class org.apache.nutch.util.PrefixStringMatcher
-
Returns the shortest prefix of input that is matched,
or null if no match exists.
- shortestMatch(String) - Method in class org.apache.nutch.util.SuffixStringMatcher
-
Returns the shortest suffix of input that is matched,
or null if no match exists.
- shortestMatch(String) - Method in class org.apache.nutch.util.TrieStringMatcher
-
Returns the shortest substring of input that is
matched by a pattern in the trie, or null if no match
exists.
- shouldCheck(HostDatum) - Method in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
Determines whether a record should be checked.
- shouldFetch(Text, CrawlDatum, long) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method provides information whether the page is suitable for selection
in the current fetchlist.
- shouldFetch(Text, CrawlDatum, long) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
This method provides information whether the page is suitable for selection
in the current fetchlist.
- shutDown() - Method in class org.apache.nutch.plugin.Plugin
-
Shutdown the plugin.
- Signature - Class in org.apache.nutch.crawl
-
- Signature() - Constructor for class org.apache.nutch.crawl.Signature
-
- SIGNATURE_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- SignatureComparator - Class in org.apache.nutch.crawl
-
- SignatureComparator() - Constructor for class org.apache.nutch.crawl.SignatureComparator
-
- SignatureFactory - Class in org.apache.nutch.crawl
-
Factory class, which instantiates a Signature implementation according to the
current Configuration configuration.
- SimilarityModel - Interface in org.apache.nutch.scoring.similarity
-
- SimilarityScoringFilter - Class in org.apache.nutch.scoring.similarity
-
- SimilarityScoringFilter() - Constructor for class org.apache.nutch.scoring.similarity.SimilarityScoringFilter
-
- simpleDateFormat - Variable in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- size() - Method in class org.apache.nutch.crawl.Inlinks
-
- size() - Method in class org.apache.nutch.metadata.Metadata
-
Returns the number of metadata names in this metadata.
- size() - Method in class org.apache.nutch.parse.ParseResult
-
Return the number of parse outputs (both successful and failed)
- skip(DataInput) - Static method in class org.apache.nutch.crawl.Inlink
-
Skips over one Inlink in the input.
- skip(DataInput) - Static method in class org.apache.nutch.parse.Outlink
-
Skips over one Outlink in the input.
- SKIP_TRUNCATED - Static variable in class org.apache.nutch.parse.ParseSegment
-
- skipChildren() - Method in class org.apache.nutch.util.NodeWalker
-
Skips over and removes from the node stack the children of the last node.
- skippedEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of a skipped entity.
- SlashURLNormalizer - Class in org.apache.nutch.net.urlnormalizer.slash
-
- SlashURLNormalizer() - Constructor for class org.apache.nutch.net.urlnormalizer.slash.SlashURLNormalizer
-
- SlashURLNormalizer(String) - Constructor for class org.apache.nutch.net.urlnormalizer.slash.SlashURLNormalizer
-
- slice(String, int, int) - Method in class org.apache.nutch.service.impl.LinkReader
-
- slice(String, int, int) - Method in class org.apache.nutch.service.impl.NodeReader
-
- slice(String, int, int) - Method in class org.apache.nutch.service.impl.SequenceReader
-
- slice(String, int, int) - Method in interface org.apache.nutch.service.NutchReader
-
- SOFTWARE - Static variable in class org.apache.nutch.tools.WARCUtils
-
- SOLR_PREFIX - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- SolrConstants - Interface in org.apache.nutch.indexwriter.solr
-
- SolrIndexWriter - Class in org.apache.nutch.indexwriter.solr
-
- SolrIndexWriter() - Constructor for class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- SolrMappingReader - Class in org.apache.nutch.indexwriter.solr
-
- SolrMappingReader(Configuration) - Constructor for class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- SolrUtils - Class in org.apache.nutch.indexwriter.solr
-
- SolrUtils() - Constructor for class org.apache.nutch.indexwriter.solr.SolrUtils
-
- Sorter() - Constructor for class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
-
- SOURCE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
A reference to a resource from which the present resource is derived.
- SpellCheckedMetadata - Class in org.apache.nutch.metadata
-
A decorator to Metadata that adds spellchecking capabilities to property
names.
- SpellCheckedMetadata() - Constructor for class org.apache.nutch.metadata.SpellCheckedMetadata
-
- splitEnd - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- splitLen - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- splitStart - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- SpringConfiguration - Class in org.apache.nutch.webui.config
-
- SpringConfiguration() - Constructor for class org.apache.nutch.webui.config.SpringConfiguration
-
- start(String) - Static method in class org.apache.nutch.parsefilter.naivebayes.Train
-
- start - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- startArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- startArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.CommonCrawlFormatJackson
-
- startArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.CommonCrawlFormatJettinson
-
- startArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.CommonCrawlFormatSimple
-
- startArray(String, boolean, boolean) - Method in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- startCDATA() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Report the start of a CDATA section.
- startCrawl(Long, NutchInstance) - Method in interface org.apache.nutch.webui.service.CrawlService
-
- startCrawl(Long, NutchInstance) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- startDocument() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of the beginning of a document.
- startDTD(String, String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Report the start of DTD declarations, if any.
- startElement(String, String, String, Attributes) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of the beginning of an element.
- startEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Report the beginning of an entity.
- startObject(String) - Method in class org.apache.nutch.tools.AbstractCommonCrawlFormat
-
- startObject(String) - Method in class org.apache.nutch.tools.CommonCrawlFormatJackson
-
- startObject(String) - Method in class org.apache.nutch.tools.CommonCrawlFormatJettinson
-
- startObject(String) - Method in class org.apache.nutch.tools.CommonCrawlFormatSimple
-
- startObject(String) - Method in class org.apache.nutch.tools.CommonCrawlFormatWARC
-
- startPrefixMapping(String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Begin the scope of a prefix-URI Namespace mapping.
- startServer() - Static method in class org.apache.nutch.service.NutchServer
-
- startUp() - Method in class org.apache.nutch.plugin.Plugin
-
Will be invoked until plugin start up.
- STAT_PROGRESS - Static variable in interface org.apache.nutch.metadata.Nutch
-
For progress of job.
- StaticFieldIndexer - Class in org.apache.nutch.indexer.staticfield
-
A simple plugin called at indexing that adds fields with static data.
- StaticFieldIndexer() - Constructor for class org.apache.nutch.indexer.staticfield.StaticFieldIndexer
-
- StatisticsPage - Class in org.apache.nutch.webui.pages
-
- StatisticsPage() - Constructor for class org.apache.nutch.webui.pages.StatisticsPage
-
- statNames - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- status - Variable in class org.apache.nutch.util.NutchTool
-
- STATUS_BLOCKED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_DB_DUPLICATE - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- STATUS_DB_FETCHED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page was successfully fetched.
- STATUS_DB_GONE - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page no longer exists.
- STATUS_DB_MAX - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Maximum value of DB-related status.
- STATUS_DB_NOTMODIFIED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page was successfully fetched and found not modified.
- STATUS_DB_REDIR_PERM - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page permanently redirects to other page.
- STATUS_DB_REDIR_TEMP - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page temporarily redirects to other page.
- STATUS_DB_UNFETCHED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page was not fetched yet.
- STATUS_FAILED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_FAILURE - Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_FETCH_GONE - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching unsuccessful - page is gone.
- STATUS_FETCH_MAX - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Maximum value of fetch-related status.
- STATUS_FETCH_NOTMODIFIED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching successful - page is not modified.
- STATUS_FETCH_REDIR_PERM - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching permanently redirected to other page.
- STATUS_FETCH_REDIR_TEMP - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching temporarily redirected to other page.
- STATUS_FETCH_RETRY - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching unsuccessful, needs to be retried (transient errors).
- STATUS_FETCH_SUCCESS - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching was successful.
- STATUS_GONE - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_INJECTED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page was newly injected.
- STATUS_LINKED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page discovered through a link.
- STATUS_MODIFIED - Static variable in interface org.apache.nutch.crawl.FetchSchedule
-
Page is known to have been modified since our last visit.
- STATUS_NOTFETCHING - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTFOUND - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTMODIFIED - Static variable in interface org.apache.nutch.crawl.FetchSchedule
-
Page is known to remain unmodified since our last visit.
- STATUS_NOTMODIFIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTPARSED - Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_PARSE_META - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page got metadata from a parser
- STATUS_REDIR_EXCEEDED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_RETRY - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_ROBOTS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_SIGNATURE - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page signature.
- STATUS_SUCCESS - Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_SUCCESS - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_UNKNOWN - Static variable in interface org.apache.nutch.crawl.FetchSchedule
-
It is unknown whether page was changed since our last visit.
- STATUS_WOULDBLOCK - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- StatusUpdateReducer() - Constructor for class org.apache.nutch.crawl.DeduplicationJob.StatusUpdateReducer
-
- stop(String, String) - Method in class org.apache.nutch.service.impl.JobManagerImpl
-
- stop(String, String) - Method in interface org.apache.nutch.service.JobManager
-
- stop() - Method in class org.apache.nutch.service.NutchServer
-
- stop(String, String) - Method in class org.apache.nutch.service.resources.JobResource
-
Stop Job
- stopJob() - Method in class org.apache.nutch.service.impl.JobWorker
-
To stop the executing job
- stopJob() - Method in class org.apache.nutch.util.NutchTool
-
Stop the job with the possibility to resume.
- stopServer(boolean) - Method in class org.apache.nutch.service.resources.AdminResource
-
Stop the Nutch server
- stringFields - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- stringFieldWritables - Static variable in class org.apache.nutch.hostdb.UpdateHostDbReducer
-
- StringUtil - Class in org.apache.nutch.util
-
A collection of String processing utility methods.
- StringUtil() - Constructor for class org.apache.nutch.util.StringUtil
-
- stripNonCharCodepoints(String) - Static method in class org.apache.nutch.indexwriter.solr.SolrUtils
-
- Subcollection - Class in org.apache.nutch.collection
-
SubCollection represents a subset of index, you can define url patterns that
will indicate that particular page (url) is part of SubCollection.
- Subcollection(String, String, Configuration) - Constructor for class org.apache.nutch.collection.Subcollection
-
public Constructor
- Subcollection(String, String, String, Configuration) - Constructor for class org.apache.nutch.collection.Subcollection
-
public Constructor
- Subcollection(Configuration) - Constructor for class org.apache.nutch.collection.Subcollection
-
- SubcollectionIndexingFilter - Class in org.apache.nutch.indexer.subcollection
-
- SubcollectionIndexingFilter() - Constructor for class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
- SubcollectionIndexingFilter(Configuration) - Constructor for class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
- SUBJECT - Static variable in interface org.apache.nutch.metadata.DublinCore
-
The topic of the content of the resource.
- SUCCESS - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsing succeeded.
- SUCCESS - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Content was retrieved without errors.
- SUCCESS_REDIRECT - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsed content contains a directive to redirect to another URL.
- SuffixStringMatcher - Class in org.apache.nutch.util
-
A class for efficiently matching String
s against a set of
suffixes.
- SuffixStringMatcher(String[]) - Constructor for class org.apache.nutch.util.SuffixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any suffix in the supplied array.
- SuffixStringMatcher(Collection<String>) - Constructor for class org.apache.nutch.util.SuffixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any suffix in the supplied
Collection
- SuffixURLFilter - Class in org.apache.nutch.urlfilter.suffix
-
Filters URLs based on a file of URL suffixes.
- SuffixURLFilter() - Constructor for class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- SuffixURLFilter(Reader) - Constructor for class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- SWFParser - Class in org.apache.nutch.parse.swf
-
Parser for Flash SWF files.
- SWFParser() - Constructor for class org.apache.nutch.parse.swf.SWFParser
-
- VAL_RESULT - Static variable in interface org.apache.nutch.metadata.Nutch
-
Name of the key used in the Result Map sent back by the REST endpoint
- valueOf(String) - Static method in enum org.apache.nutch.fetcher.FetcherThreadEvent.PublishEventType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.protocol.htmlunit.HttpResponse.Scheme
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.protocol.http.HttpResponse.Scheme
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.scoring.similarity.util.LuceneAnalyzerUtil.StemFilterType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.scoring.similarity.util.LuceneTokenizer.TokenizerType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.service.JobManager.JobType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.service.model.response.JobInfo.State
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.util.domain.DomainStatistics.MyCounter
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.util.domain.DomainSuffix.Status
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.util.domain.TopLevelDomain.Type
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.webui.client.model.ConnectionStatus
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.webui.client.model.Crawl.CrawlStatus
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.webui.client.model.JobInfo.JobType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.webui.client.model.JobInfo.State
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum org.apache.nutch.fetcher.FetcherThreadEvent.PublishEventType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.protocol.htmlunit.HttpResponse.Scheme
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.protocol.http.HttpResponse.Scheme
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.scoring.similarity.util.LuceneAnalyzerUtil.StemFilterType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.scoring.similarity.util.LuceneTokenizer.TokenizerType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.service.JobManager.JobType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.service.model.response.JobInfo.State
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.util.domain.DomainStatistics.MyCounter
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.util.domain.DomainSuffix.Status
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.util.domain.TopLevelDomain.Type
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.webui.client.model.ConnectionStatus
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.webui.client.model.Crawl.CrawlStatus
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.webui.client.model.JobInfo.JobType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.webui.client.model.JobInfo.State
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- VERSION - Static variable in class org.apache.nutch.indexer.NutchDocument
-
- VerticalMenu - Class in org.apache.nutch.webui.pages.menu
-
- VerticalMenu(String) - Constructor for class org.apache.nutch.webui.pages.menu.VerticalMenu
-