Package
Class
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
A
B
C
D
E
F
G
H
I
L
M
N
O
P
R
S
T
U
V
W
_
A
acls
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
activities
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedContextClass
Activities interface
activities
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
Activities interface
activities
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFContextClass
Activities interface
activities
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSChannelContextClass
Activities interface
activities
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSContextClass
Activities interface
activities
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetContextClass
Activities interface
ACTIVITY_FETCH
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
ACTIVITY_ROBOTSPARSE
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
add(RSSConnector.MappingRule)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRules
addAgent(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
Add a user-agent.
addAllow(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
Add an allow.
addData(IVersionActivity, String, String, InputStream)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache
Add binary data entry into the cache.
addDisallow(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
Add a disallow.
addHeader(String, String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
addRule(RSSConnector.CanonicalizationPolicy)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicies
addSeedDocuments(ISeedingActivity, DocumentSpecification, long, long)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Queue "seed" documents.
advance()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorTokenStream
Go on to next token.
allows
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
available()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Get available.
B
badFeedRescanInterval
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
bandwidthParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Max kilobytes per second per server
basicRead(byte[], int, int, int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Basic read, which uses the server object to throttle activity.
beginFetch(String)
- Method in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Begin the fetch process.
beginFetch(long)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Note the start of a fetch operation.
beginFetch(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Begin the fetch process.
beginRead(int, double)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Note the start of an individual byte read of a specified size.
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedContextClass
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
Handle the tag beginning to set the correct second-level parsing context
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFContextClass
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSChannelContextClass
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSContextClass
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetContextClass
beginTag(String, String, String, Attributes)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetItemContextClass
C
cache
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
This is the cache hash - which is keyed by the protocol/host/port, and has a Host object as the value.
cache
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
cacheData
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache
canBeFlushed(long)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Check if the current record can be flushed.
canonicalizationPolicies
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
canRemoveAspSession()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
canRemoveBvSession()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
canRemoveJavaSession()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
canRemovePhpSession()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
canReorder()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
categoryField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
categoryField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
check()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Check status of connection.
checkIfValidFeed()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
Check if feed was valid
checkingRobots
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
This will be set to nonzero if the robots structure is currently in use
checkMatch(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
checkMatch(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRule
CHROMED_SKIP
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Chromed suppression mode - skip all chromed content
CHROMED_USE
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Chromed suppression mode - use chromed content
chromedContentMode
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
client
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
close()
- Method in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Close the connection.
close()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Close the connection.
close()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Close.
connect(ConfigParams)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Connect.
connectionManager
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The connection pool (max size 1)
connectionTimeoutMilliseconds
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Connection timeout in milliseconds
contentsFile
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
contentsFile
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
contentsFile
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
contentType
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache.DocumentData
The content-type header value
createConnection(String, double, int, long, int, int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
Establish a connection to a specified URL.
D
data
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache.DocumentData
The cache file for the data
DataCache
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class is a cache of a specific URL's data.
DataCache()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
DataCache
Constructor.
DataCache.DocumentData
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class represents everything we need to know about a document that's getting passed from the getDocumentVersions() phase to the processDocuments() phase.
DataCache.DocumentData(File, String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
DataCache.DocumentData
Constructor.
dataFileFolder
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
dataRecorder
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
dataSession
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Hack added to record all access data from current crawler
dataSession
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
DECHROMED_CONTENT
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Dechromed content mode - content field
DECHROMED_DESCRIPTION
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Dechromed content mode - description field
DECHROMED_NONE
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Dechromed content mode - none
dechromedContentMode
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
dechromedContentMode
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
dechromedContentMode
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
dechromedContentMode
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
DEFAULT_BUNDLE_NAME
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
DEFAULT_PATH_NAME
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
defaultRescanInterval
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
deleteData(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache
Delete specified item of data.
descriptionField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
descriptionField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
descriptionField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
disallows
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
discard()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Discard this server.
disconnect()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Close the connection.
doCanonicalization(RSSConnector.CanonicalizationPolicy, WebURL)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Code to canonicalize a URL.
documentIdentifier
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedContextClass
The document identifier
documentIdentifier
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
The document identifier
documentIdentifier
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFContextClass
The document identifier
documentIdentifier
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSChannelContextClass
The document identifier
documentIdentifier
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSContextClass
The document identifier
documentIdentifier
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetContextClass
The document identifier
documentName
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
documentNumber
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
doesPathMatch(String, String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Check if path matches specification
doesPathMatch(String, int, String, int)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Recursive method for matching specification to path.
doneFetch(IVersionActivity)
- Method in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Done with the fetch.
doneFetch(IVersionActivity)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Done with the fetch.
dr
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
E
emailParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Email parameter
endFetch()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Note the end of a fetch operation.
endHeader()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
endRead(int, int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Note the end of an individual read from the server.
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedContextClass
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
Convert the individual sub-fields of the item context into their final forms
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
Handle the tag ending
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFContextClass
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
Convert the individual sub-fields of the item context into their final forms
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSChannelContextClass
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSContextClass
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
Convert the individual sub-fields of the item context into their final forms
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetContextClass
endTag()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetItemContextClass
Convert the individual sub-fields of the item context into their final forms
estimateInProgress
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Flag indicating whether rate estimation is in progress yet
estimateValid
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Flag indicating whether a rate estimate is needed
evalExpression
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRule
exception
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
executeFetch(String, int, String, String, String, String, int, String, String, String, String, String)
- Method in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Execute the fetch and get the return code.
executeFetch(String, int, String, String, String, String, int, String, String, String, String, String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Execute the fetch and get the return code.
executeMethod
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
F
feedTimeoutValue
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
FETCH_BAD_URI
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
FETCH_CIRCULAR_REDIRECT
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
FETCH_IO_ERROR
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
FETCH_NOT_TRIED
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
FETCH_SEQUENCE_ERROR
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
FETCH_UNKNOWN_ERROR
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
fetchCounter
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The current bytes in the current fetch
fetcher
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Fetcher to use to get the data from wherever
fetcher
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The throttled fetcher used by this instance
fetcherMap
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Storage for fetcher objects
fetchMethod
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The method object
fetchType
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The kind of fetch we are doing
filter
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedContextClass
Filter
filter
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
Filter
filter
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFContextClass
Filter
filter
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSChannelContextClass
Filter
filter
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSContextClass
Filter
filter
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetContextClass
Filter
findMatch(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicies
firstChunkLock
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
This object is used to gate access while the first chunk is being read
from
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The email address for this connector instance
G
getAcls()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get the acls
getActivitiesList()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Return the list of activities that this connector supports (i.e.
getAttributeJavascriptString(Locale, String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getAttributeJavascriptString(Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getAttributeJavascriptString(String, Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getAttributeString(Locale, String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getAttributeString(Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getAttributeString(String, Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getBadFeedRescanTime(long)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get the next time a "bad feed" should be rescanned
getBinNames(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Get the bin name string for a document identifier.
getBodyJavascriptString(Locale, String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getBodyJavascriptString(Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getBodyJavascriptString(String, Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getBodyString(Locale, String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getBodyString(Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getBodyString(String, Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getCanonicalizationPolicies()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get canonicalization policies
getChromedContentMode()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get the chromed content mode
getConnectorModel()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Tell the world what model this connector uses for getDocumentIdentifiers().
getContentType()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache.DocumentData
Get the contentType
getContentType(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache
Get the content type.
getData()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache.DocumentData
Get the data
getData(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache
Fetch binary data entry from the cache.
getDataLength(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache
Fetch binary data length.
getDechromedContentMode()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get the dechromed content mode
getDefaultRescanTime(long)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get the next time (by default) a feed should be scanned
getDocumentVersions(String[], String[], IVersionActivity, DocumentSpecification, int, boolean)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Get document versions given an array of document identifiers.
getException()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
getFeedTimeoutValue()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get the feed timeout value
getFetcher()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Given the current parameters, find the correct throttled fetcher object (or create one if not there).
getGroupNumber()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
getGroupStyle()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
getHost()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
getMaxDocumentRequest()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Get the maximum number of documents to amalgamate together into one batch, for this connector.
getMetadata()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get the specified metadata
getMinimumRescanTime(long)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Get the minimum next time a feed should be scanned
getName()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.NameValue
getPath()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
getPort()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
getRawQuery()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
getResponse()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
getResponseBodyStream()
- Method in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Get the response input stream.
getResponseBodyStream()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Get the response input stream.
getResponseCode()
- Method in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Get the http response code.
getResponseCode()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Get the http response code.
getResponseHeader(String)
- Method in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Get a specified response header, if it exists.
getResponseHeader(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Get a specified response header, if it exists.
getRobots(ThrottledFetcher)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Given the current parameters, find the correct robots object (or create one if none found).
getScheme()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
GetSeedList
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class is used to set the seed list for a specified RSS job.
getSeeds()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Iterate over all canonicalized seeds
getServerName()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Get the fqdn of the server
getSession()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Establish a session
getSession(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
getString(Locale, String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getString(Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getString(String, Locale, String, Object[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
getTextValue()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
getType()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
getValue()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.NameValue
globalHandleCount
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
This counter keeps track of the total outstanding handles across everything, because we do try to control that
globalHandleCounterLock
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
This is the lock object for that global handle counter
groupNumber
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
groupStyle
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
GROUPSTYLE_LOWER
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
GROUPSTYLE_MIXED
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
GROUPSTYLE_NONE
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
GROUPSTYLE_UPPER
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
guidField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
H
handleRSSFeedSAX(String, IProcessActivity, RSSConnector.Filter)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Handle an RSS feed document, using SAX to limit the memory impact
headerNames
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
headerValues
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
hostConfiguration
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
hostName
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Host name
I
initialized
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
initializeParameters()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
inputStream
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
The stream we are wrapping.
invalidTime
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Timestamp.
isAgentMatch(String, boolean)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
See if user-agent matches.
isAllowed(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
See if path is allowed.
isContentInteresting(IFingerprintActivity, String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Code to check if data is interesting, based on response code and content type.
isDataIngestable(IFingerprintActivity, String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Code to check if an already-fetched document should be ingested.
isDisallowed(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
See if path is disallowed.
isFetchAllowed(long, String, String, String, double, int, long, String, int, String, String, String, IVersionActivity, int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Check a given path string against this host's robots file.
isFetchAllowed(String, int, String, String, String, String, double, int, long, String, int, String, String, String, IVersionActivity, int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Decide whether a specific robot can crawl a specific URL.
isInitialized
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Flag indicating whether session data is initialized
isLegalURL(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Check for legality of a url.
isMatch(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRules
isSeed(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Check if document is a seed
isValid
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
This flag describes whether or not the host record is valid yet.
IThrottledConnection
- Interface in
org.apache.manifoldcf.crawler.connectors.rss
This interface represents an established connection to a URL.
L
linkField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
linkField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
linkField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
linkField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetItemContextClass
logFetchCount(int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Log the fetch of a number of bytes.
M
main(String[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
GetSeedList
main(String[])
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
SetSeedList
makeDocumentIdentifier(RSSConnector.CanonicalizationPolicies, String, String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Convert an absolute or relative URL to a document identifier.
makeReadable(String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Convert a string from the robots file into a readable form that does NOT contain NUL characters (since postgresql does not accept those).
makeValid(long, String, String, double, int, long, String, int, String, String, String, String, IVersionActivity, int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Initialize the record.
map(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRule
map(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRules
mapDocumentURL(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Scan patterns and return the one that matches first.
mappings
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
mappings
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRules
mark(int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Mark.
markSupported()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Check if mark is supported.
matchPattern
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
matchPattern
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRule
maxFetchesParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Max fetches per minute per server
maxOpenConnectionsPerServer
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The maximum open connections
maxOpenConnectionsPerServer
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The maximum open connections per server
maxOpenParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Max simultaneous open connections per server
Messages
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Messages()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
Messages
Constructor - do no instantiate
metadata
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
milTzMap
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Timezone mapping from RFC822 timezones to ones understood by Java
minimumMillisecondsPerBytePerServer
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The minimum milliseconds between bytes
minimumMillisecondsPerBytePerServer
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The connection bandwidth we want
minimumMillisecondsPerBytePerServer
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Stream throttling parameters
minimumMillisecondsPerFetchPerServer
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The minimum milliseconds between fetches
minimumMillisecondsPerFetchPerServer
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The minimum time between fetches
minimumRescanInterval
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
monthMap
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
myUrl
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The current URL being fetched
N
name
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.NameValue
nextFetchTime
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
This is the time of the next allowed fetch (in ms since epoch)
nextToken()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorTokenStream
noteConnectionEstablished()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Note that a connection has been established.
noteConnectionEstablished()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
Note that there is a repository connection that is using this object.
noteConnectionReleased()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Note that a connection has been released, and free resources if no reason to retain them.
noteConnectionReleased()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
Connection pool no longer needed.
O
org.apache.manifoldcf.crawler.connectors.rss
- package org.apache.manifoldcf.crawler.connectors.rss
outerTagCount
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
Keep track of the number of valid feed signals we saw
outputConfigurationBody(IThreadContext, IHTTPOutput, Locale, ConfigParams, String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Output the configuration body section.
outputConfigurationHeader(IThreadContext, IHTTPOutput, Locale, ConfigParams, List<String>)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Output the configuration header section.
outputResource(IHTTPOutput, Locale, String, Map<String, String>, boolean)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
outputResourceWithVelocity(IHTTPOutput, Locale, String, Map<String, String>, boolean)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
outputResourceWithVelocity(IHTTPOutput, Locale, String, Map<String, Object>)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
Messages
outputSpecificationBody(IHTTPOutput, Locale, DocumentSpecification, String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Output the specification body section.
outputSpecificationHeader(IHTTPOutput, Locale, DocumentSpecification, List<String>)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Output the specification header section.
outstandingConnections
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Outstanding connection counter
P
parseChinaDate(String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Parse a China Daily News date
parseRobotsTxt(BufferedReader, String, IVersionActivity)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Parse the robots.txt file using a reader.
parseRSSDate(String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Parse an RSS date
parseZuluDate(String)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Parse an RDF date
peek()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorTokenStream
Get current token.
poll()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Clean idle stuff out of cache
poll()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
This method is periodically called for all connectors that are connected but not in active use.
poll()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
Poll.
port
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Port
pos
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorTokenStream
process()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedContextClass
Process this data
process(String, IProcessActivity, RSSConnector.Filter)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
Process the data accumulated for this item
process()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFContextClass
Process this data
process(String, IProcessActivity, RSSConnector.Filter)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
Process the data accumulated for this item
process()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSChannelContextClass
Process this data, return true if rescan time was set
process()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSContextClass
Process this data
process(String, IProcessActivity, RSSConnector.Filter)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
Process the data accumulated for this item
process()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetContextClass
Process this data
process(String, IProcessActivity, RSSConnector.Filter)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetItemContextClass
Process the data accumulated for this item
processConfigurationPost(IThreadContext, IPostParameters, Locale, ConfigParams)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Process a configuration post.
processDocuments(String[], String[], IProcessActivity, DocumentSpecification, boolean[], int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Process a set of documents.
processSpecificationPost(IPostParameters, Locale, DocumentSpecification)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Process a specification post.
protocol
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Protocol
proxyAuthDomain
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Proxy auth domain
proxyAuthDomainParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Proxy auth domain
proxyAuthPassword
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Proxy auth password
proxyAuthPasswordParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Proxy auth password
proxyAuthUsername
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Proxy auth username
proxyAuthUsernameParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Proxy auth username
proxyHost
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The proxy host
proxyHostParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Proxy host name
proxyPort
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The proxy port
proxyPortParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Proxy port
pubDateField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
pubDateField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
pubDateField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
pubDateField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetItemContextClass
R
rateEstimate
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
The inverse rate estimate of the first fetch, in ms/byte
rawQueryPart
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
read()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Read a byte.
read(byte[])
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Read lots of bytes.
read(byte[], int, int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Read lots of specific bytes.
READ_CHUNK_LENGTH
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
The read chunk length
readFile(File)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
readingRobots
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
This will be set to "true" if the robots.txt for this host is in the process of being read.
recordEverything
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
This flag determines whether we record everything to the disk, as a means of doing a web snapshot
records
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
This is the list of robots records for the host, or null if no robots.txt found.
refCount
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Reference count
refCount
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
Reference count for how many connections to this pool there are
refCount
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Reference count for bandwidth variables
registerConnection(int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Register an outstanding connection (and wait until it can be obtained before proceeding)
registerGlobalHandle(int)
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
Note that we're about to need a handle (and make sure we have enough)
releaseConnection()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Release an outstanding connection back into the pool
releaseDocumentVersions(String[], String[])
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Free a set of documents.
releaseGlobalHandle()
- Static method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
Note that we're done with a handle (so we can free it)
removeAspSession
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
removeBVSession
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
removeJavaSession
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
removePhpSession
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
reorder
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
rescanTimeSet
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
Flag indicating the the rescan time was set for this feed
rescanTimeSet
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSContextClass
Rescan time set flag
reset()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Reset.
resolve(String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
responseCode
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
resultLogFile
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
ROBOT_CONNECTION_TYPE
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Robots connection type value
ROBOT_FILE_NAME
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Robot file name value
ROBOT_TIMEOUT_MILLISECONDS
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Robots fetch timeout value
Robots
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class is a cache of a specific robots data.
Robots(ThrottledFetcher)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
Robots
Constructor.
robots
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The robots object used by this instance
Robots.Host
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class maintains status for a given host.
Robots.Host(String, int, String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Host
Constructor.
Robots.Record
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class represents a record in a robots.txt file.
Robots.Record()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
Constructor.
ROBOTS_ALL
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
ROBOTS_DATA
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
ROBOTS_NONE
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
robotsMap
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Storage for robots objects
robotsUsage
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Robots usage flag
robotsUsageParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Robots usage parameter
RSSConnector
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This is the RSS implementation of the IRepositoryConnector interface.
RSSConnector()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
Constructor.
RSSConnector.CanonicalizationPolicies
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Class representing a list of canonicalization rules
RSSConnector.CanonicalizationPolicies()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicies
RSSConnector.CanonicalizationPolicy
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Class representing a URL regular expression match, for the purposes of determining canonicalization policy
RSSConnector.CanonicalizationPolicy(Pattern, boolean, boolean, boolean, boolean, boolean)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicy
RSSConnector.EvaluatorToken
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Evaluator token.
RSSConnector.EvaluatorToken()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
RSSConnector.EvaluatorToken(int, int)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
RSSConnector.EvaluatorToken(String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
RSSConnector.EvaluatorTokenStream
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Token stream.
RSSConnector.EvaluatorTokenStream(String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorTokenStream
Constructor.
RSSConnector.FeedContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.FeedContextClass(XMLStream, String, String, String, Attributes, String, IProcessActivity, RSSConnector.Filter)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedContextClass
RSSConnector.FeedItemContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.FeedItemContextClass(XMLStream, String, String, String, Attributes, int)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
RSSConnector.Filter
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Class that handles parsing and interpretation of the document specification.
RSSConnector.Filter(DocumentSpecification, boolean)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
Constructor.
RSSConnector.MappingRule
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Class representing a mapping rule
RSSConnector.MappingRule(Pattern, String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRule
RSSConnector.MappingRules
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Class that represents all mappings
RSSConnector.MappingRules()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.MappingRules
RSSConnector.NameValue
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Name/value class
RSSConnector.NameValue(String, String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.NameValue
RSSConnector.OuterContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class handles the outermost XML context for the feed document.
RSSConnector.OuterContextClass(XMLStream, String, IProcessActivity, RSSConnector.Filter)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
RSSConnector.RDFContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.RDFContextClass(XMLStream, String, String, String, Attributes, String, IProcessActivity, RSSConnector.Filter)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFContextClass
RSSConnector.RDFItemContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.RDFItemContextClass(XMLStream, String, String, String, Attributes, int)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
RSSConnector.RSSChannelContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.RSSChannelContextClass(XMLStream, String, String, String, Attributes, String, IProcessActivity, RSSConnector.Filter)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSChannelContextClass
RSSConnector.RSSContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.RSSContextClass(XMLStream, String, String, String, Attributes, String, IProcessActivity, RSSConnector.Filter)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSContextClass
RSSConnector.RSSItemContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.RSSItemContextClass(XMLStream, String, String, String, Attributes, int)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
RSSConnector.UrlsetContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.UrlsetContextClass(XMLStream, String, String, String, Attributes, String, IProcessActivity, RSSConnector.Filter)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetContextClass
RSSConnector.UrlsetItemContextClass
- Class in
org.apache.manifoldcf.crawler.connectors.rss
RSSConnector.UrlsetItemContextClass(XMLStream, String, String, String, Attributes)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetItemContextClass
rules
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.CanonicalizationPolicies
run()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
rval
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
S
seeds
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.Filter
seriesStartTime
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
The start time of this series
server
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The server object we use to track connections and fetches.
server
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
The server object we use to track throttling
serverMap
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
This hash maps the server string (without port) to a server object, where we can track the statistics and make sure we throttle appropriately
serverName
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
The fqdn of the server
setDefaultRescanTimeIfNeeded()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.OuterContextClass
Check if the rescan flag was set or not, and if not, make sure it gets set properly
setResponseCode(int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
SetSeedList
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class is used to set the seed list for a specified RSS job.
skip(long)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Skip
startFetchTime
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The start-fetch time
startTime
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
STATUS_NOCHANGE
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Status code for fetch: No change.
STATUS_OK
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Status code for fetch: OK
STATUS_PAGEERROR
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Status code for fetch: Static error; retries won't help, individual page access failed
STATUS_SITEERROR
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
Status code for fetch: Static error; retries won't help, overall access to site in question
statusCode
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The status code fetched, if any
T
tagCleanup()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
tagCleanup()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
tagCleanup()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
tagCleanup()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetItemContextClass
text
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorTokenStream
textValue
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
theURL
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
throttledConnection
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
The throttled connection we belong to
ThrottledFetcher
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class uses httpclient to fetch stuff from webservers.
ThrottledFetcher()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
Constructor.
ThrottledFetcher.DataRecorder
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class takes care of recording data and results for posterity
ThrottledFetcher.DataRecorder()
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
ThrottledFetcher.DataSession
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Helper class for the above
ThrottledFetcher.DataSession(ThrottledFetcher.DataRecorder, String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
ThrottledFetcher.Server
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class represents the throttling stuff kept around for a single server.
ThrottledFetcher.Server(String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Constructor
ThrottledFetcher.ThrottledConnection
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class represents an established connection to a URL.
ThrottledFetcher.ThrottledConnection(ThrottledFetcher.Server, double, int, long, int, int)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
Constructor.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
- Class in
org.apache.manifoldcf.crawler.connectors.rss
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread(HttpClient, HostConfiguration, HttpMethodBase)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
ThrottledFetcher.ThrottledInputstream
- Class in
org.apache.manifoldcf.crawler.connectors.rss
This class throttles an input stream based on the specified byte rate parameters.
ThrottledFetcher.ThrottledInputstream(ThrottledFetcher.ThrottledConnection, ThrottledFetcher.Server, InputStream, double, ThrottledFetcher.DataSession)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledInputstream
Constructor.
throttleGroupName
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The throttle group name
throttleGroupParameter
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The throttle group name
throwable
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.ThrottledConnection
The error trace, if any
titleField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedItemContextClass
titleField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFItemContextClass
titleField
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSItemContextClass
toASCIIString()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
token
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorTokenStream
toString()
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
totalBytesRead
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.Server
Total actual bytes read in this series; this includes fetches in progress
ttlValue
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.FeedContextClass
ttl value
ttlValue
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RDFContextClass
ttl value
ttlValue
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.RSSChannelContextClass
TTL value is set on a per-channel basis
ttlValue
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.UrlsetContextClass
ttl value
type
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
TYPE_COMMA
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
TYPE_GROUP
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
TYPE_TEXT
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.EvaluatorToken
U
understoodProtocols
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
url
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
userAgent
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
The user-agent for this connector instance
userAgents
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots.Record
V
value
- Variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector.NameValue
viewConfiguration(IThreadContext, IHTTPOutput, Locale, ConfigParams)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
View configuration.
viewSpecification(IHTTPOutput, Locale, DocumentSpecification)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
View specification.
W
WebURL
- Class in
org.apache.manifoldcf.crawler.connectors.rss
Replacement class for java.net.URI, which is broken in many ways.
WebURL(String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
WebURL(String, String, int, String, String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
WebURL(URI)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
WebURL(URI, String)
- Constructor for class org.apache.manifoldcf.crawler.connectors.rss.
WebURL
write(byte[], int, int)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataSession
writeFile(File, String)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
writeResponseRecord(String, int, ArrayList, ArrayList)
- Method in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher.DataRecorder
Atomically write resultlog record, returning data file name to use
_
_rcsid
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
DataCache
_rcsid
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
GetSeedList
_rcsid
- Static variable in interface org.apache.manifoldcf.crawler.connectors.rss.
IThrottledConnection
_rcsid
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
Robots
_rcsid
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
RSSConnector
_rcsid
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
SetSeedList
_rcsid
- Static variable in class org.apache.manifoldcf.crawler.connectors.rss.
ThrottledFetcher
A
B
C
D
E
F
G
H
I
L
M
N
O
P
R
S
T
U
V
W
_
Package
Class
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes