public class OutlinkExtractor extends Object
Outlink
s
/ URLs from plain text using Regular Expressions.Constructor and Description |
---|
OutlinkExtractor() |
Modifier and Type | Method and Description |
---|---|
static Outlink[] |
getOutlinks(String plainText,
org.apache.hadoop.conf.Configuration conf)
Extracts
Outlink from given plain text. |
static Outlink[] |
getOutlinks(String plainText,
String anchor,
org.apache.hadoop.conf.Configuration conf)
Extracts
Outlink from given plain text and adds anchor
to the extracted Outlink s |
public static Outlink[] getOutlinks(String plainText, org.apache.hadoop.conf.Configuration conf)
Outlink
from given plain text.
Applying this method to non-plain-text can result in extremely lengthy
runtimes for parasitic cases (postscript is a known example).plainText
- the plain text from wich URLs should be extracted.Outlink
s within found in plainTextpublic static Outlink[] getOutlinks(String plainText, String anchor, org.apache.hadoop.conf.Configuration conf)
Outlink
from given plain text and adds anchor
to the extracted Outlink
splainText
- the plain text from wich URLs should be extracted.anchor
- the anchor of the urlOutlink
s within found in plainTextCopyright © 2014 The Apache Software Foundation