public final class MimeTypeUtils extends Object
This is a facade class to insulate CAS Metadata from its underlying Mime Type substrate library, Apache Tika. Any mime handling code should be placed in this utility class, and hidden from the CAS Metadata classes that rely on it.
Modifier and Type | Field and Description |
---|---|
static String |
MIME_FILE_RES_PATH |
Constructor and Description |
---|
MimeTypeUtils() |
MimeTypeUtils(InputStream mimeIs,
boolean magic) |
MimeTypeUtils(String filePath) |
MimeTypeUtils(String filePath,
boolean magic) |
Modifier and Type | Method and Description |
---|---|
String |
autoResolveContentType(String url,
byte[] data)
Same as
autoResolveContentType(String, String, byte[]) , but
this method passes null as the initial type. |
String |
autoResolveContentType(String typeName,
String url,
byte[] data)
A facade interface to trying all the possible mime type resolution
strategies available within Tika.
|
static String |
cleanMimeType(String origType)
Cleans a
MimeType name by removing out the actual
MimeType , from a string of the form: |
String |
getDescriptionForMimeType(String mimeType) |
String |
getMimeType(File f)
Facade interface to Tika's underlying
MimeTypes.getMimeType(File)
method. |
String |
getMimeType(String name)
A facade interface to Tika's underlying
MimeTypes.forName(String)
method. |
String |
getMimeType(URL url)
Facade interface to Tika's underlying
MimeTypes.getMimeType(String) method. |
String |
getMimeTypeByMagic(byte[] data)
Utility method to act as a facade to
MimeTypes.getMimeType(byte[]) . |
String |
getSuperTypeForMimeType(String mimeType) |
boolean |
isMimeMagic() |
static byte[] |
readMagicHeader(InputStream stream) |
static byte[] |
readMagicHeader(InputStream stream,
int headerByteSize) |
void |
setMimeMagic(boolean mimeMagic) |
public static final String MIME_FILE_RES_PATH
public MimeTypeUtils()
public MimeTypeUtils(String filePath) throws FileNotFoundException
FileNotFoundException
public MimeTypeUtils(String filePath, boolean magic) throws FileNotFoundException
FileNotFoundException
public MimeTypeUtils(InputStream mimeIs, boolean magic)
public static String cleanMimeType(String origType)
MimeType
name by removing out the actual
MimeType
, from a string of the form:
<primary type>/<sub type> ; < optional params
origType
- The original mime type string to be cleaned.public String autoResolveContentType(String url, byte[] data)
autoResolveContentType(String, String, byte[])
, but
this method passes null
as the initial type.url
- The String URL to use to check glob patterns.data
- The byte data to potentially use in magic detection.MimeType
.public String autoResolveContentType(String typeName, String url, byte[] data)
typeName
is cleaned, with cleanMimeType(String)
.
Then the cleaned mime type is looked up in the underlying Tika
MimeTypes
registry, by its cleaned name. If the MimeType
is found, then that mime type is used, otherwise URL
resolution
is used to try and determine the mime type. If that means is
unsuccessful, and if mime.type.magic
is enabled in
NutchConfiguration
, then mime type magic resolution is used to
try and obtain a better-than-the-default approximation of the
MimeType
.typeName
- The original mime type, returned from a ProtocolOutput
.url
- The given URL
, that Nutch was trying to crawl.data
- The byte data, returned from the crawl, if any.MimeType
name.public String getMimeType(URL url)
MimeTypes.getMimeType(String)
method.url
- A string representation of the document URL
to sense
the MimeType
for.MimeType
, identified from the given
Document url in string form.public String getMimeType(String name)
MimeTypes.forName(String)
method.name
- The name of a valid MimeType
in the Tika mime
registry.MimeType
, if it exists,
or null otherwise.public String getMimeType(File f)
MimeTypes.getMimeType(File)
method.public String getMimeTypeByMagic(byte[] data)
MimeTypes.getMimeType(byte[])
.data
- The byte data to get the MimeType
for.MimeType
, or
null if a suitable MimeType
is not found.public boolean isMimeMagic()
public void setMimeMagic(boolean mimeMagic)
mimeMagic
- the mimeMagic to setpublic static byte[] readMagicHeader(InputStream stream) throws IOException
IOException
public static byte[] readMagicHeader(InputStream stream, int headerByteSize) throws IOException
IOException
Copyright © 1999-2014 Apache OODT. All Rights Reserved.