NAME

Mail::SpamAssassin::Message - decode, render, and hold an RFC-2822 message


DESCRIPTION

This module encapsulates an email message and allows access to the various MIME message parts and message metadata.

The message structure, after initiating a parse() cycle, looks like this:


  Message object, also top-level node in Message::Node tree
     |
     +---> Message::Node for other parts in MIME structure
     |       |---> [ more Message::Node parts ... ]
     |       [ others ... ]
     |
     +---> Message::Metadata object to hold metadata


PUBLIC METHODS

new()
Creates a Mail::SpamAssassin::Message object. Takes a hash reference as a parameter. The used hash key/value pairs are as follows:

message is either undef (which will use STDIN), a scalar of the entire message, an array reference of the message with 1 line per array element, or a file glob which holds the entire contents of the message.

Note: The message is expected to generally be in RFC 2822 format, optionally including an mbox message separator line (the ``From '' line) as the first line.

parse_now specifies whether or not to create the MIME tree at object-creation time or later as necessary.

The parse_now option, by default, is set to false (0). This allows SpamAssassin to not have to generate the tree of Mail::SpamAssassin::Message::Node objects and their related data if the tree is not going to be used. This is handy, for instance, when running spamassassin -d, which only needs the pristine header and body which is always handled when the object is created.

subparse specifies how many MIME recursion levels should be parsed. Defaults to 20.

_do_parse()
Non-Public function which will initiate a MIME part parse (generates a tree) of the current message. Typically called by find_parts() as necessary.

find_parts()
Used to search the tree for specific MIME parts. See Mail::SpamAssassin::Message::Node for more details.

get_pristine_header()
Returns pristine headers of the message. If no specific header name is given as a parameter (case-insensitive), then all headers will be returned as a scalar, including the blank line at the end of the headers.

If called in an array context, an array will be returned with each specific header in a different element. In a scalar context, the last specific header is returned.

ie: If 'Subject' is specified as the header, and there are 2 Subject headers in a message, the last/bottom one in the message is returned in scalar context or both are returned in array context.

Note: the returned header will include the ending newline and any embedded whitespace folding.

get_mbox_separator()
Returns the mbox separator found in the message, or undef if there wasn't one.

get_body()
Returns an array of the pristine message body, one line per array element.

get_pristine()
Returns a scalar of the entire pristine message.

get_pristine_body()
Returns a scalar of the pristine message body.

extract_message_metadata($main)
$str = get_metadata($hdr)
put_metadata($hdr, $text)
delete_metadata($hdr)
$str = get_all_metadata()
finish_metadata()
Destroys the metadata for this message. Once a message has been scanned fully, the metadata is no longer required. Destroying this will free up some memory.

finish()
Clean up an object so that it can be destroyed.

receive_date()
Return a time_t value with the received date of the current message, or current time if received time couldn't be determined.


PARSING METHODS, NON-PUBLIC

These methods take a RFC2822-esque formatted message and create a tree with all of the MIME body parts included. Those parts will be decoded as necessary, and text/html parts will be rendered into a standard text format, suitable for use in SpamAssassin.

parse_body()
parse_body() passes the body part that was passed in onto the correct part parser, either _parse_multipart() for multipart/* parts, or _parse_normal() for everything else. Multipart sections become the root of sub-trees, while everything else becomes a leaf in the tree.

For multipart messages, the first call to parse_body() doesn't create a new sub-tree and just uses the parent node to contain children. All other calls to parse_body() will cause a new sub-tree root to be created and children will exist underneath that root. (this is just so the tree doesn't have a root node which points at the actual root node ...)

_parse_multipart()
Generate a root node, and for each child part call parse_body() to generate the tree.

_parse_normal()
Generate a leaf node and add it to the parent.