Mail::SpamAssassin::Pyzor::Digest::Pieces - Pyzor backend logic module
This module houses backend logic for Mail::SpamAssassin::Pyzor::Digest.
It reimplements logic found in pyzor's digest.py module (https://github.com/SpamExperts/pyzor/blob/master/pyzor/digest.py).
This imitates the corresponding object method in digest.py. It returns a reference to an array of strings. Each string can be either a byte string or a character string (e.g., UTF-8 decoded).
NB: RFC 2822 stipulates that message bodies should use CRLF line breaks, not plain LF (nor plain CR). We will thus convert any plain CRs in a quoted-printable message body into CRLF. Python, though, doesn't do this, so the output of our implementation of digest_payloads()
diverges from that of the Python original. It doesn't ultimately make a difference since the line-ending whitespace gets trimmed regardless, but it's necessary to factor in when comparing the output of our implementation with the Python output.
This imitates the corresponding object method in digest.py. It modifies $STRING
in-place.
As with the original implementation, if $STRING
contains (decoded) Unicode characters, those characters will be parsed accordingly. So:
$str = "123\xc2\xa0"; # [ c2 a0 ] == \u00a0, non-breaking space
normalize($str);
The above will leave $str
alone, but this:
utf8::decode($str);
normalize($str);
... will trim off the last two bytes from $str
.
This imitates the corresponding object method in digest.py. It returns a boolean.
This assembles a string buffer out of @LINES. The string is the buffer of octets that will be hashed to produce the message digest.
Each member of @LINES is expected to be an octet string, not a character string.
Imitates str.splitlines()
. (cf. pydoc str
)
Returns a plain list in list context. Returns the number of items to be returned in scalar context.