Log Message: |
Make mailer.py work properly with Python 3, and drop Python 2 support.
Most of the changes deal with the handling of binary data vs Python strings.
I've made sure that mailer.py will work in a UTF-8 environment. In general,
UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
Environments using other encodings may not work as expected, but those will
be problematic for hook scripts in general. SVN repositories store internal
data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
or decoding of such data, and thus need to work with raw UTF-8 strings, not
Python strings.
The encoding of file and property contents is not guaranteed to be UTF-8.
This was already a problem before this change. This hook script sends email
with a content type header specifying the UTF-8 encoding. Diffs which contain
non-UTF-8 text will most likely not render properly when viewed in an email
reader. At least this problem is now obvious in mailer.py's implementation,
since all unidiff text is now written out directly as binary data.
As an additional fix, iterate file groups in sorted order. This results in
stable output and makes test cases in our tests/ subdirectory reproducible.
Tested with Python 3.7.5 which is the version I use in my SVN development
setup at present. Tests with newer versions are welcome.
* tools/hook-scripts/mailer/mailer.py:
Drop Python2-specific includes. Adjust includes as per 2to3.
(main): Decode arguments from UTF-8 to string.
(OutputBase:write): Encode string to UTF-8 and pass to write_binary().
OutputBase implementations now need to provide a self.write_binary
member which implements a write() method for binary data.
(MailedOutput): email.Header package is gone, use email.header instead,
and likewise replace use of email.Utils with email.utils
(SMTPOutput): Provide self.write_binary in terms of a BytesIO() object.
We cannot use StringIO since diffs may contain data in arbitrary encodings.
(StandardOutput): Provide self.write_binary in terms of stdout.buffer.
(PipeOutput): Provide self.write_binary in terms of pipe.stdin.
(Commit): Decode log message and paths from UTF-8 to string, and iterate
path groups from mailer.conf in sorted order.
(Lock): Decode directory entries from UTF-8 to string. Encode paths back
to UTF-8 when we ask libsvn_fs for a lock on a path.
Iterate path groups from mailer.conf in sorted order.
(DiffGenerator): Decode repository paths from UTF-8 to string.
(TextCommitRenderer): Decode author, log message, and path from UTF-8 to
string. Write diff data via write_binary, bypassing the re-encoding step.
(Config): Decode paths from UTF-8 to string before matching them against
regular expressions. Also decode the repository directory path from UTF-8.
* tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output.
File groups are now provided in stable sorted order. This should fix
spurious test failures in the future.
* tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long
integers and pass binary data instead of strings into libsvn_fs.
|