NAME

Mail::SpamAssassin::Plugin::AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning


SYNOPSIS

  loadplugin     Mail::SpamAssassin::Plugin::AutoLearnThreshold


DESCRIPTION

This plugin implements the threshold-based auto-learning discriminator for SpamAssassin's Bayes subsystem. Auto-learning is a mechanism whereby high-scoring mails (or low-scoring mails, for non-spam) are fed into its learning systems without user intervention, during scanning.

Note that certain tests are ignored when determining whether a message should be trained upon:

Also note that auto-learning occurs using scores from either scoreset 0 or 1, depending on what scoreset is used during message check. It is likely that the message check and auto-learn scores will be different.


USER OPTIONS

The following configuration settings are used to control auto-learning:

bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)

The score threshold below which a mail has to score, to be fed into SpamAssassin's learning systems automatically as a non-spam message.

bayes_auto_learn_threshold_spam n.nn (default: 12.0)

The score threshold above which a mail has to score, to be fed into SpamAssassin's learning systems automatically as a spam message.

Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6.

bayes_auto_learn_on_error (0 | 1) (default: 0)

With bayes_auto_learn_on_error off, autolearning will be performed even if bayes classifier already agrees with the new classification (i.e. yielded BAYES_00 for what we are now trying to teach it as ham, or yielded BAYES_99 for spam). This is a traditional setting, the default was chosen to retain backwards compatibility.

With bayes_auto_learn_on_error turned on, autolearning will be performed only when a bayes classifier had a different opinion from what the autolearner is now trying to teach it (i.e. it made an error in judgement). This strategy may or may not produce better future classifications, but usually works very well, while also preventing unnecessary overlearning and slows down database growth.