spamassassin and bayes db

List overview All Threads
Download

newer

older

Rust in Norwich

How to disable headphones mic...

Jenny Hopkins

25 Feb 2021 25 Feb '21

8:47 a.m.

Hello,

I might need to join the spamassassin mailing list for this Q, but just in case anyone here can help first:

I've got a mail set-up where exim4 hands mail to spamassassin before delivering to mailboxes local on the server. Users put any missed spam into missed-spam folders, and misfiled ham into missed-ham folders, and a cron job runs regularly to allow sa-learn to run learning from these folders.

The problem is - it says it is learning, but it isn't. It is letting through handfuls of the same spam over and over. It's as if SA is running without paying any attention to the bayes-db, which would be weird as that is what I thought was a core integral part of spamassassin. Am I missing something in the basic setup?

An example header of a missed spam shows something like:

X-Spam_report: Spam detection software, running on the system "example.co.uk", has NOT identified this incoming email as spam.

-then a few lines further down:

3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000]

Any ideas? It's driving me nuts.

Thanks, Jenny

Show replies by date

steve-ALUG＠hst.me.uk

25 Feb 25 Feb

5:57 p.m.

On 25/02/2021 08:47, Jenny Hopkins wrote:

...

Hello,

I might need to join the spamassassin mailing list for this Q, but just in case anyone here can help first:

I've got a mail set-up where exim4 hands mail to spamassassin before delivering to mailboxes local on the server. Users put any missed spam into missed-spam folders, and misfiled ham into missed-ham folders, and a cron job runs regularly to allow sa-learn to run learning from these folders.

The problem is - it says it is learning, but it isn't. It is letting through handfuls of the same spam over and over. It's as if SA is running without paying any attention to the bayes-db, which would be weird as that is what I thought was a core integral part of spamassassin. Am I missing something in the basic setup?

An example header of a missed spam shows something like:

X-Spam_report: Spam detection software, running on the system "example.co.uk", has NOT identified this incoming email as spam.

-then a few lines further down:

3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000]

Any ideas? It's driving me nuts.

Hi!

Sometimes incoming mail has fake Spamassassin headers to try and fool you that it's not spam. I don't suppose that's the case in this case.

in /etc/spamassassin

has v320.pre got loadplugin Mail::SpamAssassin::Plugin::Bayes

has v310.pre got

# AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning # loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

has local.cf got

use_bayes 1 bayes_auto_learn 1

if so then I guess it should work.

I seem to remember there's som sort of "gotcha" about global or per-user filtering. As the main user of email on this machine, I'm really the only one reporting spam & ham. Spamassassin runs as a daemon as root.

I report spam using something similar to

cd /home/USERACCOUNT/mail sudo sa-learn --mbox --spam SPAM_TRAINING_FOLDER

(replace mbox with other parameter if not using mbox format)

Because I've sudo-ed it, it goes to the root's training database.

I hope that's of help. If not, good luck!

Steve

Jenny Hopkins

6:42 p.m.

On Thu, 25 Feb 2021 at 17:58, steve-ALUG@hst.me.uk wrote:

...

On 25/02/2021 08:47, Jenny Hopkins wrote:

...
Hello,

I might need to join the spamassassin mailing list for this Q, but just in case anyone here can help first:

I've got a mail set-up where exim4 hands mail to spamassassin before delivering to mailboxes local on the server. Users put any missed spam into missed-spam folders, and misfiled ham into missed-ham folders, and a cron job runs regularly to allow sa-learn to run learning from these folders.

The problem is - it says it is learning, but it isn't. It is letting through handfuls of the same spam over and over. It's as if SA is running without paying any attention to the bayes-db, which would be weird as that is what I thought was a core integral part of spamassassin. Am I missing something in the basic setup?

An example header of a missed spam shows something like:

X-Spam_report: Spam detection software, running on the system "example.co.uk", has NOT identified this incoming email as spam.

-then a few lines further down:

3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000]

Any ideas? It's driving me nuts.

Hi!

Sometimes incoming mail has fake Spamassassin headers to try and fool you that it's not spam. I don't suppose that's the case in this case.

in /etc/spamassassin

has v320.pre got loadplugin Mail::SpamAssassin::Plugin::Bayes

has v310.pre got

# AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning # loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

has local.cf got

use_bayes 1 bayes_auto_learn 1

if so then I guess it should work.

I got very excited when I saw a reply, but unfortunately the answer to all of the above is yes.

...

I seem to remember there's som sort of "gotcha" about global or per-user filtering. As the main user of email on this machine, I'm really the only one reporting spam & ham. Spamassassin runs as a daemon as root.

I report spam using something similar to

cd /home/USERACCOUNT/mail sudo sa-learn --mbox --spam SPAM_TRAINING_FOLDER

(replace mbox with other parameter if not using mbox format)

Because I've sudo-ed it, it goes to the root's training database.

Hmm - I have a feeling my sa-learn is running as the user of the mailbox. I'll check that out in the morning.

...

I hope that's of help. If not, good luck!

I'll let you know the outcome of changing sa-learn to running as sudo.

Many thanks, Steve!

Jenny

steve-ALUG＠hst.me.uk

8:26 p.m.

On 25/02/2021 18:42, Jenny Hopkins wrote:

...

On Thu, 25 Feb 2021 at 17:58, steve-ALUG@hst.me.uk wrote:

...
On 25/02/2021 08:47, Jenny Hopkins wrote:

...
Hello,

I might need to join the spamassassin mailing list for this Q, but just in case anyone here can help first:

I've got a mail set-up where exim4 hands mail to spamassassin before delivering to mailboxes local on the server. Users put any missed spam into missed-spam folders, and misfiled ham into missed-ham folders, and a cron job runs regularly to allow sa-learn to run learning from these folders.

The problem is - it says it is learning, but it isn't. It is letting through handfuls of the same spam over and over. It's as if SA is running without paying any attention to the bayes-db, which would be weird as that is what I thought was a core integral part of spamassassin. Am I missing something in the basic setup?

An example header of a missed spam shows something like:

X-Spam_report: Spam detection software, running on the system "example.co.uk", has NOT identified this incoming email as spam.

-then a few lines further down:
3.5 BAYES_99               BODY: Bayes spam probability is 99 to
100% [score: 1.0000] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000]

Any ideas? It's driving me nuts.
Hi!

Sometimes incoming mail has fake Spamassassin headers to try and fool you that it's not spam. I don't suppose that's the case in this case.

in /etc/spamassassin

has v320.pre got loadplugin Mail::SpamAssassin::Plugin::Bayes

has v310.pre got

# AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning # loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

has local.cf got

use_bayes 1 bayes_auto_learn 1

if so then I guess it should work.
I got very excited when I saw a reply, but unfortunately the answer to all of the above is yes.

...
I seem to remember there's som sort of "gotcha" about global or per-user filtering. As the main user of email on this machine, I'm really the only one reporting spam & ham. Spamassassin runs as a daemon as root.

I report spam using something similar to

cd /home/USERACCOUNT/mail sudo sa-learn --mbox --spam SPAM_TRAINING_FOLDER

(replace mbox with other parameter if not using mbox format)

Because I've sudo-ed it, it goes to the root's training database.

Hmm - I have a feeling my sa-learn is running as the user of the mailbox. I'll check that out in the morning.

...
I hope that's of help. If not, good luck!

I'll let you know the outcome of changing sa-learn to running as sudo.

Many thanks, Steve!

Quick google If you want to run it as a per-user system, this post suggests how https://www.nesono.com/node/391

Whereas this starts from by bayes is not working https://stackoverflow.com/questions/42707466/spamassassin-bayes-not-working

It suggests checking the logs for errors, and points to, in the case discussed, spamassassin components are running with insufficient permissions. This is plausible. Log file checking is always a good place to start!

Good luck.

Steve

Ben Whyall

10:56 p.m.

Can you post a set of message headers here, anonymised all I'm really interested in is the headers related to spamassassin etc.

It looks like you might not be getting a high enough score for the message to be treated as spam. Do you get a total score ?

On 25/02/2021 18:43:35, Jenny Hopkins hopkins.jenny@gmail.com wrote: On Thu, 25 Feb 2021 at 17:58, wrote:

...

On 25/02/2021 08:47, Jenny Hopkins wrote:

...
Hello,

I might need to join the spamassassin mailing list for this Q, but just in case anyone here can help first:

I've got a mail set-up where exim4 hands mail to spamassassin before delivering to mailboxes local on the server. Users put any missed spam into missed-spam folders, and misfiled ham into missed-ham folders, and a cron job runs regularly to allow sa-learn to run learning from these folders.

The problem is - it says it is learning, but it isn't. It is letting through handfuls of the same spam over and over. It's as if SA is running without paying any attention to the bayes-db, which would be weird as that is what I thought was a core integral part of spamassassin. Am I missing something in the basic setup?

An example header of a missed spam shows something like:

X-Spam_report: Spam detection software, running on the system "example.co.uk", has NOT identified this incoming email as spam.

-then a few lines further down:

3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000]

Any ideas? It's driving me nuts.

Hi!

Sometimes incoming mail has fake Spamassassin headers to try and fool you that it's not spam. I don't suppose that's the case in this case.

in /etc/spamassassin

has v320.pre got loadplugin Mail::SpamAssassin::Plugin::Bayes

has v310.pre got

# AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning # loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

has local.cf got

use_bayes 1 bayes_auto_learn 1

if so then I guess it should work.

I got very excited when I saw a reply, but unfortunately the answer to all of the above is yes.

...

I seem to remember there's som sort of "gotcha" about global or per-user filtering. As the main user of email on this machine, I'm really the only one reporting spam & ham. Spamassassin runs as a daemon as root.

I report spam using something similar to

cd /home/USERACCOUNT/mail sudo sa-learn --mbox --spam SPAM_TRAINING_FOLDER

(replace mbox with other parameter if not using mbox format)

Because I've sudo-ed it, it goes to the root's training database.

Hmm - I have a feeling my sa-learn is running as the user of the mailbox. I'll check that out in the morning.

...

I hope that's of help. If not, good luck!

I'll let you know the outcome of changing sa-learn to running as sudo.

Many thanks, Steve!

Jenny

_______________________________________________ main@lists.alug.org.uk http://www.alug.org.uk/ https://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!

Jenny Hopkins

26 Feb 26 Feb

10:01 a.m.

On Thu, 25 Feb 2021 at 22:56, Ben Whyall ben@whyall-systems.co.uk wrote:

...

Hi

Can you post a set of message headers here, anonymised all I'm really interested in is the headers related to spamassassin etc.

It looks like you might not be getting a high enough score for the message to be treated as spam. Do you get a total score ?

On 25/02/2021 18:43:35, Jenny Hopkins hopkins.jenny@gmail.com wrote:

On Thu, 25 Feb 2021 at 17:58, wrote:

...
On 25/02/2021 08:47, Jenny Hopkins wrote:

...
Hello,

I might need to join the spamassassin mailing list for this Q, but just in case anyone here can help first:

I've got a mail set-up where exim4 hands mail to spamassassin before delivering to mailboxes local on the server. Users put any missed spam into missed-spam folders, and misfiled ham into missed-ham folders, and a cron job runs regularly to allow sa-learn to run learning from these folders.

The problem is - it says it is learning, but it isn't. It is letting through handfuls of the same spam over and over. It's as if SA is running without paying any attention to the bayes-db, which would be weird as that is what I thought was a core integral part of spamassassin. Am I missing something in the basic setup?

An example header of a missed spam shows something like:

X-Spam_report: Spam detection software, running on the system "example.co.uk", has NOT identified this incoming email as spam.

-then a few lines further down:

3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000]

Any ideas? It's driving me nuts.

Hi!

Sometimes incoming mail has fake Spamassassin headers to try and fool you that it's not spam. I don't suppose that's the case in this case.

in /etc/spamassassin

has v320.pre got loadplugin Mail::SpamAssassin::Plugin::Bayes

has v310.pre got

# AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning # loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

has local.cf got

use_bayes 1 bayes_auto_learn 1

if so then I guess it should work.

I got very excited when I saw a reply, but unfortunately the answer to all of the above is yes.

...
I seem to remember there's som sort of "gotcha" about global or per-user filtering. As the main user of email on this machine, I'm really the only one reporting spam & ham. Spamassassin runs as a daemon as root.

I report spam using something similar to

cd /home/USERACCOUNT/mail sudo sa-learn --mbox --spam SPAM_TRAINING_FOLDER

(replace mbox with other parameter if not using mbox format)

Because I've sudo-ed it, it goes to the root's training database.

Hmm - I have a feeling my sa-learn is running as the user of the mailbox. I'll check that out in the morning.

...
I hope that's of help. If not, good luck!

Hello,

Thanks so much for the responses. I've pasted up everything I know here, including headers: https://pastebin.com/nvZjqEzL

It looks as though I already changed sa-learn to run as root.

The last entry - with three spams placed in missed-spam then running the script, it reports 0 learning then deletes them (so it's looking in the right folder). The bayes_journal - I tried finding out about that error message and could only come up with that it was created on the fly.

Thanks, Jenny

steve-ALUG＠hst.me.uk

8:04 p.m.

On 26/02/2021 10:01, Jenny Hopkins wrote:

...

Hello, Thanks so much for the responses. I've pasted up everything I know here, including headers: https://pastebin.com/nvZjqEzL

It looks as though I already changed sa-learn to run as root.

Interesting. Your database is in a different place to mine. Mine's in /root/.spamassassin.

Also, I'm not using sa-exim but tweaks inside the exim config file. sa-exim website says it hasn't been maintained since 2006

See here: http://marc.merlins.org/linux/exim/sa.html That also says other ways of integrating spamassassin directly into exim.

Looking at this https://github.com/docker-mailserver/docker-mailserver/issues/365

I surmise that your spamassassin is running as user spamassassin. I think the alternative is running it as root.

Look at

sudo ps -Af | grep spamassassin

what user is it running as?

If it's NOT running as root, then sudo sa-learn will probably not work, because it will update root's database, rather than the spamassasin user's one in /var/lib/spamassasin.

If you're going to continue with the non-root user (I'm guessing it's "spamassassin"), then you'll have to adjust your learn script, perhaps to something like

sudo su - spamassasin sa-learn .....

or use sa-learn --dbpath SOMEPATH

Hmmm....

puzzling

Steve

Ben Whyall

27 Feb 27 Feb

10:23 p.m.

So it looks like there is a problem writing to the bayes_journal, I am guessing because of the user that the spamd is running.

However its definitely running the bayes on that message as its given it a score of .492334.

Its also just missing out low on the threshold to be detected as spam scoring 4.6 instead of 4.9

Looking at your config you havent changed any of the rule scores.

It may help you to adjust the x-spam-status to contain scored with the rules too.

Ben On 26/02/2021 10:01:43, Jenny Hopkins hopkins.jenny@gmail.com wrote: On Thu, 25 Feb 2021 at 22:56, Ben Whyall wrote:

...

Hi

Can you post a set of message headers here, anonymised all I'm really interested in is the headers related to spamassassin etc.

It looks like you might not be getting a high enough score for the message to be treated as spam. Do you get a total score ?

On 25/02/2021 18:43:35, Jenny Hopkins wrote:

On Thu, 25 Feb 2021 at 17:58, wrote:

...
On 25/02/2021 08:47, Jenny Hopkins wrote:

...
Hello,

I might need to join the spamassassin mailing list for this Q, but just in case anyone here can help first:

I've got a mail set-up where exim4 hands mail to spamassassin before delivering to mailboxes local on the server. Users put any missed spam into missed-spam folders, and misfiled ham into missed-ham folders, and a cron job runs regularly to allow sa-learn to run learning from these folders.

The problem is - it says it is learning, but it isn't. It is letting through handfuls of the same spam over and over. It's as if SA is running without paying any attention to the bayes-db, which would be weird as that is what I thought was a core integral part of spamassassin. Am I missing something in the basic setup?

An example header of a missed spam shows something like:

X-Spam_report: Spam detection software, running on the system "example.co.uk", has NOT identified this incoming email as spam.

-then a few lines further down:

3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000]

Any ideas? It's driving me nuts.

Hi!

Sometimes incoming mail has fake Spamassassin headers to try and fool you that it's not spam. I don't suppose that's the case in this case.

in /etc/spamassassin

has v320.pre got loadplugin Mail::SpamAssassin::Plugin::Bayes

has v310.pre got

# AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning # loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

has local.cf got

use_bayes 1 bayes_auto_learn 1

if so then I guess it should work.

I got very excited when I saw a reply, but unfortunately the answer to all of the above is yes.

...
I seem to remember there's som sort of "gotcha" about global or per-user filtering. As the main user of email on this machine, I'm really the only one reporting spam & ham. Spamassassin runs as a daemon as root.

I report spam using something similar to

cd /home/USERACCOUNT/mail sudo sa-learn --mbox --spam SPAM_TRAINING_FOLDER

(replace mbox with other parameter if not using mbox format)

Because I've sudo-ed it, it goes to the root's training database.

Hmm - I have a feeling my sa-learn is running as the user of the mailbox. I'll check that out in the morning.

...
I hope that's of help. If not, good luck!

Hello,

Thanks so much for the responses. I've pasted up everything I know here, including headers: https://pastebin.com/nvZjqEzL

It looks as though I already changed sa-learn to run as root.

Thanks, Jenny

Jenny Hopkins

8 Mar 8 Mar

11:50 a.m.

Hello All,

So I sat down and tried making spamassassin write to individual databases in /home/$user/spamassassin. I followed this guide that Steve suggested: https://www.nesono.com/node/391

I adapted it to fit with the debian structure i.e. none of the vhome thing. However it all fell over when I tried to amend the entry in /etc/default/spamassassin as suggested: OPTIONS="--create-prefs --max-children 5 --helper-home-dir --virtual-config-dir=/vhome/users/%u/spamassassin -x -u vmail" - again, tried to adapt for debian but %u wasn't picked up so sa-learn coudln't locate the dbs.

and also when I tried to tell spamassassin the bayes_path was in each home directory.

Any ideas before I completely give up?

Many thanks for your continued patience. Jenny

On Sat, 27 Feb 2021 at 22:23, Ben Whyall ben@whyall-systems.co.uk wrote:

...

Hi

So it looks like there is a problem writing to the bayes_journal, I am guessing because of the user that the spamd is running.

However its definitely running the bayes on that message as its given it a score of .492334.

Its also just missing out low on the threshold to be detected as spam scoring 4.6 instead of 4.9

Looking at your config you havent changed any of the rule scores.

It may help you to adjust the x-spam-status to contain scored with the rules too.

Ben

On 26/02/2021 10:01:43, Jenny Hopkins hopkins.jenny@gmail.com wrote:

On Thu, 25 Feb 2021 at 22:56, Ben Whyall wrote:

...
Hi

Can you post a set of message headers here, anonymised all I'm really interested in is the headers related to spamassassin etc.

It looks like you might not be getting a high enough score for the message to be treated as spam. Do you get a total score ?

On 25/02/2021 18:43:35, Jenny Hopkins wrote:

On Thu, 25 Feb 2021 at 17:58, wrote:

...
On 25/02/2021 08:47, Jenny Hopkins wrote:

...
Hello,

I might need to join the spamassassin mailing list for this Q, but just in case anyone here can help first:

I've got a mail set-up where exim4 hands mail to spamassassin before delivering to mailboxes local on the server. Users put any missed spam into missed-spam folders, and misfiled ham into missed-ham folders, and a cron job runs regularly to allow sa-learn to run learning from these folders.

The problem is - it says it is learning, but it isn't. It is letting through handfuls of the same spam over and over. It's as if SA is running without paying any attention to the bayes-db, which would be weird as that is what I thought was a core integral part of spamassassin. Am I missing something in the basic setup?

An example header of a missed spam shows something like:

X-Spam_report: Spam detection software, running on the system "example.co.uk", has NOT identified this incoming email as spam.

-then a few lines further down:

3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000]

Any ideas? It's driving me nuts.

Hi!

Sometimes incoming mail has fake Spamassassin headers to try and fool you that it's not spam. I don't suppose that's the case in this case.

in /etc/spamassassin

has v320.pre got loadplugin Mail::SpamAssassin::Plugin::Bayes

has v310.pre got

# AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning # loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

has local.cf got

use_bayes 1 bayes_auto_learn 1

if so then I guess it should work.

I got very excited when I saw a reply, but unfortunately the answer to all of the above is yes.

...
I seem to remember there's som sort of "gotcha" about global or per-user filtering. As the main user of email on this machine, I'm really the only one reporting spam & ham. Spamassassin runs as a daemon as root.

I report spam using something similar to

cd /home/USERACCOUNT/mail sudo sa-learn --mbox --spam SPAM_TRAINING_FOLDER

(replace mbox with other parameter if not using mbox format)

Because I've sudo-ed it, it goes to the root's training database.

Hmm - I have a feeling my sa-learn is running as the user of the mailbox. I'll check that out in the morning.

...
I hope that's of help. If not, good luck!

Hello,

Thanks so much for the responses. I've pasted up everything I know here, including headers: https://pastebin.com/nvZjqEzL

It looks as though I already changed sa-learn to run as root.

The last entry - with three spams placed in missed-spam then running the script, it reports 0 learning then deletes them (so it's looking in the right folder). The bayes_journal - I tried finding out about that error message and could only come up with that it was created on the fly.

Thanks, Jenny

1629

Age (days ago)

1640

Last active (days ago)

main@lists.alug.org.uk

8 comments

3 participants

tags (0)

participants (3)

Ben Whyall
Jenny Hopkins
steve-ALUG＠hst.me.uk