#10417 spam on various mailing lists
Closed: Fixed with Explanation 2 years ago by smooge. Opened 2 years ago by zsun.


Metadata Update from @smooge:
- Issue assigned to smooge

2 years ago

Metadata Update from @smooge:
- Issue priority set to: None (was: Needs Review)
- Issue tagged with: high-gain, low-trouble, security

2 years ago

Other lists hit:

fedocal
ibus-sayura-users
flockinfo
eng-service
irc-support-sig
flock-planning
flock-attendees
env-and-stacks
badges
logistics
classroom
cwg
fonts
flockinfo

Thanks. I have deleted the singletons.. The other lists have thousands of spam on them which needs a script to automate through.

ok, we have mostly cleaned this up.

  • users from that domain (37) were all disabled.
  • That domain is blocked from making more accounts.
  • We deleted a lot of the spam from archives.

Sadly, there's still a bunch on these 4 lists:

1360 tinykdump
1338 ibus-sayura-users
1128 fedocal
688 matahari

clicking delete 4,000 times doesn't scale very well. ;( We need to poke the db or create a script to mass delete these.

Need to ask @misc if they have a script to do a mass delete of archives on their mailman3 as doing it via clicks is really slow. It may have to wait until we upgrade as I think I see some 3.3 options which we don't have which would fix it.

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee

2 years ago

Are these lists set to allow posts from non-subscribers, or are they subscribing? If we do have any lists that are open-posting, I think we should change that, and ideally remove the option to change it back.

On Sun, 12 Dec 2021 at 13:07, Matthew Miller pagure@pagure.io wrote:

mattdm added a new comment to an issue you are following:
``
Are these lists set to allow posts from non-subscribers, or are they subscribing? If we do have any lists that are open-posting, I think we should change that, and ideally remove the option to change it back.

The accounts were all created as valid users in the Fedora project
system and then logged into mailman3 web interface and put the
messages. The main lists they filled are ones which should probably be
closed completely as the only traffic on them is spam related. [The
fedocal and flock-planning had 7 years of held spam from non-members
and 2000 emails from the spammers after Kevin placed the lists under
emergency moderation.]

``

To reply, visit the link below or just reply to this email
https://pagure.io/fedora-infrastructure/issue/10417

--
Stephen J Smoogen.
Let us be kind to one another, for most of us are fighting a hard
battle. -- Ian MacClaren

Running the Chinese text below the links through google translate was fascinating. It appears to be snippets of Chinese poetry randomly concatenated.

Although it would be ideal to use some type of open source machine learning algorithm:

Here is what I have found regarding that so far:
https://awesomeopensource.com/projects/machine-learning/spam-filtering

spammy seems to be the best/fastest but it isn't in python 3 yet:
https://release-monitoring.org/project/241648/

I use pydspam, which uses the milter api for sendmail or postfix, and is ported to python3. CONS: But it depends on dspam - which has been dropped from Fedora, probably because upstream seems to be dead.

I like the design of dspam because it simple and elegant. a) tokenize input with special attention to email headers (e.g. header names are tokens). b) database with spam/ham stats by token c) simple Bayes calculation based on tokens of new message and stats from database

pydspam wraps libdspam in a python API.

spammy seems the same idea (naive Bayes) as dspam. If it needs porting to py3, that might be something i could do.

spammy seems the same idea (naive Bayes) as dspam. If it needs porting to py3, that might be something i could do.

https://github.com/tasdikrahman/spammy/issues/9

Reviewing the spammy system, it does not seem to have any special treatment for email headers. A big part of the effectiveness of dspam was that a word, e.g. "FREE", in the Subject header was a different token than the same word in another header or the message body.

I suppose that could be added later.

I am going to close this ticket as the major issue is 'fixed' and the longer term ones need to be filed in as a separate initiative.

Metadata Update from @smooge:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata