Protection agains spam #299

sirex · 2020-12-29T11:31:27Z

I just finished cleaning up spam users. It looks, that Spirit does not have any protection against Spam, because I had to clean about 6000 spam users, with random user names and random emails.

I will try to hack something, to add protection against Spam, but Spirit should definitely have this built in.

Also, since Spirt does not have good moderation tools, I had to delete spam users directly for Python shell, but now I have incorrect comment numbers, errors when Spirit tries to jump to a page which no longer exists. So it would be nice to have a script, that would update all that.

sirex · 2020-12-29T11:38:54Z

Currently I used following script that generates a Python code, that deletes users with all their content:

from django.contrib.auth.models import User
from textwrap import wrap


lines = []
for user in User.objects.order_by('-date_joined')[:100]:
    lines += ['', f"{user.pk:>6},  # {user.username:<20} {user.email:<40}"]
    last_comment = getattr(user.st_comments.order_by('-date').first(), 'comment', None)
    if last_comment:
        lines += [''] + wrap(
            last_comment,
            initial_indent=(' ' * 8) + ' # ',
            subsequent_indent=(' ' * 8) + ' # ',
            max_lines=8,
            width=72,
        )

lines = '\n'.join(lines)
print(f"\nUser.objects.filter(pk__in=[\n{lines}\n]).delete()\n\n")

I just review generated script, remove all non-spam users and run this code.

nitely · 2020-12-29T13:21:48Z

Deleting users is probably never a good idea. They can just register again with the same email. You should deactivate their account instead. Hard deleting topics, and comments will break notifications, bookmarks, and possibly other things.

There is a very simple registration protection, that may help against bots, but humans will bypass any protection anyway. There are a few things that may help, like not allowing new users to post links, or having a queue of messages that trusted users/mods can review and approve (ala stackoverflow); but things a like captchas are annoying and useless against humans.

There should be a way to soft delete all topics/comments by user.

I wonder, what kind of spam did you get?

sirex · 2020-12-29T14:35:00Z

Added reCapcha, will see if it helps: sirex/ubuntu.lt@bb00c03...959453e

Spam is generated no by humans, but by bots, some how they are easily able to go through all the email verification. And email addresses are not issue, they use a random email every time.

Here is a few examples:

username             email
MartinHoike          [email protected]
Ererticeque          [email protected]
SpalSauro            [email protected]
NopdiepZicietS       [email protected]
Unilypespoiziz       [email protected]
prathyantash         [email protected]
bragreetweda         [email protected]
absoluteweddingstudio [email protected]
GonnerturneCep       [email protected]
JasonMix             [email protected]

Quite quickly I reached the point, that most users are spam users, more precisely there were about 6000 real users and more than 6000 spam users.

And regarding messages, some topics have 3 real user posts, and 1000 fake spam user posts. If I would mark those posts as deleted I would see endless pages of deleted posts.

So this is the case, where most of the content is generated by fake spam users and there is no point keeping that spam generated content in database.

I hope reCaptcha will help. And next thing, to fix some how incorrect comment counts and redirects to a in-existing page.

Forum in question is https://ubuntu.lt/.

Registration form with reCaptcha looks like this: https://ubuntu.lt/user/register/

sirex · 2020-12-29T14:42:23Z

I would be really surprised, if all this spam would be generated by real humans. My guess, that spam bots just became very sophisticated. And without serious anti-spam protection, they can ruin a community forum in days. And ubuntu.lt community forum exists for more than 15 years.

nitely · 2020-12-29T15:04:49Z

Registration form with reCaptcha looks like this: https://ubuntu.lt/user/register/

Am I supposed to see the captcha right away? I see there is a "captcha" label, but that's it... there is no captcha.

I would be really surprised, if all this spam would be generated by real humans.

If they are bots then the captcha should help. Let me know how it goes, it may be worth to add it as an optional feature.

And regarding messages, some topics have 3 real user posts, and 1000 fake spam user posts. If I would mark those posts as deleted I would see endless pages of deleted posts.

That's a good point.

sirex · 2020-12-29T15:42:10Z

Am I supposed to see the captcha right away? I see there is a "captcha" label, but that's it... there is no captcha.

There are several reCaptcha versions, I'm using latest v3, which some how detects bots automagically without showing an image or something like that. Older versions shows some images and asks to enter what is in that image.

sirex · 2020-12-29T20:38:32Z

I think, I managed to fix comment count and last active date for topics with this query:

from spirit.topic.models import Topic
from spirit.comment.models import Comment
from django.db.models import Case, When, Value, Exists, OuterRef, Subquery, Count, Max


Topic.objects.update(
    comment_count=Case(
        When(
            Exists(
                Comment.objects.
                filter(
                    topic_id=OuterRef('id'),
                    is_removed=False,
                    action=Comment.COMMENT,
                )
            ),
            then=Subquery(
                Comment.objects.
                filter(
                    topic_id=OuterRef('id'),
                    is_removed=False,
                    action=Comment.COMMENT,
                ).
                values('topic_id').
                order_by('topic_id').
                annotate(comment_count=Count('*')).
                values('comment_count')[:1]
            ),
        ),
        default=Value(0),
    ),
    last_active=Subquery(
        Comment.objects.
        filter(topic_id=OuterRef('id')).
        values('topic_id').
        order_by('topic_id').
        annotate(last_active=Max('date')).
        values('last_active')[:1]
    ),
)

sirex · 2020-12-29T21:26:14Z

Last fix with bookmarks pointing to a comment number, that no longer exists. The fix is not perfect, it only ensures, that comment number is not greater than total number of comments on a topic. So this does not guarantee, that bookmark points to correct last seen comment, but it ensures, that does not end up on a 404 page, when comment number points to a non existing comment.

The query used was:

from spirit.comment.bookmark.models import CommentBookmark
from django.db.models import Case, When, OuterRef, Subquery, Count, F, PositiveIntegerField


CommentBookmark.objects.update(
    comment_number=Subquery(
        CommentBookmark.objects.
        filter(id=OuterRef('id')).
        values('user_id', 'topic_id', 'comment_number').
        order_by('user_id', 'topic_id', 'comment_number').
        annotate(
            total_comments=Count('topic__comment'),
        ).
        annotate(
            comment_number_=Case(
                When(
                    comment_number__gt=F('total_comments'),
                    then=F('total_comments'),
                ),
                default=F('comment_number'),
                output_field=PositiveIntegerField(),
            ),
        ).
        values('comment_number_')[:1],
    ),
)

sirex · 2020-12-29T21:44:17Z

So in summary, in order to improve protection against spam Spirit needs following things:

Email verification no longer protects against spam bots, so Spirit should also provide other anti spam tools, for example new user confirmation, a question challenge for new users or integration with external anti spam services like reCaptcha.
There should be a separate new users moderation page, where moderators could see all new users and approved registration or mark users as spam. If user is marked as spam, then all users comments are also marked as spam. Currently /st/admin/user/ does not have anything like this, it does not even have a link to all user comments, the only option to go manually though all user comments and remove then one by one, which in my case, would take forever.
If comment is marked as spam, it should not show up in the topic, it should completely disappear, decreasing bookmark number, topic comment count and last active date.

If these features would be available, then spam bots would no be able to attack the forum at such a massive scale as it happened with ubuntu.lt community forum.

sirex · 2021-02-19T12:17:35Z

Now, more than a month has passed, and during that time, I found 4 new spam users, it looks, that at least two of them were manually created users. So it looks reCAPTCHA did the job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protection agains spam #299

Protection agains spam #299

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

nitely commented Dec 29, 2020 •

edited

Loading

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

nitely commented Dec 29, 2020 •

edited

Loading

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

sirex commented Feb 19, 2021

Protection agains spam #299

Protection agains spam #299

Comments

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

nitely commented Dec 29, 2020 • edited Loading

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

nitely commented Dec 29, 2020 • edited Loading

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

sirex commented Dec 29, 2020

sirex commented Feb 19, 2021

nitely commented Dec 29, 2020 •

edited

Loading

nitely commented Dec 29, 2020 •

edited

Loading