Sat, 03 May 2008
reCAPTCHA
CAPTCHAs are a name for programs designed to test if they are being used by another computer (a "bot") or by a humamn. They do this by asking the user to do a task that presumably can't be done by a computer; for example, reading obscured words.
reCAPTCHA is a well-known CAPTCHA service that takes images from the Internet Archive's book scanning project. Some words are hard
But as for spam in MediaWiki, it seems that simply using the blacklists mentioned earlier is not enough; the Reed Free Culture wiki (for example) has been spammed beyond recognition with link spam. So I am deploying reCAPTCHA to show a CAPTCHA to users when they register, and showing a CAPTCHA to anonymous users who try to add links.
P.S. Attentive people may consider a personal link I have to the Internet Archive's book scanning project. That has nothing to do with my liking reCAPTCHA. (-:
Tue, 22 Apr 2008
Mediawiki antispam: SpamBlacklist
I end up maintaining a bunch of MediaWiki wikis. So far, here is what I do to keep them low in spam, high in ham.
Note that I have a bias to wanting to accept anonymous edits.
Use SpamBlacklist
Wikimedia maintains a list of bad domains that are linked-to by spammers. The famous chongqed.org maintains a similar list. The SpamBlacklist extension prevents saves with URLs that match patterns listed in a blacklist. Blocking this way is important, even if anonymous edits are disallowed, because many bots seem to register for accounts. Blocking this way is important, even if CAPTCHAs are enabled, because there seem to be spammers who sit at their computers and spam (or alternately who solve CAPTCHAs and then let their bots run (not that I've ever done that....)).
To use it, just:
- Check it out of their svn
- Configure a cron job to get the Chongqed and MW blacklists locally, and configure $wgSpamBlacklistFiles as appropriate.
- Don't forget to read the official docs.
- Caution: The Chongqed list blocks lots of .edu domains. I "grep -v" them out.