GSoC/GCI Archive
Google Code-in 2014 Wikimedia Foundation

Enhance the BounceHandler extension to differentiate between permanent and temporary bounces effectively

completed by: m4tx

mentors: Kunal, Tony Thomas

What you need to do in a single line: Create a PHP regex expression to extract the SMTP code from the bounce email and only call the bounce processing scripts if the bounce is a hard bounce.

 

The BounceHandler extension (see https://www.mediawiki.org/wiki/Extension:BounceHandler) is used in MediaWIki to handle its email bounces effectively. It generates a VERP (https://www.mediawiki.org/wiki/VERP) 'Return-Path' address header corresponding to every send email from the Wiki and processes an incoming bounce email to take actions on the failing recipient. The bounce email is HTTP POSTed from the mail server to the extension API, from where the bounce is stripped of its headers and the bounce information is stored in a table. If the number of bounces for a user exceeds a defined limit (say 3 in a week), the user is email - unconfirmed.

A bounce is an incoming delivery failure notification mail, and there are many points where the delivery can fail, for example:

  • DNS lookup failure (Permanent failure)
  • Network failure (Temporary failure)
  • Remote server could be overloaded (Temporary failure)
  • Remote server might blacklisted wikimedia.org or wiki@wikimedia.org (Temporary failure)
  • Remote server could say example@gmail.com is a bad address (Permanent failure)
  • Remote server could say example@gmail.com is over quota (Temporary failure)

and a lot more. Each case can result in the mailserver currently handling the transaction to originate a bounce message. Actions needs to be taken only against the permanent bounces as those are recipient specific. Incorrect actions taken on a number of temporary bounces (may be due to a network error) can get a lot of users getting un-subscribed.

Currently, we have only a check to ensure that a header X-Failed-Recipients: <failed_recipient@domain> exists in https://github.com/wikimedia/mediawiki-extensions-BounceHandler/blob/master/includes/ProcessBounceWithRegex.php#L38 to confirm it to be a permanent bounce. This must be further enhanced to read the failure SMTP codes ( http://www.serversmtp.com/en/smtp-error ) from every bounce email (every bounce has one) and then effectively judge it to be temporary or permanent bounce. This is employed in various advanced bounce handlers and MediaWiki should too have one.

Skills/ Requirements required: Basic / intermediate knowledge in PHP so that you can read through the BounceHandler extension code on https://github.com/wikimedia/mediawiki-extensions-BounceHandler and get an idea how the bounce email is processed by the 'bouncehandler' API. Reading through the entire extension code is not required if you can reach to the regex expression directly and test it on a sample bounce email. Have a basic idea about email, email bounces, SMTP, mail server.

Students are required to read Wikimedia's general instructions at https://www.mediawiki.org/wiki/Google_Code-in_2014#Instructions_for_GCI_students first.