Skip navigation links
IT services and product development
Menu
TwoLogs
IT services and product development

Spam relay via web forms

Introduction

A while ago, a new form of spamming has seen the light; your on-line web forms are misused to send spam in your name.  A technique is employed called 'e-mail header injection'.  When your web form doesn't send out e-mails (e.g. if the data from your web form is entered in a database), this is just a (big) nuisance.  However, when your form does send out e-mails, your server might be sending out spam already and as a result your server could be blacklisted as a source of spam emails.

The following is an explanation of how this form of spamming works, giving you some behind-the-scenes insight on what is going on.  You can use this information to secure your own web form(s), and then test your web form using our form test page to check if it is secure.

More information on this issue can be read on e.g. Anders Brownworth's site and on SpamLinks; more information can also be found e.g. via Google.

Note at forehand: the information on this page is based on live observations and script programming insights.  The information could thus be incorrect in some places.  I'm more of an ASP web developer, but I think I know enough of PHP to give some examples for PHP scripts as well on this page.  If I'm wrong, please correct me by sending me an email!  You can do so via the on-line contact form.

Part 1: What spam robots do

The spammers first need to know which web forms are vulnerable to this form of spamming.  They employ applications that automatically scan the Internet for web forms.  Once these applications find a web form, they test whether it is vulnerable.  These automatic applications will from now on be called 'spam robots' or 'spam bots'.

A visitor using a normal browser will see your on-line web form, fill in the form fields and then click the submit button.  The browser will then send this information to the script that runs behind the form on your server.  The spam robots do something likewise; they scan your web form, determine which fields it holds, make up the information to put in the form fields and then send one or more test rounds of made-up information to your form's server script.  In this test phase the spam robot uses the e-mail address of the spammer itself, so that when the spammer finds an e-mail in his inbox, he knows your form is vulnerable.  Once a vulnerable form is discovered, your form is attacked using the same technique, and the actual spamming will start.

The made-up information that the spam robots send to your web form script is specially formatted.  The special formatting is used to bypass your script's security so it will trigger your server script to send out spam in your name.  The name of this spamming technique ('e-mail header injection') describes what is actually going on; the special data format of the made-up information could potentially add e-mail headers to the e-mail your script sends out.  The current wave of spam robots try to add a 'Bcc' header field (Blind carbon copy) to your outgoing e-mails.  This way they try to add a Bcc recipient so that your script not only sends the e-mail to the person your script intended it to be sent to, but also sends the message to other people as well.

Part 2: How web forms work

A web form contains fields, whose values can be specified by your visitors.  There are some different type of fields that you can use; normal one-line text fields, multi-line text fields, checkboxes, etc.

When the user has filled in the fields and presses the 'Submit' (or 'Send') button, the browser gathers all values in the fields and sends them to the script on your server.  To do this, the browser has to know some additional information; it needs to know where to send the data to (the internet address of your script), and how to send the data.  The HTML source code for the form thus contains this information; a typical form will look somewhat like this:

<form action=... method=...>

What is specified after 'action=' is the address of your script.  This address may contain a path to your script, but it at least contains your script's filename.

What is specified after 'method=' is the way the browser needs to send the data.  Two options are commonly used; 'GET' and 'POST'.  'GET' sends the data to the server via parameters in the URL (e.g. http://www.example.com/scripts/formmail.php?name=john&email=john.doe@example.com).  When you use 'GET', the way you typically retrieve the values in an ASP script is:

ASP: Request.QueryString('formfieldname')

I don't know the equivalent for PHP.

The 'POST' method however is used more commonly, and this method seems to be the target of these spam attempts.  When the method is 'POST', the browser gathers all information on your form, makes one big overview of it, and sends this chunk of data to your script.  In your script, you access this data in the following way:

ASP: Request.Form('formfieldname')
PHP: $_POST['formfieldname']

Note that other programs can do the same thing a standard browser can do; the way form data must be submitted to server scripts is standardized, so you can write a program that does just that: send form data to web scripts.  If that program follows the standards, your script will not even know the difference between the custom made program and a regular web browser.

This seems to be what the spam bots do; they browse the web for pages with forms on them.  When they find a form. they inspect the form's settings ('method', 'action' and what fields the form contains), and then they send faked form data to your script, hoping your script is vulnerable to spam attempts.  This means that changing the filename of your script may work for the moment, but if the spam bot gets smarter (maybe it is already smart enough?), it will automatically pick up on your new filename when it visits your web page again; your script's filename has to be specified in your web form.  Changing the filename of your script is thus a short-term solution at best.

Any javascript you add to the form to validate it before the form is submitted to your script, runs in your visitor's browser.  But, when you're not dealing with a regular visitor but with a spam bot instead, the javascript will not run (the spam bot only sends data; it doesn't actually visit and process the form page).  Thus, validating your form for correct input via javascript that runs in the browser will probably not suffice for dealing with spam bots.
Another thing you could do is using hidden form fields.  You use javascript (that only runs in your visitor's browser) to set the value of this field to a certain 'password'; e.g. when the user presses the 'Submit' or 'Send' button.  The server script can then check the value of this field; if it doesn't contain the password, the form could be sent by a spam bot, but your visitor could also have javascript disabled in his browser.

Part 3: How mail scripts work

You mail script gets called when the user hits the 'Submit' button on your web form, or when the spam bot feels like spamming your script for the thirtieth time again today.  In your mail script, you are using some email sending object or function (which one depends on the scripting language and server platform you use).  In ASP e.g. you could use the 'CDONTS' object or the 'JMail.SMTPMail' object to send emails; in PHP you could be using the 'mail' function.

These email sending objects/functions need to know where to send the email to, what subject to give the email, what the contents of the email body need to be, etc.  Your script supplies this information to the email sending object/function.

Part 4: How email works

An email basically consists of two parts; the message body and the header.

The message body contains the actual content of the email, and will be displayed by the email program of the receiver.  It can be either just a chunk of plain text, or it can be split into so-called mime parts (e.g. a plain textual part, an HTML formatted textual part, an attachment part, etc.).

The headers control how the email is handled by the email servers; it contains information on where to send the email to, the email's subject, who sent the email and when, etc.; some of this information will also be displayed in your email program.

No matter what your script puts in the message body, it will not influence where the mail will end up; what your script puts in the body will be displayed to the email's recipient, but that's about it.  What your script puts in the email headers though does influence where the email will end up.  The headers are thus of primary concern to the spammers, since they need to control where your script sends the email to.  The message body is also of concern, since the spammers need to get a message across to their audience, but if the headers can't be manipulated, you have stopped the spammer dead in it's tracks.

Part 5: How the email header injection works

Behind the scenes, the email headers are just one big chunk of text.  Each line in the header contains a so-called header field and it's value.  Header fields can e.g. be 'From:', 'Subject:', 'To:', and also 'Bcc:'.  These header fields can be set via the email sending object/function you use.  The message body is separated by a blank line from the headers.  To give you an idea, an (incomplete) example email could be:

From: "John Doe" <john@example.tld>
To: <me@example.tld>
Subject: What's going on?
Date: Mon, 12 Sep 2005 17:29:48 +0200

Hi,

Do you know what's going on?

Greets, John

When you specify a value for an email header to your email sending object/function, it will be used in the headers; the email object/function will add the correct email header line (field and value).  However, what will happen if the data you assign to the email header field contains more than just one line of text?  The email sending object/function might just insert what you specify in the headers, without checking it for validity first.  Suppose we were not to assign 'What's going on?' to the subject field, but 'What's going on?\r\nBcc: spammer@example.tld' (\r\n stands for a line separator).  The example email header above would then become:

From: "John Doe" <john@example.tld>
To: <me@example.tld>
Subject: What's going on?
Bcc: spammer@example.tld
Date: Mon, 12 Sep 2005 17:29:48 +0200

Hi,
...

Hey!  We just formed an email that will not only be sent to us, but that will be Bcc'ed to the spammer as well!

When you use the values of your form fields directly in your email sending object/function, you might be vulnerable for this type of spamming attack.  Note that regular users can't enter more than one line of text in the web form, since the input controls you use on your form are probably one-line controls.  However, since the spammer doesn't rely on your web form but sends the form data itself, he can send you whatever it wants; a single line of text, or multiple lines of text to try to inject an email header field.

Part 6: How to protect your script

If your email sending object/function doesn't check it's input itself, it is vital that you do the validity checking in your own code before you assign anything to the email header fields.  Most scripts just use e.g. the following style of code, though:

ASP: oMail.From = Request.Form('email')
PHP: mail('me@example.tld', $_POST['email'], ...);

If the form field value contains more than one line of code, no-one is going to notice, and your script might be vulnerable to spamming attempts (depending on whether the email sending object/function does some validity checking of it's own).  There are several things you can do from here.

One thing you could do, is to make sure that everything you feed into your email sending object/function that is destined for the email header, will not be more than one line of text.  The way to do this is to replace all line separators by e.g. a space.  Make sure that you not only replace all occurrences of '\n' (character code 10) by a space, but also all the '\r' characters (character code 13).  This will have the following effect on the above example spam attempt:

From: "John Doe" <john@example.tld>
To: <me@example.tld>
Subject: What's going on?  Bcc: spammer@example.tld
Date: Mon, 12 Sep 2005 17:29:48 +0200

Hi,
...

Note the 'Bcc: ...' is not on it's own line anymore, and the email server will thus not recognize it; the mail won't be sent to the spammer.  The subject will get a bit messy though; 'What's going on? Bcc: spammer@example.tld' :)  When the spam bot targeted another email header field, the results may be different; e.g. the 'Bcc: ...' line may get tagged on to the 'From: ...' line.  Maybe these messages can still be delivered to you (it depends on the forgivingness of the email server), but it will likely raise an error at the email server, thus effectively cancelling the message.
So, replacing line separators will be enough to stop any spam attempt; you yourself still get to see (some of) the (test) messages though.  Note that replacing line separators in the email message body isn't necessary; only the information that will end up in the email headers needs to be protected.

A more sophisticated thing you could do is to check the form's data, look if anything fishy is going on, and cancel sending the email if something is wrong with the data.  If your form only contains single-line text fields for the data that ends up in the email headers, a simple thing to do is to check for line separators.  A regular visitor cannot enter more than one line of text in his browser for such forms, so if you encounter a line separator, you know it was sent by a spam bot.
You could of course also check on other characteristics of a spam attempt.  What you can do is e.g. check the form values for the existence of the phrase 'Bcc:'.  This will of course stop the spam attempt, but note that your visitor can also enter that text in a valid way; e.g. via a subject:

Tried to Bcc: you, but it didn't work

Checking for 'Bcc:' thus does have it's risks.  The same goes for detecting the mime headers the spam bot wants to insert (although a typical mime definition is complex enough not to be typed by regular users).  Another thing to note is that the spam bots will eventually get smarter and might start using e.g. the 'Cc:' or the 'To:' fields for the same purpose, so scanning for 'Bcc:' might only be a short-term solution as well.
Scanning your form input for specific senders is also a short-term solution at best; it appears the spammers use different email addresses from time to time, so it is impractical to keep your email blacklist up-to-date.  Blocking e-mails from certain IP addresses is also not an option, because the spam bots seem to be running from unknowing compromised hosts.

So, overall, it seems like checking for line separators (and possibly cancelling the email if you find them) is the safest way to go; spam will get caught and there is no way you and your visitors would be hindered using your web form.

On our web server, we have IIS 6.0 and we are using ASP with the CDO email object.  We checked what would happen if a spam bot would send invalid (multi-line) data, but it appeared the CDO object already strips line separators from it's input and replaced them by spaces.  We still patched our script, though :).  For now, we just replace line separators by spaces; our website has already been scanned rigorously by spam robots, but we like to know what techniques the spammers use when they try a new scan attempt.