General ASSP Questions
From ASSPSMTP
Return to Documentation Home
General ASSP Questions
ASSP overview questions and answers are here.
2003-Nov-14 1:48pm jhanna
Security Considerations
As a proxy, ASSP passes through most of your host mail transport’s security features and vulnerabilities. It also represents a running service accepting connections from the Internet public. Perl in general has a good track record of offering few vulnerabilities. As a proxy, ASSP’s only input/output is socket based, so that limits its exposure. ASSP never opens files with user-inputted names and never shells to the operating system. In a *nix environment you will want to use ASSP’s ability to run as a non-root user. You may also consider running it in a chroot jail. To do this set the ChangeRoot variable in the configuration to set to your ASSP directory and copy (or link) the /etc/protocols file into a etc/protocol file in the ASSP directory. The collections of spam and non-spam email may represent a security risk, and access should be restricted to mail administrators. The non-spam email collection will certainly contain sensitive correspondence, and steps should be taken to protect it from those who don’t require access. Your administration password is transmitted with basic authentication (ie no encryption). If you plan to use the web interface from a host where you feel sniffing is a possibility I’d recommend installing stunnel (www.stunnel.org) to create an encrypted tunnel for your web-admin sessions. The password is stored in plain text in the assp.cfg file -- make sure file permissions protect this file from read access for unauthorized users. You can also add ip addresses to the Allow Admin Connections From configuration entry to restrict access to the admin interface, although this type of packet is quite easy to spoof. 2003-Sep-04 12:36pm jhanna
Theory of Operation
ASSP uses three complementary strategies to allow good mail and block unsolicited email: a whitelist, spambuckets, and a Bayesian filter. Every time a message passes through your SMTP server it has a from address and one or more to addresses. Your SMTP server also knows if the message is being sent from your local network (and to allow relaying for that message), or if it’s coming from outside (and must be delivered to a local address). Your local users don’t send unsolicited email (right?) and the people they correspond with would only send you solicited email. In fact the people they email would also be unlikely to send UCE. By monitoring these addresses ASSP builds a web of trust – local users are trusted, the addresses in their TO or CC fields are trusted, as are the addresses in their TO and CC fields. Any email from these people is considered not-spam without further checking. (Note this is not a good strategy for virus containment, but it is a good strategy for UCE.) Users of the local mail domains are not added to the whitelist. They are identified by being a part of the local network. Many spammers forge a from addresses with the same domain as the to address, so it is important to avoid adding local addresses to the whitelist. With only a few days of operation you should see your whitelist grow to more than 1000 addresses. The whitelist is not only helpful in identifying non-spam, but in building your database of non-spam emails. The whitelist is automatically saved every $UpdateWhitelist seconds (1 hour by default). Spambuckets are addresses which receive only spam. They can be integrated on your web site, posted on Usenet, or come naturally by having employees leave your site; after a reasonable period of time bouncing their mail all mail received for these addresses can be considered unsolicited. Any email whose sender is not whitelisted and is addressed to a spambucket is classified as spam. Spambuckets are helpful both in identifying spam, and in building and maintaining your spam database. Finally, if an email comes and is not addressed from someone not on your local network, nor on the whitelist, nor addressed to a spambucket, it is compared to the statistical profile generated by the Bayesian filter. The Bayesian filter works by looking for words and phrases (up to three words long) that occur significantly more often in either your non-spam collection, or your spam collection. For most organizations spam identifiers include things like “get rich quick” while non-spam identifiers are things like your organization’s full name or address, or personal names of people who work there. They also include considerably more subtle references like HTML tags which spammers prefer, or jargon specific to your line of business. To classify a new email all the words and phrases in the first 10000 bytes of the email (including the header) are checked against the statistical model. The top 50 ranking words and phrases are combined according to Bayes theorem to predict how well the mail compares to spam / non-spam in your collections. I have made the working assumption that only the first 10000 bytes of an email are significant for identifying spam. Spammers may change their profile, but historically spam has been relatively small, and keeping many large files in your collection is a waste of disk space and processing time. After an email is classified as local or whitelisted, or as Bayesian spam or spam to a spambox its first 10000 bytes are are saved in the appropriate collection directory. It is given a random number between 0 and MaxFiles (12000 by default) and written to that file name. In this way older files will gradually (randomly) be replaced with newer files, thus keeping the collections both diverse and up-to-date. Files in the errors folders (correctedspam and correctednotspam) are never overwritten. What follows is a sample statistical analysis of mail we received: As of Thu Mar 27 10:48:54 2003 the mail logfile shows: 78843 messages, 47637 were spam (60.4%) in 73 days for 1080.0 messages per day or 652.6 spams per day 8303 additions to / verifications of the whitelist (113.7 per day) 28273 were judged spam by the bayesian filter (59.4% of spam) 18862 were to spam addresses (39.6% of spam) 502 were rejected for executable attachments (1% of spam) 12608 were sent from local clients (40.4% of nonspam) 7838 were from whitelisted addresses (25.1% of nonspam) 10760 were ok after a bayesian check (34.5% of nonspam) 14467 addresses are on the whitelist 15108 hits on the blacklist 14890 resulted in spam (52.7% of Bayesian spam, 98.6% of blacklist hits) 218 resulted in non-spam (1.443% of blacklist hits) 2003-Sep-04 12:37pm jhanna
ASSP uses a content filter - won’t spammers disguise their content?
ASSP uses a sophisticated parsing filter to work around most spammer tricks to disguise their content. As content-based filters like ASSP become more common spammers may find ways to better disguise their message. I personally do not believe spammers will win that battle, but it’s hard to say for sure.
2003-Sep-04 12:42pm jhanna
Everyone we email is added to the whitelist, won’t spammers just use addresses from whitelist to spam us?
It is possible, but more difficult than it sounds. Addresses from your local site aren’t added to the whitelist, so a spammer will have to find someone your site emails. That list will be different for every site using ASSP. A better strategy would be for the spammer to trick you into emailing him/her. But that too will only work for one site at a time. Ultimately it is possible for the spammer to use this strategy to spam your site, but she/he will have to do the same thing individually for every site running ASSP. If this becomes a problem we will develop an appropriate defense. 2003-Sep-04 12:42pm jhanna
Will ASSP block messages I want to receive?
ASSP has been designed with great care to prevent this from happening. The whitelist is the single most powerful tool to prevent this – anyone you email will never have a message blocked. The spam filter keeps track of mail we send and spam we receive -- if an incoming message is not from someone we've emailed and it's more like the mail we send than the spam we receive then it gets through. Otherwise it's blocked and the sender gets the message, "Mail appears to be unsolicited -- report errors to postmaster@ourhost.com." The type of email that most often falls in this category is confirmation emails from web sites. Often these mails are only as personal as your email address and contain a lot of advertising - they look a lot more like spam than they look like the mail you send. If someone has a good idea how to recognize this type of email please let me know. Now that ASSP supports the "Expression to recognize non-spam" you can use that to help recognize these confirmation emails. Often they'll include your address, phone number, or other personal information that spam never includes. You can build a "regular expression" to recognize some of these. 2003-Sep-04 12:43pm jhanna
One man’s spam is another man’s ham - how does ASSP decide what to block?
See the answer to the previous question. But this raises one theoretical limit for ASSP; ASSP is designed to work for an entire site. This assumes that the users at your site have a fundamental agreement on what is spam. For most small companies the difference between what they send and spam they receive is clear enough that there isn’t a conflict here. However with a large and diverse company this assumption begins to break down. In that case ASSP is probably not the best solution. 2003-Sep-04 12:44pm jhanna
Will ASSP work with non-English languages?
At this point ASSP looks for words built from A-Z and the symbols from \240-\377 and separated by spaces. (It’s a little more complicated than that, but that’s basically it.) If your language is mostly that way then ASSP will work fine – Spanish, French, German, Polish, etc, primarily use the Latin alphabet and should work fine. Korean, Japanese, and Chinese don’t work well. Future plans may include improvements to make them more functional. As of ASSP 0.3.4 we have active users working in Spanish, French, and German without problems. 2003-Sep-04 12:44pm jhanna
I want to mess with the mail collections. What format are they in?
One message per file. Only the first 10k bytes are significant. Keep attachments attached - ASSP parses them up to the first 10k. Separate collections are kept in separate folders. Largely whitespace and headers (except the subject) are ignored. Edit, delete, or add files and rebuild the database - that’s about all there is to it. Files that have numbers as filenames will randomly be overwritten over time keeping the collection up-to-date and limited in size. As of version 0.3.4 ASSP also began to track helo phrases passed in the SMTP conversation -- see the format of the ASSP received header line to see how this should be formatted. 2003-Sep-04 12:45pm jhanna
I’ve heard content filtering is CPU intensive. Is ASSP a CPU hog?
ASSP's CPU and memory load are quite moderate. Excluding rebuilding the databases, ASSP uses fewer CPU cycles per message than our mail transport does and significantly fewer per message than our virus filter software. 2003-Sep-04 12:46pm jhanna
I want to add per-user settings. How hard is that?
Beyond the Spam Lovers and Redlist, per-user settings are beyond the scope of ASSP’s design goals. They’re generally pretty hard to implement in the SMTP Proxy environment. 2003-Sep-04 12:50pm jhanna
Is it required to take down (stop) assp to do rebuildspamdb & dnsbl?
No. The rebuildspamdb and dnsbl scripts can run without stopping ASSP for all versions. In versions prior to 0.2.0 ASSP had to be stopped to use the list.pl script, or to reload the config.pl script. With 0.2.0 and after a kill -HUP will reload the assp.cfg. 2003-Sep-04 12:57pm jhanna
How does ASSP compare to SpamAssassin?
1. Is SpamAssassin in ASSP integrated no. 2. if not ... why I used spamassassin (www.spamassassin.org) for some time prior to developing ASSP. I found SA difficult to install. It also had to be regularly upgraded. Finally, ASSP's Bayesian filter was more effective at stopping spam than SA. I understand that since then SA has developed a Bayesian component as well, but I'm not completly up-to-date on their development. 3. What are the pros of SpamAssassin compared to ASSP SA has a great investment in hand-made regular expressions and header analysis to recognize spam. 4. What are the cons of SpamAssassin compared to ASSP These same hand-crafted expressions are brittle as spammers adjust their strategies. ASSP relies on the flexibility (and customization) from your own site's Bayesian database. Furthermore, ASSP is a complete spam blocking solution, not just a filter that must be integrated to your mail transport. I credit SA with some of the impetus for getting ASSP going -- it is a great tool with a lot of features. In fact SA's smtp proxy was part of the inspiration for ASSP. And I would cheer them on -- every effective anti-spam tool reduces spammer's success and makes spam less profitable However, my goal was to have a system that was easy to install, worked unmodified with nearly every MTA on any OS, and I believe ASSP is achiving those goals. Yes, a competant Linux system administrator can probably achieve similar results with SA, but ASSP broadens that opportunity 100 fold. I trust you will find the best tool for your situation. 2003-Sep-04 1:03pm jhanna
What is the difference between the redlist, no-processing, and spamlover lists?
Here's a matrix to help identify the differences: [ filtered mail | unfiltered mail ] x [ contributes to whitelist | doesn't contribute ] = filtered & contributes = normal unfiltered & contributes = spamlover filtered & doesn't contribute = redlist (does contribute to spam/nonspam collections) unfiltered & doesn't contribute = no processing (also doesn't contribute to spam/nonspam collections) 2003-Sep-04 1:05pm jhanna
What is "cache reset" in the log file?
You can probably ignore it.
If one of your cache is resetting more often then every 7 minutes, then change the line where it says,
"if($this->{cnt}++ >5000" and change the 5000 to 20000. This will make ASSP use more RAM but
give you better performance.
Note that after one of the databases has been updated (whitelist, redlist, spamdb, or dnsbl)
an average of 255 hits on that database you'll get a "cache reset" because ASSP noticed that
the file modification timestamp changed. However new data can be read from the file from the
moment it's updated -- it's only cached data that won't be re-read.
As of version 1.0.0 the cache size is in the configuration options.
2003-Sep-04 1:06pm jhanna
What is "helo rndhelo" on the analysis page?
When a mail client connects to a mail server to send mail it must send a SMTP command, "HELO" (or the variant EHLO) followed by what it calls itself. Almost every server uses its host name in this greeting: m11.lax.untd.com for example. However spammers often greet with a random string of letters: slk845gjlkas perhaps. ASSP tries to recognize these greetings because they're an excellent indicator of spaminess. Unfortunately, a bug in versions prior to 0.3.5 meant that all messages without a header are interpreted as randomhelo greetings (or rndhelo). 2003-Sep-04 1:07pm jhanna
I've seen discussion of configuration settings that aren't on my config page. What do I do?
First, check the "Show Advanced Configuration Options" checkbox and submit the form. This will show all available configuration options. Second, the wording may have changed, or an abreviation may have been used -- look for another setting with a similar use. For example, WhiteRE is actually, "Expression to identify Non-Spam." 2003-Sep-04 1:11pm jhanna
How really does ASSP detect spam?
When you install ASSP a colony of super-intelligent thermophilus bacteria takes up residence on your CPU and begin reading all your email. They communicate using radio waves directly with the CPU and interface with the ASSP software choosing between spam and nonspam mail. If you choose to read further this myth will be sadly dispelled, and I take no responsibility for the consequences. However, you can always refer your clients to this page to prove to them that their email is actually being filtered by super-intelligent bacteria. The rebuildspamdb program is where I will start. It reads the files in your errors/spam, errors/notspam, spam and notspam directories. As it reads the files in the errors directory it also builds a hash of the mail body to be able to identify duplicate messages misfiled. This hash is used to delete messages from the notspam collection that were also in the errors/spam collection and from the spam collection that were also in the errors/notspam collection. Think of it like scrubbing bubbles - they do the work so you don’t have toooo! As rebuildspamdb reads the files it also does two things. First it runs a filter (the subroutine “clean”) that prepares the message for statistical analysis. Second it walks through the file tallying word pairs in the spam or not-spam categories according to the collection. Files in the errors/spam collection count double; files in the errors/spam count x4. The 'clean" subroutine does a number of important operations. Primarily its function is to undo the things spammers do to trick filters. It cleans up base64 encoding. It cleans up many HTML obfuscation techniques. Look at the code of the "sub clean" for more details - it’s all commented. It also does two other things (and may do more in the future) to help the Bayesian analysis. First, it inserts a keyword after each word of the subject - this lets the Bayesian filter recognize words in the subject uniquely. For example the word "free" in the subject will have a different Bayesian rating than the word “free” in the body of the message. Second it does a couple of tricks to isolate the “HELO” greeting that was sent when the message was delivered. This has also proven to be a useful Bayesian factor in identifying spam. Paul Graham’s "A Plan for Spam" recommends complete header analysis within the Bayesian filter. Because ASSP initially used three-keyword identifiers, and now (as of 0.3.4) two-keyword identifiers, I found this useless. However, header analysis will be a fruitful area of development for improving ASSP’s spam / ham recognition rate in the future. That will take place in the “clean” subroutine. There may be other pre-processing features that will be introduced there in the future. Once each mail message is pre-processed (cleaned) each word pair is tallied (words being defined as [-\$A-Za-z0-9\'\.!\240-\377]+ – shorter than 2 or longer than 19 are ignored and are further cleaned in this way: s/[,.']+$//; s/!!!+/!!/g; s/--+/-/g;) [Sorry for the technical stuff for those allergic to it.] So that in the end you end up with a big database of word pairs and their counts: “in the”: spam=23210, total=46411; “order now”: spam=20001, total=20121. The rebuildspamdb program then steps through this database discarding identifiers with total less than 5 (i.e. if a word pair occurred 4 or fewer times in all the collections combined and with errors/spam x2, and errors/spam x4 then the pair can be ignored) and calculating the spaminess ratio this way: If the spam count = 0 or the spam count = the total count then square both counts. (This amplifies factors which appear only in the spam or not-spam collection.) Spaminess = (spam count + 1) / (total count + 2) (This should look familiar to anyone with a basic understanding of Bayesian filters. It also somewhat de-emphasizes rare identifiers and emphasizes common ones.) Throw out the identifier if it’s between 0.41 and 0.59 - this identifier appears almost equally in both spam and non-spam there’s no point in keeping it. Force the result between 0.999999 and 0.000001 - Bayesian classifiers croak if the value is too close to 0 or 1. All of these results are sorted (by identifier) and stored in the spamdb for use by ASSP. Rebuildspamdb also randomly (1 time in 20) prunes outdated entries in the whitelist and goodhosts databases. Now you know how the spamdb is built, so let’s see how it is used. Suppose a mailer in the internet connects to ASSP. ASSP makes a connection to your "SMTP Destination" and begins relaying their conversation. It notes the IP address of the connecting server. It notes their HELO string. It notes their MAIL FROM (envelope sender). It notes their RCPT TOs. It notes their DATA directive. (This is all in sub "getline".) Relay attempts are blocked. The presence of spam bucket addresses is noted. Mail to the email interface is detected. Mail to no-processing or "spam lover" addresses is noted. Assuming none of that qualifies the message is passed on to "getheader." Getheader is looking for the mail header. When the header is complete getheader calls "onwhitelist" which determines if the message should be treated as whitelisted/local (it’s the same really) and if so to update the whitelist. If not processing goes on to "getbody." Getbody reads the rest of the message (or the first 10000 bytes including the header, which ever comes first), checks for attached executables (if that’s enabled) and calls "isspam" which is probably why you’re reading this document. The isspam subroutine first checks WhiteRe and BlackRE, the expressions to identify non-spam and spam, respectively. Then it calls "clean" to clean up any spammer obfuscation, and calls them again with the "cleaned" version. Then it checks for a DNSBL hit, which adds 0.97 twice to the list of Bayesian factors for this message. Then it checks for a goodhost miss, which adds whatever your site’s goodhost factor is twice, provided it is > 0.65. Then it walks through the message’s word pairs, just like rebuildspamdb did, completing the list of Bayesian factors. Unlike rebuildspamdb, an identifier hit will only be counted a maximum of two times, so if the identifier "free money" rates 0.955 and "free money" occurs three or more times in the mail message, only the first two count. The list of factors is sorted and the thirty factors closest to 0 or 1 (i.e. the 30 furthest from 0.5 or neutral) are combined as Bayes taught into a single probability. If this probability is greater than 0.6 the message is spam. (Mail is very rarely between 0.2 and 0.8 - it’s almost always > 0.9 or < 0.1.) Spam is logged in the spam directory and local and whitelisted mail is logged in the notspam directory. Headers are updated as configured. If you’re not in test-mode the connection to your "SMTP Destination" is dropped if it is spam, and when the client stops spewing the mail body, it gets the "spam error" message, and it’s connection is dropped. (In test mode the connection is completed and ASSP sends updated headers.) 2003-Sep-04 1:13pm jhanna
What is goodhosts and what does it do?
Note: As of version 1.0.5 it is recommended that you use the greylist feature and deactivate both goodhosts and the dnsbl. I noticed that we were getting a number of spams slip through the filter all with the same qualities: they were short, they were deliberately misspelled on many words, and they linked to some website. I started doing some research on (a) why they got through, and (b) how to block them. It turned out that because of the shortness and misspellings many passed through without any hits in the bayesian database, good or bad. One solution would be to assume that all mail is just a little spammy and then force the content to justify itself before being allowed to pass. This would have the added effect of possibly raising the false positive ratio, although I didn't research it to be sure. But further research revealed something more useful. Because ASSP keeps a whitelist, it is a trivial addition to track what hosts send whitelisted mail. A site of any size will quickly get AOL, Hotmail, and a few others on that list -- they'll also get their organizational partners on it quickly. This is the goodhost database, and it represents a sort of social network for your email. You're likely to email them, and they're likely to email you. Doing the math for our site I found that less than 1% of mail from these goodhosts is spam. And 89% of spam was from a not-goodhost. Each site's ratio will be different, but I expect that the goodhost marker is a healthy sign that an email is not spam. So the goodhost database is sort of like a inverse-dns-blacklist that you don't have to download. Hosts absent from the goodhost list will get your site's non-goodhost-spam ratio added to the Bayesian determination, once that ratio is higher than 65%. Other benifits of the goodhost strategy: 1) requires no download (unlike the DNSBL) 2) totally self-maintaining & tuning 3) totally customized to your own site's traffic patterns 4) unspoofable by spammers 5) this is exactly the sort of push that these short & misspelled mails needed to correctly fall into the spam pit. This is a good reason to tell your friends about ASSP -- it's only the best anti-spam tool in existance... And it's free. 2003-Oct-22 1:34pm jhanna
What is the http ://[\w\.]+@ default expression to identify spam?
That's a quite smart expression to identify spam. It catches all mails that contain URLS in the form http://fakedurl@normalurl.com It is most often used to trick the readers eye as http://www.mcafee.com@spamsite.com/securitypatch.exe "looks" as if it would connect to the trustworthy "www.mcafee.com" site where in reality it connects to "spamsite.com" with a "username" that is "www.mcafee.com". If this website does not need authentication (and they never do), then the username part is discarded. By using this expression you will quickly sort out a bunch of Spams, that in turn automatically provide you with suitable spamwords. I found no need to manually add more expressions. (Robert Orso: 2003-11-17) 2003-Nov-17 11:00am jhanna
Why does ASSP only show one recipient per message in the maillog.txt file?
Messages can have from one to hundreds of recipients. We decided to only show the first one in the maillog for simplicity. 2003-Nov-17 3:26pm jhanna
Virus blocked -- what was blocked and why?
The short reason for "why" is that ASSP found an executable attachment. The log file gives you the time and sender (though the sender is often faked, but it IP address would be right). If you use the "other" folder "External mail that wasn't spam (mostly)" you can find a copy of what was blocked there, though it's only the first 10k. That might be enough to try to recognize what was sent, either by inspecting the file or by running a virus scanner. (You can identify the file by the creation date/time -- it will match the time in the log entry.) Files don't stay there forever, though. 2003-Nov-19 9:16am jhanna
Can I delete files from the spam / notspam / other collections?
You can delete files from the other directory at any time and as you see fit. The spam and notspam files are used by rebuildspamdb.pl to create your spamdb. Do not delete these files unless you become aware that your spam collection is hopelessly corrupted and want to start from scratch, categorizing spam and notspam by hand. 2004-Jan-26 9:44am jhanna
You can limit the number of files in the spam / notspam directories with the "Max Files" option under "Other Settings" in the web configuration (default is 18009). For this to work you must have "Use Subject as Maillog Names" unchecked. Also you can run the perl script "move2num.pl" to convert existing subject named files to numbers.
What order does ASSP process mail to check if it is spam?
Testing goes like this: 1) Local or whitelisted? 2) Blacklisted Domain? 3) Spam Helo? 4) Addressed to spam-bucket? 5) Mail bomb? 6) Blocked attachment? 7) Matches expression to identify non-spam? 8) Matches expression to identify spam? 9) Bayesian evaluation If the message is identified as spam at any step along the way it goes to the spam directory. If the message is local or whitelisted it goes to the notspam directory. All other cases it goes to the other directory. 2004-Jun-21 8:15am jhanna ------------------------------- There is some update information on the ASSP Processing Order here. 2006-Dec-19 8:48am gedwest
What is the helo blacklist?
'[From a list mail by John Hanna dated 2004-06-11]'
Hi. There have been a few questions about the new helo blacklist feature. I"ll try to answer the questions I"ve seen so far and provide a little background. I"ll probably add most of this to a page in the FAQ somewhere too. A) Background Catching spam relies on taking what information is available about the message at hand and weighing it against valid mail and other samples of spam. For an automated process that leaves one with basically this information: 1) the connecting host 2) the smtp dialog (helo, mail from, rcpt to, data, and sometimes a bit more) 3) the mail header (received"s, from, subject, date, to, cc, and various other headers) 4) the mail body (content, attachments, word/phrase/character analysis, urls, etc) DNS-blacklists make a big deal about the connecting host. ASSP"s greylist does a pretty good job handling that issue. Ultimately, I"m not convinced that the connecting host really gives you any useful information -- past performance is no guarenteed indicator of future behavior. I"ve posted my thoughts on SPF elsewhere and won"t restate them here. Most of the smtp dialog is of little use. ASSP (from the first version) has recognized spam-only addresses as a means to identify spam, and that"s a useful indicator. Mail-from is often forged for spam so it"s of little predictive value. However I found that the HELO / EHLO directive is (currently) an excellent indicator of one type of spam. (More probably, there are a couple of spam programs that leave a clue in the HELO. This may change at some point, but for now, it"s our benifit.) More on this in the next section. The problem with the mail header is that can be entirely forged. Some of these forgeries can be detected, but in large this is a fleeting and time-wasting practice (in my opinion). The mail body (and header subject) is where most of the Bayesian filtering pays off. There are many methods of pre-processing (converting base64, pre-parsing html, etc) and tokenizing (breaking up the body into bite-sized chunks) and these different strategies result in different levels of success for the Bayesian categorizer. B) What"s going on the in HELO? Connecting SMTP clients are expected to greet the server with a HELO (or EHLO) message with the name of their host. This greeting is largely ignored because it is somewhat arbitrary -- NAT and various proxies make there no practical way to verify the greeting. However many many spams come with the ip address of the server host in the HELO greeting. It"s possible there are some mail servers that look at the helo to allow relaying, I don"t know. But for whatever reason, about 20% of our spam comes with the helo of our server and no non-spam ever does. Because the HELO is arbitrarily chosen by spammers there is nothing one could say in a HELO that would ever help a message be recognized as not-spam. So there"s no point in setting up a greylist-type structure for HELOs. However I have quite a few HELOs at our site that only ever send spam. And thus the helo blacklist was born. C) How does it work in ASSP? Rebuildspamdb goes through your mail archive and looks for the HELOs (they"re stored in the received header) of spam and notspam. If it finds 5 or more spam and no not spam, it goes on the blacklist. If it finds > 50 spam per not spam it goes on the blacklist. (The equation is spams / (spams + hams + 0.1) > 0.98). When mail is being processed by ASSP, if the mail qualifies for a spam check (ie it"s not local or whitelisted) and doesn"t have a blacklisted domain, then it is checked against the helo blacklist. If it matches, the mail is classified as spam. D) How can I disable it? The helo blacklist should work excellently in almost every situation automatically. However, if there"s one thing I"ve found about ASSP, there"s exceptions to everything. This is what you can do to make sure the helo blacklist works for you: 1) If it misqualifies ham, be sure to put copies (which include the full mail header) in the errors/notspam folder and run rebuildspamdb.pl. This will change the ratio for the HELO string and probably remove the helo from the blacklist. 2) Add the sender to the whitelist -- whitelisted mail isn"t checked against the helo blacklist. 3) There"s a couple of ways to disable the helo blacklist. a) Comment out assp.pl line 559 with a # so it becomes # $HeloBlackObject=tie %HeloBlack,orderedtie,"$base/$spamdb.helo" if $spamdb; and restart assp b) simply remove the spamdb.helo file after rebuildspamdb.pl runs 4) I meant to add a "Don"t use the spamdb.helo" config option and I forgot -- I"ll get it in the next release. Sorry. --N.B. - the "Don't use the Helo Blacklist" option was added in version 1.0.11-- 2005-Jul-04 2:42pm cpaine
Return to Documentation Home


