PDA

View Full Version : Spam Spelling



Jeff Root
2008-Jul-26, 06:17 PM
I almost never read spam e-mails that get past my e-mail
provider's very simple barriers, but subject lines of spam
in recent months have been rife with terrible spelling and
grammar, such as these from the last three days:

Drinking red winee stops deeafness
8 thmgs every woman makes in her intimate life.
Dollar is replacing by new currency
Ukraine's 'oldestt maan' turns 116
Malaysiia awards medals to piirate-DVD nabbing dogs
More pleasure wit less efforts.

Is this caused by poor translation from other languages,
a spammer with a bad keyboard or sloppy typing, or is it
intentional? If intentional, why?

-- Jeff, in Minneapolis

Kaptain K
2008-Jul-26, 06:21 PM
If intentional, why?
To get your attention! succeeded, didn't it?

Neverfly
2008-Jul-26, 06:22 PM
I think some of it is intentional. Partly to get past filters, partly to get past brains.
Humans make typographical errors- bots don't (right?)
It makes the spam seem more like a person typed it.

tdvance
2008-Jul-26, 06:26 PM
yes, it's intentional--it forces spam filter writers to work a lot harder, to focus on misspelled words as well as those spelled correctly. In addition to this, lots of spam will use, for example, html e-mail and mix lots of invisible characters in with words, or even symbols that look like or almost like the proper letters, making things even harder on spam filters.

I recall (was it mentioned on Baut? somewhere I saw it) a proposed "stupid" filter to rate an e-mail, posting, etc. on things like, how often are internet abbreviations or, even worse scoring, text message abbreviations used, how many words are misspelled, etc. and filters out those that score too high on "stupidity" as a result. A variation of this could eliminate e-mail apparently designed to beat spam detectors (whether purposely or as a result of, well, stupidity).

tdvance
2008-Jul-26, 06:28 PM
To give you an idea of how hard the spam filter problem is--consider what should be easier: automatically detecting swear words on public boards. All filters I've seen either make one or both kinds of mistake:

let a misspelled or space-separated or otherwise altered swearword through.

filter out non-swearwords (like the famous example of an early web filter that eliminated most medical sites because they used terms like "breast").

Kaptain K
2008-Jul-26, 06:45 PM
I got caught by surprise the other day. Not at the fact that it was spam, but at the method! I got an e-mail from me. My name and e-address. I knew it was spam, but opened it any way. Fortunately, I had my anti-virus up and running! As soon as the alarms went off, I deleted the spam and for good measure deleted it from my deleted objects file and re-ran the AV program!

Stuart van Onselen
2008-Jul-26, 07:57 PM
Those primitive and over-strict swear-word filters are really irritating to to guys named Richard who live in Sussex. :)

Nowhere Man
2008-Jul-26, 08:42 PM
Or in S****horpe. (http://en.wikipedia.org/wiki/S****horpe)

Edit to add: Look! It's still happening!

Fred

Neverfly
2008-Jul-26, 08:43 PM
Or in S****horpe. (http://en.wikipedia.org/wiki/S****horpe)


I'm sorry... In where?

Nowhere Man
2008-Jul-26, 08:47 PM
One word, two syllables. First syllable, sounds like "Scun." Second syllable, sounds like "thorpe." According to Wikipedia, it is "a town within North Lincolnshire, England. It is the administrative centre of the North Lincolnshire unitary authority, and has an estimated total resident population of 72,514."

:wall:

Fred

LaurelHS
2008-Jul-26, 08:55 PM
I got caught by surprise the other day. Not at the fact that it was spam, but at the method! I got an e-mail from me. My name and e-address. I knew it was spam, but opened it any way. Fortunately, I had my anti-virus up and running! As soon as the alarms went off, I deleted the spam and for good measure deleted it from my deleted objects file and re-ran the AV program!

I got a spam e-mail from my own address recently too, it was really strange. :confused:

Neverfly
2008-Jul-26, 08:55 PM
One word, two syllables. First syllable, sounds like "Scun." Second syllable, sounds like "thorpe." According to Wikipedia, it is "a town within North Lincolnshire, England. It is the administrative centre of the North Lincolnshire unitary authority, and has an estimated total resident population of 72,514."

:wall:

Fred

I'm sorry Fred.

But according to BAUT, it doesn't exist.
And if BAUT says it doesn't exist, then it doesn't exist.http://us.i1.yimg.com/us.yimg.com/i/mesg/emoticons7/33.gif

Nowhere Man
2008-Jul-26, 10:08 PM
:hand: to you then1. We can't talk about ****ake mushrooms here, either.

Fred

1:D

AndreasJ
2008-Jul-26, 10:28 PM
filter out non-swearwords (like the famous example of an early web filter that eliminated most medical sites because they used terms like "breast").
Perhaps apocryphal, but there's this story about an early electronic nanny app that blocked just about every university website in the word - it didn't like the letter sequence "stud".

Chuck
2008-Jul-26, 10:58 PM
Overly aggressive filters actually add sexual content to posts by pointing out possible sexual interpretations. I saw one example in a high school sports website in a thread about building bleachers. The word "screw" was replaced by "*****". It was obvious what the word was, so the filter added the sexual content by reminding everyone of the alternate meaning of "screw". Without the asterisks most readers would not have thought about it at that time.

drainbread
2008-Jul-27, 02:24 AM
http://getpopfile.org/

I love popfile, it's easy to set up and "learns" quickly, I have my email cliant set to send mail marked as [SPAM] to a specific folder insted of my inbox.

I get maybe 3 spam emails per month that make it to my inbox out of an insane amount of junk(atleast 300 per month)

Veeger
2008-Jul-27, 05:07 AM
How to get rid of junk email, that's the question. I was clean, until last summer my sister sent me an e-card on my birthday. The following days I got a flood of spam e-cards, then the spam turned raunchy. I get about 20 per day but fortuately none make to my inbox.

HenrikOlsen
2008-Jul-27, 08:33 AM
:hand: to you then1. We can't talk about ****ake mushrooms here, either.

Fred

1:D
We can talk about shiitake mushrooms (http://en.wikipedia.org/wiki/Shiitake), we just have to spell them correctly.

TrAI
2008-Jul-27, 12:10 PM
:hand: to you then1. We can't talk about ****ake mushrooms here, either.

Fred

1:D

We can talk about shiitake mushrooms (http://en.wikipedia.org/wiki/Shiitake), we just have to spell them correctly.

I assume that would trigger a filter that took misspelling into account, perhaps that was Nowhere Man's meaning...

We could write "shii mushroom", that would be less redundant anyway... "enokitake" is often called "enoki mushroom" in english, so why so many call "shiitake" for "shiitake mushroom" I do not know...

Nowhere Man
2008-Jul-27, 12:13 PM
I sit corrected, then. The fellow who was attempting to register the domain described in the Scun-thorpe [sic] Problem article misspelled it, and I ran with it.

Heh. The article says that shiitake means "shii mushroom." So saying "shiitake mushroom" is like saying "ATM machine."

Fred

TrAI
2008-Jul-27, 12:51 PM
I almost never read spam e-mails that get past my e-mail
provider's very simple barriers, but subject lines of spam
in recent months have been rife with terrible spelling and
grammar, such as these from the last three days:

Drinking red winee stops deeafness
8 thmgs every woman makes in her intimate life.
Dollar is replacing by new currency
Ukraine's 'oldestt maan' turns 116
Malaysiia awards medals to piirate-DVD nabbing dogs
More pleasure wit less efforts.

Is this caused by poor translation from other languages,
a spammer with a bad keyboard or sloppy typing, or is it
intentional? If intentional, why?

-- Jeff, in Minneapolis

Well, I believe some spam filters will try to recognize common spam mail types from by the words used, but they will have trouble with misspelled words, since these would not be in its dictionary of suspicious words.

I guess that it will soon be common for filters to flagg mails with a lot of misspelled or unrecognized words...

HenrikOlsen
2008-Jul-27, 02:14 PM
On the other hand, as the filters learn the misspelled words they get more effective since the misspelled words are a very strong indicator for spam.
Unfortunately there are many more ways to misspell words than there are of spelling them correctly, so it takes a while for all misspellings to get in the spamfilters' databases.

agingjb
2008-Aug-05, 09:17 PM
I do wonder, if I look at the titles of threads on this site, to what extent we can attribute some of the more unfortunate variations in spelling to "spam".