PDA

View Full Version : Can you work out this message?



Tog
2014-Feb-25, 09:05 AM
I guess this could go in Fun and games, but it's actually more of a test than a game. I have a number of hidden messages in a story I'm writing. Re-writing. Again. In the newest version, I'd like for something to be encrypted in a way that computers would have a hard time breaking but people wouldn't as long as the knew they key. With that in mind, I was hoping some of the more cryptographically inclined might try and work out the phrase and the key from the samples below.

Some clues: It's an English quote that nearly everyone on this site should be familiar with in some way.
The two versions of it listed below are all the same phrase, in the same code, using the same key.

X 3R58 0T H 226 1FV T2 8 CJU NY 2X 9XB8 87RR DPR A D DZ1B UPFR 982 UZS IV OW LQ PGC LC 8 ZN8 OG Q7 HB N J7CD R QE BTZO 3G40 OPB RT3 WN B7 EU VSM 0J Z3 F6B

4 GPC9 OH E CDQ IMN E5 2 POR MA 6T 3VGP SYCB 8RP X G RZ7G 36WQ PCD PIU TZ SW A8 CBR TB 6 WA5 BD RL VC I DLBG R CM 0EV3 5RB2 GRS CX8 VY 8T WP KCH 8R N2 7DP

I'm mainly concerned that the key is not hidden as well as I'd like, but I also have no idea if a computer would be able to crack it. That will probably have to wait until someone gets the key.

Any takers?

grapes
2014-Feb-25, 10:03 AM
If they're the same phrase, in the same code, using the same key, you'd think the result would be the same.

They have the same structure (letter count grouping) but no "word" longer than four "letters" (some numerals), all caps.

Heid the Ba'
2014-Feb-25, 12:04 PM
Looking only at the first sample, the only letters and numbers not to appear are 5 and K, no letter or number appears more than seven times. If the original was in english and 1-0 and A-Z represent the same when encoded then either thity four different characters appear in the original phrase or the key is progressive. I would suggest the latter.

To answer the questions: Yes a computer could crack it by brute force methods, and no I can't work it out in the time available.

Tog
2014-Feb-25, 01:08 PM
It's not progressive.
The number of times a given character appears or does not appear does not affect the message or the key.
The original message has 16 different characters, all letters.
The key comes in two parts. The first requires no special knowledge or tool, the second might, depending on the person. There may be several people here who could read the message as written with a bit of practice.

Of the two lines I did in the first post, the top was generated in a spreadsheet, the second was done by converting the message by hand as I wrote it.

jokergirl
2014-Feb-25, 01:38 PM
I'm still not sure what you mean with "in the same code, using the same key". If it is the same code and the same key, the spreadsheet generated version should have the same result as the hand generated version.

Do you mean that the character mapping key is the same, but the seed (the random place to start from) is different?

Tog
2014-Feb-25, 01:51 PM
No. I mean that you could swap any character in the first message with a character from the same position in the lower one (say, the 14th character in each) and the message would be unaffected. Any block of characters are interchangeable as long as the position from the start remains the same.

Strange
2014-Feb-25, 02:44 PM
So you have a one to many mapping from the plain text to the coded message? In other words, each of the source characters can be represented by 2 (or more?) code characters? But, presumably, it must be to-one in the reverse direction (otherwise it would be ambiguous).

jokergirl
2014-Feb-25, 02:45 PM
Ah, that makes more sense. So one ciphered symbol can be represented from a pool of symbols. Then we have to figure out which groups there are (which of the symbols actually mean the same cipher symbol?) before we begin with the cipher itself.

Moose
2014-Feb-25, 02:51 PM
Heh, all you have to do is use your code to say something subversive. If the black helicopters and guys in sunglasses show up, your code isn't strong enough. :p

Tog
2014-Feb-25, 04:47 PM
So you have a one to many mapping from the plain text to the coded message? In other words, each of the source characters can be represented by 2 (or more?) code characters?
Correct, but it's not a progressive one.

But, presumably, it must be to-one in the reverse direction (otherwise it would be ambiguous).
I'm tempted to say yes here, but I'm not sure exactly what you mean. The cypher C always means the same thing even if there are several other things that mean the exact same thing.


Ah, that makes more sense. So one ciphered symbol can be represented from a pool of symbols. Then we have to figure out which groups there are (which of the symbols actually mean the same cipher symbol?) before we begin with the cipher itself.

And this is why I'm not sure if a computer would be able to break it. What the cipher symbols represent is one layer of encryption, while the way they are used is the second. The grouping rule used for the cipher symbol pools is logical to people, but I think it would be subjective to a computer.

Tog
2014-Feb-25, 04:48 PM
Heh, all you have to do is use your code to say something subversive. If the black helicopters and guys in sunglasses show up, your code isn't strong enough. :p
Heh, I'm not sure I need it tested that completely.

Strange
2014-Feb-25, 04:52 PM
So the groups of symbols do not represent words, but are part of the encoding?

Tog
2014-Feb-25, 05:05 PM
So the groups of symbols do not represent words, but are part of the encoding?
Correct.

Tog
2014-Feb-26, 08:40 AM
Okay, it looks like this has played out.

The encryption type was Morse Code.
Dots are represented by letters and numbers that have one or more curved lines, while dashes are those made with straight lines only.

Dots = BCDGJOPQRSU2356890
Dash = AEFHIKLMNTVWXYZ147

This is why I don't know if a computer could do it with any relative ease, because the the key itself is subjective and based on a criteria that doesn't follow any real progression. On the other hand, I know I'm not the first person to think of this because I got the idea from a puzzle I saw in the 5th grade.

I'm not thinking something secure enough to keep the black helicopters at bay, only to not fall to a brute force attack in any reasonable amount of time.

How secure would this actually be?

jokergirl
2014-Feb-26, 09:33 AM
It's pretty secure if you don't know it, but it does rely on security through obscurity - if the codebreaker has reason to suspect morse code being used as the alphabet, it will be easy to break. It's not so unlikely for someone to make the jump from "groups of 1-4 -> morse code" (though I didn't, but a more military minded person might).
A way to make it more secure would be to introduce a third group of symbols which stand for spaces, making the grouping less obvious.

If I have a minute (I did not have one yesterday) I might try to decode it (by hand, because where's the fun in brute forcing?)
ETA: You made an error in the handmade translation (which I know because I read your explanation, but that explanation I am ignoring when codebreaking otherwise). Going to try from just the machine made, because it is more likely IRL to only have one version of the same sentence anyway.

;)

Strange
2014-Feb-26, 10:06 AM
It's pretty secure if you don't know it, but it does rely on security through obscurity - if the codebreaker has reason to suspect morse code being used as the alphabet, it will be easy to break. It's not so unlikely for someone to make the jump from "groups of 1-4 -> morse code" (though I didn't, but a more military minded person might).

So that might depend on the story and who the message is being kept secret from. If it is just "random people" then that obscurity might work. After all how many people know Morse code (or even know of it) and how many know that it consist of groups of 1 to 4 symbols (I didn't even though it seems obvious in hindsight). And if the story is set in the future, then there may be even fewer who are familiar with it.

But why use a code that could be cracked by an intelligent and well-informed person? Is it actually important to the the story that it can be decoded; e.g. because it is not possible to pass the key to the recipients? But there you have one of the key problems of cryptography: if the code can be worked out by the intended recipients then it can be worked out by the black hats as well.

Have you considered a one-time pad based on a book both sender and recipient have (and have previously agreed how to use). Then even a simple substitution cypher is pretty much uncrackable by brute force methods.

But if you want this as a puzzle for the reader (as well as the recipients) then it is pretty cool!

jokergirl
2014-Feb-26, 10:18 AM
Have you considered a one-time pad based on a book both sender and recipient have (and have previously agreed how to use). Then even a simple substitution cypher is pretty much uncrackable by brute force methods.

A simple substitution cipher is crackable. You are thinking of a one-time pad, and that is only uncrackable if truly random; a book is not random enough. (Probably random enough for short messages and black hats who are not the CIA or super mathematicians, though.)
Cryptonomicon had a pretty clever key and cipher based on a card deck both parties owned and had shuffled into the same configuration.

;)

Strange
2014-Feb-26, 10:19 AM
A simple substitution cipher is crackable. You are thinking of a one-time pad

Yes, sorry if I wasn't clear. I did mean a substitution based on a one-time pad (there are other ways you could use a one-time pad).


a book is not random enough

Good point. Especially as Tog is concerned about brute force techniques.


based on a card deck both parties owned and had shuffled into the same configuration

That's fine until someone drops the pack. :)

Tog
2014-Feb-26, 11:14 AM
But why use a code that could be cracked by an intelligent and well-informed person?

The idea is that those people are few and far between. It does rely on obscurity, but it also does not require anything to be written down. There is no key to pass on once both parties know it, and it looks harder to crack than it really is. The puzzle I saw when I was 10 had was like this:
A . . . EF . HI . KLMN . .__
. BCD . . G . . J . . . . OP
Finish the sequence.

Taking it in that context makes it simple enough. From the responses to my first post, it looked like the first stop on the right path was in determining that all blocks of letters were from 1 to 4 symbols long. From there it might have occurred to someone that it was Morse Code, especially since there were a few blocks with multiple two letter "words."

Hitting on the difference between the dots and dashes seemed like it would be the hard part, and the reason I wasn't sure if a computer could make that distinction. Part of the reason for the Galaxy Zoo project was that computers had a hard time discerning which way a spiral galaxy was turning. I thought this might be in the same line.

The actual story is contemporary and has been through two versions that included heroin dealing and one that included an avenging father who just snaps. People didn't seem to like how much he snapped, so I'm looking for a different path where the guy who finds the first code is actually the target of the plot, as opposed to bungling into it like in the other versions. Someone searching for the key to a code seemed like a good reason to have people come after him, but it couldn't be anything like industrial espionage for a code of this nature. More like personal copies of adoption records or something.

Anyway, I appreciate the replies to this. Thanks, all.

Moose
2014-Feb-26, 12:32 PM
Hitting on the difference between the dots and dashes seemed like it would be the hard part, and the reason I wasn't sure if a computer could make that distinction. Part of the reason for the Galaxy Zoo project was that computers had a hard time discerning which way a spiral galaxy was turning. I thought this might be in the same line.

Once you know the method, it would be relatively simple to program, I'm afraid. It's just two successive many-to-one substitutions on the decrypt side. To encrypt, it's a one-to-one, followed by a one-to-any-of-a-limited-set. If I was teaching a unit on regular expressions, for example, an implementation of your algorithm (both decryption and encryption) would make a fun project.

[Edit: Wouldn't be a bad relational database (SQL) project, too. Less boring than the standard 'small company nobody cares about' examples.]

The goodish news is that it sounds better than anything Dan Brown ever came up with (though nowhere near as strong as WWII Enigma devices). It'll almost certainly defeat any casual layperson attempt to decrypt it long enough to matter. The bad news is that I wouldn't want to count on security through obscurity against a serious attempt to decrypt it, particularly with institutional resources.

The thing with encryption is that you don't hide the algorithm. (It's too vulnerable to intercept.) You make the algorithm as open as possible (which also tends to expose any back doors built in). This means the algorithm is somewhat trustable. Modern security is in the randomness and length of the key.

It's less interesting, perhaps, but have you considered having your guy use PGP? It already does nearly all of what you need it to. You'd have to focus on having him find the private key, rather than try to attack the algorithm, though.

jokergirl
2014-Feb-26, 12:42 PM
I think it's pretty clever and a fun little puzzle, and sounds very much like what a layperson would come up with. Without any good background ideas on the person and on the content of the message, it would be tricky enough to break, next to impossible if the messages were short.
It's not something I would use with a background of repeated or regular transmissions, or in a war context, but for encrypting personal notes with only one person who actually knows the method (so no traitors!) it's quite good and believable.

;)

Strange
2014-Feb-26, 01:02 PM
Hitting on the difference between the dots and dashes seemed like it would be the hard part, and the reason I wasn't sure if a computer could make that distinction.

I think you are right about that (today, which is partly why I asked when it was set). But remember that before someone tries using a brute force method, they will at least look at it. And think, "groups of 1 to 4 ..."


Once you know the method, it would be relatively simple to program, I'm afraid.

I think Tog was thinking of the idea of spotting the two different types of characters used as dots and dashes. I don't think a computer would be able to work that out (unless their is already a categorization of characters in that way). I don't think there is any way or working out which characters are dots and dashes without that information (or a plaintext sample).


The thing with encryption is that you don't hide the algorithm.

That is certainly true of professionals. But probably not most amateurs, which may be more relevant to the story (and, sadly, includes a number of commercial organizations).

jokergirl
2014-Feb-26, 01:07 PM
I think you are right about that (today, which is partly why I asked when it was set). But remember that before someone tries using a brute force method, they will at least look at it. And think, "groups of 1 to 4 ..."



I think Tog was thinking of the idea of spotting the two different types of characters used as dots and dashes. I don't think a computer would be able to work that out (unless their is already a categorization of characters in that way). I don't think there is any way or working out which characters are dots and dashes without that information (or a plaintext sample).

Sure there is. Brute force. You don't need the dots and dashes, you have a list of characters:
1=TE
2=AIMN
3=DGKORSUW
4=BCFHJLPQVXYZ
(leaving out the numbers for now). You don't need to bruteforce the dots and dashes, you just bruteforce this. Remember that there are combinations of characters statistically very common in the English language (and I assume that the language of the message is out of question). I would for example start by looking for 141-combinations - very, very likely to be THE. That would give me about 6 character->./- translations to start from.

;)

Tog
2014-Feb-26, 01:14 PM
It's less interesting, perhaps, but have you considered having your guy use PGP? It already does nearly all of what you need it to. You'd have to focus on having him find the private key, rather than try to attack the algorithm, though.
For work, and things that matter, he'll use things like that, and other robust methods. For the heart of the story, I want him to use methods that don't require anything more than a scratch sheet of paper and a pen to work either way. I'm still struggling with the Maguffin. I thought I had a good line going, but when I posted a summary of it, it was unanimously panned as being too far for someone to break under the conditions. I still disagree with that and have examples, but there are times the mob rules and expectations override reality.

I want to give anyone who accidentally reads the thing the idea that there could be messages all around them that no one sees because they don't look like anything.
I've got a phone number hidden in a string of lights on a Christmas card. The number of bulbs of any color between the yellow ones represents the digits.
I've got the sugar packets on a restaurant table giving a telephone number in base four.
I've got a dresser with some knobs turned upside down to represent the pips of Braille.
I've got a role playing game book with randomly inserted italics that I've been pretty sure meant something for close to 20 years now. I'd like to pass that feeling having missed something to others every time they see a set of burnt out light bulbs in a freeway information sign.

In a later story, my PI spots a code in the form of broken links to YouTube videos posted on a Twitter account. (It really does make sense in the story) This story is where he learns to look for things like that.

Strange
2014-Feb-26, 01:19 PM
Remember that there are combinations of characters statistically very common in the English language

You are right, of course. That was the first thing I thought of when I saw the code. And then completely forgot when I started thinking about Morse code. Duh!

Strange
2014-Feb-26, 01:20 PM
This story is where he learns to look for things like that.

And then we find he is insane? :)

Tog
2014-Feb-26, 01:25 PM
Sure there is. Brute force. You don't need the dots and dashes, you have a list of characters:
1=TE
2=AIMN
3=DGKORSUW
4=BCFHJLPQVXYZ
(leaving out the numbers for now). You don't need to bruteforce the dots and dashes, you just bruteforce this. Remember that there are combinations of characters statistically very common in the English language (and I assume that the language of the message is out of question). I would for example start by looking for 141-combinations - very, very likely to be THE. That would give me about 6 character->./- translations to start from.

;)
I hadn't thought about trying to break it that way. Thanks.

jokergirl
2014-Feb-26, 01:31 PM
I sent you a PM to show how I worked backwards from what you gave me. :) It's pretty simple really, but it does include a bit of guesswork and background knowledge ("I know this is a phrase so...").

And now I really should get back to actual coding work... <.<;

Moose
2014-Feb-26, 02:30 PM
I think Tog was thinking of the idea of spotting the two different types of characters used as dots and dashes. I don't think a computer would be able to work that out (unless their is already a categorization of characters in that way). I don't think there is any way or working out which characters are dots and dashes without that information (or a plaintext sample).

Dots and dashes (and even words sometimes) are simply a different kind of alphabet, and they're _all_ just binary strings to a computer. All this really is is an application of super-encryption (encrypting something twice, often using different methods.) It makes brute force computationally expensive, (and maybe out of reach of lowest-common-denominator mass-market hardware... maybe...) but still not actually hard.

Remember that some of the WWII German service branches used super-encryption. Enigma was still technically harder to crack than this would be because it included a layer of entropy (the ring positionings).


That is certainly true of professionals. But probably not most amateurs, which may be more relevant to the story (and, sadly, includes a number of commercial organizations).

We need to distinguish between amateurs and layfolk. The only real advantage the best cryptography professionals (NSA) have over the gifted amateurs is access to supercomputing (and apparently the ability to have backdoors installed in commercial products on the sly.) NSA can't even claim sole lock on true entropy anymore.

DonM435
2014-Feb-26, 07:15 PM
So, with all those different ways to encode "dot," do you choose one randomly each time? Same for dash.

A translator program would be extremely simple. You'd store the "dot" letters in one array, the "dash" letters in another, and the Morse Code and their letter equivalents in a third.
Globally replace all the "dot" letters with dots, and the "dash" letters with dashes. Pass what results through a Morse Code lookup and you're done.

Of course you'd have to know what the trick was in the first place, and I don't think that a program could figure out that point. So, it looks pretty effective to me.

Tog
2014-Feb-27, 10:38 AM
We need to distinguish between amateurs and layfolk. The only real advantage the best cryptography professionals (NSA) have over the gifted amateurs is access to supercomputing (and apparently the ability to have backdoors installed in commercial products on the sly.) NSA can't even claim sole lock on true entropy anymore.
When Zodiac sent his first round of letters to the newspapers, the codes were sent to the FBI and Navy cryptologists who couldn't crack it. It took a woman with no experience beyond an interest in puzzles a couple of hours to break it.


So, with all those different ways to encode "dot," do you choose one randomly each time? Same for dash.
I have it in a spreadsheet. The message is written vertically, one letter at a time. Beside it is the Morse String, and beside that four columns with a 0, 1, or 2 to represent null, dot, and, dash. The letter E is 0001.

The next column uses VLOOKUP to select the letter from an array based on a random number from 1 to 18. If the value in the previous column was is 1 it picks from column B, if it's 2 it picks from Column B.

Then it consolidates the text string into a single column of text per letter, and a cell at the top consolidates all of those strings into one long text string for pasting. After pasting, I needed to go back and add the spaces by hand. The result is that every time I open the sheet, a completely different text string is generated.

It's sloppy, but it was just a quick thing. I'm sure if I spend some time on it I can get it cleaned up well enough to cover the spacing issues to better hide the letter blocks, and allow for a faster conversion of the raw message to the coded one. I'm sure there are much better ways to go about it, but the only programming language I ever learned was Apple Basic.

profloater
2014-Feb-27, 12:59 PM
I like your idea as a puzzle and also the real life problem of making any code that a computer could not break. One time pads are good but that does require both parties to have the pads, that does not fit all situations. Using a cultural reference is attractive but cloud computing might break that easily. The use of straights and curves was clever for morse, and there are other tricks like that especially if you chose a special font. Semaphore and naval flags are other alphabets that might lend to cultural coding rather than straight substitution. Of course today the idea today might seem criminal but after the robot wars??