Results 1 to 21 of 21

Thread: Machine code?

  1. #1
    Join Date
    Mar 2013
    Location
    Pretty much in the middle, and a little way up.
    Posts
    1,220

    Machine code?

    Hello folks

    I'm working on a story, and a little part of it involves machine code.
    I'm not exactly sure what I'm thinking of is called; hence this question.
    (I'm a total tyro at anything computer-related. As far as I'm concerned there are magic computer fairies in there somewhere.)

    This is what I'm asking: Take a file - say the model file of a 3D object in a game. Precisely that in fact - that's what I remember opening. Open it in text, and the file is filled with numbers and odd-looking characters that don't appear to have any connection to language. I've always thought that this was 'machine language'. However, looking for examples online, I see that machine language is mainly binary code.

    Does anyone know what I'm talking about? The strange non-English characters. (They look a bit Cyrillic, but since I know that language even less that's just an ignorant visual observation.)
    I'd like to learn a bit of what those characters are, what they represent and a tiny bit of what they do.

    Thanks friends - cheers!
    "The difference between theory and practice is that in theory, there's no difference."

    "Aikido: the art of hitting people with planets."

  2. #2
    Join Date
    Jul 2005
    Posts
    18,104
    The odd characters are more of an epiphenomenon - they're what happens when you open a non-text file in a text editor. Data are stored in many different formats, according to the program that writes/reads the data, and very few of these formats are compatible with text editors, which expect the data to be grouped so as to numerically represent letters and other characters.
    What you were looking at in your text editor was all the numerical data, understandable by a 3D graphics program, being rendered as if it were a piece of text.
    I'd be reluctant to call that "machine code" - machine code (at least, back in the day when I wrote machine code) refers to the low-level programming language used by computers themselves. When you write a program in a higher level language, it's interpreted/compiled into machine code so that it can run on the hardware of your machine. Back in the day, we used to poke around writing our own little machine code routines directly, to speed up the code execution, but that's not much of an issue these days, it seems.
    What you're seeing is just a misinterpreted data file.

    Grant Hutchison

  3. #3
    Join Date
    Dec 2018
    Posts
    103
    Quote Originally Posted by NorthernDevo View Post
    Hello folks

    I'm working on a story, and a little part of it involves machine code.
    I'm not exactly sure what I'm thinking of is called; hence this question.
    (I'm a total tyro at anything computer-related. As far as I'm concerned there are magic computer fairies in there somewhere.)

    This is what I'm asking: Take a file - say the model file of a 3D object in a game. Open it in text, and the file is filled with numbers and odd-looking characters that don't appear to have any connection to language. I've always thought that this was 'machine language'. However, looking for examples online, I see that machine language is mainly binary code.

    Does anyone know what I'm talking about? The strange non-English characters. (They look a bit Cyrillic, but since I know that language even less that's just an ignorant visual observation.)

    Thanks friends - cheers!
    It's hard to say without some more information. It may depend on the particular software you're using.

    In the old days, a text editor on a computer in the US would typically interpret a file as containing ASCII characters, possibly using only seven bits, with the extra bit in a byte used for error checking. See here:

    https://en.wikipedia.org/wiki/ASCII#7-bit_codes

    There are 128 possible 7-bit codes, and most of them are printable, but the first 32 are "non-printable" characters, which mostly provide various instructions (described in the article) to display or print terminals. If you open a document containing these codes using software not designed to interpret them correctly, the result may be nonsense displayed on the screen. These are often called "non-printable" characters.

    As technology progressed, the eighth bit was increasingly used to carry data instead of for error checking, meaning an additional 128 characters became available. Here is what IBM did with these extra 128 characters:

    https://en.wikipedia.org/wiki/Code_page_437

    They not only created their own (non-standard) definitions of what these 128 characters meant, they also redefined the first 32, since you don't need the original definitions of these characters for a lot of applications.

    These are country/language specific. For example, here is a Russian "code page".

    https://en.wikipedia.org/wiki/Code_page_866

    For the first 128 characters, it is the same as the IBM definition. But for the next 128 characters, it interprets them mostly as Russian letters instead. So a computer in Russia that opens up a document that was meant to be interpreted using IBM's definition, will misinterpret the 128 extra characters as Russian letters instead.

    Since the Russian code page includes all the English letters as well, you can have a document with both English and Russian in it. But you can't have one with Russian and the graphics symbols that IBM defined for the extra 128 characters.

    A computer in Israel might use this:

    https://en.wikipedia.org/wiki/Code_page_862

    The first 128 characters are the same as in IBM's definition, but the next 128 are Hebrew. So you can create a document with both English and Hebrew, and maybe your text editor is even smart enough to write the English characters left-to-right, and the Hebrew characters right-to-left. But if you want a document with both Hebrew and Russian, you're stuffed - it's either one or the other, you can't have both. The workaround for this was to embed a command in the text document that your editor recognised as an instruction to switch code pages. But, if your editor weren't smart enough, it might simply try to display the command on the screen as text, and then interpret the remaining text in the original language. Result - gibberish on the screen.

    So if you open a data file (like executable code, a "model file" as you described, whatever), that was never meant to be read in text format, the text editor will try to interpret the data in the file as text characters, using whatever the default code page is. The result using code page 437 (the IBM definition) would most likely be a mix of letters, numbers, punctuation, and strange looking graphics symbols. If you opened the exact same document on a computer in Russia, in a text editor that expects code page 866, instead of a lot of the graphics characters, you'd see Russian letters instead. On a computer in Israel, and one in Iraq, and one in Sweden, it might look different still, if each of these uses a code page based on the local language.

    I don't know that there is really a name for this - it's not really any kind of code, it's a computer program (the text editor) taking data in one format, and completely misinterpreting it, as if it were written in another format. It could be called "gibberish", I suppose.

    In recent years, the effort has been to get rid of this code page madness, and have a single system for representing all of the characters in all of the languages in the world. This is Unicode, although there are actually a couple ways of representing Unicode. Since there are an absolutely huge number of symbols that have to be represented (not just because there are a lot of languages, but also because East Asian languages have a lot of symbols), 256 is nowhere near enough; that's not even enough to represent the symbols in all the European languages. So the most common version of Unicode uses eight bits to represent some symbols (because of the history, English is rather privileged here), sixteen bits to represent others, 24 for others, etc.

    So if your text editor expects a document to be in Unicode, and the file you open contains data that was never meant to be interpreted as text, who knows what sort of mishmash you're going to get. You could have English, Russian, Greek, ancient Sanskrit, Chinese, graphics, etc., all mixed together. But again, open the exact same document with a program expecting code page 437 (the IBM definition), it will look different, and with a program expecting code page 866 (Russian), it will look different still, etc. If Unicode UTF-8 takes over the world (it mostly has), then at least the complete nonsense on the screen will look the same everywhere in the world.

    I don't know that there is a name for what you see on the screen. If you open a program file (so a file containing computer instructions to be executed, rather than text) in a text editor, I suppose you could say you are looking at "machine code", because that is what the file contains; however, your text editor is mistakenly attempting to display it as text, resulting in total nonsense.
    Last edited by 21st Century Schizoid Man; 2019-Nov-14 at 06:25 PM. Reason: Mistaken "127" changed to "128"

  4. #4
    Join Date
    Mar 2013
    Location
    Pretty much in the middle, and a little way up.
    Posts
    1,220
    Quote Originally Posted by grant hutchison View Post
    The odd characters are more of an epiphenomenon - they're what happens when you open a non-text file in a text editor. Data are stored in many different formats, according to the program that writes/reads the data, and very few of these formats are compatible with text editors, which expect the data to be grouped so as to numerically represent letters and other characters.
    What you were looking at in your text editor was all the numerical data, understandable by a 3D graphics program, being rendered as if it were a piece of text.
    I'd be reluctant to call that "machine code" - machine code (at least, back in the day when I wrote machine code) refers to the low-level programming language used by computers themselves. When you write a program in a higher level language, it's interpreted/compiled into machine code so that it can run on the hardware of your machine. Back in the day, we used to poke around writing our own little machine code routines directly, to speed up the code execution, but that's not much of an issue these days, it seems.
    What you're seeing is just a misinterpreted data file.

    Grant Hutchison
    Excellent, thank you


    Quote Originally Posted by 21st Century Schizoid Man View Post
    It's hard to say without some more information. It may depend on the particular software you're using.

    In the old days, a text editor on a computer in the US would typically interpret a file as containing ASCII characters, possibly using only seven bits, with the extra bit in a byte used for error checking. See here...
    Excellent, thank you! That's what I was looking for.

    Now to integrate it into the story - and hehe it's a weird one. Having WAY too much fun with it though.

    Cheers!
    "The difference between theory and practice is that in theory, there's no difference."

    "Aikido: the art of hitting people with planets."

  5. #5
    Join Date
    Aug 2006
    Posts
    3,511
    Quote Originally Posted by NorthernDevo View Post
    Does anyone know what I'm talking about? The strange non-English characters. (They look a bit Cyrillic, but since I know that language even less that's just an ignorant visual observation.)
    I'd like to learn a bit of what those characters are, what they represent and a tiny bit of what they do.
    Others have eloquently explained it, so this is just a "me too".

    A standard on almost all computers is to store all code in 8-bit bytes (or 16, or 32). A 8-bit byte can hold any of 256 values, from 0000 0000 to 1111 1111.
    That's machine code.

    Everything else is an interpretation.
    Run it in a 3D program and it will interpret the bytes as values of some sort.
    Run it in a text editor and it will interpret the bytes as text characters.
    Run it in PhotoShop and it might interpret them as colour values.

    So, the byte 1010 1010 (converted to decimal value is 170) might be an OCTAL coordinate 252, or ASCII character 170 or HEX colour value AA.

    ASCII character 170 is defined as the LOGICAL NEGATION character:
    https://theasciicode.com.ar/ascii-co...i-code-170.gif
    So, if you open the file in a text editor, it will display that character.

  6. #6
    Join Date
    Mar 2013
    Location
    Pretty much in the middle, and a little way up.
    Posts
    1,220
    Quote Originally Posted by DaveC426913 View Post
    Others have eloquently explained it, so this is just a "me too"...
    ASCII character 170 is defined as the LOGICAL NEGATION character:
    https://theasciicode.com.ar/ascii-co...i-code-170.gif
    So, if you open the file in a text editor, it will display that character.
    Great, thanks so much.
    This throws a bit of a monkey wrench into my plans, but it's an easy enough fix, I think.
    In short, a computer game designer is attempting to explain to a video game character - a Sorceress - that he is her Creator.
    This is somewhat difficult since she is a tremendously powerful warrior-mage who has ridden dragons into battle, and he's a skinny nerd with the muscle tone of canned ham.
    ...It's a fun story.
    "The difference between theory and practice is that in theory, there's no difference."

    "Aikido: the art of hitting people with planets."

  7. #7
    Join Date
    Jul 2005
    Posts
    18,104
    Incidentally:
    Back in the days of steam computing with a Command Line Interface, we used to enjoy a not-unrelated phenomenon that was actually called "going Cyrillic". In those dear dead days of yore, a computer screen wasn't a sort of pixel palette painted by a Graphical User Interface, but was fed by a set of memory locations that could each contain a single byte, corresponding to a single character (from a selectable 256-character code page, as described by 21st Century Schizoid Man above) that would be placed on the screen.
    So you could make an "A" appear on the screen by writing the number 65 into the corresponding RAM address, for instance.

    If your code went a little awry (and reader, it sometimes did), you could end up with it endlessly spewing its numerical output into the screen memory area. The characters from the "upper 128" of the code page, rarely used in writing, would appear with equal frequency to the familiar alphanumeric characters and punctuation. All the graphics shapes and accented letters from the upper 128 bytes of the common default code page, 437, did an excellent job of simulating a foreign alphabet. Hence, "going Cyrillic".

    When the first Macs came along, with their GUIs, we got a whole new kind of program malfunction - placing unexpected numbers in that kind of screen memory area resulted in the display of what looked like a random bitmap, which was called a "snow crash". (Hence the title of Neal Stephenson's 1992 novel.)

    Grant Hutchison

  8. #8
    Join Date
    Aug 2006
    Posts
    3,511
    Quote Originally Posted by NorthernDevo View Post
    In short, a computer game designer is attempting to explain to a video game character - a Sorceress - that he is her Creator.
    This is somewhat difficult since she is a tremendously powerful warrior-mage who has ridden dragons into battle, and he's a skinny nerd with the muscle tone of canned ham.
    ...It's a fun story.
    Nerd: "See these things here? They're bits. They're part of you."
    Sorceress: "So what? It's just a tiny scratch mark on your window-box."
    Nerd: changes random bit from 0 to 1
    Sorceress: is suddenly a cube of purple snow, only six inches tall but spanning from horizon to horizon
    Nerd: changes bit back to zero
    Sorceress: "Do NOT do that again!"

  9. #9
    Join Date
    Aug 2006
    Posts
    3,511
    I read a story once where a programmer was drawn into a fantasy world and started to learn its magic. And then made up his own incantations...

    "For i equals one to ten!"
    "Fireball!"
    "Next"
    "Run!"


    "S equals monster size!"
    "While S greater than zero!"
    "Set S to S times zero point nine!"
    "Loop!"
    "Run!"

  10. #10
    Join Date
    Apr 2010
    Posts
    432

  11. #11
    Join Date
    Mar 2013
    Location
    Pretty much in the middle, and a little way up.
    Posts
    1,220
    Quote Originally Posted by DaveC426913 View Post
    I read a story once where a programmer was drawn into a fantasy world and started to learn its magic. And then made up his own incantations...

    "For i equals one to ten!"
    "Fireball!"
    "Next"
    "Run!"


    "S equals monster size!"
    "While S greater than zero!"
    "Set S to S times zero point nine!"
    "Loop!"
    "Run!"
    Ouch...
    That second one is evil!
    "The difference between theory and practice is that in theory, there's no difference."

    "Aikido: the art of hitting people with planets."

  12. #12
    Join Date
    Jul 2005
    Posts
    18,104
    Quote Originally Posted by NorthernDevo View Post
    Ouch...
    That second one is evil!
    Also, in essence, the plot of Larry Niven's short story "Convergent Series".

    Grant Hutchison

  13. #13
    Join Date
    Jun 2003
    Location
    Central Florida.
    Posts
    5,896
    I remember that Crichton's Jurassic Park had some actual programming statements (higher-level, not machine language) presented and explained. Something like "Look here: it stops checking once it's found 173 dinosaurs ... so it'll never find a new one! That number shouldn't have been hard-coded!"

  14. #14
    Join Date
    Mar 2013
    Location
    Pretty much in the middle, and a little way up.
    Posts
    1,220
    Quote Originally Posted by DonM435 View Post
    I remember that Crichton's Jurassic Park had some actual programming statements (higher-level, not machine language) presented and explained. Something like "Look here: it stops checking once it's found 173 dinosaurs ... so it'll never find a new one! That number shouldn't have been hard-coded!"
    One of the things I love about Crichton's work is that he introduces solid and realistic background into his stories. That moment - one of the best in Jurassic Park - shows a computer programmer taking center stage and doing his best. I'm not sure if it was actually realistic, but as a reader I enjoyed it greatly and it lent a tremendous realism to the story - a lesson I try to remember in my own writing. Hence this question, lol.
    "The difference between theory and practice is that in theory, there's no difference."

    "Aikido: the art of hitting people with planets."

  15. #15
    Join Date
    Aug 2006
    Posts
    3,511
    Quote Originally Posted by NorthernDevo View Post
    ... a computer programmer taking center stage and doing his best. I'm not sure if it was actually realistic...
    Heehee. Funny/ironic:

    I was going to create a funny infographic about Nedry's "two MILLION lines of code!" thinking that's child's play really, compared to some real workhorse programs.

    I'd thought I'd better get my facts straight, so I went in search of some average figures for the number of lines of code in random samples of software - so I can juxtapose them against poor Nedry's burden.

    What do I find?

    lines-of-code-1.png

    Oh well. So I'm not the first one to think of this.


    Here it is in context:

    lines-of-code-2.png

    Here it is in detail:
    https://www.businessinsider.com/how-...oftware-2017-2

  16. #16
    Join Date
    Jun 2003
    Location
    Central Florida.
    Posts
    5,896
    I hope that they have a consistent definition for what constitutes a "line of code."

    You can put multiple statements on one physical line; or else extend one statement over multiple lines. You can accomplish the same task via many statements, or a few -- sometimes the apparently longer version runs faster. You might be counting non-executable comment lines, or not. Or, counting calls to external subroutines, or not. I think the Jurassic Park source code was in "C", and so would be subject to any of these distractions.

    I used to wince when some manager asked if I could get a line count for some program, or estimate lines of code needed. Depending upon whom it was, I might just ask him how big (or small) a number he wanted it to be. General consensus was: bigger is better. This discourages writing tightly and efficiently, but what of it?

    I'm thinking that machine language code would give you a more meaningful count, as even verbose code can compile to simpler instructions if optimized.

  17. #17
    Join Date
    Jun 2007
    Posts
    5,774
    Quote Originally Posted by DonM435 View Post
    I hope that they have a consistent definition for what constitutes a "line of code."

    You can put multiple statements on one physical line; or else extend one statement over multiple lines. You can accomplish the same task via many statements, or a few -- sometimes the apparently longer version runs faster. You might be counting non-executable comment lines, or not. Or, counting calls to external subroutines, or not. I think the Jurassic Park source code was in "C", and so would be subject to any of these distractions.

    I used to wince when some manager asked if I could get a line count for some program, or estimate lines of code needed. Depending upon whom it was, I might just ask him how big (or small) a number he wanted it to be. General consensus was: bigger is better. This discourages writing tightly and efficiently, but what of it?

    I'm thinking that machine language code would give you a more meaningful count, as even verbose code can compile to simpler instructions if optimized.
    Heavy use of copy/paste coding can defeat that metric...while greatly multiplying locations for bugs and resulting in horribly unmaintainable code. Given two possible implementations, the one that's better thought out and which better factors the problem into its component pieces will usually be the smaller one. Cleaning up sloppy code can turn thousands of lines of scrunched up code cramming multiple operations on a line into a few hundred lines of nicely formatted code.

    And of course, different languages (and libraries) wildly differ in the number of lines of code needed to accomplish different tasks.

  18. #18
    Join Date
    Jul 2018
    Posts
    133
    Are you a fan of Westworld? They have an awesome scene in that where one of the androids, who thinks she is a real person, is trying to be convinced by one of the programmers that her mind is a computer program. He remotely links a data pad directly to the software that is running her mind, shows her some sort of a "conversation tree" display, and as she is talking he hands her the pad and it starts showing the words that she is currently speaking, only they appear on the screen a half second before she says them. This causes her to continually stop and restart the conversation, but the display is always accurate and always shows the word she is about to say just before she says it. The stress of seeing her words show up on screen before she was even aware she thought them actually caused her to suffer some sort of crash and reboot. It was a really cool moment of realization for her.

    WARNING: If aren't familiar with Westworld, it's an HBO show that has a lot of nudity and cursing in it on a similar level as Game of Thrones, and I'm pretty sure this scene has both. I just wanted to warn you in case you went to look it up online.

  19. #19
    Join Date
    Mar 2013
    Location
    Pretty much in the middle, and a little way up.
    Posts
    1,220
    Quote Originally Posted by Dave241 View Post
    Are you a fan of Westworld? They have an awesome scene in that where one of the androids, who thinks she is a real person, is trying to be convinced by one of the programmers that her mind is a computer program. He remotely links a data pad directly to the software that is running her mind, shows her some sort of a "conversation tree" display, and as she is talking he hands her the pad and it starts showing the words that she is currently speaking, only they appear on the screen a half second before she says them. This causes her to continually stop and restart the conversation, but the display is always accurate and always shows the word she is about to say just before she says it. The stress of seeing her words show up on screen before she was even aware she thought them actually caused her to suffer some sort of crash and reboot. It was a really cool moment of realization for her.

    WARNING: If aren't familiar with Westworld, it's an HBO show that has a lot of nudity and cursing in it on a similar level as Game of Thrones, and I'm pretty sure this scene has both. I just wanted to warn you in case you went to look it up online.
    You do realize that the warning itself made me want to go look it up?
    LOL J/K. I love the original Westworld (Yo, Yul! hehe) but have never seen the modern version. That scenario you describe sounds like a brilliant piece of writing and acting though; I'll take a look.
    "The difference between theory and practice is that in theory, there's no difference."

    "Aikido: the art of hitting people with planets."

  20. #20
    Join Date
    May 2005
    Posts
    8,188
    In old machine code, there was a situation that didn't differentiate between the binary that represented code and the binary that represented data--and if you weren't careful you could "execute your data." That, is as fun as it sounds.

  21. #21
    Join Date
    Dec 2018
    Posts
    103
    Quote Originally Posted by grapes View Post
    and if you weren't careful you could "execute your data." That, is as fun as it sounds.
    That's the fundamental principle behind stack smashing. Exploit a stack overflow by getting your data written to memory where there should only be code, and have the data/code do something nefarious.
    No, your face is a logical fallacy!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •