PDA

View Full Version : Communication Efficiency



GOURDHEAD
2004-Jan-23, 03:14 PM
If the U.S. tax code were converted to ASCII and this bit pattern represented with a numeral from a very large number base (units position from 0 to (2^n) -1 where n is allowed to be sufficiently large to contain the bytes needed to present the message), could, with appropriate a priori agreements about the number base, this tax code be transmitted from here to Alpha Centauri (or vice versa) in the form:

(ID1)X1X2X3...(ID2)Y1Y2Y3... Where the IDs identify subtrahends (hexidecimal E) and subtractors (hexidecimal F),

and the Xs and Ys are limited in range to the hexidecimal values 0 thru D.

If so, huge amounts of data could be transmitted via polarization modulation across space with very few character sets sending many repetitions of the same surrogate message. Eight (or some few more) characters repeated 100 times increase the likelihood of accurate communication. Are others already doing this?

This looks to good to be true, so I have probably overlooked a major flaw. Has someone thought of an even more efficient method? I am anxious to be enlightened.

GOURDHEAD
2005-Nov-08, 07:09 PM
Quite a few new members have joined since I posted this question. Can anyone point out the flaws, if any? Would a duo-hexidecimal (32) base work better for constructing the pseudo-message?

Halcyon Dayz
2005-Nov-08, 10:11 PM
I don't quite get what you are proposing.
Please, could you elaborate? :neutral:

Van Rijn
2005-Nov-08, 10:34 PM
Or perhaps simplify it?

I'm guessing you are trying to get something for nothing - without simply removing redundancies, you are trying to do a type of lossless data compression. But, if you have (for example) 256 possible patterns, you need to send enough data to unambiguously represent those 256 patterns. You may be able to do some sophisticated signal modulation, but ultimately, bits are bits. The number base is irrelevant.

GOURDHEAD
2005-Nov-09, 04:02 AM
I don't quite get what you are proposing. Please, could you elaborate? I'll try. The bits it would take to express all the information in the Encyclopedia Britanica in ASCII code can also be used to express a large number, N, in binary code. N can also be expressed as the difference between (or sum of ) an infinite set of pairs of numbers using an infinite set of number bases. If we have a colony in the Alpha Centauri system with which we wish to communicate, we will want to avoid message corruption from noise as well as transmit it quickly. We will further wish to enhance the fidelity of the information transmittal by repeating it several times using polarization modulation to represent the information content.

It would take a lot of digits to express N in binary (ASCII), decimal, or hexadecimal bases and we wish to avoid sending all those bits required to represent such numbers. So we by apriori agreement, with teams on each end of the transmission/reception, select pairs of numbers, expressed in a common base, from the infinite set availale to us such that either their sum or difference is equal to N. The critical constraint on the selection of the number base is to use only the low order digits to express each member of the pair. If the base is 2^99 (actual power of 2 to be determined by communication theory experts), then n1*(2^99)^0 becomes the rightmost digit, n2*(2^99)^1 becomes the second digit from the right, etc. The n's are restricted to values 1 through 15 and as few digit positions as can provide the required flexibility. A transmission of the bit pattern that equals 5432, 3344 would be interpreted by the receiving station to add those numbers expressed in the number base, express the answer in binary and read the ASCII code constituting the data.

Van Rijn
2005-Nov-09, 07:08 AM
It would take a lot of digits to express N in binary (ASCII), decimal, or hexadecimal bases and we wish to avoid sending all those bits required to represent such numbers.

As I thought, you are trying to get something for nothing. You can remove redundancy, you can use efficient signalling methods, but if you want to send something where there are 256 possible states, you have to send the equivalent of 8 bits. And so on for whatever number of bits you wish to send. No exceptions. If there were, you could have infinite data compression, and reduce the Library of Congress to one bit.

Halcyon Dayz
2005-Nov-09, 11:31 AM
The classical way of compressing text is by finding strings of
characters that repeat themselves, might be only a few characters
or complete sentences, and replace these with a numerical code,
and then add a table of these codes.
If you only use the standard letters you can suffice with using
a reduced ASCII-set, which requires only 7 bits.
Together with some other tricks, text can be reduced considerably,
often to less than 20%.

For accuracy you could want add a control-bit to every byte,
or send the message more than once.

In general though, text takes relatively little space.
You could fit thousands of volumes of plain text on a single CD-ROM.
It is audio-visual material that takes a lot of space.

GOURDHEAD
2005-Nov-09, 02:05 PM
As I thought, you are trying to get something for nothing.I prefer to think of it as getting a lot out of a relatively few bits, but being the slow learner that I am, haven't seen where the flaw in my scheme is. Surely someone in the business has thought of this and knows why it won't work.

Assuming it can be made to work for text, with appropriate modification it will also work for image transmission where the greatest gain in efficiency will be realized. Also, fiddling with this and similar schemes may alert us how to search for information from the Advanced Ones allowing us to convert what appears to be an unusual burst of noise into an intelligent data transmission while invoking polarization demodulation.

Be kind, at least condescendingly tutorial, show me the flaw.

01101001
2005-Nov-14, 05:14 AM
I believe your scheme is working at cross purposes with itself. You have some sort of compression scheme for efficiency, and then repeat the compressed message some unspecified number of times for reliability, thereby blowing away any advantage gained in compression. Those two problems should be solved separately, and then integrated to send one entire reliable message in the bandwidth available.

The base message should be compressed (http://en.wikipedia.org/wiki/Lossless_data_compression). Compressed, it should have high entropy (http://en.wikipedia.org/wiki/Information_entropy), and appear to have the characteristics of random noise for high efficiency. I strongly suspect a state-of-the-art compression scheme like LZW (http://en.wikipedia.org/wiki/LZW) will defeat yours -- though I admit I haven't had the interest to analyze yours. If you can show yours is better for typical data, expect wealth to come your way.

Once you have a compressed message, with natural redundancy squeezed out of it, then for transmission you should add redundancy to it with an error-detection-and-correction code (http://en.wikipedia.org/wiki/Error-correcting_code) so it can be reconstructed should some of its bits be damaged in transmission. Analyzing the types of damage you expect, the type and distriubution of errors, and deciding what, if any, error rate is acceptable, detected and undetected errors, will help you chose a much better method than just sending the message multiple times. I suppose your solution will be some kind of Reed-Muller code (http://en.wikipedia.org/wiki/Reed-Muller_code), like NASA uses deep-space.

montebianco
2005-Nov-14, 05:28 AM
If there were, you could have infinite data compression, and reduce the Library of Congress to one bit.

Well, you can do that, as long as you know in advance that the Library of Congress is one of the two messages that will be sent :)

I'd better duck now...

Van Rijn
2005-Nov-14, 11:16 AM
Well, you can do that, as long as you know in advance that the Library of Congress is one of the two messages that will be sent :)

I'd better duck now...

I know you were joking but: Sure, you can activate a message, so you can send a a complex plan (by whatever means) and send a one bit go/no go signal. But, the best you can ever do when transmitting or storing data is to remove all redundancy. If I'm understanding his intent, I think GOURDHEAD was attempting to find an encoding method (not compression method) that could somehow consistently fit a large amount of data into a smaller amount with no loss. And that just isn't possible.

montebianco
2005-Nov-14, 05:15 PM
I know you were joking but: Sure, you can activate a message, so you can send a a complex plan (by whatever means) and send a one bit go/no go signal. But, the best you can ever do when transmitting or storing data is to remove all redundancy. If I'm understanding his intent, I think GOURDHEAD was attempting to find an encoding method (not compression method) that could somehow consistently fit a large amount of data into a smaller amount with no loss. And that just isn't possible.

That's for sure. If there are 2^X possible messages that can be sent, then X bits are needed. If some messages are more likely than others, those could be encoded with fewer than X bits, and the others encoded with X+1 bits. But no way to get them all below X bits...

GOURDHEAD
2005-Nov-14, 09:50 PM
You critics may have a worthwhile point but remember you're dealing with a slow learner and consequently you are shirking your duty to show me why it doesn't work. I have laid out a simple procedure for accomplishing the sending of a message the size of the content of the Encyclopedia Britannica using only 100 or 200 bits, maye a 1000 on the outside, where the content is never known apriori. These huge messages will be sent once a week or so always with content not known apriori by the colonists at AC-A and they will be receiving similar messages from Earth to advise them of the latest news, cures for diseases, answers to previously asked questions, technology development, etc. If these messages were to be sent in their uncoded form, hundreds of millions, maybe billions of bits, would have to be sent, and , even if we use polarization modulation, the noisy universe, working against such an enormous target over 4.3 years, will make sure it will not be received intact.

My coding system will allow you to send 100 one thousand-bit identical messages (a much smaller target), and although none come through intact, I hope comparison of the content of the entire set will allow a correct version to be retrieved. Bit compression and error detection and correction code should be employed, to the extent it is helpful, to the 100 thousand-bit messages. Standard data pages (either text or pictures) could be inserted every 50 or 100 pages as indicated by test results and experience to enhance message veracity (ensuring confidence that is has been received correctly). The format of the message should include a portion that identifies the power of 2 that identifies the number base so maximum flexibility can be applied to both code and decode the messages.

1. Any number can be expressed as a numeral which is the sum or difference of two or more other numbers in a common data base.

2. Any numeral so defined can be expressed in binary code.

3. Binary code, using ASCII protocol, can be coverted into a string of letters and punctuation symbols.

4. Letters and punctuatiuon symbols comprise messages.

What have I not seen? Is there a Nobel prize for communication theory...well, at least trickery?

NEOWatcher
2005-Nov-14, 10:07 PM
You critics may have a worthwhile point but remember you're dealing with a slow learner and consequently you are shirking your duty to show me why it doesn't work.
You haven't shown us why you think it does work. You gave us some methods and definitions, but not how it applies to the end product.
How about a short example, like a short sentence?
Show us the following: the original sentence, the information being sent, and how it is decoded.

1. Any number can be expressed as a numeral which is the sum or difference of two or more other numbers in a common data base.

How does that make it shorter. It takes 8 bits to describe 100. It takes at least 8 bits to describe any number that can be used as a difference to get to 100. The difference will involve one number that is greater than, and one number that is less than.


2. Any numeral so defined can be expressed in binary code.

So? Computers represent just about anything in binary.


3. Binary code, using ASCII protocol, can be coverted into a string of letters and punctuation symbols.

Why is ASCII so important? I can invent my own protocol that would be slightly more effecient than ascii since it would cover only the needed characters and not some of rarely used ones. For instance, how about a 6 bit code that allows 64 characters (26 letters, 10 numbers and some assorted symbols)


4. Letters and punctuatiuon symbols comprise messages.

And spaces, and numbers, and pictures, etc.


What have I not seen? Is there a Nobel prize for communication theory...well, at least trickery?

I'm sure it would be covered in some other Nobel prize (most likely math).

SirBlack
2005-Nov-15, 12:52 AM
I've finally taken the time to read through Gourdhead's explanation, and I don't quite understand how this method is supposed to work. I am also suspicious (as are others) that it can achieve the claimed data compression. Perhaps we could work through a practical example so as to help with both issues.


Say we want to transmit this text: "Message"

That would be 7 characters, each represented by 8 bits. Specifically, the text would be represented as: 01001101 01100101 01110011 01110011 01100001 01100111 01100101

So then all these bits are appended to each other to created the number N which in decimal form would be 21,785,119,738,128,229.

And now this is supposed to be represented somehow as the sum of a pair or pairs of numbers... I think. Here is where I don't understand how the method is supposed to work.

Gourdhead, could you work through the rest get the resulting data which would actually be transmitted? Then we could also see how many bits would be required compared to the original 56 bits making up the message.

Van Rijn
2005-Nov-15, 01:53 AM
Gourdhead, could you work through the rest get the resulting data which would actually be transmitted? Then we could also see how many bits would be required compared to the original 56 bits making up the message.

This is the crux of the issue.



1. Any number can be expressed as a numeral which is the sum or difference of two or more other numbers in a common data base.

2. Any numeral so defined can be expressed in binary code.


How many bits does it require to exactly represent that number? I think you will find there is no savings on average if starting with an arbitrarily chosen number.

"Infinite compression" (lossless compression beyond the removal of all redundancy) is essentially the information equivalent of the perpetual motion machine. Take a look at the link 01101001 supplied earlier:

http://en.wikipedia.org/wiki/Information_entropy

It is physically impossible, but also should be clear if you think about small values. How would you consistently represent 256 values in 7 bits?

But, if you do find a way to do it, you can make a lot of money, so if you think you have a shot, go for it. There actually was a company in the early '90s that said they could do this and they got a lot of money from hopefuls, despite many negative comments. They kept delaying their product, and when they did demonstrate something when using arbitrary input data ... it produced garbage for output.

Ah, well.

GOURDHEAD
2005-Nov-15, 05:01 PM
So then all these bits are appended to each other to created the number N which in decimal form would be 21,785,119,738,128,229.
Very little efficiency can be gained when sending short messages and I need software (e.g., BASIC) capable of computing a specific answer from a selected pair of numbers. As someone has suggested, a modified ASCII code may increase the efficiency.

The following, although not providing an exact answer, will, I hope, demonstrate the procedure:

The decimal number we wish to send is 21,785,119,738,128,229 ("message" in ASCII) requiring 56 bits if not coded as I suggest.

We could send X59-XY where X is the coefficient of (2^59)^0, 59 designates the power of 2 forming the number base (576,460,752,303,423,488), and Y is the power of 2 designating the number base of the subtrahend. 59 was somewhat arbitrarily chosen to make sure I had a value greater than 21,785,119,738,128,229, and I need computational software to get more specific.

We could send xxxx0101100100101101xxxxyyyyyyyy using a total of 48 bits when coded as I suggest. Here the x's refer to the coefficient of the power of the number base which power increases, starting at zero in the right most position, from right to left as we normally express decimal numbers. The y's indicate the number base of the subtrahend. Note that 8 bits are used to indicate subtraction (or addition or multiplication) which we could eliminate by apriori agreement and have a savings based on 32 versus 56 bits in this case (a savings of 43%). The very large messages will have savings of 99.8% or more.

montebianco
2005-Nov-15, 05:36 PM
Very little efficiency can be gained when sending short messages and I need software (e.g., BASIC) capable of computing a specific answer from a selected pair of numbers. As someone has suggested, a modified ASCII code may increase the efficiency.

The following, although not providing an exact answer, will, I hope, demonstrate the procedure:

The decimal number we wish to send is 21,785,119,738,128,229 requiring 56 bits if not coded as I suggest.

We could send X59-XY where X is the coefficient of (2^59)^0, 59 designates the power of 2 forming the number base (576,460,752,303,423,488), and Y is the power of 2 designating the number base of the subtrahend. 59 was somewhat arbitrarily chosen to make sure I had a value greater than 21,785,119,738,128,229, and I need computational software to get more specific.

We could send xxxx0101100100101101xxxxyyyyyyyy using a total of 48 bits when coded as I suggest. Here the x's refer to the coefficient of the power of the number base which power increases, starting at zero in the right most position, from right to left as we normally express decimal numbers. The y's indicate the number base of the subtrahend. Note that 8 bits are used to indicate subtraction (or addition or multiplication) which we could eliminate by apriori agreement and have a savings based on 32 versus 56 bits in this case.

I think the trick here is that you can certainly encode some 56-bit numbers with fewer than 56 bits, you just can't encode all of them that way with the same encoding scheme, if each 56-bit number is to be represented uniquely. If all 56 bit numbers are possible, it must be the case that at, if some of them are encoded with fewer than 56 bits, others must be encoded with more than 56 bits. If we choose an arbitrary 56-bit number, if your scheme has a distinct representation for every number, than there is just no way to do it with fewer than 56 bits. You have translated a 56 bit number to 48 bits; if we choose an arbitrary 56 bit number, can you always do this? If so, you must be assigning the same encoding to different numbers.

Now, if some numbers are possible messages that could be transmitted while others are not, it may well be possible to design an encoding scheme which uses fewer bits than the straightforward scheme, and if all numbers are possible but some are more likely than others, then it is possible to design an encoding scheme which uses fewer bits than the straightforward scheme on average. What is the nature of the information to be transmitted? If there is structure/redundancy, it can be done. If any 56 bit number is a possible message, then there must be at least 2^56 distinct encodings, and there must be at least 56 bits in the encoding scheme.

GOURDHEAD
2005-Nov-15, 05:59 PM
If so, you must be assigning the same encoding to different numbers.
Yes the same encoding process but with sufficient flexibility to distinguish between "message", "massage", "masters", "misters", "process", etc. Note that the decimal number representing the contents of the encyclopedia Britannica expressed in ASCII would be very large, 2 to the 200 or 300th power perhaps (wildly guessing) and maybe requiring multiplication of numbers in the transmitted message while not exceeding several hundred bits to transmit. When put into practice, we will no doubt develop/discover shortcuts primarily through protocols which precisely define apriori agreements. Note that we are not transmitting mesages per se; we are transmitting instructions of how to reconstruct messages constructed by a sender a few light years away.

montebianco
2005-Nov-15, 06:14 PM
I'm not really sure I'm understanding the last post. The distinction between sending messages and sending instructions on how to construct messages seems artificial to me; if there are 256 messages that could be constructed at the end of the process, then 256 distinct sets of instructions must be sent.

Now, you also mention encyclopedia Britannica; if the intention is that the messages encoded are text, then there is a lot of structure and redundancy to text; some messages (in fact, if we were to do some calculations, almost all) simply make no sense interpreted as text. So with this assumption about the nature of the information being transmitted (the 2^200 or 2^300 numbers are severe overestimates - these are enormous numbers), it would certainly be possible to encode things down to many fewer bits than straight character-by-character encoding. Let us suppose the EB takes 100 million bits in a straightforward character-by-character encoding (I don't know if that is accurate). Do you wish to be able to send any 100 million bit message at all? Or only those which make sense when interpreted as text? If the latter, probably large gains are possible. If the former, there's just not much that can be done...

GOURDHEAD
2005-Nov-15, 07:29 PM
You have translated a 56 bit number to 48 bits; if we choose an arbitrary 56 bit number, can you always do this? If so, you must be assigning the same encoding to different numbers.....Do you wish to be able to send any 100 million bit message at all? Or only those which make sense when interpreted as text? If the latter, probably large gains are possible. If the former, there's just not much that can be done...We merely specify which two numbers, expressed as a small multiple of the specified power of 2, that, when subjected to the specified operation, will yeild the 56-bit ASCII message.

Trying to remember that I am speaking out of ignorance and very much need guidance from an expert communications theorist, I believe we can send any message (actually instructions to reconstruct) at all, including graphics, provided we design the protocols cleverly and optimize the apriori agreements.

Another approach, which is more restrictive, is to always use the same very large number base for each operand and the same operator (add, subtract, multiply, or divide) thus reducing the variables to the coefficients of the exponents of the number bases. A standard easily recognized filler would be appended at the end of the message as part of its construction to get the process to produce the number interpreted in ASCII to render the message accurately. The filler would consist of something the size of I Corinthians, 13 repeated until the answer is correct (suffixed with bits as required) when the operands are combined by the specified operators

01101001
2005-Nov-15, 09:07 PM
We merely specify which two numbers, expressed as a small multiple of the specified power of 2, that, when subjected to the specified operation, will yeild the 56-bit ASCII message.

You assume two such small numbers exist. Why?

For instance the sum or difference of any two multiples of some specified (natural number) power of 2 must be even. What if the number you are trying to represent is odd?

SirBlack
2005-Nov-16, 02:33 AM
Maybe it's just me, but I still don't understand clearly how you're representing N using X, Y, and some power of 2.


X is the coefficient of (2^59)^0, 59 designates the power of 2 forming the number base (576,460,752,303,423,488)

First thing, (2^59)^0 is equal to 1 as is anything raised to the power of 0. This can't possibly what you intended. So I'm taking a guess that the first term is simply X*(2^59)


and Y is the power of 2 designating the number base of the subtrahend.

Looks like you mean the second term is 2^Y

So by my interpretation you mean N = X*(2^59) - 2^Y


We could send X59-XY

But then I look at this and it implies X is used in the second term somehow. So perhaps you meant something more like N = X*(2^59) - X*(2^Y)

But further on you say


We merely specify which two numbers, expressed as a small multiple of the specified power of 2, that, when subjected to the specified operation, will yeild the 56-bit ASCII message.

Which seems to say that X and Y are each used a multiples of some power of 2. So maybe the equation should be N = X*(2^59) - Y*(2^59)



Now everything I've read seems to be some variation on using two terms each with some coefficient and some power of 2. So I'm going to look at an abstract version of this that covers all of the above euqations.

N = A*(2^B) - C*(2^D)

There's a major problem with this, N cannot be odd. This problem is still present if you change the subraction into an addition or multiplication. So that would leave division: N = (A*(2^B)) / (C*(2^D)). Though since B and D are arbitrary, this could be simplifed to N = (A/C) * (2^E) where E = B-D. And though I can't state explicitly why, I have a feeling this equation won't be able to do what you want it to do.

At this point, we need to make sure the math you're trying to use to represent N is valid before we even worry about how many bits would be in the final transmission.

01101001
2005-Nov-16, 06:37 AM
Maybe it's just me, but I still don't understand clearly how you're representing N using X, Y, and some power of 2.

It's not just you. I've made a couple of stabs at understanding Gourdhead's method but I can't understand what he's saying. His nomenclature and notation are very confusing.

Gourdhead, any chance you could restate the method? An example would really really help. Maybe that "Message" challenge was too complicated, with numbers too large. Suppose the message you wanted to send was ASCII "i". In binary that is 01101001. (Hey!) For your convenience, 01101001 in hex is 69 and in decimal is 105. That shouldn't swamp you computationally.

Please show exactly what bits you would send with your compression method, and how the fields of bits you send are to be interpreted by the receiver. (It's OK if your method produces a string longer than 8 bits. Most compression methods are anti-efficient with short data.)

I think we'd all like to understand what you are proposing. Thanks.

GOURDHEAD
2005-Nov-16, 04:16 PM
For instance the sum or difference of any two multiples of some specified (natural number) power of 2 must be even. What if the number you are trying to represent is odd?

......First thing, (2^59)^0 is equal to 1 as is anything raised to the power of 0. This can't possibly what you intended. So I'm taking a guess that the first term is simply X*(2^59)
I apologize for my lack of skill to present this idea clearly; the frightening thought is I may be overlooking a fault that is so obvious to you that you are reluctant to mention it to me.

I did indeed mean (2^59)^0 which specifies the units position. I get the odd numbers from "the units position" similar to what is done in decimal notation. What you may have overlooked is that the coefficient of the units position [(2^59)^0], as is the coefficient for each numeral position, in my scheme is "overrestricted" by having its coefficients limited to the hexidecimal numbers 0 through D (decimal 13) where "E" and "F" may be needed as delimiters; if not, the position multipliers can be expanded to include "E" and "F". Note that the "units" position in hexidecimal includes decimal numerals 0 through 15. If practice so dictates I could expand the units position restriction to 31 or 63, but, for now, I prefer the more severe constraint to minimize the number of bits used for transmittal. Remember there is an infinite set of operands which, when combined as directed by the four major arithmetic operators, will result in the desired answer, so the arbitrary restriction placed on the position multipliers will be tolerable. Also, the filler can assist in causing the number to be transmitted to be of the form and size (fine tuned as) we wish. Obviously we wish to limit it to being used only once and that for content accuracy verification.


Maybe it's just me, but I still don't understand clearly how you're representing N using X, Y, and some power of 2.The x's and y's indicate hexidecimal bit positions required to specify the position multiplier. That's why each was repeated to indicate four positions. It might have been more clear had I used x1x2x3x4 and x5x6x7x8. If it is required to expand the position multipliers to 31 or 63, the number of bit positions will increase accordingly.

In summary let's assume the colonists (or natives) at Alpha Centauri A wish to send a message to Earth consisting of 10^18 words. That is probably larger than the content of the Encyclopedia Britannica. Such a message coded in ASCII (for example) can be expressed as a numeral of an extremely large number base. This number base can be expressed as an even larger power (than 18) of 2 which, in turn, can be expressed as sums or differences of numbers expressed in the same or other powers of 2 number bases. Since our position multipliers are overrestricted and we wish to use only the leastmost 4 or 5 positions with multipliers limited to 0 through 13 (or 15,31,or 63), we will probably elect to add several (more than 2) numbers to arrive at the inordinately large (decimal notation) number that comprises the message. Assuming that the average word size is 7 characters as was the case in "message" which was expressed in decimal notation as a number greater than 21 million. This suggests using a number base on the order of 10^(18+9) or 10^27 which can be transmitted in less than 1000 bits using my coding system. Hey! It's just arithmetic; we can each add.

Metricyard
2005-Nov-16, 05:40 PM
This suggests using a number base on the order of 10^(18+9) or 10^27 which can be transmitted in less than 1000 bits using my coding system. Hey! It's just arithmetic; we can each add.

Sorry, you're not going to get 10^27(of what ever unit you're using) to fit in 1000 bits.

I am totally clueless on what you're trying to describe here. It's been awhile since I've played with data transmissions or programming, but if I was to take a wild quess on what you're trying to do, it looks like you're trying to create a multidimensional array of some sort.

Maybe an example like 01101001 suggests?


Gourdhead, any chance you could restate the method? An example would really really help. Maybe that "Message" challenge was too complicated, with numbers too large. Suppose the message you wanted to send was ASCII "i". In binary that is 01101001. (Hey!) For your convenience, 01101001 in hex is 69 and in decimal is 105. That shouldn't swamp you computationally.

Please show exactly what bits you would send with your compression method, and how the fields of bits you send are to be interpreted by the receiver. (It's OK if your method produces a string longer than 8 bits. Most compression methods are anti-efficient with short data.)

I think we'd all like to understand what you are proposing. Thanks.

Van Rijn
2005-Nov-16, 10:42 PM
Right. Gourdhead, until you provide a real world example with the steps laid out, this is just so much word salad. It makes no sense to me.

I think that you will find that, if you start with a message with all conventional data redundancy removed, whatever value you end up with at the end of the process - assuming it can be used to accurately reconstruct the original message - will end up requiring at least as many bits to correctly represent it as the original message.

SirBlack
2005-Nov-17, 01:24 AM
Ah ha, I finally think I understand, though it took a while even with that last post...

It sorta comes from thinking about representing N as the addition or subtraction of different numbers in different bases. For instance 203310 = 102048 - F2048. Of course there's there's a vast number of ways to do this if any number of digits in any base is allowed. But since the aim is to use a small amount of bits in the final message, both the digits and bases allowed are restricted. Only digits 1 through F are allowed, and only bases that are a power of 2.

But this seems rather cumbersome to talk about in terms of digits and bases, so here is where it changes to each term being a power of 2 multipled by some coefficient. The previous example now becomes 2033 = 1*(2^11) - 15*(2^0).

Now I'll try to put this into a general form: N = X1*(2^P1) ? X2*(2^P2) ? X3*(2^P3) ? ...
where each X could be a value from 1 to 15, each P could be a value from 0 to 255, and each ? would be replaced by one of the fundamental mathematical operations such as addition or subtraction (or maybe multiplication). The ranges for P or X could be enlarged as necessary to deal with very large values of N, but there still has to be some specific restrictions, especially for X. For the sake of dicussion, we should probably just stick with the restrictions mentioned above.

One thing I'm not sure about is what if any restrictions apply to the number of terms allowed in this equation. And I think this may become an important issue. Certainly any possible N can be represented by the equation if any number of terms is allowed. But the final goal is to represent N in a form that requires fewer bits to transmit than the straightforward binary form of N, which means we have a limit of just how many terms can be used.

What needs to be examined first is just how many terms are necessary to represent any arbitrary N. Now for values of N near a power of 2 (or multiple of a power of 2), two terms is sufficient. For instance 65521 = 1*(2^16) - 15*(2^0). However, an N further away from a multiple of a power of 2 isn't so easy. I don't think there's a way to represent 65519 with just two terms (keeping in mind the restrictions on X). Finding a three term representation wouldn't be hard, however I expect I could find an N less than the previous numbers which would require more than three terms. And perhaps so on for more terms.

As we deal with larger and larger values of N, particular ranges will require more and more terms to represent properly. I'd love to have some sort of algorithm that could give the minimum number of terms necessary for each particular N, but unforutnately I don't. However I still suspect there may be values of N where the representation of N given by the equation requires more bits to send than the original N. The question now is whether this would happen for relatively few values of N or for a large enough amount that this method becomes useless in too many cases to be practical.

GOURDHEAD
2005-Nov-17, 03:45 AM
Ah ha, I finally think I understand, though it took a while even with that last post...Bravo!! Protocol or header or a combination of both can eliminate assigning bits to specify arithmetic operators and for transmission purposes a 5-bit code may suffice. The composers of the messages to be transmitted can convert the completed message to ASCII for which that set of bits is the equivalent of a very, very large number which can be expressed as you describe. I am more and more convinced that an algorithm can be developed for a standard number base on the order of 2^300 and standard filler data can be added to fine tune the transmitted number such that it lends itself to 5 or less positions and multipliers from 0 through 15. Once the composers determine how many times the filler is to be used, it can be inserted every 50 pages or so, as opposed to all at the end, as a means to verify mesage fidelity and of error correction in case the universe was particularly unfriendly on that 4.3 year transmission.

No one has commented on the polarization modulation to which we could add pulse coded modulation. Has someone like SETI looked into how mutually familiar civilizations would communicate across such distances with high fidelity? Am I correct in assuming the universe is less noisy with respect to pulse coded polarization modulated messages? If there are several planets in the Alpha Centauri system, the explorers will have much data to communicate back to Earth and Earth will have to send lotsa data to keep the colonists current with both technology development and personal data from friends and kin.

01101001
2005-Nov-17, 06:50 AM
Bravo!! Protocol or header or a combination of both can eliminate assigning bits to specify arithmetic operators and for transmission purposes a 5-bit code may suffice. The composers of the messages to be transmitted can convert the completed message to ASCII for which that set of bits is the equivalent of a very, very large number which can be expressed as you describe. I am more and more convinced that an algorithm can be developed for a standard number base on the order of 2^300 and standard filler data can be added to fine tune the transmitted number such that it lends itself to 5 or less positions and multipliers from 0 through 15. Once the composers determine how many times the filler is to be used, it can be inserted every 50 pages or so, as opposed to all at the end, as a means to verify mesage fidelity and of error correction in case the universe was particularly unfriendly on that 4.3 year transmission.

I still don't understand. I appreciate and understand SirBlack's notation. I don't understand how Gourdhead uses it, given the comments above.

? X1*(2^P1) ? X2*(2^P2) ? X3*(2^P3) ? ...

(I added a leading ?, beacuse the first term can be added or subtracted, too.)

Where does the base, like 2^300 above fit in? Is it actually:

? X1*((2^300)^P1) ? X2*((2^300)^P2) ? ...

with Xi a 4-bit quantity, a number in the range 0-15, and the '?' either addtion or subtraction. (Of course if Xi is 0, you don't bother to send it and its associated Pi, thereby supposedly gaining efficiency, right?)

If that is the case, then there are many, many numbers that cannot be generated this way. For instance (and not limited to) any number that when divided by 2^300 leaves a remainder greater than 15 and less than (2^300)-15, cannot be generated from this method. Tell me how would you represent the number 16 with this method. (Second thought: do the Pi values have to be different? If not, that's a way out -- but it costs efficiency big time. If any Pi values can be identical, then imagine how many bits it would take to represent, for insatnce, 2^150 with a sum of small values.)

Hey, if I'm not understanding the method, it's probably because you haven't supplied a clear definition of it. If I misstated it above, then please, Gourdhead, define it. I am growing tired of trying to analyze what I think you are claiming.

SirBlack
2005-Nov-17, 08:07 AM
I'm also not sure what "a standard number base on the order of 2^300" is supposed to mean. If that's supposed to mean that every Pi = 300 then the equation is going for fail for most values of N. On the other hand, if Gourdhead means 300 is simply supposed to be an offset added to each Pi such that the equation would be X1*2^(P1+300) ? X2*2^(P2+300) ? ... then this is a trivial matter compared to other issues with this method.



I've been playing around with this method for a while now, and unfortunately I'm coming to the conclusion that there will be many values N in which this method will fail to yield a smaller number of bits in the final transmission. I don't quite have everything together to put forward a full explanation. But basicly I'm seeing that each Xi*2^Pi term can only "code" for 4 bits of N, unless parts of N have certain ideal properties. This is a serious problem since each term requires more than 4 bits in the final transmission. For example, I could find many 100 bit N which would require 25 terms to represent, which would make the final transmission much more than 100 bits. Furthermore, it seems the more randomized the bits are in N, the more likely this method will require too many terms to be worthwhile. And the average message or data which N represents we can easily expect to be a rather random serious of bits.

If necessary, I can later explain how I've reached this conclusion.

GOURDHEAD
2005-Nov-17, 03:42 PM
I'm also not sure what "a standard number base on the order of 2^300" is supposed to mean. If that's supposed to mean that every Pi = 300 then the equation is going for fail for most values of N.Every Pi would be identical and near a value like 300 to ensure we can handle numbers sufficiently large for the enclyclopedic data transmittals from Alpha Centauri. If this scheme is not fundamentally flawed, when we begin its implementation, algorithms will be developed to optimally select the Pi and the ? as well as to evaluate whether a standard number base versus a mixture of number bases is optimal. If the composers of the messages come up with an N for which the process fails, they will add standard filler until a doable N is reached or partition the message in some way that makes it doable, but at no instance violating whatever protocols have been imposed. Of course burnt fingers will motivate us to continually evolve the algorithms and protocols to achieve transmittal of the largest amount of data with the least number of bits.

If we divide 2^300 by 2^4 (the number of bits required to specify in ASCII the characters in a seven character word plus a delimiting space) we get 2^296 such units which, I guess, may be more than is contained in the Encyclopedia Britannica. Whether it is or not, it will be a very large message capable of accommodating the "fine tuning" filler.


If necessary, I can later explain how I've reached this conclusion. It's not necessary, but would be delightfully received.

montebianco
2005-Nov-17, 04:55 PM
I've been playing around with this method for a while now, and unfortunately I'm coming to the conclusion that there will be many values N in which this method will fail to yield a smaller number of bits in the final transmission. I don't quite have everything together to put forward a full explanation. But basicly I'm seeing that each Xi*2^Pi term can only "code" for 4 bits of N, unless parts of N have certain ideal properties. This is a serious problem since each term requires more than 4 bits in the final transmission. For example, I could find many 100 bit N which would require 25 terms to represent, which would make the final transmission much more than 100 bits. Furthermore, it seems the more randomized the bits are in N, the more likely this method will require too many terms to be worthwhile. And the average message or data which N represents we can easily expect to be a rather random serious of bits.

If necessary, I can later explain how I've reached this conclusion.

I haven't analyzed the particular method, but if no loss is permitted (i.e., the procedure must accept any arbitrarily specified message at the one end, and generate an unaltered copy of that message at the other end) it can be proven that there is no encoding scheme in which the transmission is sometimes shorter and never longer than the straightforward encoding. Now, if some messages are more common than others, a method can be optimized around those, so it would only be inefficient for messages which are not likely to be sent. Also, if some loss is allowed (e.g., if some resolution can be lost in graphics), that's a different story. But if there can be no loss and any message is possible, then whatever the encoding scheme, to result in the generation of 2^100,000,000 distinct messages at the receiving end, it must be possible to send at least 2^100,000,000 distinguishable transmissions, and that requires at least 100,000,000 bits. This is why I wanted to know the nature of the messages to be sent earlier in the thread; I wanted to know if some assumptions about the content of the message could be made, or if some loss of information was acceptable. If the answer to both questions is "no," then there is nothing that can be done here, regardless of the encoding scheme...

01101001
2005-Nov-17, 05:17 PM
Every Pi would be identical and near a value like 300 to ensure we can handle numbers sufficiently large for the enclyclopedic data transmittals from Alpha Centauri. If this scheme is not fundamentally flawed, when we begin its implementation, algorithms will be developed to optimally select the Pi and the ? as well as to evaluate whether a standard number base versus a mixture of number bases is optimal. If the composers of the messages come up with an N for which the process fails, they will add standard filler until a doable N is reached or partition the message in some way that makes it doable, but at no instance violating whatever protocols have been imposed. Of course burnt fingers will motivate us to continually evolve the algorithms and protocols to achieve transmittal of the largest amount of data with the least number of bits.

If we divide 2^300 by 2^4 (the number of bits required to specify in ASCII the characters in a seven character word plus a delimiting space) we get 2^296 such units which, I guess, may be more than is contained in the Encyclopedia Britannica. Whether it is or not, it will be a very large message capable of accommodating the "fine tuning" filler.

It's not necessary, but would be delightfully received.

So, are you saying every number N can be represented efficiently as:

N = +/- X1(2P1) +/- X2(2P2) +/- X3(2P3) +/- X4(2P4) +/- ...

where all Xi values are less than 16 and all Pi values are identical values, near 300, or nearly identical values near 300?

If so, it is fundamentally flawed.

If q is the smallest exponent, then this form cannot represent any numbers that when divided by 2q leave a nonzero remainder.

And, if r is the largest exponent, this form cannot efficiently represent numbers that are larger than most multiples of 2r, requiring huge numbers of terms.

Try representing 105 decimal that way. Impossible.

Try representing 2400 that way. Unwieldy.

GOURDHEAD
2005-Nov-17, 08:08 PM
Try representing 105 decimal that way. Impossible. Try representing 2^400 that way. Unwieldy. 105 decimal is much too small a message for using 2^300 as a number base to accomplish efficiently, although I see no reason why it couldn't be done using a carefully chosen filler. For short messages observed to be so by the recipients from header data, the protocol would allow the use of a more convenient number base.

A number the size of 2^400 would require a larger base number such as itself, although by using the exponentiation operator we could send (2^200)^2. Note that this is a shorthand way of specifying the number that could also be represented as a decimal number of many more digits and that we can write it this way much faster than if we wrote it decimal digit by decimal digit. What I think I'm doing is extrapolating (very loosely) this sort of thing to develop the shorthand notation for transmitting the instructions for reconstructing carefully formulated messages across 4.3 light years.


it can be proven that there is no encoding scheme in which the transmission is sometimes shorter and never longer than the straightforward encoding.Remember the message is not being transmitted, only the instructions of how to reconstruct it as facilitated by protocols.

I want to thank each of you who have shown interest in this scheme after it lay dormant for so many months especially you cynics and critics; your politeness has been remarkable. I sense some of you are begining to believe that the scheme may have merit; however, I must caution you about reporting me to the Swedish Royal Academy of Sciences before some algorithms are perfected and the protocols are established. This scheme is an important adjunct to my interstellar transportation system, http://home.comcast.net/~mbmcneill7/ which will put the colonists there with whom we (actually our descendants) will communicate.

01101001
2005-Nov-17, 08:49 PM
105 decimal is much too small a message for using 2^300 as a number base to accomplish efficiently, although I see no reason why it couldn't be done using a carefully chosen filler. For short messages observed to be so by the recipients from header data, the protocol would allow the use of a more convenient number base.

OK. Then ponder the number 2^300+105. Your method of adding and subtracting small multiples of large powers of 2 cannot represent it. Face it: the only thing you get by adding and subtracting small multiples of large powers of 2 is more small multiples of large powers of 2.

It does not work.


A number the size of 2^400 would require a larger base number such as itself, although by using the exponentiation operator we could send (2^200)^2. Note that this is a shorthand way of specifying the number that could also be represented as a decimal number of many more digits and that we can write it this way much faster than if we wrote it decimal digit by decimal digit. What I think I'm doing is extrapolating (very loosely) this sort of thing to develop the shorthand notation for transmitting the instructions for reconstructing carefully formulated messages across 4.3 light years.

But if you pick an even larger power of 2 as your base, then there are even more numbers that when didivded by that base leave a nonzero remainder.

IT DOES NOT WORK.


Remember the message is not being transmitted, only the instructions of how to reconstruct it as facilitated by protocols.

So what? I defy you to describe a mathematical representation using small coefficients of close, large powers of 2 that will efficently represent any message confined to the range 2^400 and 2^401. Go ahead. Then test it on this number: 2^400 + 2^350 + 2^300 + 2^250 + 2^100 + 2^50 + 1

Take the challenge. Demonstrate your representation method works rather than just claiming it works.


I want to thank each of you who have shown interest in this scheme after it lay dormant for so many months especially you cynics and critics; your politeness has been remarkable. I sense some of you are begining to believe that the scheme may have merit; however, I must caution you about reporting me to the Swedish Royal Academy of Sciences before some algorithms are perfected and the protocols are established. This scheme is an important adjunct to my interstellar transportation system, http://home.comcast.net/~mbmcneill7/ which will put the colonists there with whom we (actually our descendants) will communicate.

It DOES NOT WORK!

montebianco
2005-Nov-17, 09:12 PM
Remember the message is not being transmitted, only the instructions of how to reconstruct it as facilitated by protocols.

But this is not a meaningful distinction. Each distinct message must have a distinct set of instructions. Whatever you call it, if two messages are different, then the instructions to reconstruct them must also be different. Let us suppose that you wish to send these instructions using 100,000 bits. Is it clear that, whatever the encoding scheme is, whether we call the 100,000 bits the "message" or the "instructions" to build the message, there can only be 2^100,000 distinct messages reconstructed at the end of the process? If you have more than 2^100,000 possible messages to send, then either some of them cannot be sent at all, or some of them must encode to the same set of 100,000 bit instructions, in which case there is no way for the receiving end to know which message corresponding to these instructions is the right one.

01101001
2005-Nov-17, 09:29 PM
But this is not a meaningful distinction. Each distinct message must have a distinct set of instructions. Whatever you call it, if two messages are different, then the instructions to reconstruct them must also be different. Let us suppose that you wish to send these instructions using 100,000 bits. Is it clear that, whatever the encoding scheme is, whether we call the 100,000 bits the "message" or the "instructions" to build the message, there can only be 2^100,000 distinct messages reconstructed at the end of the process? If you have more than 2^100,000 possible messages to send, then either some of them cannot be sent at all, or some of them must encode to the same set of 100,000 bit instructions, in which case there is no way for the receiving end to know which message corresponding to these instructions is the right one.

Yes. It is affectionately known as the Pigeonhole Principle (http://en.wikipedia.org/wiki/Pigeon_hole_principle).

The amusing thing about Gourdhead's method if it worked is that you could compress any message of say 10,000 bits into a message of, say, 1000 bits, and then apply the method again, to the first-result 1000 bits, to compress that to, say, 100 bits, and then apply the method again, to get 10 bits, and then apply it again to compress it to 1 bit. But, don't apply it to that 1 bit message or you may cause the Universe to collapse!

Edit: Spelling

montebianco
2005-Nov-17, 09:43 PM
Yes. It is affectionately known as the Pingeonhole Principle (http://en.wikipedia.org/wiki/Pigeon_hole_principle).

Hadn't heard it put in those terms, but that is indeed a nice way to put it :)


The amusing thing about Gourdhead's method if it worked is that you could compress any message of say 10,000 bits into a message of, say, 1000 bits, and then apply the method again, to the first-result 1000 bits, to compress that to, say, 100 bits, and then apply the method again, to get 10 bits, and then apply it again to compress it to 1 bit. But, don't apply it to that 1 bit message or you may cause the Universe to collapse!

And that would be bad...

SirBlack
2005-Nov-17, 11:09 PM
Ok, here is something of an explanation on how I'm seeing this method is going to fail at data compression more often than not.


Instead of looking at N in decimal, I went back to it's binary representation. I then started looking at what each Xi*2^Pi term could represent in relation to N.

The first case is just a single Xi*2^Pi term. Fix Pi at some arbitrary value, and vary Xi from 1 to 15 and see what N looks like in each case. To make it simple I'll use Pi = 4 for this example.
Xi = 1 gives us 1*2^4 which makes N = 100002
Xi = 2 gives us 2*2^4 which makes N = 1000002
Xi = 3 gives us 3*2^4 which makes N = 1100002
Xi = 4 gives us 4*2^4 which makes N = 10000002
...
Xi = 14 gives us 14*2^4 which makes N = 111000002
Xi = 15 gives us 15*2^4 which makes N = 111100002
Notice that as Xi varies, at most only 4 bits of N are affected. This behavior is the same for any fixed Pi.

However it isn't always necessary to have one Xi*2^Pi term for every 4 bits of N. There are two ways this can happen. First is where the binary form of N contains a series of zeroes. For instance if N = 1010000000112 then only two terms are needed, one to represent the 101 on the left and one of the 11 on the right, so 5*2^9 + 3*2^0 can represent this 12 bit N.

The second situation is where the binary form of N contains a series of ones. For example, N = 10011111111100112. In this case, subtraction can be used to represent all of the middle ones. Start with 5*2^13 to get 10100000000000002, then subtract 1*2^4 to get 10011111111100002, then add 3*2^0 to finaly get 10011111111100112. Thus the 3 term equation 5*2^13 - 1*2^4 + 3*2^0 represents a 16 bit N.

So we see that if N contains large strings of zeroes and/or large strings of ones, this method can compress N into a relatively small number of terms. 1011000000000000000001111111111111111110012 could be represented with only 3 or 4 terms. However a more random N such as 1011101101011001101101101001112 does not benefit from any special cases are will require one Xi*2^Pi term for every 4 bits.

Now consider that each Xi*2^Pi term requires at least 4 bits (and in most cases significantly more bits) in the final transmission, as well as the fact that each arithmatic operator linking the terms will require a bit or two. This means the method fails to achieve any data compression for any N that requires one term for every 4 bits. In fact if each term requires 12 bits in the final message, as the discussion was originally based on, then a ratio of one term per 12 bits of N still fails.

Out of all this we can draw the general rule: the more the bits of N are randomized, the worse the ratio of terms to bits of N this method will yeild, and thus the more likely this method will fail to be useful.

The final question is then, is the average N more or less likely to have randomized bits. Consider the example in my first post to this thread in which N would represent "Message". If using the standard 8 bits per character then N = 01001101011001010111001101110011011000010110011101 1001012 or even using 7 bits per character (which is a possibility) then N = 10011011100101111001111100111100001110011111001012 . Either way N is rather randomized, and I would suggest that any larger chunks of normal text which one might consider would have similar randomness. In other words, this method is going to fail for any common data we might try to use it on. It certainly could not compress an encyclopaedia to any significant extent.

01101001
2005-Nov-18, 01:37 AM
However it isn't always necessary to have one Xi*2^Pi term for every 4 bits of N. There are two ways this can happen. First is where the binary form of N contains a series of zeroes. For instance if N = 1010000000112 then only two terms are needed, one to represent the 101 on the left and one of the 11 on the right, so 5*2^9 + 3*2^0 can represent this 12 bit N.
Sure, but this is not what Gourdhead proposes.


Every Pi would be identical and near a value like 300 to ensure we can handle numbers sufficiently large for the enclyclopedic data transmittals from Alpha Centauri.
Gourdhead's method does not permit using terms with widely different exponents. The Pi exponent(s) would be "identical and near a value like 300".

(And, any modern compression algorithm would compress long strings of 0 and long strings of 1, and additionally other binary strings with mixed 0s and 1s, if they appear sufficiently frequently in the message. What those compression algorithms produce as a result is something without such frequently appearing strings of bits, of high entropy, and why applying a compression algorithm to the output of a compression algorithm rarely results in further savings. Once the redundancy is squeezed out, there is no more to be found.)

SirBlack
2005-Nov-18, 03:09 AM
Sure, but this is not what Gourdhead proposes.


Gourdhead's method does not permit using terms with widely different exponents. The Pi exponent(s) would be "identical and near a value like 300".

I had meant to say something about this at one point, but I think my criticism of the method remains valid regardless. If it's going to fail when a wide range of Pi values is allowed, then it's going to have the same problems (if not more) if Pi is restricted to a smaller range.

And I do agree that restircting Pi to a small range leads to significant problems as I see you've tried to explain in previous posts.

01101001
2005-Nov-18, 03:48 AM
I had meant to say something about this at one point, but I think my criticism of the method remains valid regardless. If it's going to fail when a wide range of Pi values is allowed, then it's going to have the same problems (if not more) if Pi is restricted to a smaller range.
Hey, I didn't mean any criticism of your excellent effort to show that there are simple mathematical ways to compress long strings of 0s or 1s. You're spot on. But, I just envisioned someone looking at your work and wrongly thinking, "Yes, this shows that the Gourdhead representation works!"

===

While I'm posting, there's a couple of Gourdhead statements I'd like to contrast.


Every Pi would be identical and near a value like 300 to ensure we can handle numbers sufficiently large for the enclyclopedic data transmittals from Alpha Centauri.

A number the size of 2^400 would require a larger base number such as itself[...]
Is it not curious that an exponent like 300 is sufficient to use for compressing encyclopedic data transmission, yet data like the number 2^400 requires a larger base? The number 2^400 is just 401 bits long. If interpreted as ASCII, it would be 7 bits short of just 51 ASCII characters -- not very interesting ASCII characters, but ASCII characters. That is a very short message. It is by no means a large amount of data to send. Yet a value near 2^300 would suffice to transmit encyclopedic data.

I shake my head.

GOURDHEAD
2005-Nov-18, 04:43 PM
Gourdhead's method does not permit using terms with widely different exponents. The Pi exponent(s) would be "identical and near a value like 300". It was a desirement that we could settle on a "standard" number base near 2^300 or whatever proves to be optimal, not a hard requirement, and was based on my faith in our ability to construct algorithms to determine when the standard number base is optimal and when it is not. It was not my intent to exclude multiple Pi, just to do so when it was to our advantage. Obviously, if we have to use a large quantity of Pi, there will be litle or no advantage. What about my assertion that we could taylor the message by adding filler to produce bit patterns that lend themelves to the use of the "standard" number base?

You guys are tough and serious critics and I appreciate the effort you have put into this; however I remain unconvinced that it won't work...more accurately can't be made to work. I hope I'm running on more than dogged optimism, for now I'm in the mindset of Edison when he was trying to get the electric light bulb to work which is sometimes indistinguishable from a recent popular definition of craziness.

Surely smarter and better trained minds than mine have thought of this and have discarded it for some of the reasons you have offered, and this thought is intimidating. Let's take the path that says this can not be made to work. How shall we communicate across 4.3, or more, light years in spite of the noise the universe will adversely apply to our attempts? Could it be that many civilizations are attempting to communicate with us and each other and their attempts are being frustrated by the noise level in this section of the MW, including corruption of the carrier frequency ecountering replicas of itself that are exactly out of phase over various intervals of time? Do any of you have some scheme that can overcome the technical problems associated with noise suppression and beam collimation and produce a reliable way to communicate between the stellar systems?

My assignment is clear; I have much work to do.

In summary:
1. Any message the size of the content of the EB and coded in something like ASCII is equivalent to the binary expression of a very large decimal numeral.

2. Instructions for reproducing this numeral can be transmitted using a 6-bit protocol with far fewer bit states than the original message.

3. These instructions can be decoded by their recipients without ambiguity.

4. Algorithms can be designed to optimize the process.

montebianco
2005-Nov-18, 05:41 PM
My assignment is clear; I have much work to do.

In summary:
1. Any message the size of the content of the EB and coded in something like ASCII is equivalent to the binary expression of a very large decimal numeral.

2. Instructions for reproducing this numeral can be transmitted using a 6-bit protocol with far fewer bit states than the original message.

3. These instructions can be decoded by their recipients without ambiguity.

4. Algorithms can be designed to optimize the process.

Before doing that work, I would strongly encourage you to contemplate the answers to the following questions - they may save you a lot of trouble. These will apply no matter what particular encoding scheme you use.

a) How many possible distinct messages can be represented with the number of bits you have in mind (i.e., with content comparable to EB)? Let us call this answer M.

b) How many distinct messages (or, since you previously objected to that terminology, how many distinct sets of instructions for building a message) can be transmitted with the number of bits you have in mind for the actual transmission? Let us call this answer N.

c) Is N smaller than M? Per your point 2., the number of bits to be transmitted is smaller than the bit content of the original message. So the answer to this question ought to be "yes" - correct?

d) How can N sets of instructions for building a message result in the construction of M distinct messages when N<M, if each set of instructions (per point 3) is unambiguous?

I think if you will work through these questions, you will see what the problem is here - it's not that you haven't found the right encoding method, it's that no encoding method can do what you want it to do here. Now, all of this assumes you don't know anything about the nature of the message. If you can make some assumptions about the content of the message, then it may well be possible to reduce the number of bits needed for the transmission; many real-world compression schemes work this way. Other real-world compression schemes have loss of information, i.e., the bit pattern reconstructed at the far end is not the same as the original. This is perfectly OK in some cases; MP3 compression of music does not faithfully reproduce the original, but it is close enough that the human ear has trouble telling the difference. But if you must reproduce the message at the receiving end exactly, and you do not know anything a priori about the content of the message, you're just stuck. There is simply no way to do it. This is provable; answer the four questions above, and you should see what the problem is.

GOURDHEAD
2005-Nov-18, 05:51 PM
There is simply no way to do it. This is provable; answer the four questions above, and you should see what the problem is. Thanks for the constructive criticism, I'll work on it.

montebianco
2005-Nov-18, 06:19 PM
Thanks for the constructive criticism, I'll work on it.

I hope it helps you out. Once again, this is assuming you must reproduce the message exactly and don't know anything about the message in advance. If you are allowed to make some assumptions about the nature of the data, then that changes the game completely; it may be possible to achieve savings then, and perhaps large savings. If you are allowed to modify the message a little bit, that also changes the game; many audio and video compression schemes are based on the fact that some pictures and/or sounds, while different, are so similar that the human eye/ear has trouble telling them apart. So it is OK to modify the message then, as long as you don't modify it too much. These schemes also tend to make assumptions about the nature of the sound/picture; if you feed them random static, many of them will perform poorly.

SirBlack
2005-Nov-18, 08:47 PM
It was a desirement that we could settle on a "standard" number base near 2^300 or whatever proves to be optimal, not a hard requirement, and was based on my faith in our ability to construct algorithms to determine when the standard number base is optimal and when it is not.

The problem is that it's simply not a practical idea to have a restiricted Pi. Previously, I worked through what happens when varying Xi while keeping Pi fixed and showed that this only "controls" up for 4 bits. Now let me show what happens what Pi is varied. For the example I'll using 15*2^Pi
Pi = 8 gives 15*2^8 which makes N = 1111000000002
Pi = 7 gives 15*2^7 which makes N = 0111100000002
Pi = 6 gives 15*2^6 which makes N = 0011110000002
Pi = 5 gives 15*2^5 which makes N = 0001111000002
and so on
Changing Pi by 1 only shifts the "controlled" bits by 1 place. So using only Pi values restricted to a small range can only represent the bits of N for a small range of the total. This could be done only if you're dealing with a rather simple N such as 1101011100000000000000000000000000002 in which the only Pi needed are 28 and 32. Whereas the typical N with more randomized bits such as 1101011100110101100111011011101100012 will require various Pi = 0, 4, 8, 12, 16, 20, 24, 28, 32.


What about my assertion that we could taylor the message by adding filler to produce bit patterns that lend themelves to the use of the "standard" number base?

I can't imagine any type of filler that would help in any way. Inserting anything into the middle of N would only make the required Pi every more widely spaced than before. Nor would it reduce how many Xi*2^Pi terms are needed. For instance 1011011001112 needs 3 terms with Pi = 0, 4, 8. With filler, 101101100000000001112 still needs 3 terms but with Pi = 0, 8, 12.


How shall we communicate across 4.3, or more, light years in spite of the noise the universe will adversely apply to our attempts? Could it be that many civilizations are attempting to communicate with us and each other and their attempts are being frustrated by the noise level in this section of the MW, including corruption of the carrier frequency ecountering replicas of itself that are exactly out of phase over various intervals of time? Do any of you have some scheme that can overcome the technical problems associated with noise suppression and beam collimation and produce a reliable way to communicate between the stellar systems?

I hadn't bothered to comment on this yet, but while I'm here... Data compression and data transmission are really two separate subjects, and generally require different solutions. In fact, some of the methods I've read about to keep data from being seriously corrupted during transmission involve transmitting additional bits. For instance a parity bit which indicates that the preceeding 8 bits were either even or odd. Thus if a bit of two get changed due to noise, there's a decent probability of conflict with the parity bit and so the receiver has some idea what has been corrupted. And other techniques offer better possibilities for reconstructing the data even if a few bits are changed due to noise.

But still, the point is that it's two different subjects. Whether your data compression method (if it somehow is viable) or another one is used, other methods would need to be employed for reliable data transmission.

GOURDHEAD
2005-Nov-19, 01:32 AM
...if you feed them random static, many of them will perform poorly.
Tis not I but the universe that will feed them truly random static for 4.3 years.

GOURDHEAD
2005-Nov-21, 04:57 PM
By converting “Sorry, you're not going to get 10^27(of what ever unit you're using) to fit in 1000 bits” to
“Sorry, you are not going to get ten e one thousand of anything to fit in one thousand bits”, and attempting to code it in ASCII, I get “083 111 114 114 121 096 032 121 111 117 032 097 114 101 032 110 111 116 032 103 111 105 110 103 032 116 111 032 103 101 116 032 116 101 110 032 101 032 111 110 101 032 116 104 111 117 115 097 110 100 032 111 110 032 097 110 121 116 104 105 110 103 032 116 111 032 102 105 116 032 105 110 032 111 110 101 032 116 104 111 117 115 097 110 100 032 098 105 116 115 046”. Expressed in ASCII (coding mistakes notwithstanding) the resulting decimal numeral is of the order 10^272 which is 2^~906. Although this is an unambiguous message, currently there may not be any computers with register architectures that allow the handling of numerals of this size at the individual digit level much less the capability of handling this message which is quite small compared to the EB. Although I believe a combination of specially designed computers and software could be produced to do this, it may not be worth the effort. Using a 2^302 base we would have X4 (302*3) X3 (302*2) X2 (302*1) X1 (302*0) where the 302’s are known to be powers of 2 by protocol and the Xi themselves may require expression as products, quotients, or exponentiations, which further increases the use of bits to specify the operators.

I am much less enthusiastic!!

01101001
2005-Nov-24, 05:14 AM
Rather than sending multiple copies of a message to ensure reliability...

Science News Online: Pushing the Limit -- Digital communications experts are zeroing in on the perfect code (http://www.sciencenews.org/articles/20051105/bob8.asp)


Over the past half-century, mathematicians and computer scientists have come up with clever approaches to the noise problem. They've devised codes that intentionally incorporate redundancy into a message, so that even if noise corrupts some portions, the recipient can usually figure it out. Called error-correcting codes, these underlie a host of systems for digital communication and data storage, including cell phones, the Internet, and compact disks. Space missions use this technique to send messages from transmitters that are often no more powerful than a dim light bulb.
[...]
To some coding theorists, turbo codes (http://en.wikipedia.org/wiki/Turbo_code) and LDPC codes (http://en.wikipedia.org/wiki/LDPC) represent the end of coding theory. In the lab, they've reached within 5 percent of the maximum efficiency, and commercial systems are within 10 percent, says McEliece.

"We're close enough to the Shannon limit (http://en.wikipedia.org/wiki/Shannon_capacity) that from now on, the improvements will only be incremental," says Thomas Richardson, a coding theorist at Flarion Technologies in Bedminster, N.J. "This was probably the last big leap in coding."

01101001
2005-Nov-24, 05:32 AM
Although this is an unambiguous message, currently there may not be any computers with register architectures that allow the handling of numerals of this size at the individual digit level much less the capability of handling this message which is quite small compared to the EB. Although I believe a combination of specially designed computers and software could be produced to do this, it may not be worth the effort.
Algorithms for arbitray precision (http://en.wikipedia.org/wiki/Arbitrary_precision) arithmetic have existed for decades.

GOURDHEAD
2005-Nov-25, 02:51 PM
Thanks for the links.

GOURDHEAD
2005-Dec-11, 01:24 AM
.....where the 302’s are known to be powers of 2 by protocol and the Xi themselves may require expression as products, quotients, or exponentiations, which further increases the use of bits to specify the operators. I am much less enthusiastic!! However, the above number may lend itself to arithmetic operations that permit several such short sentences plus some easily identifiable filler and maybe a divisor, also easily identifiable, that, when used in a rigid protocol, still produce a bit reduction. A four, or at most a five bit, byte may suffice since we will be transmitting the ten digits of the decimal system and some few arithmetic and punctuation codes as opposed to a full ASCI code set. Perhaps there are algorithms that can be coded (or developed and coded) to “weave” such messages within a protocol that can be unwoven without loss of content nor increase of ambiguity. When coded in the special 5-bit bytes suggested above, the above message requires 5*272 or 1360 bits, where the “message reconstruction” instruction requires 5*20 or 100 bits and the protocol constraints. If the weaving can be developed and expressed as one priority (unweaving order), one operator, and one operand per additional sentence within the order 10^272 numeral, 15 bits per additional short sentence will be required. Note that an additional 100 bits may be required to specify filler. All totaled we have cut the message transmission from 1360 (5 * 272) bits to 300 or 400 bits and if the weaving has allowed the inclusion of 2 additional sentences of equivalent length (within the order 10^272 decimal numeral), we have cut what otherwise would have required 4080 bits to perhaps 600 or 700 bits. Also, if the cleverness of the algorithm permits the transmission to be composed of a large integral power of 2 from which some easily expressed filler subtraction produces the message, additional savings may be available.

My guess is that the larger the message the greater the efficiency....and the more clever the algorithm has to be.

mugaliens
2005-Dec-11, 12:58 PM
If I'm understanding his intent, I think GOURDHEAD was attempting to find an encoding method (not compression method) that could somehow consistently fit a large amount of data into a smaller amount with no loss. And that just isn't possible.

For tax code, let's assume the simplest of cases - only text, upper and lower, and normal keyboard characters. No special characters.

That's 46x2 characters, or 92. Using bits, you require 7 bits which gives you 128 characters, leaving 128-92, or 36 characters left over.

Best way to do error-checking is to use something similar to CRC (cyclical redundancy check) and retransmit if anything's missing. More complicated methods can allow computer on receiving end to check and reconstruct any missing bit yet only adds about 15% in size.

What to do with the 36 characters left over? Assign them to high-frequency words! Like "the," "and," and like.

"The," for example, takes 21 bits. But if it's assigned to character 92, then it only takes 7 bits, saving 2/3.

If you want room for an additional 128 words, just go with 8-bit bytes. At some point you reach a maximum benefit and increasing the word size hurts, not helps. I estimate 8 or 9 would be the maxima.

And character (byte) assignments can be dynamic, given at the beginning of the message, aplicable to the whole message, and after anlyising the content of the message.

I remember in high school we used zmodem. Seemed to work very well, and would catch and reset file downloads. Then the Internet, and don't know what they use, but it seems to work.

Here's an arrticle on zmodem http://www.techfest.com/hardware/modem/zmodem.htm

One of the big factors with redundancy is latency. How long does it take for communication between sender and receiver? The larger the latency, the more required redundancy. For satellites, not to bad. For deep-space missions, very bad, and need perfect redunancy.

Finally, there's digital spread spectrum, which requires less energy to punch through the noise floor than using a single frequency. It's used in wireless and cell phones.