NIH: 100M Years to Change a Binding Site

Status
Not open for further replies.

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
I am OK with the receiver being the one measuring the information content. If that message was supposed to say “brown fur” but was corrupted to say “white fur”, is this corruption detrimental or beneficial to the snow-bound boder?

Your carefully phrased question is ignoring the issue. Fur colour and accidental advantages are irrelevant. Random changes are always detrimental to information and everything you propose to keep evolution afloat must hide this fact.
 

DavisBJ

New member
Your carefully phrased question is ignoring the issue. Fur colour and accidental advantages are irrelevant. Random changes are always detrimental to information and everything you propose to keep evolution afloat must hide this fact.
Of course I worded my response carefully. That’s because I don’t dispute the idea that random DNA changes are highly likely to be detrimental. But there is a huge difference between my “highly likely” and your “always”. Unless you can decree that fur color could never be changed by any random alteration to DNA, then the potential fur color change is included as part of your declaration of “always” detrimental.

I say that if a fur color change is clearly an improvement to a boder’s genome, and that if that change is possible via a DNA alteration, then that stands as a prima facie example which disproves your overly broad assertion about information theory.

You can obfuscate about fur coloration being intimately involved with other aspects of the genome, but your assertion that all random DNA changes are detrimental was not predicated on that, nor is that a requirement of information theory.
 

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
Of course I worded my response carefully. That’s because I don’t dispute the idea that random DNA changes are highly likely to be detrimental. But there is a huge difference between my “highly likely” and your “always”. Unless you can decree that fur color could never be changed by any random alteration to DNA, then the potential fur color change is included as part of your declaration of “always” detrimental.
The difference between "always" and "highly likely" is the difference between information theory and evolution. So you're still not applying information theory in your fur colour discussions.

I say that if a fur color change is clearly an improvement to a boder’s genome, and that if that change is possible via a DNA alteration, then that stands as a prima facie example which disproves your overly broad assertion about information theory.
It disproves nothing. It's utterly irrelevant.

You can obfuscate about fur coloration being intimately involved with other aspects of the genome, but your assertion that all random DNA changes are detrimental was not predicated on that, nor is that a requirement of information theory.

If you could show us how information theory does not apply to data from DNA (and quit pretending fur colour is data from DNA) that would be great. :thumb:
 

DavisBJ

New member
The difference between "always" and "highly likely" is the difference between information theory and evolution. So you're still not applying information theory in your fur colour discussions.
Then information theory is not an obstacle to evolution.
It disproves nothing. It's utterly irrelevant.
Sorry, but turning a blind eye to counter examples is dishonesty, not science.
If you could show us how information theory does not apply to data from DNA (and quit pretending fur colour is data from DNA) that would be great. :thumb:
If fur color is not controlled by information in the DNA, what biological mechanism determines what the fur color will be?
 

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
Then information theory is not an obstacle to evolution.
:rotfl:

Whose posts are you reading?

Sorry, but turning a blind eye to counter examples is dishonesty, not science.
Examining and critiquing your posts and pointing out exactly how they are inadequate is not "turning a blind eye". :nono:

If fur color is not controlled by information in the DNA, what biological mechanism determines what the fur color will be?
Fur colour is determined by the genetic code. Unfortunately you are refusing to understand the issue. Information theory works on the data, not on the expressions of that data.
 

DavisBJ

New member
Examining and critiquing your posts and pointing out exactly how they are inadequate is not "turning a blind eye". :nono:
Denying what you are doing is also not advisable.
Fur colour is determined by the genetic code. Unfortunately you are refusing to understand the issue. Information theory works on the data, not on the expressions of that data.
OK, assign a value of 1 to the DNA pattern that ultimately causes brown fur. Assign a value of 0 to the DNA pattern that causes white fur. Now we are looking at the DNA itself, and are not concerning ourselves with the details of how that information is processed by the boder’s biology. My points still stand.
 

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
OK, assign a value of 1 to the DNA pattern that ultimately causes brown fur. Assign a value of 0 to the DNA pattern that causes white fur. Now we are looking at the DNA itself, and are not concerning ourselves with the details of how that information is processed by the boder’s biology. My points still stand.

:rotfl: DNA is not 1 and 0.

If you want to talk about a binary code I can show you how random changes are always bad for information within that. You have to show us how information theory is not applicable to DNA.

Not binary. :)
 

Yorzhik

Well-known member
LIFETIME MEMBER
Hall of Fame
So, since Stripe and Yorz have this DNA as information stuff all figured out, maybe they can tell me which of these two segments of DNA have more information?

MWN

1 atgcccatgc aggagcccca gaggaggcta ctgggtcctt tcaactccac ccgcacaggc
61 gttcctcacc tcgagctatc tgccaaccag actggaccct ggtgcctgca cgtatccatc
121 ccagatggcc tcttcctcag cctggggctg gtgagcttgg tggaaaatgt gctggtggtg
181 atttccattg ccaagaacca aaacctgcat tcccccatgt actacttcat ctgctgcctg
241 gctttgtctg acctgcttgt gagtgtgagc attgtgctgg agaccactct catcttggtg
301 ctagaggcag gggccctggc cacccgggtg actgtggtac agcagctgga caatgtcatc
361 gacgtgctca tctgtggctc catggtctca agtctgtgct tcctcggagc catcgctgtg
421 gaccggtaca tctccatctt ctatgcactg cgctatcaca gtattgtgac actgccccgg
481 gctcggtggg ccatcgtggc catctgggta gccagcatct cttccagcac tctttttgtt
541 gcctactaca accacacagc ggtcctgctt tgtctcgtca ccttttttct agccacgctg
601 gcactcatgg tagttctgta tgtgcacatg cttgcacggg cacaccagca tgctcaggcc
661 attgctcagc tccacaagag acagcacctt gtccaccaag gtttccgact caaaggcgcg
721 gccaccctca ctatcctctt gggcattttc ttcctgtgct ggggcccctt cttcctgtac
781 ctcactctca ttgtcctctg cccgaagcac cctacctgtg gctgtttctt caagaacctc
841 aatctcttcc ttgccctcat catcttcaac tccattgttg accccctcat ctatgccttc
901 cgaagtcagg agctccgcat gacgctcaag gaggtgctgc tgtgctcctg gtga





HEH589

1 atgcccatgc aggagcccca gaggaggcta ctgggtcctt tcaactccac ccgcacaggc
61 gttcctcacc tcgagctatc tgccaaccag actggaccct ggtgcctgca cgtatccatc
121 ccagatggcc tcttcctcag cctggggctg gtgagcttgg tggaaaatgt gctggtggtg
181 atttccattg ccaagaacca aaacytgcat tcccccatgt actacttcat ctgctgcctg
241 gctttgtctg acctgcttgt gagtgtgagc attgtgctgg agaccactct catcttggtg
301 ctagaggcag gggccctggc cacccgggtg actgtggtac agcagctgga caatgtcatc
361 gacgtgctca tmtgtggctc catggtctca agtctgtgct tcctcggagc catcgctgtg
421 gaccggtaca tctccatctt ctatgcactg cgctatcaca gtattgtgac actgccccgg
481 gctcggtggg ccatcgtggc catctgggta gccagcatct cttccagcac tctttttgtt
541 gcctactaca accacacagc ggtcctgctt tgtctcgtca ccttttttct agccacgctg
601 gcactcatgg tagttctgta tgtgcacatg cttgcacggg cacacmagca tgctcaggcc
661 attgctcagc tccacaagag acagcacctt gtccaccaag gtttccgact caaaggcgct
721 gccaccctca ctatcctctt gggcattttc ttcctgtgct ggggcccctt cttcctgtac
781 ctcactctca tcgtcctctg cccgaagcac cctacctgcg gctgtttctt caagaacctc
841 aatctcttcc ttgccctcat catcttcaac tccattgttg accccctcat ctatgccttc
901 cgaagtcagg agctccgcat gacgctcaag gaggtgctgc tgtgctcctg gtga




It should be really easy, right?
It depends on how the receiver meant to receive it. Which one can be translated correctly?

Let me give you an analogy. If one had the same phrase in a language you understood and a language you didn't, the language you understood would have more information content.
 

Tyrathca

New member
It depends on how the receiver meant to receive it.
And if there is no relevant "meant"? Your application pre-supposes an intent/intelligence behind the transmitter and receiver.
Which one can be translated correctly?
By what standard do you determine what is "correct"?
Let me give you an analogy. If one had the same phrase in a language you understood and a language you didn't, the language you understood would have more information content.
But in genetics every "phrase" is "understood", at least in the sense that each codon results in either an amino acid or a stop signal or can interact chemically with binding specific proteins, etc. Thus according to this (silly) analogy genetic mutations does not result in a loss of information because no matter what the sequence it can be "understood" as much or as little as any other.

Furthermore your analogy is irrelevant since it provides no way of quantifying information and is relative i.e. someone who speaks the language of the other phrase but not english would think the reverse about the information content of the two. Also this analogy allows for the increase in information as if there is a phrase in another language that with a few changes creates a phrase in English then this would be an increase in information. Not that since your argument is an increase in information by random change has a probability = 0 any probability of this no matter how small would disprove your claim.
 

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
Your application pre-supposes an intent/intelligence behind the transmitter and receiver.
So?

But in genetics every "phrase" is "understood", at least in the sense that each codon results in either an amino acid or a stop signal or can interact chemically with binding specific proteins, etc.
And with language every "word" can be "understood", at least in the sense that each wave results in a vibration in the inner ear that is transmitted to the brain.

We've been round this merry-go-round a number of times. If you're just going to "redefine" every term then "nothing" you "say" has any "concrete meaning".
 

Frayed Knot

New member
It depends on how the receiver meant to receive it.
But in the case of DNA, what is the receiver? The context of the question is how much information is inherent in the string of data, independent of any transmission/reception. If you're trying to stick to a data transmission model, we can read the data near perfectly, so the "reduction in uncertainty" at the receiver (the "receiver" being our understanding of what's in it) corresponds exactly to the amount of entropy in the source data.

This is the key concept that Stripe steadfastly and dishonestly refuses to acknowledge. If you can read the data without error, like we can with DNA, then the reduction of uncertainty on our end is exactly the entropy of the data set that we started with. Stripe even started to acknowledge this before I caught him, when he said that the amount of information in source data corresponds to its compressibility. He was admitting that it *is* valid to refer to the information content of a data set, independently of transmission, but I reminded him of that and he dug in to his previous stance.

Let me give you an analogy. If one had the same phrase in a language you understood and a language you didn't, the language you understood would have more information content.
What you're saying is that if a receiver doesn't know how to decode the data that it's sent, then there is no reduction in uncertainty, so no information is transmitted. This highlights the problem I had cornered Stripe on earlier - if you stick to a pure transmission model to define the concept of "information," then it's invalid to talk about the information content of a set of data. If that's the definition you're using, you have to stick with it, but he and you keep then referring to the information content of the source data set anyway, and as you highlighted here, that has no meaning. Under that definition, a paragraph of text in Swahili could have more or less information than a paragraph of text in English, depending on whether the reader understands the language.

Do you see the problem there? If that's the model you're using, you have to stick to it, so don't then try to refer to the information content of source data, because it's meaningless.

Alternatively, you could do what the mathematicians and real information theory specialists do, and measure the information content of a set of data by the theoretical reduction in uncertainty of an ideal receiver. This then corresponds to the entropy of the data set. But then you're left with the problem (you and Stripe, not me and A_O), that clearly a data set that has more characteristics of randomness has more information content. Oops.
 

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
he said that the amount of information in source data corresponds to its compressibility.
No, I didn't. :)

He was admitting that it *is* valid to refer to the information content of a data set, independently of transmission, but I reminded him of that and he dug in to his previous stance.
You can do anything you like when you have all the data. But information theory works on the principle that there is always noise in the data. If there were no noise, there is no point doing any analysis.

So you can suggest that if we know exactly what the data is being sent and received then information can be calculated from the entropy, but we are discussing a real life application. And in real life you must always account for noise.

So, I'm not being dishonest. You just happen to be applying a different situation on top of the discussion I'm having with BJ. And I'm sure you'll agree that when there are random changes, entropy cannot be equated to the reduction of uncertainty at the receiver. Right?

What you're saying is that if a receiver doesn't know how to decode the data that it's sent, then there is no reduction in uncertainty, so no information is transmitted. This highlights the problem I had cornered Stripe on earlier - if you stick to a pure transmission model to define the concept of "information," then it's invalid to talk about the information content of a set of data. If that's the definition you're using, you have to stick with it, but he and you keep then referring to the information content of the source data set anyway, and as you highlighted here, that has no meaning. Under that definition, a paragraph of text in Swahili could have more or less information than a paragraph of text in English, depending on whether the reader understands the language.
This is correct, but misses the point of Y's analogy. There is a difference between "information" in an informal ( :chuckle: ) sense and Shannon information, but Y was trying to explain something other than what you've got him for here. Can you see what it is?

Alternatively, you could do what the mathematicians and real information theory specialists do, and measure the information content of a set of data by the theoretical reduction in uncertainty of an ideal receiver. This then corresponds to the entropy of the data set. But then you're left with the problem (you and Stripe, not me and A_O), that clearly a data set that has more characteristics of randomness has more information content. Oops.
All you're doing is insisting we use your definition instead of ours. We are explaining quite sensibly what is happening and you're insisting we call information "randomness" (which only makes the discussion insanely difficult to parse) instead of allowing that we use information to mean reduction of uncertainty at the receiver.

Here's a suggestion. We continue this discussion by sticking to the terms "entropy", "randomness" and "reduction of uncertainty at receiver" and leaving the I-word out of it. How does that sound?
 

Frayed Knot

New member
No, I didn't. :)
Yes, you did:
Stripe said:
A_O said:
So if they have different information content, how do you tell which one has more?
Find out which one is generally more compressible.
My statement was that you said "the amount of information in source data corresponds to its compressibility." Do you still deny this? Honestly?

I mean, really, honestly?


Here's a suggestion. We continue this discussion by sticking to the terms "entropy", "randomness" and "reduction of uncertainty at receiver" and leaving the I-word out of it. How does that sound?
I'm willing to do that. Here goes...


You can do anything you like when you have all the data. But information theory works on the principle that there is always noise in the data. If there were no noise, there is no point doing any analysis.

So you can suggest that if we know exactly what the data is being sent and received then information can be calculated from the entropy, but we are discussing a real life application. And in real life you must always account for noise.
In the situations we're talking about, which is reading either text on a page or reading the sequence of DNA, we have the luxury of reading it as slowly and as carefully as we choose. Wouldn't you say that we can read either virtually without error (noise)? At a minimum, we can, as we get more and more careful, read it so that the error rate approaches zero...


And I'm sure you'll agree that when there are random changes, entropy cannot be equated to the reduction of uncertainty at the receiver. Right?
As the error of our reading data approaches zero, the "reduction in uncertainty of the receiver" approaches the value exactly equal to the entropy of the data set.


All you're doing is insisting we use your definition instead of ours.
No, not at all. I explicitly said that you're free to use your own conventions, but if you do so, you have to consistently use your own conventions and not equivocate by then switching definitions in the middle of your argument, which you repeatedly are doing.

In the convention that you've chosen here,

1) you may not refer to the information content of a set of data.

2) the reduction in uncertainty that someone will get by perfectly reading (receiving) the data set is exactly equal to the entropy of that data set. Even if you acknowledge that you can never be guaranteed to read the text on a page perfectly, the limit of RiU as the error approaches zero, is exactly equal to the entropy of the data set.

3) A data set that has more characteristics of randomness will equate to a higher reduction in uncertainty when it's read accurately by the receiver.
 

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
My statement was that you said "the amount of information in source data corresponds to its compressibility." Do you still deny this? Honestly?
:chuckle:

And if you'll notice, Alate was discussing the situation you have presented (where we have all the data) as opposed to the explanation I was giving to BJ where we did not.

I answered the question correctly and completely in synch with what I've been saying all along. You've just decided that I cannot talk about the compressibility while still insisting that the i-word is only to be understood as reduction in uncertainty at the receiver.

In the situations we're talking about, which is reading either text on a page or reading the sequence of DNA
Which is a completely different situation from what I am talking about with BJ. Please keep that in mind next time you want to accuse me of dishonesty. :)

we have the luxury of reading it as slowly and as carefully as we choose. Wouldn't you say that we can read either virtually without error (noise)? At a minimum, we can, as we get more and more careful, read it so that the error rate approaches zero...As the error of our reading data approaches zero, the "reduction in uncertainty of the receiver" approaches the value exactly equal to the entropy of the data set.
Here's what you can do. You can decode the genome of an organism. But in that process we are the receiver and must account for noise. We can decode the genome from another organism of the same kind. Again we are the receiver and must account for noise. Once we have the two genomes written down we can assume they are accurate. After we have assumed they are accurate we can compare them to see which contains more information.

Can you see how vastly different this process is from what I was discussing with BJ?

No, not at all. I explicitly said that you're free to use your own conventions, but if you do so, you have to consistently use your own conventions and not equivocate by then switching definitions in the middle of your argument, which you repeatedly are doing.
Defining the i-word as reduction of uncertainty at receiver does not prohibit me from comparing two sets of data.

In the convention that you've chosen here, 1) you may not refer to the information content of a set of data.
I can when we know that the entropy is equal to the reduction of uncertainty at receiver. Just because I define the product as the result of multiplication, doesn't mean I can't find a product by using addition.

But I cannot do this in the real world because in the real world we must always account for noise.

2) the reduction in uncertainty that someone will get by perfectly reading (receiving) the data set is exactly equal to the entropy of that data set. Even if you acknowledge that you can never be guaranteed to read the text on a page perfectly, the limit of RiU as the error approaches zero, is exactly equal to the entropy of the data set. 3) A data set that has more characteristics of randomness will equate to a higher reduction in uncertainty when it's read accurately by the receiver.

All this is perfectly accurate, on paper and there's no reason I cannot use these calculations when asked to answer a question requiring them.
 
Last edited:

Frayed Knot

New member
And if you'll notice, Alate was discussing the situation you have presented (where we have all the data) as opposed to the explanation I was giving to BJ where we did not.
So? You're attempting to dodge, but I'm not gonna let you. Alate_One asked which of two strings had more information content. This is well after you had dug-in and declared that you're sticking with Tom Schneider's definition of "information" which only applies to the transmission of data, not the source data itself. And you answered A_O, (correctly, according to standard usage of the word) stating that the information content is dependent on the compressibility of the source data. Then you denied that you did this very thing. Now you're trying to dodge it.

I answered the question correctly and completely in synch with what I've been saying all along.
Nope, you're equivocating, using whichever definition lets you squirm out of admitting your previous mistakes.

You've just decided that I cannot talk about the compressibility while still insisting that the i-word is only to be understood as reduction in uncertainty at the receiver.
Not at all, I will let you talk about the compressibility of data, or its entropy, but I will remind you that you have already stated that the definition of information that you yourself are sticking to cannot be applied to the source data.

Here's what you can do. You can decode the genome of an organism. But in that process we are the receiver and must account for noise. We can decode the genome from another organism of the same kind. Again we are the receiver and must account for noise.
It's not necessary to account for noise, when the accuracy of our ability to read it makes any tiny errors negligible.


Once we have the two genomes written down we can assume they are accurate.
Good - you are agreeing with what I just said, about any potential errors being negligible.

After we have assumed they are accurate we can compare them to see which contains more information.
We can't compare the information in the two sets of data, because according to your previous declaration, that word is off-limits for the discussion of sets of data.

Defining the i-word as reduction of uncertainty at receiver does not prohibit me from comparing two sets of data.
You can certainly compare two sets of data, measuring the compressibility or the entropy, but your own definition prohibits you from comparing the information in two sets of data. You've said this yourself, and you keep switching when it suits you.

I can when we know that the entropy is equal to the reduction of uncertainty at receiver. Just because I define the product as the result of multiplication, doesn't mean I can't find a product by using addition.
That first sentence makes no sense, and the second doesn't help shed any light. Care to restate that?

But I cannot do this in the real world because in the real world we must always account for noise.
We've already discussed the effect of noise. Since we can read the data as carefully as we want, we can arbitrarily reduce the noise so that it's below any relevant threshold. The noise can be neglected when we're talking about sets of data like text on a page or the sequence of DNA.
 

Yorzhik

Well-known member
LIFETIME MEMBER
Hall of Fame
And if there is no relevant "meant"?
Then there isn't a message.

Furthermore your analogy is irrelevant since it provides no way of quantifying information...
Shannon cannot quantify?!?!?

... and is relative i.e. someone who speaks the language of the other phrase but not english would think the reverse about the information content of the two.
Right. If the receiver/decoder changes, the measurement of the information will change.

Also this analogy allows for the increase in information as if there is a phrase in another language that with a few changes creates a phrase in English then this would be an increase in information.
Of course. Add information and there is an increase in information. That's, like, redundant and saying it twice.

Not that since your argument is an increase in information by random change has a probability = 0 any probability of this no matter how small would disprove your claim.
Not at all. If this is what you think then you would have be trying REALLY hard to misunderstand.
 
Last edited:

Yorzhik

Well-known member
LIFETIME MEMBER
Hall of Fame
But in the case of DNA, what is the receiver?
One example is the machinery that turns the DNA into a working protein.

If you're trying to stick to a data transmission model, we can read the data near perfectly, so the "reduction in uncertainty" at the receiver (the "receiver" being our understanding of what's in it) corresponds exactly to the amount of entropy in the source data.
Unless there is such a thing as mutations.

This is the key concept that Stripe steadfastly and dishonestly refuses to acknowledge.
I think Stripe would admit that mutations exist.

If you can read the data without error, like we can with DNA, then the reduction of uncertainty on our end is exactly the entropy of the data set that we started with.
I'll say it again for a third time. One must always account for noise mutations.

What you're saying is that if a receiver doesn't know how to decode the data that it's sent, then there is no reduction in uncertainty, so no information is transmitted.
No. It means the information cannot be measured.

This highlights the problem I had cornered Stripe on earlier - if you stick to a pure transmission model to define the concept of "information," then it's invalid to talk about the information content of a set of data.
That makes no sense. What is being transmitted?

If that's the definition you're using, you have to stick with it, but he and you keep then referring to the information content of the source data set anyway, and as you highlighted here, that has no meaning. Under that definition, a paragraph of text in Swahili could have more or less information than a paragraph of text in English, depending on whether the reader understands the language.
Yeah.

Do you see the problem there? If that's the model you're using, you have to stick to it, so don't then try to refer to the information content of source data, because it's meaningless.
The content can only be measured if you have all 3 (4) parts: encoded message, transmission, and decoded message (noise).

Alternatively, you could do what the mathematicians and real information theory specialists do, and measure the information content of a set of data by the theoretical reduction in uncertainty of an ideal receiver. This then corresponds to the entropy of the data set. But then you're left with the problem (you and Stripe, not me and A_O), that clearly a data set that has more characteristics of randomness has more information content. Oops.
You're joking with me. And by joking I mean, "It is thus clear where the joker is in saying that the received signal has more information. Some of this information is spurious and undesirable and has been introduced via the noise. To get the useful information in the received signal we must subtract out this spurious portion."
 

DavisBJ

New member
:rotfl: DNA is not 1 and 0.

If you want to talk about a binary code I can show you how random changes are always bad for information within that. You have to show us how information theory is not applicable to DNA.

Not binary. :)
Yes, I realize DNA is not binary. I also realize you once again are showing that you can’t or won’t look past the details to see the concept.

No matter what that actual arrangement of DNA base pairs is that codes for brown fur, it is perfectly valid for people above elementary school level to denote that exact arrangement as condition 0. A corruption of that message that would result in white fur could be denoted as a 1. Notice this is really deep – 1 and 0 are different. 1 is not 0, and therefore must be a “corruption” of a message that originally specified the 0. Is that way over your head?

Yorzhik, in those occasional posts he makes, uses 10% as many words as you do, and says ten times as much (maybe wrong, but still expressed more clearly than you have ever done). Why don’t you go back to your sandbox and we will see if Yorzhik picks up the idea that you are so incompetently defending?
 

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
So? You're attempting to dodge, but I'm not gonna let you.
I guess you need something to talk about. :think:

Alate_One asked which of two strings had more information content. This is well after you had dug-in and declared that you're sticking with Tom Schneider's definition of "information" which only applies to the transmission of data, not the source data itself. And you answered A_O, (correctly, according to standard usage of the word) stating that the information content is dependent on the compressibility of the source data. Then you denied that you did this very thing. Now you're trying to dodge it.
If you want to find out which has more reduction in uncertainty at receiver then in this case you can test the compressibility of both sets. It's "dependent" (your word, not mine) only in the sense that the math runs that way. The i-word is always reduction in uncertainty at receiver. Discovering that value by using known data does not change this.

Nope, you're equivocating, using whichever definition lets you squirm out of admitting your previous mistakes.
I haven't made a mistake. Or, at least, you're not pointing one out.

Not at all, I will let you talk about the compressibility of data, or its entropy, but I will remind you that you have already stated that the definition of information that you yourself are sticking to cannot be applied to the source data.
Luckily I did not apply it (your word, not mine). I just used the source to determine the value.

It's not necessary to account for noise, when the accuracy of our ability to read it makes any tiny errors negligible.
Yes it is necessary to account for noise. We are receiving data from a DNA analysis. Not reading letters on a page that we have assumed to be accurate.

Good - you are agreeing with what I just said, about any potential errors being negligible.
:chuckle:

We can't compare the information in the two sets of data, because according to your previous declaration, that word is off-limits for the discussion of sets of data. You can certainly compare two sets of data, measuring the compressibility or the entropy, but your own definition prohibits you from comparing the information in two sets of data. You've said this yourself, and you keep switching when it suits you.
Yes, we can. When we compare them we will know that their entropy is the same as the reduction in uncertainty at receiver. But this is not the case in the situation BJ presented. In the real world, we must always account for noise.

That first sentence makes no sense, and the second doesn't help shed any light. Care to restate that?
When we define five times four as equalling twenty that does not stop us adding five plus five plus five plus five to find twenty. I am perfectly justified in defining the i-word as being reduction in uncertainty at receiver and using the fact that entropy equals reduction in uncertainty when we have the source data. You're just claiming that I'm changing my definition. You're claiming that I'm contradicting my definition that a product is the result of multiplication because I can use addition to find the right answer.

We've already discussed the effect of noise. Since we can read the data as carefully as we want, we can arbitrarily reduce the noise so that it's below any relevant threshold. The noise can be neglected when we're talking about sets of data like text on a page or the sequence of DNA.
You cannot do this in the situation BJ gave.
 

Stripe

Teenage Adaptive Ninja Turtle
LIFETIME MEMBER
Hall of Fame
Yes, I realize DNA is not binary. I also realize you once again are showing that you can’t or won’t look past the details to see the concept.
The concept you are trying to show is evolutionary theory. You have to show how information theory does not apply to data from DNA, not just assert that evolution works.

No matter what that actual arrangement of DNA base pairs is that codes for brown fur, it is perfectly valid for people above elementary school level to denote that exact arrangement as condition 0. A corruption of that message that would result in white fur could be denoted as a 1. Notice this is really deep – 1 and 0 are different. 1 is not 0, and therefore must be a “corruption” of a message that originally specified the 0. Is that way over your head?
It's perfectly understandable. But when you reduce the data down to 1 and 0 you remove most possibility that noise might be found in the data. You're making a fundamental switch from talking about the actual situation which is that DNA is corrupted by random changes to talking about a contrived scenario where all the data is known.

You're hiding from the challenge. Your challenge is to show how random changes can improve the signal. How can random changes reduce the uncertainty at the receiver? Why does information theory not apply to evolution.

Yorzhik, in those occasional posts he makes, uses 10% as many words as you do, and says ten times as much.

He has been doing this a lot longer than I have and I've learnt a lot from him. Clearly I have a ways to go. :)

And since you seem to now fully understand information theory enough to say we are wrong, perhaps you'll be kind enough to show us how we are wrong. It'd sure beat you just insisting on the idea. :up:
 
Status
Not open for further replies.
Top