NIH: 100M Years to Change a Binding Site

DavisBJ · Mar 31, 2011

Stripe said:
The concept you are trying to show is evolutionary theory. You have to show how information theory does not apply to data from DNA, not just assert that evolution works.

This is a pretty good example of why a discussion with you is usually a waste of time. Now listen, since I said this same thing before. I am not questioning information theory, nor even whether it applies to DNA. I am almost solidly convinced that you are misapplying it.

It's perfectly understandable. But when you reduce the data down to 1 and 0 you remove most possibility that noise might be found in the data. You're making a fundamental switch from talking about the actual situation which is that DNA is corrupted by random changes to talking about a contrived scenario where all the data is known.

So you think information theory is impotent in the hyper-simple case where the message can only have two values.

perhaps you'll be kind enough to show us how we are wrong. It'd sure beat you just insisting on the idea. :up:

Sure. Show me how the corruption of the brown-fur DNA message into a white-fur message in snow-bond boders degrades the receiver’s genome.

Tyrathca · Mar 31, 2011

Stripe said:
So?

So this is inappropriate for genetics. There is no intent/intelligence known to be involved normally.

And with language every "word" can be "understood", at least in the sense that each wave results in a vibration in the inner ear that is transmitted to the brain.

Which highlights the flaw of the analogy in the first place. That description for language and words is actually a fairly accurate analogy for DNA and codons. Thus treating the word "understood" in this way your analogy does not fit unless we assume that there is intelligence and intent behind the coded DNA (i.e. God the designer, which is essentially using your conclusion as part of your premise)

Tyrathca · Mar 31, 2011

Yorzhik said:
Then there isn't a message.

Then your application of information theory is inappropriate. If there is no message, from the perspective of the method of quantification you are using, then anything you say about it has little relevance.

Shannon cannot quantify?!?!?

I saw nothing about shannon there. You made an analogy, not an description of shannon information. By your own admission above there is no message in your analogy unless we assume there is a relevant "meant" (which in itself assumes intent/intelligence). Shannon information can quantify, but according to your analogy there is nothing there to quantify! (It seems you are only selectively using shannon information by using various definitions or information and using them interchangeably)

Of course. Add information and there is an increase in information. That's, like, redundant and saying it twice.

Not at all. If this is what you think then you would have be trying REALLY hard to misunderstand.

Well stripe has been saying that random changes can't ever increase information. Your description however seems to consider the intent (or lack of) for any change to be irrelevant, if a change creates a phrase in English from another language then the information increases regardless of whether it was due to random effects or not. You've jumped in and done a lot to defend him and little to differentiate yourself from him, excuse me if I lump you two together. I assume this then is an area you disagree with each other?

Frayed Knot · Mar 31, 2011

DavisBJ said:
So you think information theory is impotent in the hyper-simple case where the message can only have two values.

Apparently so - we people who work in the field of signal communications never ever transmit 1's and 0's.

Tyrathca said:
It seems you are only selectively using shannon information by using various definitions or information and using them interchangeably

That does seem to be the theme among the two of them, doesn't it?

Frayed Knot · Mar 31, 2011

Yorzhik said:
Frayed Knot said:

But in the case of DNA, what is the receiver?

Click to expand...

One example is the machinery that turns the DNA into a working protein.

That's not what we're talking about here. We're talking about taking a strand of DNA, and getting the sequence of its bases into our heads, or onto a sheet of paper. In Stripe's definition that the "information" word can only be used to gauge the reduction in uncertainty of a receiver, this means that "information" is a term that applies to a process and not to a set of data. I'm willing to use his definition (even though it is not standard in information theory), so I'm having to carefully word all my statements.

The "receiver" in this example is our brain, or the sheet of paper we write the sequence on. At the beginning of the process, I am completely uncertain about the sequence of DNA in the strand. At the end of the process, I will have read the information, which I am able to do with near complete precision - in fact, I have the luxury of taking as much time with it as I need, so I can reduce the potential for error arbitrarily. Tell me how low you want the error probability to be, and I can get under that by taking more time and doing it more carefully. Therefore, I can guarantee that any transcription errors will be low enough to be neglected in our analysis of information (how low does the error rate need to be to be negligible? I can get below that).

At the end of the process, I am near completely certain as to the sequence of DNA. Now the question becomes - what is the reduction in uncertainty about the DNA sequence after it's been read? This is the central core of information theory - the reduction in uncertainty is exactly equal to the entropy of the data that we started with. In other words, the information we get is equal to how unexpected the starting data set is, and a data set that has fewer patterns, is more like randomness, has more.

A pattern with more randomness results in more information.

Unless there is such a thing as mutations.

We are not talking about mutations, we are not talking about replication of DNA by a cell, we are talking about reading the DNA sequence.

I'll say it again for a third time. One must always account for ~~noise~~ mutations.

We did account for noise in our process of reading the data. The idea of mutations doesn't apply.

Do you see the problem there? If that's the model you're using, you have to stick to it, so don't then try to refer to the information content of source data, because it's meaningless.

Click to expand...

The content can only be measured if you have all 3 (4) parts: encoded message, transmission, and decoded message (noise).

In the scenario we're discussing, the message is the sequence of the DNA. Transmission is the act of reading that, and the decoded message is our understanding of what was in it at the beginning. The noise (errors) can be reduced to as low as we want them, low enough to be negligible. As the noise gets lower and lower, the information transmitted (reduction in the uncertainty in our understanding of what was there) approaches the limit of the entropy of the source data. Since we can get the errors as low as we desire, the information is equal to the entropy of the data.

I hope the non-technical people reading this thread can understand the key concepts.

Stripe · Mar 31, 2011

DavisBJ said:
This is a pretty good example of why a discussion with you is usually a waste of time.

Yes. It is a waste of time to try and convince me of your position without providing any reasons why I should accept your ideas.

Now listen, since I said this same thing before. I am not questioning information theory, nor even whether it applies to DNA. I am almost solidly convinced that you are misapplying it.

If you're convinced I am misapplying i-theory then you must have some convincing reasons. What are those reasons?

So you think information theory is impotent in the hyper-simple case where the message can only have two values.

No. Did you not read when I wrote:

"If you want to talk about a binary code I can show you how random changes are always bad for information within that."

?

Sure. Show me how the corruption of the brown-fur DNA message into a white-fur message in snow-bond boders degrades the receiver’s genome.

I can't. Even the most knowledgeable biologist has only scratched the surface of how DNA is encoded. I doubt they could easily identify which parts of the genome control fur colour, let alone make the necessary changes, let alone account for how this affects other functionality.

Tyrathca said:
So this is inappropriate for genetics. There is no intent/intelligence known to be involved normally.Which highlights the flaw of the analogy in the first place.

Says you. :idunno:

I think it's a very powerful analogy. But I can understand why you must insist it is not so.

That description for language and words is actually a fairly accurate analogy for DNA and codons. Thus treating the word "understood" in this way your analogy does not fit unless we assume that there is intelligence and intent behind the coded DNA (i.e. God the designer, which is essentially using your conclusion as part of your premise)

When we want to provide evidence for God's existence, we'll tell you. In this thread, however, we are providing compelling reasons why evolution cannot proceed by random mutations and natural selection. :thumb:

The most compelling reason is:
Random changes always increase uncertainty at the receiver.

Your job is to show us how information theory cannot be applied to data from DNA or to show us why we should not be concerned about noise. Or you can try FK's tactic and insist we equate randomness and information so the conversation turns into an incomprehensible quagmire.

Or you could accept the obvious.

Choose well. :up:

I assume this then is an area you disagree with each other?

Nope. I understand why you might think that though.

Jukia · Mar 31, 2011

Stripe said:
I can't. Even the most knowledgeable biologist has only scratched the surface of how DNA is encoded. I doubt they could easily identify which parts of the genome control fur colour, let alone make the necessary changes, let alone account for how this affects other functionality.

And the reason AiG or other creation scientists are not hot on this issue is...?

Frayed Knot · Mar 31, 2011

Stripe said:
The most compelling reason is:
Random changes always increase uncertainty at the receiver.

Yet again you're conflating different ideas.

If you have a set of data, and a receiver who is uncertain about the content of that data, then if you have a transmission path that introduces random errors, the receiver will have more uncertainty than if the transmission path had no errors.

That's fine, but it's not relevant here.

What we're talking about is that we have two sets of data, and we are completely uncertain about the content of each before we read them. Then we read them, with an error rate that's small enough for our purposes.

The question is, which one of those data sets resulted in more of a reduction in uncertainty (information transfer) to us (the receiver)? The answer is, whichever one had the highest entropy, and the entropy can be measured.

If you take a data set that has a lot of predictability, and call that Data Set A, then you randomly scramble some if it, and call the modified one Data Set B, then you can compare the entropy of those two sets of data. And Data Set B will have a higher entropy. Reading Data Set B will result in more of a reduction in the uncertainty at the receiver. Data Set B resulted in more information being read (transmitted).

These are the facts of information theory.

DavisBJ · Mar 31, 2011

Stripe said:
Yes. It is a waste of time to try and convince me of your position without providing any reasons why I should accept your ideas.

White furred boders

If you're convinced I am misapplying i-theory then you must have some convincing reasons. What are those reasons?

White furred boders.

No. Did you not read when I wrote:

"If you want to talk about a binary code I can show you how random changes are always bad for information within that."
?

White-fur – Brown fur is a binary choice. Get on it.

I can't.

The truth finally emerges.

Even the most knowledgeable biologist has only scratched the surface of how DNA is encoded. I doubt they could easily identify which parts of the genome control fur colour, let alone make the necessary changes, let alone account for how this affects other functionality.

Translation – obfuscate, rabbit trails, divert, do anything but show how information theory would force the degradation of even a binary element of a genome.

Alate_One · Mar 31, 2011

Stripe said:
I can't. Even the most knowledgeable biologist has only scratched the surface of how DNA is encoded. I doubt they could easily identify which parts of the genome control fur colour, let alone make the necessary changes, let alone account for how this affects other functionality.

:rotfl: you did NOT just say this! You know those two DNA codes I posted? They control fur color. One is sandy and the other is black. They are different from one another by 4 letters.

And the process is reasonably well understood. Certainly it is hard to actually know *everything* about biological components, but your assumption of cluelessness is totally off base.

Frayed Knot · Mar 31, 2011

Alate_One said:
:rotfl: you did NOT just say this!

You'd think he'd be continually embarrassed by his own statements, but then there's the Dunning-Kruger Effect. That explains a lot.

Stripe · Mar 31, 2011

Jukia said:
And the reason AiG or other creation scientists are not hot on this issue is...?

They are. :idunno:

Frayed Knot said:
Yet again you're conflating different ideas.

No, I'm not.

If you have a set of data, and a receiver who is uncertain about the content of that data, then if you have a transmission path that introduces random errors, the receiver will have more uncertainty than if the transmission path had no errors. That's fine, but it's not relevant here.

It is in the real world. It is when we are discussing the propagation of genetic code.

What we're talking about is that we have two sets of data, and we are completely uncertain about the content of each before we read them. Then we read them, with an error rate that's small enough for our purposes. The question is, which one of those data sets resulted in more of a reduction in uncertainty (information transfer) to us (the receiver)? The answer is, whichever one had the highest entropy, and the entropy can be measured.

When the reduction in uncertainty is the same as the entropy because of this kind of scenario then it is perfectly valid to calculate the reduction in uncertainty by calculating the entropy. But in the real world, we must always account for noise.

If you take a data set that has a lot of predictability, and call that Data Set A, then you randomly scramble some if it, and call the modified one Data Set B, then you can compare the entropy of those two sets of data. And Data Set B will have a higher entropy. Reading Data Set B will result in more of a reduction in the uncertainty at the receiver. Data Set B resulted in more information being read (transmitted).

Well, I'm not exactly sure what you've done wrong, but there must be something! :chuckle:

Tom says, "The uncertainty has its largest value only when the symbols are equally probable" so let's see if we can't figure this out.

If you have a data set with a lot of predictability and you're sending one character A,T,G or C then the receiver will be very able to predict what that character will be (reducing uncertainty). If you randomise the data then there will not be a lot of predictability so the receiver will not so easily be able to predict the next character (increasing uncertainty).

These are the facts of information theory.

DavisBJ said:
White furred boders White furred boders. White-fur – Brown fur is a binary choice. Get on it. The truth finally emerges. Translation – obfuscate, rabbit trails, divert, do anything but show how information theory would force the degradation of even a binary element of a genome.

:chuckle:

Stripe · Mar 31, 2011

Alate_One said:
:rotfl: you did NOT just say this!

Yes, I did.

You know those two DNA codes I posted? They control fur color. One is sandy and the other is black. They are different from one another by 4 letters.

OK.

Was it easy for biologists to determine these?

Certainly it is hard to actually know *everything* about biological components

I know. We're about a billion years away from that.

but your assumption of cluelessness is totally off base.

I never assumed cluelessness.

Frayed Knot · Mar 31, 2011

Stripe said:
It is in the real world. It is when we are discussing the propagation of genetic code.

Sure, there are situations where that's the question, but it's not the question that we've been discussing. If we're comparing two sets of data, we're not talking about propagating genes and the errors associated with that. That's a different subject entirely.

When the reduction in uncertainty is the same as the entropy because of this kind of scenario then it is perfectly valid to calculate the reduction in uncertainty by calculating the entropy. But in the real world, we must always account for noise.

I accounted for noise in the example. It can be as low as we need it to be in order to neglect it.

Well, I'm not exactly sure what you've done wrong, but there must be something! :chuckle:

Tom says, "The uncertainty has its largest value only when the symbols are equally probable" so let's see if we can't figure this out.

If you have a data set with a lot of predictability and you're sending one character A,T,G or C then the receiver will be very able to predict what that character will be (reducing uncertainty). If you randomise the data then there will not be a lot of predictability so the receiver will not so easily be able to predict the next character (increasing uncertainty).

Exactly! I'm glad you're starting to catch on. The initial uncertainty with a randomized data set is higher, therefore the amount that our uncertainty is reduced when we read the data is maximized, therefore the amount of information is maximized. With a data set that's completely unpredictable, the reduction in our uncertainty from reading it is the highest.

Alate_One · Mar 31, 2011

Stripe said:
Was it easy for biologists to determine these?

Pretty much. Not every trait is easy but a lot of them are. The way DNA sequencing is going it's going to get really easy really soon. Don't know what's causes a new trait? Just sequence em.

I never assumed cluelessness.

That's pretty much what you said.

Stripe · Mar 31, 2011

Frayed Knot said:
Exactly! I'm glad you're starting to catch on. The initial uncertainty with a randomized data set is higher, therefore the amount that our uncertainty is reduced when we read the data is maximized, therefore the amount of information is maximized. With a data set that's completely unpredictable, the reduction in our uncertainty from reading it is the highest.

OK, but there's still something you've missed and I think I've found it:

"There is one more thing to add: the more random the initial string is, the more information can be transmitted. But random is a matter of perspective in this case. It means that the perfect transmission code looks random to an outsider, because it has no correlations or your weightings. To someone who knows how to read the code, it is a wonderfully clear message because it can have most of the noise removed by proper decoding."

So when there is no decoding going on (like in your data sets A & B) you are perfectly correct. But when we do have a decoding process and we find random numbers coming in we will only get gibberish.

And this is where Shannon's theory gives way to reality.

Stripe · Mar 31, 2011

Stripe said:
So when there is no decoding going on (like in your data sets A & B) you are perfectly correct. But when we do have a decoding process and we find random numbers coming in we will only get gibberish.

If you have a decoding system then each time you receive a digit in a data set, your uncertainty will decrease about the rest of the data set. If you get a digit that is in error, then you may be able to use the rest of the data to correct that error, but you will never be able to use the error for anything useful.

Right?

Frayed Knot · Apr 1, 2011

Stripe said:
OK, but there's still something you've missed and I think I've found it:

"There is one more thing to add: the more random the initial string is, the more information can be transmitted. But random is a matter of perspective in this case. It means that the perfect transmission code looks random to an outsider, because it has no correlations or your weightings. To someone who knows how to read the code, it is a wonderfully clear message because it can have most of the noise removed by proper decoding."

That's right, but I think you're interpretation of what that says is off.

So when there is no decoding going on (like in your data sets A & B) you are perfectly correct. But when we do have a decoding process and we find random numbers coming in we will only get gibberish.

I'm not sure where exactly the disconnect is, but I'll take a stab.

I think you're saying that if we start with Data Set A, and that is our gold standard, then we scramble some of it randomly and transmit that, then the receiver will not be as certain about the contents of Data Set A as he would be if he read Data Set A without the scrambling. The information transmitted about A will be less if it's randomized first.

That is true, but it's not what the rest of us have been discussing.

We're talking about the situation where you have two gold standards: Data Set A and Data Set B. Now we happen to know that the way we made Data Set B was to take A and scramble some of it randomly, but all the transmitters and receivers know is that A is different from B. We also can measure the entropy of both, and almost certainly, B will have more entropy than A, because we took a portion of A which probably had some predictability, and scrambled it randomly so that section has no predictability.

So we have two setups, one transmitting the information from A, and the other transmitting the information from B. The receiver for B has more uncertainty before the data transmission, because Set B is less predictable. After transmission, there will be no uncertainty for either, because they've read the data with virtually no error. The receiver for B has had its uncertainty reduced more than the receiver for A, so there was more information.

And this is where Shannon's theory gives way to reality.

Oh come on now. Shannon's work is all about the real world.

Look, Stripe, I know you feel like you're married to the concept that randomness has less information, but it's time to let that go. You're waging a losing battle.

Stripe · Apr 1, 2011

Frayed Knot said:
That is true, but it's not what the rest of us have been discussing.

Well, perhaps you should be.

We're talking about the situation where you have two gold standards: Data Set A and Data Set B. Now we happen to know that the way we made Data Set B was to take A and scramble some of it randomly, but all the transmitters and receivers know is that A is different from B. We also can measure the entropy of both, and almost certainly, B will have more entropy than A, because we took a portion of A which probably had some predictability, and scrambled it randomly so that section has no predictability.

We don't have this luxury when trying to decode DNA. There is no gold standard.

Oh come on now. Shannon's work is all about the real world.

Yes, but the applications in this thread have not been. Except for mine.

Look, Stripe, I know you feel like you're married to the concept that randomness has less information, but it's time to let that go. You're waging a losing battle.

Not if I manage to stick to the right definitions and situations.

Thanks for your help on this thread. I think I learnt a lot! :up:

Frayed Knot · Apr 1, 2011

Stripe said:
We don't have this luxury when trying to decode DNA. There is no gold standard.

If we're comparing the DNA of two critters, each one of those sets of DNA is what we're trying to read - each is the "gold standard" in my example. In an ideal world, we can read each perfectly. In the real world, we can make the error rate as low as desired, as Tom Schneider said in the very next sentence after the snippet you quoted.

NIH: 100M Years to Change a Binding Site

New member

New member

New member

New member

New member

Teenage Adaptive Ninja Turtle

New member

New member

New member

Well-known member

New member

Teenage Adaptive Ninja Turtle

Teenage Adaptive Ninja Turtle

New member

Well-known member

Teenage Adaptive Ninja Turtle

Teenage Adaptive Ninja Turtle

New member

Teenage Adaptive Ninja Turtle

New member