20110526

Is DNA computing going to terminate Internet banking?

By Chris Lee
 
In computer science and mathematics, problems can be broadly categorized as hard or easy. This doesn't refer to the complexity of programming an algorithm to solve a problem, but rather to how the size of the problem influences the time it takes to find a solution. The existence of hard problems forms the very bedrock of computer security, so any paper that claims to be on the path to an efficient way of finding solutions to hard problems deserve some scrutiny.

Which is why a paper entitled "DNA-Based Computing of Strategic Assignment Problems" drew my skeptical eye. The paper implies (but does not actually state) that the strategic assignment problem falls into the category of hard, and claims that computing using DNA may efficiently solve such problems. Is that really so?
All math is hard, right?

Is the difference between hard and easy problems really so important? It is, and the difference allows us to engage in secure communications over the Internet (think online banking). Easy problems scale in a polynomial way, so if you have an eight-bit problem, then the time to solve it might scale as 83. If you increase the problem to a 16 bit problem, then the time taken to solve it increases to 163. Hard problems scale exponentially, so our eight-bit problem might be 38, and increasing to 16 bits increases the time taken to solve the problem to 316.

To illustrate the difference between hard and easy problems, let's take a specific example. Imagine that finding the prime factors of an eight-bit number takes 512 seconds. If this is an easy problem, then we can estimate that going to 256 bits will extend the computation time to 17 million seconds on the same computer. That seems like a large increase, but, in fact, you're looking at just a few years of computation time, ignoring any increase in computation speed. If it is a hard problem, however, the 256 bit problem will take 4.6×1087 seconds.

So you can see how such scaling is problematic. The difficulty of tackling a hard problem is what makes public-private key systems, such as the ones used in Internet banking, so secure at the moment. Now, if someone were to claim that they could solve a hard problem efficiently—that is, take a problem that scales exponentially and show that their solution scales polynomially—then that would be very surprising, and quite worrying.

The paper from Shu, Wang, and Yong, based at Nanyang Technological University in Singapore, doesn't make that explicit claim. But you would have to be unable to read between the lines to not get that idea from the paper.

The problem that Shu and colleagues have focused on is the strategic assignment problem, for which there is little information on the Internet—I suspect that it may have an alternative name that is more common, but I was unable to find it. They give a good (and historic) example in the paper. Imagine that you have three horses that you are going to race pair-wise against a competitor. Before the races begin, you observe that your competitor's fastest horse is faster than your fastest horse, and his second-fastest horse is faster than your second-fastest horse, and his slowest horse is faster that your slowest horse.

The problem, then, is how can you arrange them to win the majority of races? The answer is to race your fastest horse against your competitors second-fastest horse, your second-fastest horse against their slowest, and your slowest horse against their fastest horse. Baring injury, you should win two out of three races. The problem, very simply put, boils down to matching between items in pairs of lists.


A DNA computer program? Getting a solution to the problem via DNA turns out to be relatively easy. First, the researchers assigned a set of single-stranded DNA sequences to each item in each list. To control processing, they ensured that each assignment contains a sequence that is recognized by one of five different DNA-cutting restriction enzymes. They also assigned some additional sequences to denote stopping points. The researchers then generated all these sequences separately, and then mixed them together to allow them to become double stranded DNA. The DNA strands were then stitched together to create all possible sequences.

Most of these sequences are garbage, and the real work comes from finding the right bit of DNA. The first step is to only choose those sequences that start with the right list—the solution matches items from list A to items in list B, so if a sequence begins with list B, it should be thrown out. To do this, the researchers use a polymerase chain reaction (PCR) to amplify the DNA strands that started with one of the items in list A and ended at one of the stop sequences.

The next step is to choose only those strands that may have each sequence only once—the correct solution will have each list item only once. This is done by running the DNA through a gel that sorts the DNA strands by length, allowing the researchers to pick out the DNA band that corresponds to the right length. Unfortunately, the selected DNA may have repeated list items, and these must be eliminated.

This is what the restriction enzymes are for. The researchers add one of the restriction enzymes to the collection of DNA that was taken out of the gel. The enzyme breaks the DNA up wherever it finds a specific sequence. Any of the DNA strands that contain a specific list item more than once ends up being cut more than once, and thus is much shorter than the rest.

The solution containing these DNA fragments is then divided up and each solution has one of the remaining restriction enzymes added. These different restriction enzymes cut off different stretches of DNA—ultimately the strands that had the right solutions in a specific order end up being cut down to a particular length. Now, in this particular case, the order of the matches doesn't matters, so it is enough to run another gel and separate out the DNA corresponding to 100 base pairs (in this example).

Sequence those 100 bases, and you have your solution. I should also note that figuring out the right length to look for is not difficult, since it is simply one less than the number of items in the list. I think that's pretty cool. It's a smart, sophisticated, and elegant version of DNA computing. By itself, it warrants attention, but it probably isn't as general a result as the authors would like to think.


And that is fast and scalable? The authors claim this is fast and scaleable. I am skeptical. As it stands, we have two gels and one PCR step to go through, as well as two digestion steps—not even getting into generating the sequences and then sequencing the result. The nice thing is that most of these steps require a fixed amount of time, independent of the size of the problem. So in that respect, yes, at some point, DNA computing may well beat traditional computation.

But I worry about scalability in terms of the actual computing process. This was possibly the simplest version of their example of DNA computing. The choice of DNA restriction enzymes and the fact that the order of the matches doesn't matter were both critical to making this example work. I suspect that figuring out a sequence of DNA restriction enzymes to apply is itself a hard problem. If that is true, then this work simply transforms one hard problem into another hard problem with little gain. To make matters worse, if the order of the matched pairs was important, then I suspect that it's impossible to find a set of restriction enzymes that provide a solution.

So, the answer to the title of this post is: no, not now, and probably not ever. To be a little less flippant: the researchers now need to show that this method can, even in principle, be used for a wide range of problems—can it be turned into a universal computer? They also need to explain why the selection of restriction enzymes is an easy problem to solve. If they can show that, then the world of commerce will tremble at their feet.

No comments: