SAN FRANCISCO, May 15 (Xinhua) -- A new study suggests 13 tiny snippets of deoxyribonucleic acid (DNA) may be enough to infer hundreds of thousands more markers, capable of revealing a wealth of genetic information.
However, according to Noah Rosenberg, a Stanford University professor of biology and senior author of the study published Monday in Proceedings of the National Academy of Sciences, the ability to infer so much on the basis of so little information raises privacy concerns.
Rosenberg and his colleagues' findings are based on two sets of genetic data from 872 human genomes. The first comprised 13 markers that until this year were the basis of the Combined DNA Index System, or CODIS, the forensic genetic marker set of the Federal Bureau of Investigation (FBI). And the system was recently upgraded to include seven additional markers, bringing the total to 20. The second, much broader dataset included 642,563 genetic markers that did not overlap with the first set.
The question was, how well could the researchers match a person's record in one dataset to their record in the other?
The team found there were strong enough patterns in human DNA, at least in the DNA of the diverse set of people studied, that they could match upward of 90 percent of the records. If 17 more forensic markers were added, bringing the total to 30, the researchers could match more than 99 percent of the records in the two datasets, meaning that with the right combination of databases, it may be possible to infer a wealth of genetic information based on a very small set of markers.
The results suggest there may be flaws in the way law enforcement officials, courts and businesses that conduct genetic tests have thought about genetic privacy. Previously, it had been assumed that forensic DNA collections were only useful for matching DNA samples to names already in a database for placing a suspect at a crime scene, and could not reveal any information beyond identity matches.
The new findings indicate when the same person is included in more than one genetic database, it may be possible to infer genetic traits from CODIS data or to find matches across different sets of DNA markers, Rosenberg noted, adding that privacy and legal issues aside, "there are several other places where this result is useful."
"The approach we are using dates back to the 1960s, when computer scientists and statisticians were first trying to figure out how to link records from the same people in different government, medical or corporate databases," said Michael Edge, a recent PhD graduate and lead author on the paper. "It is interesting to see that the same type of problem arises in so many contexts in genetics."
One issue is backward compatibility. The problems forensic geneticists face are often harder than simply matching profiles -- for example, determining whether one person's DNA is present in a mixture of several people's DNA left on a doorknob at a crime scene. With just 13 or 20 genetic markers, there is a substantial risk of false positive matches. Using larger marker sets would reduce false positive rates, but it might not be possible to check for matches against decades of profiles collected with the 13 markers that have been used to date.
The research gives a proof of principle that it may be possible to develop a forensic genetic system with new marker sets and still be able to test for matches against databases assembled with the earlier CODIS markers, Rosenberg was quoted as saying in a news release.