We will primarily be collecting males from the indigenous populations around the world, which will maximize the number of Y-chromosomes while providing mtDNA as well. It is also easier to study X-chromosome variation in men, since the X is only present in one copy and it is therefore easier to infer haplotypes.
The populations will be chosen through a process of consultation with elders and the people themselves. I have been in Australia for the past few days getting this started, and am off to Singapore and India this week. We will sample both ‘ethnically defined’ (by language, customs, etc.) and ‘geographically defined’ (i.e. if a group, such as the Kazaks of Central Asia, are widespread then we will attempt to sample roughly on a grid) groups.
Sampling from indigenous groups will be through blood draws, which will yield hundreds of micrograms of DNA. This amount of DNA if far more than we need for typing Y and mtDNA, and it will allow us to apply new markers to the study of migratory patterns in the future. Particularly as the HapMap data becomes available, new autosomal haplotype systems should provide great resolution for questions that are unanswerable using Y and mtDNA (remember that these only assess a tiny fraction of your complete genomic ancestry). The DNA will be stored at the regional center that collected it, and will be available for study in the future by all members of the scientific community – effectively a virtual, global biobank. These studies will only take place as collaborations, and the proposed genotyping must follow the guidelines for the study – e.g. only markers that tell us about historical or anthropological information. Also, the actual laboratory work will take place at the regional center(s) – one of our project goals is to build scientific capacity in the less developed countries (Brazil, South Africa, India, etc.) where we have centers. No medical research will ever be conducted using these samples, for reasons having to do with informed consent and intellectual property. We will release all of the anonymous data into the public domain as we analyze it. We feel that this information is part of the ‘commons’ of our species – it belongs to everyone – and no attempt will be made to patent it.
We will be testing every indigenous sample collected for Y and mtDNA. In the case of the former, a multiplex PCR technique will be used to type AT LEAST the 12 STR markers typed by Family Tree DNA. We will probably be typing more – perhaps as many as 20 – in the initial screen. We will also sequence HVR-1 in each individual. Initially, we will also SNP type every Y-chromosome and mtDNA to confirm the haplogroup. Once we have a sufficient database, we will probably be able to predict haplogroup affiliation with a high degree of precision, allowing us to simply type the STRs for most individuals. Over time new markers will be discovered – some perhaps by us, to answer specific questions – but the key will always be having access to the indigenous DNA samples to type these markers. These are the most valuable asset of the project, and we don’t have to limit ourselves to any particular markers – we’ll choose the best ones to answer the questions we are investigating.
The maps shown on the website atlas at the moment demonstrate the routes followed by the markers that will be reported in the public component at the moment. Over time we will add more routes (= subhaplogroups) as the information on them improves. Remember that this is a GLOBAL project of enormous logistical complexity, and therefore that we may not show all of the details of a well-studied region like Europe at this time. We will be improving the level of detail over the coming years, and European users in particular should see their routes become much more detailed. Purchasing a participant’s kit is like purchasing a ‘subscription to your genome’, and you will be able to check back every few months to see what has been updated in your profile.
Finally, the data collected from the public part of the project will allow us to add an enormous number of genotypes to the database, giving us the power to answer some key questions. For instance, at the moment there is no evidence for interbreeding with Neanderthals as modern humans migrated into western Europe, but this is based on only 15-20,000 individuals who have been genotyped. Will we find a rare Neanderthal lineage in the 234,000th sample we type? Also, the public samples will allow us to assess patterns of genetic variation in admixed populations. There are some interesting studies we hope to do with the US census data, comparing Y and mtDNA patterns to that database. So these samples really are part of the project, not simply a way to raise funds - although that is a great aspect as well. Most people I've spoken to love the fact that all of the net proceeds from their kit - slightly more than 20% of the $99.95 price - get plowed back into the research and Legacy project.
See the earlier response as well.