A Y-chromosome
clade is the set of Y-chromosomes descended from a single Y-chromosome (the founder). In human terms, it consists of all the patrilineal descendants of a single man.
Clades are usually defined in terms of unique event polymorphisms (UEPs). Such polymorphisms occur rarely enough to be useful for cladistic analysis and determination of the human Y-chromosome phylogeny. A clade defined on the basis of UEPs is a haplogroup.
There is a misconception among some people that haplotypes, i.e. the alleles at several Y-STR loci can also define a clade. This is, however, impossible, for at least three reasons.
First, those who erroneously define clades based on Y-STR haplotypes do so by means of identification of a cluster of similar haplotypes.
But, this isn't enough. Suppose you identify a cluster of haplotypes, and every pair of them has a genetic distance of at most 3. First, it must be shown that the genetic distance between any haplotype in the cluster and any other haplotype (not in the cluster), must be greater than 3. Suppose you have identified a cluster of haplotypes {a, b, c} and dist(a, b)=3. Now, suppose that there is another haplotype d and dist(a, d) = 3. You are not justified to exclude d from the proposed "clade", since it may share a common ancestor with a that is more recent than the common ancestor of a and b.
Moreover, since age estimates are associated with
very wide confidence intervals, it is not guaranteed that greater genetic distance implies an older MRCA. To ensure that a group of Y-chromosomes are part of a clade, you must ensure that other Y-chromosomes have an even greater genetic distance than 3, so great indeed, that it is extremely unlikely that they are closely related to any Y-chromosomes in the haplotype cluster.
Needless to say, none of the folks who propose various "clades" on the basis of Y-STR haplotypes have bothered to prove that their haplotype clusters share a common ancestor that is more recent than that between cluster members and non-cluster members.
Second, suppose that you have identified a very distinctive haplotype cluster that addresses the first concern. Suppose that every pair of haplotypes within this cluster is within a short genetic distance (e.g., 3) and very far from any other haplotype (e.g., more than 15). Is this sufficent to define a clade?
It is not, since you are not certain that you have sampled the relevant Y-chromosomes, i.e., those that bridge the gap between your cluster and other Y-chromosomes, revealing them to be part of a continuum, rather than distinct members of a particular clade.
There are several cases in which supposed clades were defined, e.g., if a marker has a value of 12 or 14 but no intermediate (13) values, only to be invalidated later on when chromosomes with intermediate values popped up.
So, while the first concern identifies the need for clusters to be tight and distinct, the second concern identifies the problem that tight and distinct clusters may be spurious due to incomplete sampling of the genetic continuum.
Third, suppose that you have identified a tight and distinct cluster, and that moreover you have extremely large and comprehensive samples that give you a strong degree of confidence in your cluster. Have you now identified a true clade of the Y-chromosome phylogeny?
The answer is still no, and the reason is the time symmetry of the mutation model of Y-STR loci. Consider the following Y-chromosome tree.

Nodes with capital letters are at most
g=4 generations away from the clade founder. It is perhaps possible to devise a test that would be able to detect all these haplotypes as related. But, any test that would identify these haplotypes as descendants of the "founder" node, despite 4 generations of mutations, would also erroneously identify all the smallcase nodes, also at most 4 generations away from the "founder" as members of the clade.
A haplotype cluster centered on a presumed founder who lived
g generations ago will invariably include a set of Y chromosomes that do not form a clade.
Whereas a clade includes all the descendants of a single founder, a haplotype cluster will invariably include many men who are g generations away from the founder, whether they are his descendants or not.Why are Y-STRs qualitatively different from UEPs? While a UEP at the founder defines a watershed moment, separating the founder's descendants (who possess the UEP derived state) from his other relatives (who do not), Y-STRs do not define such a moment: node "m", a cousin of the founder, will possess a haplotype that is 4-generations removed from the "founder", just as node "Q" who is a great great grandson. By looking at haplotypes it is impossible to distinguish between the two.
There is a practical reason why the distinction between haplotype clusters and clades is important, and this has to do with
ancient DNA.
Suppose that a very old archaeological sample (of age A years) is Y-STR tested and reveals an R1b-like haplotype. Can we make the inference that this
was a member of the R1b clade? No, since many (non-descendant) patrilineal relatives of the R1b founder would have similar haplotypes.
Are we justified in claiming that the founder of haplogroup R1b was earlier than A years? The answer is again no, as haplotypes similar to current R1b ones existed before R1b was founded.
How is this compatible with the known fact that haplogroups can be
predicted from sufficiently long Y-STR haplotypes?
First, such predictions don't rely only on the Y-STR haplotypes, but also on large number of haplotypes with
known UEP results. Haplogroup prediction relies on UEPs and can't be made independent of UEPs.
Second, such predictions don't rely only on the Y-STR haplotypes, but also on the knowledge that they are
present-day haplotypes (last row in the figure).
Today, only the descendants of the clade founder survive in the haplotype cluster, but this is not necessarily the truth for earlier times.
ConclusionClades cannot be defined based on Y-STR haplotype clusters for several reasons, both practical and theoretical.
On the practical side, it is extremely
difficult to define a clade using Y-STRs because haplotype clusters must be shown to be distinctive (clearly separated from other Y-chromosomes) and genuine (separated because of common descent, and not incomplete sampling).
But, even if a clear-cut genuine haplotype cluster is detected, it does not constitute a clade, since the time symmetry of Y-STR mutations necessitates that it will include (erroneously) non-descendant relatives of the founder.
There is nothing wrong with exploratory analysis of haplotype clusters, if one keeps in mind that such clusters are not and should not be thought of as clades of the Y-chromosome phylogeny.