Saturday, February 21, 2015

The East African Cluster

The East African cluster/component (which I used to dub AEA/ Ancestral East African) is an ancestral component that peaks mostly, for the time being, in populations such as the Dinka, Anuak, Gumuz & generally South Sudanese populations, most of whom are speakers of Nilo-Saharan languages (a family which Old Nubian whom some of you maybe familiar with is a member of) and it is often dubbed "The Nilo-Saharan" component as a result.

The ADMIXTURE run from Hodgson et al. 2014 does, as many studies do, notices the existence of this cluster, it's the light blue one that is duly named Nilo-Saharan by them. 

At the lower Ks of this run and in many other runs, both those arranged by more layman sources as well as peer-reviewed studies; this component makes up a great degree of the ancestry in Horn Africans as well who are sort of the next peak of it in East Africa although we lack North Sudanese samples (Nubians, Sudanese Arabs, Beja etc.) for the time being.

This component tends to make up ~60% of the ancestry in Somalis & for example ~50% of the ancestry in Ḥabeshas such as Tigray-Tigrinyas. It also has a certain spread in West Asia & North Africa, most North Africans (Egyptians, Berbers) often prove to be almost ~20% East African [3], the rest of their African admixture being "Niger-Congo", a component of West-Central African origins that can be found all over Africa except, mostly, the Horn of Africa in varying degrees while peaking in Niger-Congo speakers.

The East African cluster exists in Levantines right down to Ashkenazi Jews & Negev Bedouin at a rate of about ~1-15% and in Arabians themselves as you can observe in Hodgson et al. itself at a rate of ~5-15% [4] [1]. Much of its presence in these populations given its spread seems quite extensively ancient (its ancient nature is likely even greater in North Africans) and this component's spread is not to be conflated with the Arab Slave Trade though some of its spread might be owed to this.

Mainland East Africa

Yemenite Jews for one are likely similar to some of the Christian populations of the rest of West Asia & Egypt; in that they are a perhaps decent representation of  the pre-Islamic inhabitants of their respective homeland (in this case; Yemen), seemingly avoiding a good degree of the admixture incurred by their Muslim counterparts [8], it says a lot that they too have this component at a rate of ~5-15%. It's also quite prevalent in Southeast Africa, making up a non-negligible segment of the ancestry in various Southeast African Bantu speaking peoples as well as the Hadza, and so on.

What makes this component intriguing however is not its spread, though its main geographic concentration (East Africa) is key. It's that this component and its carriers tend to show a greater affinity for Eurasians or rather; Out of Africa Populations than other African populations do.

Above is a PCA plot (Principal Component Analysis) from Pagani et al. , a study focused on the Horn of Africa populations. In this PCA/ cluster you can see that Anuaks & the South Sudanese samples have a greater pull towards West Asians & North Africans as well as Europeans (CEU) than Yorubans who are a Niger-Congo speaking West African ethnic group do (though the pull is small) despite the fact that Anuaks and such have anything between ~10-30% Niger-Congo input (makes up almost the entirety of the Yoruba).

As many of you might know by now if you've been keeping up with this blog, Out of Africa populations in general are thought to have expanded out of East Africa or at least share more of a lineage with East Africans over other Africans, this is noticeable in that every single mtDNA Haplogroup outside of Africa traces back to mtDNA L3 which is an East African marker.

A theoretical spread of how L3 spread

The Y-Chromosome Haplogroups of all Out of Africa populations also trace back to CT which is for now thought to be East African although its ancestors BT & Haplogroup A (Anuaks, Dinkas etc. are rich in Haplogroup A and rich in L3 markers) are clearly African.

It seems to be this general East African (their ancestors honestly could have been anywhere in Africa, it's just that their lineage is mainly pulling toward East Africans as we know them today) origin postulated for Out of Africa Populations that seems to give modern East Africans a certain affinity for them that other Africans lack or have less of.

However the East African cluster is most likely not a pure representation of the cluster, for example, the Proto-Eurasians (the first and ancestral OoA population) descend from. Carriers of this component including those whose ancestry it covers at a rate of ~70-80% (Anuaks etc. have a certain degree of Niger-Congo admixture) show markers such as L2 and so on that are signals perhaps for more divergent African ancestry in them and likely in their component.

The component also at the lower Ks even shows a meager "Khoisan"-like influence that you can see for yourself in Hodgson's own plot by observing the populations it peaks in at K=3 (numbers on the side) to K=5.

It even has a very small Eurasian/Out of Africa element that tends to show in these populations at the very lowest Ks (K=2 etc.).

It can only be assumed that if these more divergent influences (excluding the small Eurasian element) were not likely present in it (we'll need ancient genomes from East Africa to truly understand the component); the component's affinity for Eurasians would become even more potent. It could perhaps even seem Eurasian itself as a result... It could possibly be assumed that there is a very ancient East African cluster within it, noted by lineages such as L3 while the rest is more divergent African input marked by other markers such as L0 & L2.

Nevertheless, its current affinity for Eurasians brought about by the richness for example of lineages such as L3 in its carriers is quite evident. It shows in terms of whole-genome-wide variance as well as Fst distances.

The Ethiopic (Omotic) component is a mixed cluster made up of this East African component or predominantly, then pretty much something related to the Arabian & Khoisan clusters in that Fst distance table above so its greater closeness to Eurasia is explained mostly by direct and non-negligible West Asian input (~15%) but notice how Nilo-Saharan (East African component we're discussing here) is closer to all of the Eurasian clusters than Niger-Congo, Pygmy & Khoisan are.

Now, Fst distance can be influenced by a variety of factors, i.e. populations with small sizes can enjoy a higher incidence of genetic drift which will raise their Fst distance from other populations (it's ultimately not a 100% reliable way to gauge "distance") but then the thing is; even the Niger-Congo component likely shares in very ancient East African input.

Niger-Congo carriers such as Yorubans will often show L3 and as you can see their component has a certain affinity for Nilo-Saharan that "Pygmy" & "Khoisan" lack and they too prove closer to Eurasians than lets say the Khoisan do even when we utilize whole genome data to look at the estimated divergence dates between these populations:

The Yoruba are much less divergent in this respect from the French & the Han than San/Khoisan are. This is perhaps due to their ancient East African input which is shared to some extent between their Niger-Congo component and lets say your average Anuak's Nilo-Saharan component. David Reich himself once suggested that Niger-Congo Africans could be made up of divergent African lineages.

It's quite likely to me that components such as East African/ Nilo-Saharan & Niger-Congo will crumble into other components once we have ancient genomes from all over Africa to work with, much like what happened with Europe and our current knowledge of the three-way mixture nature of Europeans [7]. One of the components it will be made up of will likely be a specifically East African cluster with what might be a very strong affinity for Eurasians (Out of Africa Populations) given the origins of Eurasians as a whole. For now, only time & waiting will tell.

For the time being we just know that this cluster has what can be assumed to be a somewhat greater affinity for Eurasians than other African clusters as displayed above.

Reference List:

6. The expansion of mtDNA haplogroup L3 within and out of Africa, Soares et al.

7. Ancient human genomes suggest three ancestral populations for present-day Europeans, Lazaridis et al.


1. Rummage through the Ethio-Helix blog's pages (the links I shared) on Haplogroups in East Africa, the author sets things up quite well in terms of interactivity and shares links to each of the studies he got the marker percentages from so everything he's sharing is entirely reliable, no worries. It's a good blog in mostly all other respects too so I recommend it.

