Tuesday, July 5, 2016

Somali qpAdm models using new ancient genomes

So, I asked David over at Eurogenes to run Somalis as a mixture between South Sudanese people and Natufians in order to see how well the model would fit using a formal statistical method like qpAdm and he got some pretty surprising results overall:

Natufian + Sudanese (south):

Sudanese: 54%
Natufian: 46%

Neolithic Levant + Sudanese (south):

Sudanese: 54%
Neolithic Levant: 46%

Neolithic Levant + Chalcolithic Iran + Sudanese (south):

Sudanese: 55%
Neolithic Levant:  34%
Chalcolithic Iran: 11%

Now, what's going to surprise you is that the third model is the one that fits the best, and by a long shot when compared to the first model. Natufian + Sudanese (south) fits the worst (chisq: 26.256, tail prob: 0.09%, std. errors: 0.009), Neolithic Levant + Sudanese (south) fits much better (chisq: 7.593, tail prob: 47%, std. errors: 0.006) and Neolithic Levant + Chalcolithic Iran + Sudanese (south) fits even better (chisq: 4.975, tail prob: 66%, std. errors: 0.057).

The last one almost fits as well as a Corded Ware sample being modeled as ~70% Yamnaya & ~30% Esperstedt Middle-Neolithic (chisq: 2.621, std. errors: 0.060) which roughly fits with the data from peer-reviewed studies like Haak et al. 2015:

This oddly reminds me of some models the new Lazaridis et al. pre-print shared where they were asserting that Somalis were a mixture between Mota & population along the Iran_ChL→Levant_BA cline:

I didn't make much of the above at the time. For one, Mota is a poorer fit for Somalis' African ancestry than the South Sudanese (due to various reasons alluded to here), and it made little sense that Somalis' West Eurasian ancestry corresponded better with Bronze Age Levantines and Copper Age Iranians than Neolithic Levantines, for example. At least in my humble opinion.

I figured Lazaridis & company just didn't try a different model that would probably fit better but it's now odd that this fits a bit well with what the above qpAdm models imply which is that "Neolithic Levantine + Chalcolithic Iranian + Sudanese (south)" fits much better than "Natufian + Sudanese (south)" and somewhat better than "Neolithic Levant + Sudanese (south)".

Future analyses and data will be needed but I should point out that some current ADMIXTURE runs don't seem entirely supportive of such a model but ADMIXTURE is not necessarily as precise as qpAdm can be.

For one, qpAdm is preferable because it outright allows you to take a Natufian and then a Neolithic Levantine and see which one you have a greater affinity for (mixture wise) but ADMIXTURE is a lot more messy in that it allows all of these clusters to form among various modern & pre-historic populations and could thus be more prone to producing perhaps more clunky results. Formal statistical methods like qpAdm also seem to be "drift resistant" / resistant to being skewed by recent genetic drift and can thus notice deeper ancestry better than ADMIXTURE to a certain degree.

But we should see what some other analyses say like d-stats and tree-mix. I'm skeptical about the third model in particular (for the time being), despite how well it fits.

Reference List:


1. Link to the full qpAdm results.


  1. So does this mean that all previous admixture runs that didn't pick up CHG admixture were wrong? Also do you have the Gedmatch kit of Chalcolithic Iran, I'm curious as to what its components are

    1. I honestly don't know but I've asked another person to try similar models and they should be able to get started soon. We'll see if the models fit once again and if this can be backed up by something like d-stats. Some runs do notice some "CHG" type stuff (I.e. Punt's or the K=16 for Haak et al.'s K=20 run) but yeah... Surprising stuff and I'm not willing to accept wholeheartedly just yet.

      And I'm sadly not aware of any Chalcolithic Iranian sample's kit number. Apologies, bro.

    2. Yeah it should be interesting if these models do end up fitting, should help put a timeframe on the "Eurasian" side of Horners.

      Also I found the Gedmatch kit of the Chalcolithic Iranian sample, its M124870

    3. Well, the admixture in Horners is, especially for Highlanders, seemingly a bit "episodic". So if Somalis really do have some Chalcolithic Iranian-like admixture; it might only hint at later waves of admixture we hadn't noticed prior but most of the admixture is likely still as old as we used to assume. But yes, this'll be interesting if the models hold.

      And thanks a lot for the kit! :-)

  2. Yes I agree, it does seem like there were multiple admixture events. I don't know if you've seen it, but according to Kurd's new Near East K13 calculator, Mota comes out as 13% Natufian.

    Here are the results

    1 SUB_SAHARAN 79.93
    2 NATUFIAN 13.09
    4 PAPUAN 0.96
    5 KARITIANA 0.62
    6 EHG 0.31

  3. So if the best fit is "Neolithic Levant + Chalcolithic Iran + Sudanese (south)" then does this change what we previously thought about Somalis eurasian levels,instead of being 60% SSA they are actually 5% less? Or does this Levant_N sample also have SSA?

    1. Can't be sure on that count, to be honest. PCAs, tree-mix and ADMIXTURE imply these Neolithic Levantines are indeed a little "SSA" and that could be what's making Somalis seem a little extra West Eurasian but d-stats don't support them being part "SSA". All a bit confusing right now, to be frank. We'll see with future samples and analyses.

  4. The Chalcolithic Iran admixture could account for the presence of T and non-Arabian J in Cushitic speakers. Chadic speakers also carry R1B which could have been present in low levels amongst the earliest Levantine speakers due to Chalcolithic Iran admixture.

    1. Y-DNA T has already been found in Neolithic Levantines though and these particular Neolithic Levantine sample don't seem to show any Chalcolithic or Neolithic Iranian-like admixture. They just seem to be a mixture between Villabruna/WHG-related ancestry and Basal Eurasian.

    2. Do you agree that the Natufians were speakers of Proto-Afroasiatic? Since the Natufians were mostly carriers of haplogroup E1b1 why are Semitic speakers overwhelmingly haplogroup J? Subsequent assimilation of peoples from the caucasus?

    3. Y-DNA J1 so far looks like its origins lie more along the lines of the Iran~Transcaucasus~Anatolia area and SO FAR doesn't appear in Neolithic and Epipaleolithic Levantines but appears in Bronze Age ones = your guess is as good as mine and it was probably brought over to the region via Chalcolithic Iranian/CHG/Neolithic Iranian-related admixture and was tacked onto early Semitic speakers whose earliest linguistic predecessors were probably E-M35 carriers. But we'll see as more and more ancient DNA floods in.

      And I don't know if Natufians were speakers of Proto-Afro-Asiatic. They could've been, I suppose. But if your question means "Do I think they were THE Proto-Afro-Asiatic speakers" then no. These 6 samples are just carriers of E-Z830 which is like one lineage out of dozens of E-M35 lineages found among Cushitic, Berber, Semitic and former "Egyptian" speaking Afro-Asiatic speakers. Besides, I'd have to ask my linguist compadres to chime in but, as various linguists I encounter nowadays seem to assert, Afro-Asiatic most likely came to be around the Southern Egypt~Northern Sudan area. Also, a linguistic issue is a linguistic issue. The Afro-Asiatic Urheimat will only be deciphered via linguistic study not via population genetics which'll only help lend credence to or shake up future linguistic hypotheses like how ancient DNA is helping reaffirm the already linguistically established Steppe Hypothesis in the case of Indo-European.

  5. It makes sense. The ancestry.com tests are telling us horn of Africans that we are 55% south eastern bantu and 45% middle eastern.we aren't bantu and they didn't have the Dna stuff for nilotics however the percentages are pretty accurate.