Friday, August 14, 2015

Somali mtDNA frequencies

While like I've said; I don't touch upon Uniparental data (Y-DNA & mtDNA) too much, I still thought sharing this data would be rather interesting especially when you consider that I've gathered the mtDNA markers of about 75 ethnic Somalis who are on 23andme alongside the data from Mikkelsen et al. with a sample size of 190 Somalis.
And what's ultimately intriguing and why I've taken the time to quickly share these frequencies is how much the study & the 23andme set correlate.

I've essentially colored things based on the larger mtDNA lineage each subclade belongs to. All L2 lineages are colored blue, M lineages are colored crimson etc etc. 

The only meaningful differences would be for example that in Mikkelsen et al's non-L lineages (M & N) are slightly more common at roughly 37-39% whilst this goes down to 31-32% in that 23andme set (granted, the sample size is lower, I suppose). 

Otherwise the noticeable correlation is quite neat, I must say. The fact that L3 is the most prevalent non-M & N lineage. The fact that L2a is very common in all two sets and the general overlap in markers (the same markers appearing again). 

At any rate; I suppose I figured this was interesting data and worth plunging into one little place like this blog post for some to view. I'm not going to go into much detail at all here like the history of this or that Haplogroup. Sharing these frequencies should be enough because as I said; Uniparentals are not my forte; some blunders on my part pointed out by the helpful Zam in the comment section which led to a remodelling of this post are testament to that...

This is essentially the mtDNA data of about 265 Somalis most of whom are unrelated [check notes] and from differing regions across Greater Somalia in various cases, from three different sources and they correlate which in my humble opinion speaks rather strongly to how "representative" they are so enjoy this interesting data.


Reference List:




Notes:

1. Methodology behind how the 23andme samples were gathered is gone into here.

13 comments:

  1. I left a response on your thread at ABF: forumbiodiversity.com/showthread.php/44480-Sheikh-Awale-s-results/page11

    There is a small mistake you made. The two persons with L2c2 have an African-American mother while their father is a Gharri Somali. Thus, their maternal lineage did NOT come from Somalia and should not be included in your stats.

    I have a couple more Somalis that you probably did not include:

    L4a1a
    R0a1a
    M1a*
    L0a'b'f*
    N1e'I*

    IMG: http://i.imgur.com/eM2qVLD.png

    ReplyDelete
    Replies
    1. Ahh, I found the L2c odd... It's more common in West Africa and I've only ever seen it on a Sudanese Arab I've encountered. I'll rework the chart immediately, thanks. I excluded a handful of others due to having off mtDNA markers.

      But I told you to send me a visitor message on ABF, man. Elias takes too long to let new messages through sometimes. I think I kept going back for like 2 days to see your comment and it still wasn't let through.

      Delete
  2. Lastly, I want to remind you that Boattini et al. is based on a meta-analysis of Soares et al. 2012 which sampled refugees in Yemen ''from Somalia''. In the study itself (Soares) they even hint at having included Bantu samples in their Somalian set. It says nothing about their ethnic group, clan whatsoever. Considering that Yemen has a lot of South Somalian refugees, including Bantus, Bajuni, and Bravanese (Swahili group) that fled to the country, one should take that result with a grain of salt.

    For example, in Boattini / Soares et al. the following lineages were all detected: T2e, T2c, M5, M51a1, M76, M7b1, L2a1b, L3e, L3f1b4a, L1c, L0d3, B4a1a1a.

    NONE of the above lineages exist in any other Cushitic populations nor are those found in any of the ethnic Somalis on 23andMe. This only confirms that they have included non-Somalis in their set.

    I find the Mikkelsen's study far more representative of Somali maternal DNA. Denmark or most of Northern Europe mostly only has ETHNIC Somali refugees (mainly from Darood & Hawiye clan). Therefore, the Mikkelsen et al. study is pretty much the best we have so far.

    ReplyDelete
    Replies
    1. Well, I'll just keep this comment here as a warning that Boattini shouldn't be taken very seriously, then. I only trusted it cos I couldn't see the actual subclades and lacked access to the paper but a Somali I was in contact with who'd seen it a long time ago (he was trustworthy to me) led me to believe it was all in fact "ethnic Somalis" who were tested.

      Delete
    2. I just opted to get rid of Boattini et al. entirely as based on what you've shared it would just be misleading to keep it... You see, man... This is what happens when one leaves their comfort zone (mine being autsomal DNA base data). Anyway, the original text has been "archived" here- :

      https://docs.google.com/document/d/1h1cBdtx6XIxIyTLwadmo0St_Cp1m0z_kv_X39OuxQ6M/edit?usp=sharing


      -but I'll keep this post strictly Mikkelsen et al. and 23andme Somalis for now just to not mislead anyone that comes across this post.

      Thanks a lot for the heads up.

      Delete
    3. There is sometimes an overlap between Central Bantu African lineages and some Nilo-Saharan maternal lineages. The latter group is found in Northeast Africans due to the ancient connection between Nilo-Saharans and Cushites. Akin to how E1b can be both - E1b1b-M35 being clearly East African, while E1b1a-M2 not so much.

      If one carefully checks them, almost always the Bantu originated lineages can be differentiated from the older Nilo-Saharan lineages in the Horn.

      Some examples (if you start reading this, by native I mean present in the region before ~5000 BC):

      L3d1a1a with the back-mutation C150T is an undeniable Bantu lineage found even in Mozambique, Cameroon, Oman and Makrani coast of Pakistan. Clearly not native.

      L3d1a2 and L3d1a1b (with mutation 146C) on the other hand appear to be a proto-Nilo-Saharan lineage found mainly in Chad, Sudan, and in Cushitic populations. It diverged from the above group around 18,000 years ago. This type is native & ancient to Somalia.

      L3b, only two sub-types are possibly old in the Horn and that is L3b1b and L3b1a1a. Found in North Africans and Nilo-Saharan/Sahel groups but not Southeast Africans. Other variants are likely not and just recent additions. Rare overall.

      There is an L1b lineage in the Horn that at face value might seem Bantu because of L1 in general, but it is actually not. This specific derived variant L1b1a2a with mutation 16289G exists in Cushite & Ethiosemite groups. Entered the region 11 kya (pre-Bantu). IMG: http://i.imgur.com/u8aSDJH.png . It is found in Egypt & Ethiopia. I believe this one is now native, but all other L1 lineages are likely not (especially L1c). Rare overall.

      For L3f, almost all are native to the Horn, especially L3f2, L3f3, L3f1a, various kinds of L3f1b* (but not L3f1b1 (w C16292T) and L3f1b3-4 (C150T) L3f1b4 (A3505G) appear exogenous) and of course L3f* kinds are native. In general, most L3f is native, except a few specific ones.

      As for L2a, this one is tough to analyze due to its age (27 kya) and widespread nature throughout Africa. It is mainly L2a1h, L2a1d, pre-L2a1h/d variants, L2a1r (6743C & 15924G), certain L2a1c (sub-type L2a1c3a (7858T) and pre-L2a1c5 [16129A!]), L2a1j (8764A, 14464G), some types of L2a1* (especially the 11016A mutated kind) are native to the Horn. Other L2a sub-lineages like L2a1a3 and L2a1b are not native and clearly exogenous.

      L2b, only one derived sub-line L2b3c with mutation 12011C is autochthonous to the Horn. Ethioboy of ABF (Amharic mother) has this for example. Other L2b variants are not.

      For L0, I think the following are native to the Horn (most): L0f, L0b, L0a1d, L0a1a1, L0a1c, some L0a2c variants, L0a2d, L0a3, L0a4.
      The following are exogenous: L0a2a, L0a1b1, and L0a2b.
      I am undecided on L0d3, but lean towards it not being native to the Horn because of its connection with the Sandawe.

      Most other L lineages are much easier to analyze:

      L3a – all kinds native.
      L3i – all kinds native.
      L3x – all kinds native.
      L3e – all kinds exogenous.
      L1c - all kinds exogenous.
      L2c, L2e - all kinds exogenous.
      L3c, L3j – all kinds native.
      L3h – all kinds native.
      L4 – all kinds native.
      L5 – all kinds native (but very rare).
      L6 – all kinds native.

      Delete
    4. With the above knowledge in mind I analyzed Boattini’s & Soares’ et al. dataset and removed most of the disqualified non-Somali lineages (30 in total, or about 20% - which coincides roughly with the number of non-Somalis in Somalia). Considering that those Bantu & Persian-Indian lineages are totally irrelevant to our ancestry as ethnic Somalis (most of us are 0% Bantu & Persian-Indian admixed) I think what I did is fairly reasonable.

      https://docs.google.com/spreadsheets/d/1FfZHMsFttefSZxzKXgP_KvC12MrIOCqShjBYlCFyWio/edit#gid=824725636

      I get the following frequencies:

      L0 9.32%
      L1 0.85%
      L2 14.41%
      L3 25.42%
      L4 5.93%
      L5 0.85%
      L6 2.54%
      M 11.02%
      N 29.66%

      Seems fairly normal, only the HV1b frequency seems off in that study (I suspect they used people who are cousins and it repeated too much).
      In any case, I hope this information has been useful to you.

      Kind Regards,
      Zam

      Delete
    5. "I hope this information has been useful to you."

      Most certainly, my friend. Thanks a lot for helping out here and pointing out the issues with Boattini et al. and all of the other information you've shared is quite welcome too.

      "I get the following frequencies:

      L0 9.32%
      L1 0.85%
      L2 14.41%
      L3 25.42%
      L4 5.93%
      L5 0.85%
      L6 2.54%
      M 11.02%
      N 29.66%"

      I'll make a chart for these frequencies and share them here in the comment section soon. Thanks. :)

      Delete
    6. Chart based on those percentages you ascertained:

      https://docs.google.com/spreadsheets/d/1ccpHA4xD2vgNQrKftOQqbNBqYQu2O1q00mCzMXnx52Y/pubchart?oid=1339914478&format=interactive

      Delete
    7. MtDNA L2a is over 100kya because of MtDNA L2a5 which it originate in East Africa to Southeast Africa almost 100kya

      A migration time 95kya to 45kya

      Delete
  3. I got R0a for my mtdna where would you say this came from? And around when?, I'm a northern Somali, (isaaq)

    ReplyDelete
  4. Hi Awale, 23andme assigned me to the L0a'b'f mtdna haplogroup as it is probably the most accurate haplogroup 23andMe can give me. From what a friend told me L0a'b'f is one of the oldest lineages found in modern humans. In fact it is so old that it doesn't really give you any genealogical information. But I seem to belong to a lineage between L0a'b'f and L0f (L0f descends from L0a'b'f), or pre-L0f if you will. I lack two of the mutations that define L0f (13145G and 16327C), meaning my lineage split off sometime before the other lineages defined as L0f split up. I haven't met anyone else with this mtdna so far. I am from north east Somalia.

    ReplyDelete