To further asses Africa’s genetic landscape with the samples publicly available, I’ve taken a few steps into investigating some equations which had not yet been experimented with. However, all statistical tools used within these analysis have been made publicly available and will be referenced below. Please note, that this project wasn’t sponsored in any way at the time of investigation. And as a one man team, I have only limited resources and time. Further investigation into the possibility that any new model presented above is accurate is highly recommended.
Methods.
Samples used in the multiple analysis above were taken from a various repositories. In order to best satisfy coverage through said forms of analysis, three datasets were merged and processed. The first dataset required the most African samples acquirable and was used solely for calculating the date and to a lesser extent the amplitudes of admixture. The second dataset was merged with intent to capture the most shared variants between each population. This way, processes such as ADMIXTURE1 can give more robust results and required less time to run however, at the cost of reducing the populations available for analysis. The third dataset was a subset of the second; containing handpicked individuals for running qpGraph2. All of the datasets were merged and went through quality control provided by plink3 and to a lesser extent vcftools and Eigenstrat. All sex and mitochondrial chromosomal reads were removed as well as all potential triallelic markers. The snp counts were 2.25, and 1.3 million for the dataset 1 and datasets 2 and 3 respectively.
To infer dates of admixture, I ran MALDER4, on select populations in the first dataset. Outside of those mentioned above, no variants were pruned. This was to ensure that Linkage Disequilibrium (LD) data wont be lost in the process. Being that the populations in question harbor an older age and or more genetic diversity than usual, the minimal length between two base pairs accepted were fixed with ‘min-dis: 0.005.’ Donating populations were pooled to were ancient populations who are similar autosomally were placed into the same quadrant.
Estimates for One (left) and Two (right) admixture events in select African populations. prepublished details about the admixing populations can be seen here on sheet two.
Visual For a Single Admixture Event
The Admixture graph was generated by qpGraph5 with same principles as in Lipson (2017).6 Initially I created a base model including Chimp, Altai, Faroaskopp, Mota, Shum_Laka, Ust-Shm, Russian and Sunghir. I wrote a custom script to allow for permutations after constructing the base graph similar to what was done in Lazaridis 20187 . The script called for a pair of each node previously designated to contribute an admixture weight to the test population. In doing so, models that generated an outlying Z-score with an absolute value of more than 3.00 weren’t considered for a viable phylogeny despite whether or not they were generally accepted in previous studies. And models containing a Z score with an absolute value of more than 5.00 weren’t even printed. If no fits calculate a respectable score when building from the base model alternative admixture events for populations within the base sample would be explored. The graph presented in the initial post was just one among many successful graphs, but selected for parsimony and legibility.
ANA Admixture proportions were determined by using a series of formal stats. These were expedited with the help of qpDstat included in the admixtools package.2 The calculations assumed a phylogeny shown in the graph below, where:
(O) would be ASW as indicated in a previous post, here represented by Mbuti, (C) would be ASA represented by the samples of Ballitobay, (A) would be ANA1 a population more closest to ANA’s progenitors and represented by the non “basal” ancestry in Mota. (B) would be ANA2 represented by the non Eurasian portion of Taforalt, (X) would be West Eurasians represented by Sunghnir IV and (Y) would represent an early East Eurasian represented by the earliest Hoabinhian sample. Mbuti was chosen as the AWS outgroup due to previous evidence suggesting that they’d harbor deep ancestry that isn’t ASA or AEA derived. (dubbed Central African Hunter gatherers in Lipson. 2020)8 And since we’re looking for ancestry that is distinctly not representative of Eurasian back-migration, using a sample with potential “Basal Eurasian–like” ancestry (as shown in my graph prior) in the Hoabinhian felt appropriate. With that said calculating the ratio of ANA ancestry in select samples should be quite simple. Where we can just use:
Where α is ANA related ancestry, δ is shared drift with Eurasians and γ is various “basal” Ancestry. In the case of which our test population does not have ancestry that’ll form a clade with populations who have deep ancestry in relation, (represented by C, and O, in the figure) γ is ≤1. Being that ANA ancestry hadn’t been parsed or discovered in detail yet we need to make two key assumptions in order for this to method to be considered. One would be that Mota has no Eurasian ancestry and the other is that Taforalt has no ancestry more Basal than ANA1.That’ll mean any population with a γ assumed ≤1 would be a mixture of pure ANA and a Eurasian of some sort.
For populations who actually harbor more deep ancestry in respects to ANA1,we can first define the quadrants we want to investigate; deep ancestry defined by both Ballitobay (C) and Mbuti (O) giving us:
Where γa is Mbuti related and γb is Ballitobay related. We can then solve with:
Note, that in a situation where γx is ≤ 1 multiplying with 1/γx will unwantingly increase the estimates. Therefor simply calculating ‘1- [(|1-1⁄γx| + 1-1⁄γx)/2]’ and plugging it in in place off γx for all potential tests populations can suffice. To get 1⁄α which is the inverse ratio of Mota’s ANA percentage. We can use the ratio of Taforalt’s shared drift with just Mota’s ANA1 ancestry:
Plugging in ANA1&2 ancestry is also straight forward. Here we can look to deduct the portion of Mota’s deep ancestry (γm)and Taforalt’s non basal related ancestry said to be most similar to Villabruna.7 (β). However, with our outgroups being the best samples discovered so far for their respective quadrants in ancient African related ancestry. We have to look for a sample that might find themselves between Mota, Mbuti, and Ballitobay. Using qpAdm it was shown that a sample from roughly 4 thousand years ago at Kakapel, Kenya could be modeled as Ancient east African/Mota and Mbuti. Though it isn’t clear as to if Kakapel is a mixture between a Mota-like ancestor and Mbuti, we could still use his genotype to solve to solve for γm with help from Taforalt’s non Eurasian ancestry (ANA2):
In doing the above I got γm = ~0.42, meaning 42% of Mota’s ancestry is deep ancestry not shared with Taforalt. Subsequently, being that I have found Mota to have roughly 21% AWS, a similar number for ghost human/ Central African HG in Lipson (2020), it’ll be suggested that Mota’s Ballitobay ancestry would be around 21%. In the previous analysis I used γm1 = 0.42 for Mbuti related ancestry and γm2 = 0.22 for Ballitobay related ancestry in Mota to parse ANA ancestry in select samples. Raw data of other equations with differences in Motas Ballitobay related ancestry can be looked at here. Following suit with the above example of how to deduct non investigated ancestry from Taforalt we can do the same for Mota to get ANA1.
and:
resulting in finally:
For more information on deducting non related ancestry you can see the method used in Lazaridis (2014) to discover Neareastern ancestry in various West Eurasians.9
Limitations & Future Experiments.
As I stated initially, representative samples for accurate analysis is limited. For instance, of the three way split from Anatomically Modern Humans or more accurately early Africans, we only really have a singe clear representative group. That would be the South African Hunter gathers characterized by sequenced samples from Ballitobay and the Faroaskopp Rockshelter South Africa. There had been little resolution on how Ancient East Africans are related to or divergent from South African hunter gathers. Since Mota’s sequencing, he was used as a putative marker for ancient east African related ancestry, though it hasn’t been proven with certainty that Mota himself wasn’t an outlier for such ancestry. To elaborate, the sample mentioned earlier from Kakapel 3.9Kya was described as being a mixture of Mota and Mbuti though he could very well been descendant of a progenitor to to East African ancestry. Similarly, the ancestry he had related to Mbuti could be representative of any of the branches of early African populations suggested here and in Lipson (2020), such as AWS, Ghost human or Central African Hunter-gatherers. It would be interesting to find out for certain if east Africans were of their own branch of anatomically modern human, South African admixed Ancient West Sudanic people, or vice versa.
Another concern is the assumptions made regarding ANA ancestry and how ANA relates to deep African ancestry. The method I used earlier consistently assumed that ANA would be purely downstream from Mota, but it might not be so in reality. In some cases there seems to be an affinity towards Ancient South Africans by samples like Taforalt. For instance Dst(Taforalt, EE; WE, ASA) prints a more negative Z-score than Dst(Taforalt, EE; WE, Mbuti) despite Mbuti showing evidence of having North African related ancestry. (EE and WE being East and west Eurasian respectively.) So while we don’t know for sure the amplitude of which unknown archaic admixture in ASW/Mbuti differentiates them from other populations, we also don’t fully know the exclusivity of early ANA components in Northern Africa. An obvious limitation in parsing ANA ancestry is the inability to estimate ANA ancestry in Mbuti and Ballitobay samples due to the assumptions that lead them to be outgroups.
With more time I can hope to get better resolution and test more populations for various forms of African ancestry. Hopefully there would be a way to visualize Ancient Western Sudanic or “Ghost Human” ancestry using samples available. Also we can get a much better look at the genetic landscape with more available samples from Africa, particularly those who have Central Saharan or Sahelian ancestry. Populations from Chad, Niger, Southern Libya, and Mali would be a great start for tracking down some overlooked ancient components related to any which quadrant of African ancestry. Clarity on the substructure within West African related populations could be further understood as we seen when observing Sahelian populations recently.10 I also hope to be able to disentangle substructure withing Bantu populations. With evidence showing waves of population movements involving Bantu speakers, it’ll be interesting to see if there were minute interpopulational differences among Bantu speakers by region or by Bantu derivative languages.
References
1. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
2. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
3. Chang, C. C. et al. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4, (2015).
4. Pickrell, J. K. et al. Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl. Acad. Sci. U. S. A. 111, 2632–7 (2014).
5. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–93 (2012).
6. Lipson, M., Reich, D. & Townsend, J. P. A working model of the deep relationships of diverse modern human genetic lineages outside of Africa. Mol. Biol. Evol. 34, 889–902 (2017).
7. Lazaridis, I., Belfer-cohen, A., Mallick, S. & Patterson, N. Paleolithic DNA from the Caucasus reveals core of West Eurasian ancestry.
8. Lipson, M. et al. Ancient West African foragers in the context of African population history. Nature 1–6 (2020) doi:10.1038/s41586-020-1929-1.
9. Lazaridis, I. et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Fig S3 Pre-Revised. Chart 14 doi:10.1101/001552.
10. Babiker, H., Heath, J., Reed, F., Schiffels, S. & Gray, R. D. Striking Genetic Diversity Among Populations of West Africa Uncovers the Mystery of a Language Isolate. SSRN Electron. J. (2020) doi:10.2139/ssrn.3631471.