My Account Login | Contact Us
Tracking Five Millennia of Horse Management with Extensive Ancient Genome Time Series

Authors Antoine Fages, Kristian Hanghøj, Naveed Khan, Alan K. Outram, Pablo Librado, Ludovic Orlando

In Brief

Genome-wide data from 278 ancient equids provide insights into how ancient equestrian civilizations managed, exchanged, and bred horses and indicate vast loss of genetic diversity as well as the existence of two extinct lineages of horses that failed to contribute to modern domestic animals.


  • Two now-extinct horse lineages lived in Iberia and Siberia some 5,000 years ago
  • Iberian and Siberian horses contributed limited ancestry to modern domesticates
  • Modern breeding practices were accompanied by a significant drop in genetic diversity

Fages et al., 2019, Cell 177, 1–17 May 30, 2019 ª 2019 The Author(s). Published by Elsevier Inc.



Horse domestication revolutionized warfare and accelerated travel, trade, and the geographic expansion of languages. Here, we present the largest DNA time series for a non-human organism to date, including genome-scale data from 149 ancient animals and 129 ancient genomes (R1-fold coverage), 87 of which are new. This extensive dataset allows us to assess the modern legacy of past equestrian civilizations. We find that two extinct horse lineages existed during early domestication, one at the far western (Iberia) and the other at the far eastern range (Siberia) of Eurasia. None of these contributed significantly to modern diversity. We show that the influence of Persian-related horse lineages increased following the Islamic conquests in Europe and Asia. Multiple alleles associated with elite-racing, including at the MSTN ‘‘speed gene,’’ only rose in popularity within the last millennium. Finally, the development of modern breeding impacted genetic diversity more dramatically than the previous millennia of human management.

Part 3 of 4

Gait, Speed, and Selection

We next aimed to identify possible differences in the traits selected prior to and after the C7th–C9th transition. Only one subset of horses provided sufficient data for calculating the Population Branch Statistics (PBS) (Yi et al., 2010) considering at least 10 individuals above 1-fold depth-of-coverage per archaeological site (Tables S6 and S7; STAR Methods). It consisted of 11 Bronze Age Deer Stone horses (representing the pre-C7th–C9th Asian group), 11 Gallo-Roman horses (pre-C7th–C9th European horses), and 17 Byzantine horses (post-C7th–C9th). Enrichment analyses of the genes overlapping the top 1,000 50 kb windows revealed that functional categories related to cervical and thoracic vertebrae were over-represented in Byzantine horses (adjusted p values%0.05) (Figure 4A; STAR Methods; Figure S4). Eleven genes within the HOXB/C clusters, instrumental for the development of the main body plan and the skeletal system (Pearson et al., 2005), featured among the windows showing the strongest PBS values (Figure 4A). These findings were robust to the number of outlier windows considered and the significance threshold retained was conservative relative to neutral expectations (STAR Methods). Therefore, our results provide evidence for selection toward changes in the skeletal morphoanatomy of the post-C7th–C9th horses related to Sassanid Persians.

Figure 4
Figure 4. Selection Targets through Time (A) Population branch statistics (PBS) along the genome of 17 Byzantine horses, compared to 11 Gallo-Roman and 11 Deer Stone horses. The underlying tree topology consists of three groups with sufficient data and representing pre-C7th–C9th horses in Asia and Europe and post-C7th–C9th horses descending from Sassanid Persians. We used non-overlapping 50 kb genomic bins, and genes underlying enrichment for functional categories related to vertebral changes are indicated. These include Sf3b1 and seven HOXB/C genes. Hoxc11, Hoxb7, Hoxb13, and Hoxc12 are not annotated as related to vertebral modifications, but embedded within the two independent clusters of HOXB/C genes. The MSTN speed gene, one selection candidate in Byzantine horses, is also highlighted. See also Figure S4 and Tables S6 and S7 for further information. (B) Temporal allele trajectories for six SNPs associated with racing performance and locomotion patterns. (C) Variance in allele frequency over time for the 57 SNPs investigated, categorized according to their impact on racing performance, body conformation, diseases and coat-color variations. The red dashed line delimits the 95th percentile of the variance distribution obtained from all SNP positions segregating genome-wide. See also Figures S5 and S6 for the full list of the SNPs investigated.

We further explored temporal shifts in the traits that are commonly selected by modern breeders. We retraced allelic trajectories at key genomic locations associated with or causal for locomotion, body size, and coat-coloration phenotypes. We also tracked known variants underlying genetic disorders through time (Figure S5; STAR Methods). Allele frequencies were calculated every 1,000 years (step size = 250 years) and restricted to the lineage leading to modern domesticates (DOM2) (Figures 4B and 4C). Mutations causing genetic disorders were extremely rare, including the GYS1 H allele responsible for a severe myopathy in Quarter horses and other heavy and saddle horse breeds. This allele was almost absent across all archaeological sites and, thus, not particularly advantageous for past breeders despite the increased glycogen storage muscular capacity conferred in starch-poor diets (McCue et al., 2008). Spotted and dilution alleles also remained at low frequencies, in contrast to the MC1R chestnut coat-coloration allele, which was relatively common, except at the end of the Middle Ages (Figures 4B and S6). The DMRT3 allele that causes ambling and improves speed capacity in Icelandic horses (Kristjansson et al., 2014) was first seen in a Great Mongolian Empire horse (TavanTolgoi_ GEP14_730) and slowly gained in frequency thereafter (Figure S5). Interestingly, the MSTN ‘‘speed’’ gene was among the PBS selection candidates in the post-C7th–C9th branch (Figure 4A). We found that a number of alleles involved in racing performance, including at MSTN and PDK4 and ACN9 (Hill et al., 2010), rose in frequency in the last 600–1,100 years (100–1,100 and 600–1,600 years ago) (Figure 4B). Allele frequencies at these three loci also varied significantly more through time than other mutations genome-wide (Figure 4C). Altogether, this supports that speed capacity was increasingly selected in the last millennium.

Discovering Two Divergent and Extinct Lineages of Horses

Domestic and Przewalski’s horses are the only two extant horse lineages (Der Sarkissian et al., 2015). Another lineage was genetically identified from three bones dated to 43,000–5,000 years ago (Librado et al., 2015; Schubert et al., 2014a). It showed morphological affinities to an extinct horse species described as Equus lenensis (Boeskorov et al., 2018). We now find that this extinct lineage also extended to Southern Siberia, following the principal component analysis (PCA), phylogenetic, and f3- outgroup clustering of an 24,000-year-old specimen from the Tuva Republic within this group (Figures 3, 5A and S7A). This new specimen (MerzlyYar_Rus45_23789) carries an extremely divergent mtDNA only found in the New Siberian Islands some 33,200 years ago (Orlando et al., 2013) (Figure 6A; STAR Methods) and absent from the three bones previously sequenced. This suggests that a divergent ghost lineage of horses contributed to the genetic ancestry of MerzlyYar_ Rus45_23789. However, both the timing and location of the genetic contact between E. lenensis and this ghost lineage remain unknown.

PCA revealed that native Iberian horses (IBE) from the 3rd and early 2nd mill. BCE cluster separately from E. lenensis, Przewalski’s horses (and their Botai-Borly4 ancestors) and the lineage leading to modern domesticates (DOM2) (Figure 5A; STAR Methods). This indicates that a fourth lineage of horses existed during the early phase of domestication (Gaunitz et al., 2018; Outram et al., 2009). Members of this lineage possess their own distinctive mtDNA haplogroup (Figure 6A; STAR Methods) and are represented by two Spanish pre-Bell Beaker Chalcolithic settlements (Cantorella and Camino de Las Yeseras) and a Bronze Age village (El Acequio´ n), with archaeological contexts compatible with both wild and domestic status.

Figure 5
Figure 5. Genetic Affinities (A) Principal Component Analysis (PCA) of 159 ancient and modern horse genomes showing at least 1-fold average depth-of-coverage. The overall genetic structure is shown for the first three principal components, which summarize 11.6%, 10.4% and 8.2% of the total genetic variation, respectively. The two specimens MerzlyYar_Rus45_23789 and Dunaujvaros_Duk2_4077 discussed in the main text are highlighted. See also Figure S7 and Table S5 for further information. (B) Visualization of the genetic affinities among individuals, as revealed by the struct-f4 algorithm and 878,475 f4 permutations. The f4 calculation was conditioned on nucleotide transversions present in all groups, with samples were grouped as in TreeMix analyses (Figure 3). In contrast to PCA, f4 permutations measure genetic drift along internal branches. They are thus more likely to reveal ancient population substructure. (C) Population modeling of the demographic changes and admixture events in extant and extinct horse lineages. The two models presented show best fitting to the observed multi-dimensional SFS in momi2. The width of each branch scales with effective size variation, while colored dashed lines indicate admixture proportions and their directionality. The robustness of each model was inferred from 100 bootstrap pseudo-replicates. Time is shown in a linear scale up to 120,000 years ago and in a logarithmic scale above.

Modeling Demography and Admixture of Extinct and Extant Horse Lineages

Phylogenetic reconstructions without gene flow indicated that IBE differentiated prior to the divergence between DOM2 and Przewalski’s horses (Figure 3; STAR Methods). However, allowing for one migration edge in TreeMix suggested closer affinities with one single Hungarian DOM2 specimen from the 3rd mill. BCE (Dunaujvaros_Duk2_4077), with extensive genetic contribution (38.6%) from the branch ancestral to all horses (Figure S7B). This, and the extremely divergent IBE Y chromosome (Figure 6B), suggest that a divergent but yet unidentified ghost population could have contributed to the IBE genetic makeup.

To test this and further assess the underlying population history, we explicitly modeled demography and admixture by fitting the multi-dimensional Site Frequency Spectrum in momi2 (Kamm et al., 2018) (STAR Methods). The two best-supported scenarios (Figure 5C) provided divergence time estimates on par with previous work, first 113–119 kya for the E. lenensis split (Librado et al., 2015; Schubert et al., 2014a), then 34–44 kya for that of Przewalski’s horse and DOM2 lineages (Der Sarkissian et al., 2015). In both models, IBE and E. lenensis show strong genetic affinities, with no less than 93.2%–98.8% genetic input from the former into the branch ancestral to E. lenensis, some 285–333 kya. The magnitude of this pulse could suggest that the two lineages in fact split at that time, but that a more divergent ghost population contributed 1.2%–6.8% ancestry into IBE, pushing the momi2 estimate for the IBE divergence to deeper times (539–1,246 kya). The strong genetic affinity between IBE and E. lenensis is consistent with the results of Struct-f4, a new method developed here leveraging all possible combinations of f4-statistics to provide a 3D representation of ancestral population relationships that is robust to lineage-specific genetic drift (Figure 5B; STAR Methods), as opposed to PCA projections.

Figure 6
Figure 6. Phylogenetic Reconstructions Based on Uniparental Markers Tip labels are respectively composed of individual sample names, their reference number as well as their age (years ago, from 2017). Red, orange, light green, green, dark green and blue refer to modern horses, ancient DOM2, Botai horses, Borly4 horses, Przewalski’s horses and E. lenensis, respectively. Black refers to wild horses not yet identified to belong to any particular cluster in absence of sufficient genome-scale data. Clades composed of only Przewalski’s horses or ancient DOM2 horses were collapsed to increase readability. (A) Best maximum likelihood tree retracing the phylogenetic relationships between 270 mitochondrial genomes. (B) Best Y chromosome maximum likelihood tree (GTRGAMMA substitution model) excluding outgroup. Node supports are indicated as fractions of 100 bootstrap pseudoreplicates. Bootstrap supports inferior to 90% are not shown. The root was placed on the tree midpoint. See also Table S5 for dataset information.

This article originally appeared on The Cell and is being published here as an abstract in 4 parts, published weekly. Creative Commons License You can download a PDF of the complete study HERE.

You can find other interesting information and articles in our section on Health & Education.

Our Mission — Serving the professional horse person, amateur owners, occasional enthusiasts and sporting interests alike, the goal is to serve all disciplines – which often act independently yet have common needs and values.

Equine Info Exchange is totally comprehensive, supplying visitors with a world wide view and repository of information for every aspect related to horses. EIE provides the ability to search breeds, riding disciplines, horse sports, health, vacations, art, lifestyles…and so much more.

EIE strives to achieve as a source for content and education, as well as a transparent venue to share thoughts, ideas, and solutions. This responsibility also includes horse welfare, rescue and retirement, addressing the needs and concerns of all horse lovers around the world.