Introduction to Peanut Genome Project

    Cultivated peanut (Arachis hypogaea), derived from two wild ancestors, A. duranensis and A. Ipaensis, originated and domesticated in South and Central America several thousand years ago. Peanut was widely distributed to many countries by the invaders when Columba discovered the America. Now peanut is a world-wide important crop especially in Asia, Africa and also North and South America. In China, peanut is one of the most important oil crop grown in most provinces and producing 17.6 million metric tons of pod annually, taking more than 40% of the world total production. During the last 70 years, active researches on peanut genetic breeding and cultivation innovation lead to the development of hundreds of peanut varieties and a numbers of cultivation models suitable for different regions and ecosystems. These achievements significantly improved peanut production, the yield increased from less than 1500 kg/ha to 3000 kg/ha in China from 1950’s to 2010’s. However, the studies on peanut genetics, physiology, molecular biology, molecular breeding and genomics are far behind other crops like rice, maize, soybean and wheat, partly due to the complex tetroploid genome.

    Sequencing the cultivated peanut genome will greatly accelerate the understanding of peanut evolution and the molecular mechanisms of yield, quality, biotic and abiotic stress resistance, which could significantly promote genetic improvement and molecular breeding. Cultivated peanut is an allotetraploid containing A and B subgenomes, from two diploid wild species A. Duranensis (2n=2x=20) and A. Ipaensis (2n=2x=20), respectively. The estimated genome size of cultivated peanut is ~2800 Mb, similar to human genome size. High percentage of repeat sequences in the genome had made it hard for cultivated peanut de novo sequencing and assembly. Fortunately, the whole genome sequencing of two ancestral diploid species, A. Duranensis and A. Ipaensis have been completed two years ago, which do have provided valuable information for genetic breeding, genomics application and particular for the assembly of tetraploid peanut genome. With the advent of third generation sequencing technique the time of disclosing the complexed genome of the tetraploid peanut is coming on.

    We completed the de novo sequencing and assembly of a high quality, chromosome-scale reference genome for cultivated peanut using single-molecular real time sequencing together with HiC and genetic maps. We have also sequenced six wild species, three with AA genomes, one with BB genome, one with AABB genome (A. Monticola), and 22 tetraploid peanut accessions, cultivars and old landraces. Twenty-nine transcriptomes of different tissues from specific environment and hormone treatment were sequenced by Illumina technique. Besides, a full-length transcriptome from mixed RNA sample was also sequenced by SMRT-sequel technique obtaining nearly 40,000 unigenes. These cDNA sequences provided valuable information for accurate annotation of the genome. The reference genome of the cultivated peanut provides direct information for peanut genetic improvement and molecular breeding. The comparative analysis between the cultivated peanut genome and the diploid wild species could help us understanding peanut evolution and domestication, enable us reconstruct the history of peanut origin and evolution. For example, the genome information could be used to explain R-genes evolution, different disease resistance ability between wild and cultivated species, fat biosynthesis and accumulation as well as the heredity of some other agronomically important traits.

    Whole genome sequencing of the cultivated peanut, wild type diploid species, re-sequencing of peanut accessions and landraces enable us to develop genome wide SNP and SSR markers which could significantly promote the construction of maps with high density makers, and in turn promote the whole genome selection in the practice of molecular breeding. In addition, map based cloning of functional genes and major QTLs become reality with the available of high density molecular marker maps.