Machine learning approaches to identify core and dispensable genes in pangenomes
Machine learning approaches to identify core and dispensable genes in pangenomes
Blog Article
Abstract A gene in a given taxonomic group is either present in every individual (core) or absent in at least a single individual (dispensable).Previous pangenomic studies have identified certain functional differences between core and dispensable genes.However, identifying MOSFETs if a gene belongs to the core or dispensable portion of the genome requires the construction of a pangenome, which involves sequencing the genomes of many individuals.
Here we aim to leverage the previously Laundry Set characterized core and dispensable gene content for two grass species [Brachypodium distachyon (L.) P.Beauv.
and Oryza sativa L.] to construct a machine learning model capable of accurately classifying genes as core or dispensable using only a single annotated reference genome.Such a model may mitigate the need for pangenome construction, an expensive hurdle especially in orphan crops, which often lack the adequate genomic resources.