Join us for our monthly webinar next Wednesday, Sept. 4th, at 12 PM CDT. Dr. David Emms from InstaDeep will discuss AgroNT, a foundational large language model for plant genomics.
AgroNT: A Foundational Large Language Model for Plant GenomicsFoundational large language models can be pre-trained on large unlabelled datasets and subsequently fine-tuned to a wide range of specific tasks. We’ll present AgroNT (Agro Nucleotide Transformer), a foundational DNA large language model pre-trained on reference genomes from 48 plant species with a predominant focus on crops. We have shown that AgroNT can be fine-tuned to obtain state-of-the-art predictions of many genomic elements, including polyadenylation sites, splice sites, open chromatin and enhancer regions. Furthermore, AgroNT can be fine-tuned to e.g. predict tissue-specific gene expression levels or to prioritize functional variants.
Building on our Nucleotide Transformer, the novel SegmentNT model is able to make nucleotide resolution predictions, well suited to tasks such as de novo genome annotation of previously unseen species. Both our AgroNT and SegmentNT models are open-sourced for academic research and non-commercial uses on our GitHub repository
https://github.com/instadeepai/nucleotide-transformer and HuggingFace space
https://huggingface.co/InstaDeepAI.