Home /Research /Plants meet machines: Prospects in machine learning for plant biology
PERCEPTION

Plants meet machines: Prospects in machine learning for plant biology

Pamela S. Soltis, Gil Nelson, Alina Zare, Emily K. Meineke

Year
2020
Citations
65

Abstract

Machine learning approaches are affecting all aspects of modern society, from autocorrect applications on cell phones to self-driving cars to facial recognition, personalized medicine, and precision agriculture. Although machine learning has a long history, drastic improvements in these application areas recently have been driven by improvements to computational infrastructure; increased computing power; increased ability to collect, manage, and store very large amounts of data; and algorithmic advances. Multiple types of machine learning have been developed, each with its own techniques, strengths, and weaknesses, making certain approaches better matches for certain problems than others. Supervised machine learning and the use of neural networks (e.g., deep learning; Table 1) underlie much of the recent accelerated application of machine learning to many biological problems, including those across a range of scientific questions in plant science. For example, deep learning technologies have recently achieved impressive performance on a variety of predictive tasks, such as species identification (Unger et al., 2016; Carranza-Rojas et al., 2017), plant species distribution modeling (e.g., Zhang and Li, 2017; Botella et al., 2018), weed detection (Yu et al., 2019), and mercury damage to herbarium specimens (Schuettpelz et al., 2017). They are also being applied to questions of comparative genomics (e.g., Xu and Jackson, 2019) and gene expression (Mochida et al., 2018) and to conduct high-throughput phenotyping (e.g., Singh et al., 2016; Ubbens and Stavness, 2017) for agricultural and ecological research. Moreover, novel approaches are poised to revolutionize studies of plant phenology (e.g., Pearson et al., 2020) and functional traits through application to more than 30 million images of herbarium specimens now available at iDigBio (http://www.idigbio.org) as well as other digital repositories. The application of machine learning methods to extract data from herbarium specimens has grown and diversified in a few short years, beginning with species identification in a specific geographic region (e.g., Unger et al., 2016). Subsequent attempts to use deep learning to tackle the difficult taxonomic task of identifying species in large collections of herbarium specimens showed that convolutional neural networks trained on thousands of digitized herbarium sheets are able to learn highly discriminative patterns (e.g., Carranza-Rojas et al., 2017). These results are very promising for extracting a broad range of accurate annotations in a fully automated way. Such approaches are also being applied to identification of plant phenophase (i.e., bud, flower, fruit), which is important for assessing the effects of climate change on plant growth and reproduction and for comparing plant responses with those of pollinators, migratory birds, and other species that rely on plants for food and/or nesting sites (see, e.g., Lorieul et al., 2019; Pearson et al., 2020; Brenskelle et al., 2020; Goëau et al., 2020). Likewise, other evolutionary or ecological traits, such as leaf shape and size, leaf margins, and flower color, could also potentially be scored from images of herbarium specimens. However, despite the promise of applying deep learning to herbarium specimen images to address a range of questions, this emerging field also raises challenging methodological questions about how to avoid any bias and misleading conclusions when analyzing the produced data. Indeed, as for any statistical learning method, convolutional neural networks are sensitive to bias issues, including the way in which the training data sets are built. Moreover, as good as the prediction might be on average, the quality of the produced annotations can be very heterogeneous from one sample to another, depending on various factors such as the morphology of the species, the storage conditions in which the specimen was preserved, and the age of the specimen when imaged. Given both

Keywords

BiologyPlant biologyComputational biologyBotany

Related papers

Browse all PERCEPTION papers