Revolutionising biodiversity monitoring with AI
State of nature assessment faces significant data challenges: at present, we lack the infrastructure needed to effectively measure biodiversity on a global scale.
Existing biodiversity monitoring methods are time-consuming and labor-intensive. Historically, ecologists have relied on ground surveys using basic tools and manual counting techniques, which are difficult to scale. As a result, current biodiversity data exhibits major gaps, with coverage skewed toward accessible, well-studied areas and a bias toward species and habitats that are easier to observe. Astonishingly, only around 7% of the Earth’s surface has been adequately surveyed, while more than half of all records focus on less than 2% of known species (Hughes et al., 2021).
To halt biodiversity loss and move toward a nature-positive future, scalable tools and technologies are essential to address these data challenges and provide accurate measurements of biodiversity impacts and ecosystem health. Fortunately, emerging techniques are making biodiversity measurement and analysis more accessible and scalable.
Biodiversity measurement & analysis technologies can be segmented based on the way data is collected.
Those technologies can be regrouped as follows:
Generally speaking, the field of Earth Observation (EO) involves gathering data about the physical, chemical, and biological systems of Earth. EO techniques are categorized based on how information is collected, either directly in the field or remotely through technologies like satellites and drones.
1. Traditional field-based EO
Ecological surveys: involves deploying people to physically collect specimens (samples), conduct point counts (e.g., for bird populations), and perform transects (systematic line surveys across an area).
Wildlife cameras (camera traps): involves recording photos and videos of passing animals. These can operate autonomously for months without human intervention.
These methods are often expensive, invasive, and difficult to scale across large areas.
2. Advanced field-based EO
eDNA (environmental DNA): involves collecting and analyzing traces of DNA left behind in the environment (e.g., water, soil, air) to identify the species present in an area. Each organism leaves DNA fragments (e.g., skin cells, fur, scales, excrement) in its habitat. The analysis of samples of soil or water allows for an inventory of species without direct observation, making it particularly useful in hard-to-access environments like marine ecosystems. These techniques are non-invasive and can be applied on large scales. However, they also have structural limitations: species can only be identified if their DNA is already cataloged in a database. Moreover, this method indicates the presence or absence of a species, but not its abundance.
Bioacoustics: involves the deployment of microphones and recording devices on the ground to capture the sound produced by species. This technology provides a low-cost, scalable, and non-invasive tool for detecting species presence, abundance, and behavior based on vocalizations and movements. However, it is limited to species that are acoustically active.
3. Remote EO
Remote EO involves using drones equipped with sensors (e.g., LiDAR, cameras, thermal imaging) or satellites in orbit. These systems utilize various remote sensing technologies, including:
Multispectral and hyperspectral imaging: captures data across specific wavelength bands, allowing for the analysis of different features like vegetation health and material composition. Hyperspectral imaging, in particular, provides higher resolution data by capturing hundreds of bands, making it useful for applications like mineral mapping and pollutant detection.
Radar (radio detection and ranging): uses radio waves to measure distances (by measuring the time it takes for them to bounce back after hitting an object) and map surface features. Unlike optical imaging, radar can penetrate clouds and function in all weather conditions, making it particularly valuable in regions with frequent cloud cover. Applications include monitoring land deformation, ice sheet dynamics, and forest biomass.
LiDAR (light detection and ranging): uses laser pulses to measure distances by calculating the time it takes for the laser to return from the ground. LiDAR creates highly accurate 3D maps of terrain and structures, even through dense vegetation. It is widely used in topographic mapping, forest inventories, and urban planning.
While remote sensing offers excellent scalability, remote sensing technologies do not provide the detailed species diversity metrics available through field-based methods and may require specialized training and permits for legal operation. All in all, it cannot yet fully replace site-level sampling. That being said, those new EO techniques can produce high-frequency, high-resolution data with a more comprehensive coverage that can capture global changes.
The combination of EO & AI technologies can be game changing for biodiversity measurement & monitoring.
To effectively analyze the vast amount of data produced by advanced EO techniques, AI techniques and capabilities are becoming increasingly crucial. Here's a high-level segmentation of new AI techniques that can play a key role in this context.
Machine learning (ML) and Deep learning (DL): involves training algorithms on labeled datasets to classify objects, detect patterns, and make predictions (supervised learning), and finds hidden patterns in data without labeled input (unsupervised learning). Neural networks (e.g. such as Long Short-Term Memory Networks (LSTMs) with multiple layers (hence "deep") can also be used to model complex patterns in large dataset.
(!) As such, ML can be used to identify species, detect land cover changes, or classify habitats based on satellite imagery. As an example, Meta and the World Resources Institute recently launched a revolutionary AI-powered global tree canopy height map with 1-meter resolution, offering unprecedented insights into forest structures. Indeed reforestation projects typically present great challenge to monitor tree growth, where monitoring of young & sparse trees (such as in agroforestry) or small project areas (such as in community-led efforts) require individual tree-scale sensitivity across large areas. Meta addressed this challenge by deploying a state of the art model, namely DiNOv2: the model was trained on 18 million satellite images from the years 2009 to 2020. Their data finds that more than >1/3 of the land on earth (50 million km2) has an above 1m canopy height, while 35 million km2 have a canopy height greater than 5m.
Computer vision: subset of ML that deals with analyzing and interpreting visual data, typically involving automated object detection and recognition within images or video streams.
(!) It could be essential for monitoring species using camera traps, drone footage, or satellite images. More specifically, “vision models” such as Convolutional Neural Networks (CNNs) are a great fit for species recognition : it involves converting audio into spectrogram (frequency spectrum over time), then use these spectrograms as input when training a custom CNN on a labeled dataset. Species can thus be detected from live audio.
Natural Language Processing (NLP): subset of ML that deals specifically with understanding, interpreting, and generating human language. It typically involves text mining, semantic analysis and the use of Large Language Models (LLMs).
(!) It can be used to reconcile multiple labeled datasets together, and automate the generation of summaries from vast datasets, making it easier for researchers to interpret EO data.
Generative models: subset of ML focused on generating new data samples.
It can be useful to create synthetic data to fill gaps in EO datasets, and augment datasets from other 3rd party sources, providing more comprehensive coverage for analysis.
Additionally, advancements in edge AI have opened up new possibilities for running AI models directly on monitoring devices, to enhance scalability and enable real time detection and analysis (useful for instance to detect e.g. illegal logging or poaching).
More generally, those new AI capabilities can translate EO data streams into tangible insights and enable optimised identification (of e.g. species), classification (of e.g. said species), extrapolation (to scale analyses over e.g. larger areas or over time) and forecasting (based on e.g. time series).
Interesting companies in this emerging field include Earthblox, Gentian, or Versant in France.
Yet some limits still need to be addressed before we can fully scale biodiversity & ecosystem measurement.
Indeed, while technology is rapidly advancing, several significant challenges still need to be tackled:
Underlying asset complexity: as extensively covered in our primer on biodiversity (accessible here), biodiversity is inherently multi-dimensional and local by nature. It involves millions of intricate interactions between species and their environment. Developing models that can capture even a fraction of these interactions is a massive undertaking and requires deep scientific understanding combined with robust computational power.
Gaps in scientific understanding: data availability remains limited for key ecosystems like grasslands, regions such as Africa, and critical dimensions of biodiversity such as population genetic diversity. Oceans, in particular, present unique challenges due to limited observational capacity in deep-water biomes.
High costs / low maturity of some of the technologies mentioned above: many of these technologies are still costly and not yet mature enough for large-scale, automated deployment. The integration of these methods into routine monitoring will require significant investment and refinement.
Data integration and standardization issues: there is no standard on data collection. Each device may use its own data format, and taxonomies can vary widely. This results in highly heterogeneous datasets that are difficult to combine and analyze effectively.
AI-specific challenges: running AI models presents several hurdles, including the need for high computing power and large storage capacities. Ensuring accuracy in labeled datasets (e.g., audio) requires significant human verification. Additionally, challenges like filtering out human speech or sensitive data from audio/video streams, processing data with strong background noise, labelling datasets and managing issues like model hallucinations further complicate AI implementation.
That said, at darwin, we are firmly convinced that cutting-edge technologies like AI have a critical role to play in addressing the biodiversity crisis.
This article highlights how AI can revolutionize state of nature monitoring. That being said, identifying the precise causes or agents (“who” or “what”) behind observed nature shifts remains particularly challenging, especially with the complex interplay of natural and human-induced factors. At darwin, we focus on evaluating the impacts and dependencies of economic activities on biodiversity. This mission allow us to tackle this “attribution challenge”, as in being able to attribute ecological changes to specific companies. And we also see numerous potential applications for AI in scaling biodiversity impacts assessment.
Stay tuned for more insights on how we at darwin envision the role of technology.