PanAfFGBG: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition

Otto Brookes1,2†*, Maksim Kukushkin3,4*, Majid Mirmehdi1, Colleen Stephens5, Paula Dieguez5, Thurston C. Hicks6, Sorrel Jones5, Kevin Lee5, Maureen S. McCarthy5, Amelia Meier5, Emmanuelle Normand2, Erin G. Wessling7, Roman M. Wittig5,8, Kevin Langergraber9, Klaus Zuberbühler10, Lukas Boesch2, Thomas Schmid3,11, Mimi Arandjelovic5, Hjalmar Kühl5,12, Tilo Burghardt1
1University of Bristol, 2Wild Chimpanzee Foundation, 3Martin Luther University Halle-Wittenberg, 4Leipzig University, 5Max Planck Institute for Evolutionary Anthropology, 6University of Warsaw, 7Harvard University, 8University of Lyon, 9Arizona State University, 10University of St Andrews, 11Lancaster University in Leipzig, 12Senckenberg Museum of Natural History
*Equal technical contribution, Corresponding author
CVPR 2025 Best Paper Award Candidate

Here we show foreground-background video pairs from the PanAf-FGBG dataset. The first row contains foreground videos (F), and the second row contains background videos (B) for the corresponding camera location.

Abstract

Computer vision analysis of camera trap video footage is essential for wildlife conservation, as captured behaviours offer some of the earliest indicators of changes in population health. Recently, several high-impact animal behaviour datasets and methods have been introduced to encourage their use; however, the role of behaviour-correlated background information and its significant effect on out-of-distribution generalisation remain unexplored.

In response, we present the PanAf-FGBG dataset, featuring 20 hours of wild chimpanzee behaviours, recorded at over 350 individual camera locations. Uniquely, it pairs every video with a chimpanzee (referred to as a foreground video) with a corresponding background video (with no chimpanzee) from the same camera location. We present two views of the dataset: one with overlapping camera locations and one with disjoint locations. This setup enables, for the first time, direct evaluation of in-distribution and out-of-distribution conditions, and for the impact of backgrounds on behaviour recognition models to be quantified. All clips come with rich behavioural annotations and metadata including unique camera IDs and detailed textual scene descriptions.

PanAf-FGBG

Multi-Label Data

The PanAf-FGBG dataset is a multi-label dataset, meaning that each video may contain multiple behaviours from a list of 14 distinct behaviours. The behaviours are: 1) Aggression, 2) Bipedal, 3) Camera reaction, 4) Climbing, 5) Display, 6) Feeding, 7) Grooming, 8) Object carrying, 9) Piloerection, 10) Playing, 11) Resting, 12) Tool use, 13) Travel and 14) Vocalisation.

Overlapping (ID) and Disjoint (OOD) Dataset Views

The PanAf-FGBG dataset is organised into two distinct views: overlapping (Doverlap) and disjoint (Ddisjoint). The overlapping view includes videos captured from camera locations that appear in both the training and testing sets. In contrast, the disjoint view consists of videos recorded from camera locations that are exclusive to either the training or the testing set, but not both.

Synthetic Backgrounds ()

In addition to using available foreground-background video pairs, we also generated synthetic background videos () using the Segment Anything Model 2 (SAM2) to extract foreground object masks from videos. The masked foreground regions are then filled using mean pixel value filling based on the surrounding background pixels. The resulting videos serve as synthetic backgrounds () for training and evaluation purposes.

Experiments

Background Reliance

Background reliance is assessed using three models: a dummy classifier that predicts based on class distribution, a background model (B) trained exclusively on background videos, and a foreground model (F) trained solely on foreground videos. Both the background and foreground models are evaluated on the foreground test set.

Background Neutralisation

To mitigate reliance on background information, we propose a latent-space background compensation technique. This approach subtracts background features from foreground features in the latent space, effectively reducing the influence of background cues on behavior recognition. The subtraction is controlled by a hyperparameter α, which linearly decreases from 1 to 0 during training, allowing the model to gradually shift its focus from background to foreground features.

Acknowledgements

We thank the Pan African Programme: ‘The Cultured Chimpanzee’ team and its collaborators for allowing the use of their data for this paper. We thank Amelie Pettrich, Antonio Buzharevski, Eva Martinez Garcia, Ivana Kirchmair, Sebastian Schütte, Linda Gerlach and Fabina Haas. We also thank management and support staff across all sites; specifically Yasmin Moebius, Geoffrey Muhanguzi, Martha Robbins, Henk Eshuis, Sergio Marrocoli and John Hart. Thanks to the team at https://www.chimpandsee.org particularly Briana Harder, Anja Landsmann, Laura K. Lynn, Zuzana Macháčková, Heidi Pfund, Kristeena Sigler and Jane Widness. The work that allowed for the collection of the dataset was funded by the Max Planck Society, Max Planck Society Innovation Fund, and Heinz L. Krekeler. In this respect we would like to thank: Ministre des Eaux et Forêts, Ministère de l’Enseignement supérieur et de la Recherche scientifique in Côte d’Ivoire; Institut Congolais pour la Conservation de la Nature, Ministère de la Recherche Scientifique in Democratic Republic of Congo; Forestry Development Authority in Liberia; Direction Des Eaux Et Forêts, Chasses Et Conservation Des Sols in Senegal; Makerere University Biological Field Station, Uganda National Council for Science and Technology, Uganda Wildlife Authority, National Forestry Authority in Uganda; National Institute for Forestry Development and Protected Area Management, Ministry of Agriculture and Forests, Ministry of Fisheries and Environment in Equatorial Guinea. This work was supported by the UKRI CDT in Interactive AI (grant EP/S022937/1). This work was in part supported by the US National Science Foundation Awards No. 2118240 "HDR Institute: Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning" and Award No. 2330423 and Natural Sciences and Engineering Research Council of Canada under Award No. 585136 for the “AI and Biodiversity Change (ABC) Global Center”.

Citation

@inproceedings{brookes2025fgbg,
        author    = {Otto Brookes and Maksim Kukushkin and Majid Mirmehdi and Colleen Stephens and Paula Dieguez and Thurston C. Hicks and Sorrel Jones and Kevin Lee and Maureen S. McCarthy and Amelia Meier and Emmanuelle Normand and Erin G. Wessling and Roman M. Wittig and Kevin Langergraber and Klaus Zuberbühler and Lukas Boesch and Thomas Schmid and Mimi Arandjelovic and Hjalmar Kühl and Tilo Burghardt},
        title     = {PanAfFGBG: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        year      = {2025},
        month     = {June},
        pages = {5433-5443}
      }
      
Also consider citing PanAf20k:
 @article{brookes2024panaf20k,
        title={PanAf20K: A Large Video Dataset for Wild Ape Detection & Behaviour Analysis},
        author={Brookes, Otto and Mirmehdi, Majid and Stephens, Colleen and McCarthy, Maureen and Murai, Mizuki and Normand, Emmanuelle and Vergnes, Virginie and Meier, Amelia and Lapuente, Juan and Wittig, Roman and Dowd, Dervla and Jones, Sorrel and Leinert, Vera and Wessling, Erin and Corogenes, Katherine and  Zuberb{\"u}hler, Klaus and Lee, Kevin and  Angedakin, Samuel and  Langergraber, Kevin and Dieguez, Paula and Maldonado, Nuria and Boesch, Christophe and Arandjelovic, Mimi and K{\"u}hl, Hjalmar and Burghardt, Tilo},
        title={Editorial for Special Issue on Computer Vision Approaches for Animal Tracking and Modeling},
        journal={International Journal of Computer Vision (IJCV)},
        year={2024},
        doi={10.1007/s11263-024-02003-z}
        url={https://doi.org/10.1007/s11263-024-02003-z}
      }