Global Wheat Head Detection 2021: An Improved Dataset for Benchmarking Wheat Head Detection Methods

The Global Wheat Head Detection (GWHD) dataset was created in 2020 and has assembled 193,634 labelled wheat heads from 4700 RGB images acquired from various acquisition platforms and 7 countries/institutions. With an associated competition hosted in Kaggle, GWHD_2020 has successfully attracted attention from both the computer vision and agricultural science communities. From this first experience, a few avenues for improvements have been identified regarding data size, head diversity, and label reliability. To address these issues, the 2020 dataset has been reexamined, relabeled, and complemented by adding 1722 images from 5 additional countries, allowing for 81,553 additional wheat heads. We now release in 2021 a new version of the Global Wheat Head Detection dataset, which is bigger, more diverse, and less noisy than the GWHD_2020 version.


Introduction
Quality training data is essential for the deployment of deep learning (DL) techniques to get a general model that can scale on all the possible cases. Increasing dataset size, diversity, and quality is expected to be more efficient than increasing network complexity and depth [1]. Datasets like ImageNet [2] for classification or MS COCO [3] for instance detection are crucial for researchers to develop and rigorously benchmark new DL methods. Similarly, the importance of getting plant-or crop-specific datasets is recognized within the plant phenotyping community ( [4][5][6][7][8][9][10], p. 2, [11][12][13]). These datasets allow benchmarking the algorithm performances used to estimate phenotyping traits while encouraging computer vision experts to further improvement ( [10], p. 2, [14][15][16][17]). The emergence of affordable RGB cameras and platforms, including UAVs and smartphones, makes in-field image acquisition easily accessible. These high-throughput methods are progressively replacing manual measurement of important traits such as wheat head density. Wheat is a crop grown worldwide, and the number of heads per unit area is one of the main components of yield potential. Creating a robust deep learning model performing over all the situations requires a dataset of images covering a wide range of genotypes, sowing density and pattern, plant state and stage, and acquisition conditions. To answer this need for a large and diverse wheat head dataset with consistent and quality labeling, we developed in 2020 the Global Wheat Head Detection (GWHD_2020) [18] that was used to benchmark methods proposed in the computer vision community and recommend best practices to acquire images and keep track of the metadata.
The GWHD_2020 dataset results from the harmonization of several datasets coming from nine different institutions across seven countries and three continents. There are already 27 publications  (accessed July 2021) that have reported their wheat head detection model using the GWHD_2020 dataset as the standard for training/testing data. A "Global Wheat Detection" competition hosted by Kaggle was also organized, attracting 2245 teams across the world [14], leading to improvements in wheat head detection models [23,25,31,41]. However, issues with the GWHD_2020 dataset were detected during the competition, including labeling noise and an unbalanced test dataset.
To provide a better benchmark dataset for the community, the GWHD_2021 dataset was organized with the following improvements: (1) the GWHD_2020 dataset was checked again to eliminate few poor-quality images, (2) images were re-labeled to avoid consistency issues, (3) a wider range of developmental stages from the GWHD_ 2020 sites was included, and (4) datasets from 5 new countries (the USA, Mexico, Republic of Sudan, Norway, and Belgium) were added. The resulting GWHD_2021 dataset contains 275,187 wheat heads from 16 institutions distributed across 12 countries.

Materials and Methods
The first version of GWHD_2020, used for the Kaggle competition, was divided into several subdatasets. Each subdataset represented all images from one location, acquired with one sensor while mixing several stages. However, wheat head detection models may be sensitive to the developmental stage and acquisition conditions: at the beginning of head emergence, a part of the head is barely visible because it is still not fully out from the last leaf sheath and possibly masked by the awns. Further, during ripening, wheat heads tend to bend and overlap, leading to more erratic labeling. A redefinition of the subdataset was hence necessary to help investigate the effect of the developmental stage on model performances. The new definition of a subdataset was then formulated as "a consistent set of images acquired over the same experimental unit, during the same acquisition session with the same vector and sensor." A subdataset defines therefore a domain. This new definition forced to split the original GWHD_2020 subdatasets into several smaller ones. The UQ_1 was split into 6 much smaller subdatasets, Arvalis_1 was split into 3 subdatasets, Arvalis_3 into 2 subdatasets, and utokyo_1 into 2 subdatasets. However, in the case of utokyo_2 which was a collection of images taken by farmers at different stages and in different fields, the original subdataset was kept. Overall, the 11 original subdatasets in GWHD_2020 were distributed into 19 subdatasets for GWHD_2021.
Almost 2000 new images were added to GWHD_2020, constituting a major improvement. A part of the new images comes from the institutions already contributing to GWHD_2020 and was collected during a different year and/or at a different location. This was the case for Arvalis (Arvalis_7 to Arvalis_12), University of Queensland (UQ_7 to UQ_11), Nanjing Agricultural University (NAU_2 and NAU_3), and University of Tokyo (Utokyo_1). In addition, 14 new subdatasets were included, coming from 5 new countries: Norway (NMBU), Belgium (Université of Liège [46]), United States of America (Kansas State University [47], TERRA-REF [7]), Mexico (CIMMYT), and Republic of Sudan (Agricultural Research Council). All these images were acquired at a ground sampling distance between 0.2 2 Plant Phenomics and 0.4 mm, i.e., similar to that of the images in GWHD_ 2020. Because none of them was already labeled, a sample was selected by taking no more than one image per microplot, which was randomly cropped to 1024 × 1024 px patches that will be called images in the following for the sake of simplicity.
With the addition of 1722 images and 86,000 wheat heads, the GWHD_2021 dataset contains 6500 images and 275,000 wheat heads. The increase in the number of subdatasets from 18 to 47 leads to a larger diversity between them which can be observed on Figure 1. The subdatasets are described in Table 1. However, the new definition of a subdataset led also to more unbalanced subdatasets: the smallest (Arvalis_8) contains only 20 images, while the biggest (ETHZ_1) contains 747 images. This provides the opportunity to possibly take advantage of the data distribution to improve model training. Each subdataset has been visually assigned to several development stage classes depending on the respective color of leaves and heads ( Figure 2): postflowering, filling, filling-ripening, and ripening. Examples of the different stages are presented in Figure 2. While being approximative, this metadata is expected to improve model training.

Dataset Diversity Analysis
In comparison to GWHD_2020, the GWHD_2021 dataset puts emphasis on metadata documentation of the different subdatasets, as described in the discussion section of David et al. [18]. Alongside the acquisition platform, each subdataset has been reviewed and a development stage was assigned to each, except for Utokyo_3 (formerly utokyo_2) as it is a collection of images from various farmer fields and development stages. Globally, the GWHD_2021 dataset covers well all development stages ranging from postanthesis to ripening ( Figure 2).
The diversity between images within the GWHD_2021 dataset was documented using the method proposed by Tolias et al. [48]. The deep learning image features were first extracted from the VGG-16 deep network pretrained on the ImageNet dataset that is considered representing well the general features of RGB images. We then selected the last layer which has a size of 14 × 14 × 512 and summed it into a unique vector of 512 channels, which is then normalized. Then, the UMAP dimentionality reduction algorithm [49] was used to project representations into a 2D space. The UMAP algorithm is used to keep the existing clusters during the projection to a low-dimension space. This 2D space is expected to capture the main features of the images. Results (Figure 3) demonstrate that the test dataset used for GWHD_2020 was biased in comparison to the training dataset. The subdatasets added in 2021 populate more evenly the 2D space which is expected to improve the robustness of the models.

Presentation of Global Wheat Challenge 2021 (GWC 2021)
The results from the Kaggle challenge based on GWHD_ 2020 have been analyzed by the authors [14]. The findings emphasize that the design of a competition is critical to enable solutions that improve the robustness of the wheat head detection models. The Kaggle competition was based on a metric that was averaged across all test images, without

Plant Phenomics
accuracy is proposed as a new metric [14]. The accuracy computed over image i belonging to domain d, AI d ðiÞ, is classically defined as where TP, FN, and FP are, respectively, the number of true positive, false negative, and false positive found in image i. The weighted domain accuracy (WDA) is the weighted average of all domain accuracies: where D is the number of domains (subdatasets) and n d is the number of images in domain d. The training, validation, and test datasets used are presented in Section 5.
The results of the Global Wheat Challenge 2021 are summarized in Table 2. The reference method is a faster-RCN with the same parameters than in the research paper GWHD_2020 [18] and trained on the GWHD_2021 (Global Wheat Challenge 2021 split) training dataset. The full leaderboard can be found at https://www.aicrowd.com/ challenges/global-wheat-challenge-2021/leaderboards. (iii) How to cite the dataset? The present paper can be cited when using the GWHD_2021 dataset. However, cite preferentially [18] for wheat head detection challenges or when discussing the difficulty to constitute a large datasets (iv) How to benchmark? Depending on the objectives of the study, we recommend two sets of training, validation, and test (Table 3):

How to Use/FAQ
(a) The Global Wheat Challenge 2021 split when the dataset is used for phenotyping purpose, to allow direct comparison with the winning solutions (b) The "GlobalWheat-WILDS" split is the one used for the WILDS paper [50]. We recommand to use the GlobalWheat-WILDS split when working on outof-domain distribution shift problems It is further recommended to keep the weighted domain accuracy for comparison with previous works.

Conclusion
The second edition of the Global Wheat Head Detection, GWHD_2021, alongside the organization of a second Global Wheat Challenge is an important step for illustrating the usefulness of open and shared data across organizations to further improve high-throughput phenotyping methods. In comparison to the GWHD_2020 dataset, it represents five new countries, 22 new subdatasets, 1200 new images, and 120,000 new-labeled wheat heads. Its revised organization and additional diversity are more representative of the type of images researchers and agronomists can acquire across the world. The revised metrics used to evaluate the models during the Global Wheat Challenge 2021 can help researchers to benchmark one-class localization models on a large range of acquisition conditions. GWHD_2021 is expected to accelerate the building of robust solutions. However, progress on the representation of developing countries is still expected and we are open to new contributions from South America, Africa, and South Asia. We started to include nadir view photos from smartphones, to get a more comprehensive dataset and train reliable models for such affordable devices. Additional works are required to adapt such an approach to other vectors such as a camera mounted on unmanned aerial vehicle, or other high-resolution cameras working in other spectral domains. Further, it is planned to release wheat head masks alongside the bounding box given the very large number of boxes that already exists and provides more associated metadata.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.