A 30 m Resolution Distribution Map of Maize for China Based on Landsat and Sentinel Images

As the second largest producer of maize, China contributes 23% of global maize production and plays an important role in guaranteeing maize markets stability. In spite of its importance, there is no 30m spatial resolution distribution map of maize for all of China. This study used a time-weighted dynamic time warping method to identify planting areas of maize by comparing the similarity of time series of a satellite-based vegetation index at each pixel with a standard time series derived from known maize ﬁ elds and mapped maize distribution from 2016 to 2020 over 22 provinces accounting for more than 99% of the maize planting area in China. Based on 18800 ﬁ eld-surveyed pixels at 30-meter spatial resolution, the distribution map yields 76.15% and 81.59% of producer ’ s and user ’ s accuracies averaged over the entire investigated provinces, respectively. Municipality- and county-level census data also show a good performance in reproducing the spatial distribution of maize. This study provides an approach to mapping maize over large areas based on a small volume of ﬁ eld survey data.


Introduction
Maize plays a significant role in maintaining global food security. Since 2001, maize has surpassed rice as the second most yield crop worldwide [1]. In 2018, maize was about 13% of the global crop production [1]. Maize has been planted over a wide range of environmental conditions, from tropical to cold regions, including large areas with no irrigation [2,3]. With a warming climate, the observed frequencies of natural disasters have substantially increased [4,5] and has substantially impacted maize production. During 1979-2016, the maize yield in China decreased by 1.7% for every 1°C temperature increase [6]. Determining the planting areas of maize is crucial for reducing economic losses, establishing effective mitigation actions, and ensuring food security. Therefore, under the background of climate change, it is more important to accurate and timely identify the maize planting area [7].
Numerous methods have been developed to identify maize over large regions [8][9][10]. Remote sensing data play an irreplaceable role in mapping maize distribution because of their spatial and temporal continuity. The prevailing method is to use machine learning methods, including neural networks [11], support vector machines [12], random forests [13,14], and deep learning [15] to identify various crop types. A prime example is the Cropland Data Layer (CDL) updated annually by the U.S. Department of Agriculture (USDA), which provides 30 m resolution planting areas of more than 100 types of crop in the United States [8]. Although machine learning methods usually can accurately classify the crops, these methods are labor-intensive and time-consuming because of their strong reliance on large volumes of training samples [16,17]. The requirement of large volumes of training samples substantially limits the applications of machine learning methods over national and regional scales [18].
Another emerging approach is to distinguish crop types based on seasonal characteristics of crop growth indicated by satellite-based vegetation indices [19][20][21][22][23]. For example, Dong et al. [24] used the dynamic time warping (DTW) method to contrast the differences of seasonal variations of a vegetation index of winter wheat with that of other crops and identified the planting areas of winter wheat of China. This method produced accurate spatial maps of winter wheat based on a very low volume of field samples to determine seasonal characteristics of winter wheat [24]. However, this method relies on a basic assumption that each crop type has unique seasonal characteristics compared to others. Unlike winter wheat, maize grows from spring to autumn, and its growing season overlaps with numerous crop types and natural vegetation. Therefore, it has been a great challenge to identify maize and other summer crops. Recent studies showed the good performance for distinguishing summer crops [25,26]. For example, Pan et al. [21] applied the time-weighted dynamic time warping (TWDTW) method to a test area in California, classified alfalfa, durum wheat, sugar beets, onions, and lettuce, and the overall accuracy achieved 74.92%. With the TWDTW method, Gella et al. [26] mapped maize, potato, summer wheat, and winter wheat by using SAR imagery in complex farming areas, and the overall accuracy achieved 77.1%.
The DTW method is a practical tool for identifying crop types. Several studies have proved the performance of the method [20,27]. The algorithm was initially devised for speech recognition [28]. Maus et al. [29] considered the phenological change of different crop types, improving the DTW method by adding a time weight to it, namely TWDTW. Csillik et al. [30] tested different time weights in DTW and found that the time weight required to achieve satisfactory accuracy was different in different regions, and time-constrained DTW performed better than the Euclidean distance or nontime-constrained DTW.
As the secondlargest maize producing country, China has produced 25.73 million tons of maize in 2018, which accounts for 22.42% of global maize production [1]. Several studies have attempted to produce the maize map in China using phenological-based methods. Zhang et al. [31] tested the practicability for distinguishing maize and other summer crop types by analyzing the similarity in a satellite-based normalized difference vegetation index at two farms. Other studies made great efforts to mapping planting areas of maize at the provincial and national scales [9,32]. Zhang et al. [9] determined a standard seasonal curve of vegetation index according to field surveys and compared the correlation and root mean standard error between the seasonal   Journal of Remote Sensing vegetation index of a given location and its standard seasonal curve to determine maize areas. Based on this principle, Zhang et al. [9] produced the planting areas of maize over 11 provinces and cities of Northeast and North China at 250 m spatial resolution. Luo et al. [32] generated a harvesting area dataset of 1 km spatial resolution for rice, wheat, and maize in China from 2000 to 2015 by comparing the phenophases of each pixel with the reference phenophases of three crops. Although there have been many efforts to generate a distribution map of maize in China, a 30 m spatial resolution maize map is still not available. Identifying the planting areas of various crops at 30 m spatial resolution is especially important in China. A large population of Chinese farmers has a very low planting area (i.e., 1.37 ha per household) [33]. Such small farmlands only account for 20% of a single 250 m MODIS pixel. Besides, the cropland field in China is quite fragmented, especially in mountainous regions [34]. In addition, farmers can freely select the planted crops, resulting in high heterogeneity of crop types [35]. Previous studies highlighted large misclassification resulting from the use of coarse spatial resolution data in China [22]. The aim of this study is to identify the planting locations of maize using a phenologybased method with 30 m spatial resolution satellite datasets. Specific objectives are to (1) produce the distribution maps of maize from 2016 to 2020 with 30 m spatial resolution using Landsat and Sentinel images and (2) evaluate the accuracy of the identified areas using county and municipal census data, field surveys and visual interpretations from Google Earth.
The proposed method can be used to update the map of the planting area of maize annually, providing an important data layer for estimating maize yield, mitigating the impacts of natural disasters, and monitoring food security.

Method
2.2.1. Time-Weighted Dynamic Time Warping. The methodology employed in the study mainly relied on the similarity of the phenological change in normalized difference vegetation index (NDVI) of an unknown pixel with the known seasonal change of maize. A TWDTW method was used to measure the similarity [18,24]. Briefly, TWDTW is an improved method of DTW, which measures the similarity according to  Journal of Remote Sensing the minimum distance of two time series by a nonlinear alignment. Because the crops have their own phenophases, the TWDTW method uses the temporal range for comparing the minimum path of two time series [29]. This study used the logistic TWDTW with open boundary which has been proved to have higher accuracy than the linear TWDTW [29] to distinguish between maize and other crops, and used the parameters suggested by Pan et al. [21].
Specifically, this study used the method proposed by Dong et al. [24] to identify planting locations of maize. First,   When the dissimilarity value is lower, the time series is more similar to the standard curve, and the pixel is more likely to correspond to planted maize. The thresholds of dissimilarity were determined by province-level statistical areas. The dissimilarity value was calculated in all pixels, pixels with less dissimilarity than the threshold value were identified as maize, and the area of all identified maize pixels would be the same as the province-level statistical area. No other methods were used to differentiate maize with other crops in this study. The standard NDVI curves of seasonal change were determined in each province and retrieved from 50 maize pixels randomly selected from field surveys and Google Earth samples of each province (see Section 2.3.2). In general, the standard curves of all provinces have similar seasonal changes ( Figure 2). This study used a set of standard curves of maize from 2019 to identify planting areas of maize for all investigated years (i.e., 2016 to 2020) to further examine the scalability of the method.

Accuracy Assessment.
The accuracy of the maize map was examined based on both field samples and county-level agricultural census data. Using 50% of the field samples, which were set aside and not included in the calculation of the standard curve, three accuracy indexes are calculated in the study, including producer's accuracy (PA), user's accuracy (UA), and overall accuracy (OA). PA indicates the percentage of investi-gated maize samples correctly identified as maize; UA indicates the percentage of maize on the classification map that is actually confirmed by field surveys. OA quantifies the overall effectiveness of the method and is calculated as the percentage of correctly identified samples.
The areas of identified maize were also compared to the statistical area at the county level. We calculated the coefficient of determination (R 2 ), the slope of the regression line between identified and statistical areas (slope), and the relative error (RMAE) for 12 provinces for which the county-level statistical data were available. The computational formula of RMAE was expressed as follows: where SA i and IA i are the statistical area and identified area of the ith county and n indicates the amount of county in a given province.    [24]. Therefore, the study used data of 7.7 billion 30 m pixels covered the entire maize growing season (May to October for the northern provinces and March to October for the southern provinces) during 2016-2020. Figure 3 shows the half-monthly cloud-free image frequencies during the growing season of each pixel. Data filtering and gap-filling were conducted according to the following procedures. The linear equation was used to interpolate the missing data based on the adjacent observations to ensure all pixels in the study area have an NDVI series of the same length. Then, Savitzky-Golay filter, which can capture the seasonal cycles of vegetation greenness, was used to build a smoothed time series in this study [38]. The FROM-GLC product was used in the study to exclude noncultivated areas [39].

Field Data and Agricultural Census
Data. This study used field investigations and census data to examine the accuracy of the identified planting areas of maize. First, field surveys were conducted at 600 sites over 11 provinces in 2019 ( Figure 1); these investigated sites were selected randomly and were relatively homogeneously distributed. An unmanned aerial vehicle (UAV; eBee, senseFly Ltd., Switzerland) was used to take pictures from filed sites. At each field site, the UAV took images covering about 0.1 km 2 , providing available field samples covering about 6100 pixels (i.e., 30 × 30 meter), of which 5200 pixels were maize and 900 were nonmaize. Second, very high-resolution images in 2019 from Google Earth were visually interpreted to select large maize fields, acquiring 18800 field samples, of which 6366 samples were for maize and 12434 samples were for other crops, natural vegetation types, and built-up areas. Third, county-level and province-level census data, obtained from the National Bureau of Statistics of China (NBS, http://data.stats.gov.cn/ ), were used for the accuracy assessment of the maize map. This study collected province-level census data for all investigated 22 provinces and county-level census for 12 provinces covering a total of 1096 counties.

Results
This study produced the distribution map of maize over 22 provinces in China for 2016-2020 ( Figure 4). Over the five years, the planting area of maize in China has remained stable and decreased slightly from 4.4 million ha to 4.1 million ha. Northeast and North China are two major areas of   Journal of Remote Sensing planting maize, and maize was planted more frequently in these major areas ( Figure 4). On average, the overall identification accuracy is 79.13% over all 22 provinces, and the average user's and producer's accuracy are 81.59% and 76.15%, respectively ( Table 1). The best overall accuracy is found in Liaoning Province with 91.62%, and the lowest is in Anhui Province (66.81%) ( Table 1). User's and producer's accuracies showed large differences among the various province. For example, in several provinces of South China (i.e., Anhui, Jiangsu, Chongqing, and Guangxi), the user's and producer's accuracy are low (Table 1). Figure 5 shows zoomed-in images of two sites at Inner Mongolia and Shandong provinces with UAV picture and observed maize and other crop samples. The UAV picture was taken in August, and most of the area in the picture is planted with maize. The TWDTW method can represent the detailed classification of maize at the local sites. County-level census data indicated a good performance of the proposed method for reproducing the spatial variation of the planting areas of maize ( Figure 6). Over all counties, the mapped and statistical areas showed a strong correlation, concentrated around the 1 : 1 line for all five years. The slopes of the regression line between the mapped and the statistical areas were in the range 0.85-0.99, and R 2 were larger than 0.7 ( Figure 6).
The method's performance for 12 provinces based on county-level statistical data showed R 2 between the mapped and the statistical areas ranging from 0.40 to 0.88 averaged over the five years (Figure 7). The slope of the regression between the mapped and the statistical areas varied from 0.61 to 1.06 averaged over five years and showed a good performance for reproducing the spatial variations of the planting areas of maize. Averaged RMAE over the five years, indicating the relative identification error, ranged from 0.25 to 0.47. Several large producing provinces showed low RMAE, such as Shandong, Henan, Jilin, while large identification errors were found in South China (i.e., Anhui, Jiangsu, Hubei) and in a mountainous region (i.e., Shaanxi, Ningxia). The low accuracy of South China mainly due to the low quality of satellite data at the cloudy and rain periods (Figure 3). Southern China is greatly affected by the East Asian summer monsoon, and there is a long period of cloudy and rain periods during the growing season of maize. The low accuracy of the mountainous region mainly due to the fragmentation of the cropland. For example, the mountain area in Guizhou Province exceeds 90%; therefore, the   7 Journal of Remote Sensing cropland is very narrow and fragmented. In addition, the RMAE of 2018 was the lowest (i.e., 0.30) among the five years, mainly because that year had a higher percentage of good satellite observations (Figure 3).

Discussion
Although maize is one of the three main staple crops, its planting areas showed substantial changes over the study period. According to census data, the share of maize planting area increased continuously during the past six decades in China, from 11.51% in 1960 to 32.93% in 2014 [40], turning to the largest grain crop. The share of maize yield kept increasing and reached 35.53% in 2014, while the share of wheat and other crops gradually decreased [40]. Several efforts have been made to generate distribution maps of maize at the provincial and national scales at 250 m and 1 km resolution [9,32]. However, in China, moderateresolution maps may limit the accuracy due to the large uncertainties caused by mixed pixels [35,41].
Compared to machine learning methods, the TWDTW method only required a small volume of field samples to determine the seasonal curves of the vegetation index and can easily and rapidly update the annual maize maps [42]. On the contrary, machine learning methods highly depend on the quantity and quality of training samples, which is largely labor-intensive and time-consuming [18]. Some studies have shown that the accuracy of machine learning methods drops when using a small volume of training data [16,17]. More importantly, this method used a set of standard curves of maize to identify the maize of other years and thus can identify maize with a small volume of training data and could be applied to other years [29]. A previous study showed a good performance of the TWDTW method in identifying winter wheat, performing even better than machine learning methods [24]. This study confirmed the robustness of TWDTW for distinguishing maize from other summer crop types. The proposed method performed very well in identifying the planting locations of maize according to the comparison with field surveys and census data over all 22 provinces. For example, according to census data, the Although this study generally showed a good performance for identifying maize, there were large identification errors in 2016. The low identification accuracy in 2016 probably mostly resulted from relatively poor satellite data quality [43,44] (Figure 8). A recent study also found that the number of cloud-free satellite images largely determines how well the seasonal change of a vegetation index can be retrieved, thus impacting the map accuracy [24]. This study created half-month maximum NDVI composites based on multi-temporal images of Landsat 7/8 and Sentinel-2 to ensure as many good satellite observations as possible. However, there were large differences in the availability of good quality images from 2016 to 2020 (Figure 3). Though a linear interpolator was used in the study to fill the gap of the lacking images, only a part of the phenological information can be restored correctly. Therefore, the lack of good observations still seriously affects identification accuracy. The complete Sentinel-2 observation consists of two satellites (A and B). As Sentinel-2B was launched in March 2017, the lack of Sentinel-2B images made 2016 the worst image quality, which resulted in low identification accuracy in 2016 ( Figure 6). In addition, the effects of data quality also were found in several cloudy provinces. Acquiring completely cloud-free images remains a challenge [45], which may be an important factor in the low identification accuracy in these provinces.
In addition, the planting habits will also affect the classification accuracy. According to planting habits, maize planted in China can be divided into two types, spring maize and summer maize. Summer maize will be planted after the harvest of the winter crop, e.g., winter wheat, and winter crops will be planted immediately after the maize harvest. Therefore, the growing season of summer maize is relatively short. This kind of planting pattern is mainly distributed in the Hebei, Tianjin, Shandong, Henan, Anhui, and Jiangsu provinces. In other provinces where spring maize is planted, the growing season of maize is longer because there is no need to rush to plant winter crops. The standard NDVI 9 Journal of Remote Sensing curve used in this study was limited to the growing season, and the length of the standard curve varies in provinces, which can reflect the planting patterns of different provinces ( Figure 2). However, a few provinces have two planting patterns. For example, most of Hebei province plant summer maize, while a few areas in the north of Hebei Province, due to climatic conditions, plant spring maize. For these provinces, using only one standard curve in the identification will also affect the accuracy.
This study examined the method performance for reproducing the spatial variations on planting areas of maize. It should be noticed that reproducing temporal variability of planting area is also important. Long-term statistical data are one of the most important datasets for examining the performance of temporal variations. In addition, the detection of crop types over multiple years at the same location is another important way for examining the method performance. By using the field data on the same location for multiple years, the method performance for reproducing interannual variability could be evaluated by detecting unchanged and changed accuracy.
In addition, as a summer crop, maize has a phenophases similar to that of other summer crops, e.g., soybean, peanut, and rice, which substantially increases the difficulties in distinguishing maize from other summer crops. In most southern provinces, rice is the more important summer crop, and the accuracy in these regions is lower than that in northern provinces. A previous study also found that the identification accuracy of maize is the lowest compared to paddy rice and winter wheat using a 250-meter MODIS dataset to identify crop distribution [32]. Therefore, using the time series of only one index is not enough to completely distinguish maize from other summer crops. Further studies should investigate the differences in vegetation index or surface reflectance of maize with other summer crops to improve the classification accuracy. Several comparison studies of various crop types showed distinct differences in the reflectance of maize compared to other major crops [46][47][48] that may be exploited to distinguish maize from other summer crops.

Conclusion
The aim of the study is to map the distribution of maize at 30 m spatial resolution in China using time-weighted dynamic time warping method. Based on Landsat and Sentinel datasets, this study identified the distribution map of maize from 2016 to 2020 over the 22 provinces, covering 99% of the maize planting area in China. The identification accuracy was examined using 18800 field-surveyed samples and county-level census areas. The distribution map indicated very well the planting locations of maize over all 22 provinces. However, the quality of satellite data largely determined the effectiveness of retrieving the phenological information of maize, which highly determined the classification accuracy. In general, this study produced a 30 m maize map of China. This method allows updating the planting areas of maize annually because it needs only a small volume of field survey data.

Data Availability
The distribution maps of maize in China from 2016 to 2020 are available at doi:10.6084/m9.figshare.17091653.   Journal of Remote Sensing