Perspective Advanced Information Mining from Ocean Remote Sensing Imagery with Deep Learning

In the past decades, the increasing ocean-research-oriented satellites, sensors, acquisition, and distribution channels have brought new tasks and challenges to mine information from such big data with complex and sparse information. The information mining requirements from big data and the advance in deep learning (DL) technology showed mutual promotive benefits in practical ocean information extraction and DL-based framework development. In 2020, scientists showed that most information retrievals from ocean remote sensing images could be accomplished using existing DL network frameworks, i.e., U-net for semantic segmentation and SSD (Single-Shot Multi-box Detection) for object detection [1]. The U-Net’s almost symmetric encoder-decoder structure and the skip connection between encoder-decoders have an excellent performance in retrieving fundamental semantic segmentation information in the ocean remote sensing imagery, such as coastal inundation area extractions [2]. SSD extracts feature maps of different data scales and takes a priori frames of different scales. Therefore, it has an excellent performance in detecting fundamental object detection problems in the ocean field, such as ship detection [3]. Although the off-the-shelf DL-based models are helpful, new developments in this field lead to a new era of DL-based technology for ocean remote sensing information mining. Specifically, two developments should be incorporated into the specific task-driven DL model: network architecture advance and domain-knowledge-based (expert knowledge) guidance in model parameter selection. Figure 1 upper panel shows the general framework used in [1] and the two newly added boxes that are the key elements we address in this paper.


Introduction
In the past decades, the increasing ocean-research-oriented satellites, sensors, acquisition, and distribution channels have brought new tasks and challenges to mine information from such big data with complex and sparse information. The information mining requirements from big data and the advance in deep learning (DL) technology showed mutual promotive benefits in practical ocean information extraction and DL-based framework development. In 2020, scientists showed that most information retrievals from ocean remote sensing images could be accomplished using existing DL network frameworks, i.e., U-net for semantic segmentation and SSD (Single-Shot Multi-box Detection) for object detection [1]. The U-Net's almost symmetric encoder-decoder structure and the skip connection between encoder-decoders have an excellent performance in retrieving fundamental semantic segmentation information in the ocean remote sensing imagery, such as coastal inundation area extractions [2]. SSD extracts feature maps of different data scales and takes a priori frames of different scales. Therefore, it has an excellent performance in detecting fundamental object detection problems in the ocean field, such as ship detection [3].
Although the off-the-shelf DL-based models are helpful, new developments in this field lead to a new era of DL-based technology for ocean remote sensing information mining. Specifically, two developments should be incorporated into the specific task-driven DL model: network architecture advance and domain-knowledge-based (expert knowledge) guidance in model parameter selection. Figure 1 upper panel shows the general framework used in [1] and the two newly added boxes that are the key elements we address in this paper.

Deep Network Architecture with Attention Mechanism
Ocean remote sensing images and time series resemble the image/video data stream in computer vision. This similarity provides a strong argument that various emerging DL architectures used in computer vision can be adopted to solve critical oceanography problems. A significant emerging trend in the computer vision field has been the attentionbased neural networks [4] that have made exciting progress in classification, regression, anomaly detection, and dynamic modeling [5]. The core of the attention-based neural networks is the attention function. Typically, an attention function represents a mapping between a query, a set of keyvalue pairs, and an output, where the input, output, and query are all vectors. The output is calculated as a weighted sum of the values, where the weight allocated to each value is determined by a compatibility function between the query and the corresponding key. The need to add the attention mechanism is two folds. First, according to the U-Net architecture, most existing DL-based ocean models use convolution operations, such as inundation area detection [2]. Convolutional operations process a local neighborhood, either in space or time. However, the convolution operator has a limited receptive field, thus preventing it from modeling long-range pixel depen-dencies. Secondly, the convolution filters have static weights at inference and cannot flexibly adapt to the input content. This disadvantage is not conducive to capturing signals of some fast-changing variables. The attention mechanism calculates response at a given pixel by a weighted sum of all other positions, thus capturing long-range dependencies

Detection results
Convolution operation: Capturing local information Attention operation: Capturing global information Encoder module: it is composed of convolutional layers and pooling layers, used for information extraction, and repeated for several times.
Decoder module: it is composed of upsampling/trans-convolutional layers and convolutional layers, used for resolution restoration, and repeated for several times.
Bottleneck module: it is composed of convolutional layers.
Output module: it is composed of convolutional layers, regularization layers, and pixel-wise classification layer.
Concatenation module: higher resolution image information is concatenated into lower resolution image information.
Multi-scale information is fused.
Attention module: capturing global information.
Existing DL framework   Journal of Remote Sensing with deep neural networks and overcoming the abovementioned issue.
In recent years, the attention-based transformer architecture has outperformed the previous convolutional-based architectures in various DL tasks. Furthermore, several studies have also applied the attention mechanism to ocean remote sensing image processing, such as sea ice detection [6], sea ice prediction [7], and cyclone intensity estimation [8].
Ren et al. [6] integrated the position and channel attention modules into an original U-Net model to form a dualattention U-Net model (DAU-Net) for sea ice detection. The DAU-Net integrates the SAR image's long-range and local-range dependencies, which helps extract more discriminating feature representations for classifying sea ice and open water. Experiments showed that the dual-attention mechanism helps DAU-Net extract more discriminating features than the original U-Net. Ren et al. [7] proposed an attention-based data-driven model for predicting daily sea ice concentration (SIC) of the Pan-Arctic, termed SICNet. For the Pan-Arctic, spatiotemporal dependencies exist on both global and local ranges. Thus, Ren et al. [7] designed a temporal-spatial attention module (TSAM) to help the SICNet capture accurate spatiotemporal dependencies. The TSAM employed a temporal convolutional network (TCN) as a temporal attention module to capture the long-range temporal dependencies and a spatial attention module to capture the long-range spatial dependencies. Wang et al. [8] added the spatial and channel attention mechanisms in the DL model of tropical cyclone (TC) intensity estimation using satellite images. The channel attention layer weights indicate that satellite images at 10.4 and 12.3 μm channels play a significant role in the estimation model. The spatial attention layer weights demonstrate that the DL model focuses on areas with low brightness temperature and TC eye.
The attention mechanism emphasizes the combination of global and local information, which is also in line with the oceanographic problems that require multiscale combined analysis. Figure 1 middle panel shows the newly proposed DL framework with the attention mechanism as a purple box module. As shown in the expanded view of the attention mechanism module, an image is divided into multiple patches. The attention network is modeled between every two patches to capture global information.

Incorporating Domain Science Knowledge into the DL Architecture Design
Knowledge-driven and data-driven approaches are complementary to each other. The knowledge-driven approach is based on physical rules to establish governing equations that are directly interpretable. The data-driven approach is based on statistical knowledge, is highly flexible in adapting to the data, and facilitates detecting signals that the governing equations ignore. However, the "black box" nature of the DL structure lacks interpretation. The fusion of domain and data sciences is an increasing trend in solving particular problems in science by DL [9]. Oceanography research is no exception requiring the combination of ocean theory with DL methods. Using domain knowledge can deal with complex ocean problems and alleviate the demand for data in the modeling process. Specifically, ocean theoretical domain knowledge can reduce the degree of freedom of input data dimensionality and thus the training difficulty of DL models. Ocean domain knowledge can be divided into physical constraints and spatiotemporal data processing methods. Integrating domain knowledge into DL models can be achieved using a multibranch network structure.
Physical knowledge can help to constrain the model construction. For example, using satellite sea surface height (SSH) and sea surface temperature (SST), Liu et al. [10] designed a dual-branch convolutional neural network with dense connections to simultaneously obtain ocean eddies' mesoscale dynamic and thermal characteristics. Zhang et al. [11] solved the small training dataset problem in retrieving internal solitary wave amplitude by combining satellite observations and lab experiments with transfer learning techniques. The lab experiment was specially designed following the similarity law and basic fluid mechanics principles. Furthermore, during the model establishment, ocean background information and internal solitary wave characteristics, which affect the internal solitary wave amplitude, were considered following the domain knowledge guidance. The results show that with domain knowledge informed, the input parameters and model structures can be carefully designed, and better model performance can be achieved.
The knowledge of spatiotemporal data processing has contributed to sea fog detection and mesoscale eddies detection. A dual-branch sea fog detection network is proposed comprising a statistical extraction module and a dualbranch optional module. Specifically, Chen et al. [12] designed a DL model for efficient detection of sea fog. Sea fog detection is more complicated than other segmentation tasks because of the difficulty of separating clouds and fogs. Aiming at the indistinguishable problem, [12] analyzed the difference from the reflection principle and designed a knowledge extraction module to extract statistical information in the visual space using prior knowledge. By introducing domain knowledge, the proposed method outperforms advanced semantic segmentation algorithms in sea fog detection; especially, it can effectively detect sea fog in an image with mixed cloud and fog. Mu et al. [13] developed a hurricane winds retrieval model that fully exerted DL's powerful data fusion advantage and deeply mined the hurricane information in synthetic aperture radar images. It showed that the model significantly improved the wind speed retrieval accuracy by simultaneously utilizing SAR measured physical parameters in backscattering energy, the texture feature represented by the grey level co-occurrence matrix, and the unique morphological hurricane feature. All that domain science knowledge considered by the DLbased framework is helpful for model training and fitting.
The above studies use domain knowledge and perform well in sea fog detection, mesoscale eddies detection, internal solitary wave retrieval, and wind field retrieval. Furthermore, these methods jointly extract discriminative features from both visual and knowledge domains. Thus, we believe that 3 Journal of Remote Sensing in other fields of oceanographic research, multibranch networks can also be used to combine domain knowledge to improve the model's performance further. The newly proposed DL framework (Figure 1 middle panel) is a typical dual-branch network, an example of a simple multibranch network structure. First, the visual branch extracts features from ocean remote sensing images. Then, the features extracted through expert knowledge are input into the network through the knowledge branch. Compared with extracting features from images in the existing AI framework, the dual-branch network provides richer modellable features, reducing modeling difficulty.
This article points out that network architecture advance and domain-knowledge-based (expert knowledge) guidance should be incorporated into the specific task-driven ocean remote sensing imagery processing. The attention mechanism emphasizes the combination of global and local information. Ocean theoretical domain knowledge provides compelling input features for DL models and reduces the degree of freedom of input data dimensionality.

Data Availability
There is no data associated with this article.