Unveiling Dynamic System Strategies for Multisensory Processing: From Neuronal Fixed-Criterion Integration to Population Bayesian Inference

Multisensory processing is of vital importance for survival in the external world. Brain circuits can both integrate and separate visual and vestibular senses to infer self-motion and the motion of other objects. However, it is largely debated how multisensory brain regions process such multisensory information and whether they follow the Bayesian strategy in this process. Here, we combined macaque physiological recordings in the dorsal medial superior temporal area (MST-d) with modeling of synaptically coupled multilayer continuous attractor neural networks (CANNs) to study the underlying neuronal circuit mechanisms. In contrast to previous theoretical studies that focused on unisensory direction preference, our analysis showed that synaptic coupling induced cooperation and competition in the multisensory circuit and caused single MST-d neurons to switch between sensory integration or separation modes based on the fixed-criterion causal strategy, which is determined by the synaptic coupling strength. Furthermore, the prior of sensory reliability was represented by pooling diversified criteria at the MST-d population level, and the Bayesian strategy was achieved in downstream neurons whose causal inference flexibly changed with the prior. The CANN model also showed that synaptic input balance is the dynamic origin of neuronal direction preference formation and further explained the misalignment between direction preference and inference observed in previous studies. This work provides a computational framework for a new brain-inspired algorithm underlying multisensory computation.


Introduction
The primate brain frequently combines multisensory information from different sensory modalities, such as information of visual, vestibular, auditory, and haptic origin, to improve the perception of the external world. Both visual and vestibular information is valuable for the multisensory cortex to infer self-motion and object motion direction accurately in real time. Previous experimental studies [1,2] showed that the macaque dorsal medial superior temporal region (MST-d) contains neurons responsible for multisensory encoding, e.g., vestibular and visual motion cues. Exper-imental studies observed that some MST-d neurons respond preferably to vestibular and visual motion in the same direction (called "congruent" neurons), while others prefer opposing directions (called "opposite" neurons) [1][2][3][4]. Recent theoretical studies suggested that congruent neurons mainly implement cue integration, while opposite neurons mainly perform segregation. The responses of the congruent and opposite neurons are critical for the animal to make inference about whether information from the visual and vestibular senses are attributed to a common source or two separated ones. However, the mechanism for the origin of congruent or opposite neurons in MST circuit is rarely studied.
Since sensory signals vary across modalities and conditions, Ernst and Banks proposed a general principle that the brain determines the degree to which a sense dominates the flow of information based on its reliability, which is defined as the variance of the sensory estimate [14]. Further works demonstrated that human and monkey subjects adjusted the weight of each sensor based on cue reliability (stimulus motion coherence) and made the decisions about motion directions in a near-optimal way [15][16][17][18]. Evidence from experiments suggested that multisensory cortical neurons, e.g., MST-d (dorsal medial superior temporal), could represent the weighing of cue reliability by the neural response [3], comprising a link between single-neuron activity and behavior. Combining the theories and behavioral data, many works assumed that the neural system performs causal inference in Bayesian approach [19][20][21], which redefines the inference problem as an assessment of the posterior probability of integration based on the measurement distribution of each cue. The Bayesian causal inference (BCI) model managed to explain the experimental findings that spatially concurrent visual and vestibular inputs improved direction discrimination performance [3], which is mainly attributed to congruent neurons [4], while spatially confined inputs are canceled out by opposing neurons [22,23]. Furthermore, by fitting the distribution width of sensory measurements, Bayesian computation captures the varying uncertainty that originates from both stimuli contrast and physiological noise [24]. Nonetheless, the BCI model does not explain the biophysical computation principle in neural systems due to the limit of its mathematical form. It remains to be solved how the Bayesian approach is achieved physically by neural systems, especially how the prior and posterior probability is represented by neuronal firing.
In this study, we seek to combine both physiological recordings and computational models and explore two key scientific questions: (1) How are visual and vestibular signals integrated and separated by neuronal circuits? (2) How do multisensory computing algorithms emerge from hierarchical cortical circuits for behavioral-level inference decisions? In contrast to previous works that attribute the inference to the interaction between the multisensory areas [13], this study first focuses on the multisensory computation of each MST-d neuron, which plays a more fundamental role. By investigating the physiological data from monkey MST-d neurons, we found that neuronal direction preference in the MST-d is correlated with relative synaptic input strength between the visual and vestibular inputs, which may entail multisensory computation. We further built a multipartite cortical circuit model that is composed of three continuous attractor neural networks (CANNs) [25]. The circuit consists of three ring attractor networks, mimicking input transmission from the unisensory visual and vestibular regions to the MST-d along the cortical hierarchy. We demonstrate by this model that MST-d neurons naturally compose the coding bases for integration or separation by various synaptic coupling strengths between the inputs. The change of direc-tion preference as observed in the data is an explicit form of the nonlinear coupling dynamic.
Next, we went a further step and applied this computational principle to hypothetical decisions about whether the inputs should be integrated. We revealed that individual MST-d neurons implement a fixed-criterion strategy, which makes decisions with deterministic boundaries. However, by pooling MST-d neurons with different synaptic coupling strengths, the MST-d population implements Bayesian inference that leads to distinct decisions based on cue reliability, constituting a bioplausible process for multisensory inference.

Analysis of Physiological Data.
We began with physiological recordings. We first characterized MST-d neurons with their tuning response functions to the input senses. The MST region is not only a crucial multisensory region (responding to both visual and vestibular inputs [1,26]) but also correlates with perception at the behavioral level [4,27,28]. Specifically, the dorsal part of the MST has a large receptive field that can respond to translational motion signals [29], which is well suited for detecting self-motion and object motion features in the horizontal plane.
In our experiments, monkeys were seated on a motion platform attached to a screen (Figure 1(a)). Without reporting, the subjects perceived visual motion through the optical flow presented on the screen or/and vestibular motion through the translation of the platform in the horizontal plane, comprising unisensory or multisensory conditions. Both visual and vestibular motion cues are designed to represent the same velocity and acceleration from one of 8 directions with 45°intervals. The multisensory condition contains 64 combinations of visual and vestibular input directions (Figure 1(b)), while the unisensory conditions contain 8 directions of each input. MST-d neuronal activities were recorded by a single-unit technique during the delivery of visual and/or vestibular stimuli. Figure 1(c) shows the response function of an example MST-d neuron in the unisensory condition (side panels) or multisensory condition (middle panel). Note that the unisensory condition means that either visual or vestibular cues are presented without the other cue (visual-only or vestibular-only), and the unisensory response is a function of each cue direction (θ).
where θ vis and θ ves represent the visual and vestibular cue directions, f vis and f ves are the neuronal spatial tuning response functions, and R max vis and R max ves are neuronal maximal responses to either visual or vestibular cues, respectively, across all directions. We categorized each MST-d neuron by the balanced or imbalanced response level based on the ratio (r) By definition, r ≥ 1. To make it simple, we defined a neuron as balanced neuron when 1 ≤ r ≤ 1:7 and as imbalanced when r ≥ 1:7 (the criteria r = 1:7 are explained later). In the data, 70 of 115 MST-d neurons were identified as balanced neurons, characterized by relatively balanced synaptic inputs (Figure 2(a) top panel), and the average max ðR max vis , R max ves Þ was 1.28 times the value of min ðR max vis , R max ves Þ for neurons in this group. On the other hand, 45 of 115 neurons were identified as imbalanced neurons, characterized by the dominance of one input over the other (Figure 2(a) bottom panel). The value of max ðR max vis , R max ves Þ in this group was 2.35 times that of min ðR max vis , R max ves Þ on average. Since the cues were applied with the same reliability (velocity and acceleration), the ratio r indeed measures the degree of contribution from one sense over the other in each neuron. It was proved that response amplitudes encode the input reliability that is linked to the signal-tonoise ratio [3]; thus, r is a neuronal precoded bias property of cue reliability, which is independent of real-time stimuli.
Following the classification, the two encoding bases contained distinct tuning weights in unisensory condition. To specify the encoding properties in multisensory condition where both cues are presented, we examined the neural response R mul that is a function of both cue directions θ vis and θ ves (Figure 1(c) middle panel) where f mul is the multisensory tuning curve function. R max mul = max ðR mul Þ on the response contour is appointed to the maximal neuronal response to a specific pair of visual and vestibular directions (denoted as θ pref vis,mul and θ pref ves,mul ). We defined the spatial disparity between the two directions as a multisensory-preferred disparity.
Similarly, the unisensory-preferred disparity was defined by the spatial disparity between the preferred directions of  Figure 1: Experimental protocol and multisensory preference schematics. (a) Schematic of the physiological experiment. The screen was placed on a platform with 6 degrees of motion. Visual motion stimuli were simulated by the movement of a dot cloud, and vestibular motion stimuli were simulated by the movement of the platform. (b) Systematic table of multisensory stimuli. Each stimulus took 8 discretized movement directions spaced by 45°; thus, 64 neural responses were recorded for each neuron under the multisensory condition. (c) Neuronal responses in unisensory (side panels) and multisensory (middle panel) conditions. Multisensory preference was identified from the maximum response (R max multi ) in the grid of 64 responses that specifies a joint visual and vestibular preferred direction. Multisensory tuning curves were identified by fixing the direction of one modality at the preferred value while varying the other (multisensory visual curve: solid red line, multisensory vestibular curve: dashed red line). Both multisensory tuning curves intersect at the maximal response R max multi .
Both Δθ uni and Δθ mul range from 0°to 180°. In contrast to Δθ mul , Δθ uni denotes the neuronal preference in the absence of multisensory interaction. As a result, Δθ uni is interpreted as the synaptic input disparity to each neuron, which represents the topological distance between the inherent preferences of two independent cues after synaptic learning.   . As a result, Δθ mul is usually different from Δθ uni in each neuron, which revealed the effect of multisensory operation on the encoding dynamics. Intuitively, Δθ mul is interpreted as the disparity between the preferred directions of two sensory cues. The top panels of Figures 2(b) and 2(c) present a typical balanced neuron that the disparity between the preferred directions increased when switching from the unisensory to multisensory condition. The bottom panels of Figures 2(b) and 2(c) present an example imbalanced neuron in which the preferred disparity exhibited a decrement. The disparity preference change in multisensory condition resulted from the shifted peak of tuning curves (multisensory tuning curves are derived from red lines in Figure 1(c)). The neural mechanism will be explained by a computational neural network in the next section.
To specify the disparity preference at the population level, we investigated the probability distribution of unisensory ( Figure 2(d)) and multisensory preferences (Figure 2(e)) in both balanced (left panels) and imbalanced neurons (right panel). The unisensory condition was characterized by polarized distributions. Balanced neurons generally preferred small Δθ uni values, while imbalanced neurons generally preferred large Δθ uni values. The summed population of MST-d thus had a bipolar distribution that parallels previous report [1]. Under multisensory conditions, imbalanced neurons presented a major transition to preferring a small disparity between both cues. On the other hand, balanced neurons featured a uniform-distributed preference. We further specified the relationship between Δθ uni and Δθ mul , as shown in Figure 2(f). It is clear that balanced neurons had a stronger correlation between the two disparities. A small Δθ uni generally led to a small Δθ mul and vice versa. In contrast, imbalanced neurons had a weaker correlation, and Δθ mul was generally biased to 0°regardless of Δθ uni .
Based on the results above, we hypothesize that imbalanced neurons may serve as a sensory integration encoding basis because (1) it is commonly acknowledged that animals are inclined to integrate stimuli with small spatial disparity [20], in which imbalanced neurons are more likely to respond strongly given the dominance of the small Δθ mul value. (2) If one cue is unreliable, the subject tends to integrate the cues by giving larger weights to the more reliable one [14], which matches the precoded reliability bias of imbalanced neurons. Accordingly, we postulated that balanced neurons serve as a separation encoding basis in neural circuits since they are the counterparts of each other. In short, the balanced and imbalanced neurons have the potential to encode the spatial disparity of visual and vestibular cues in a reliability-based manner. We assumed that the Δθ mul distributions are identical in all directions; thus, only disparity coding was considered in this study, while specific directions were omitted. Next, we aimed to prove this hypothesis by a computational model that the response ratio, for either balanced or imbalanced neurons, is the dynamic origin that determines whether the MST-d neuron is a multisensory integration or separation encoder.

Continuous Attractor Neural Network (CANN)
Modeling. To test our hypothesis that the response ratio for balanced or imbalanced neurons can determine neuronal encoding function, we constructed a multipartite cortical circuit model as a continuous attractor neural network (CANN, Figure 3(a); modified from [25], see Methods for details). The model simulated hierarchical sensory processing composed of three neural networks, two of which were the unisensory middle temporal region (MT) and parietalinsular vestibular cortex (PIVC). We assumed the third network to be a multisensory subnetwork that received the population outputs from the MT and PIVC regions and further determined the multisensory preference of MST-d neurons downstream. In other words, the CANN model simulated multisensory preference formation in specific synaptic input conditions; thus, it is independent of real stimuli.

Model Sketches.
Each network is composed of 180 neurons, whose positions denote preferences and are topologically aligned across networks. Neuronal dynamics are featured by a rate-based model in which activity ranges from 0 to 1 [25]. The inputs to MT and PIVC are visual and vestibular signals, respectively, and were simulated by a Gaussian function that centers at position θ vis or θ ves and has a wide range (Figure 3(b) top panel). Once the inputs were received, the neurons in two unisensory layers showed group response "bumps" mainly due to the lateral connections in a Mexican hat shape (Figure 3(b) bottom panel), and the center of the bump was usually distorted from θ vis or θ ves by neuronal intrinsic noise.
Then, the response bumps were sent forward to the multisensory subnetwork. Notably, the response ratio in the data reflects the synaptic property because the inputs were applied with the same motion intensity. It is well acknowledged that synaptic input is the product of synaptic weight and the input firing rate. Since the input firing rate is normalized to 1, we assumed that the synaptic input is proportional to the synaptic weight; thus, the response ratio is interpreted as the synaptic weight ratio (more concisely referred to as the synaptic ratio).
Therefore, balanced and imbalanced neurons were simulated by adjusting the proportion of the forward connection weights from the MT and PIVC to the subnetwork. The simulations were repeated 1000 times, and the occurrence of the weight ratio value followed the balanced (Figure 3 When the forward inputs from the MT and PIVC arrived at the subnetwork, they still carried the distorted displacement between θ vis and θ ves due to the topological alignment setting. The neurons in the subnetwork responded to the inputs and formed bumps due to lateral connections (with the same parameters as unisensory layers), and the dynamics of the subnetwork are characterized by bifurcation states. It is obvious that when the inputs are close in distance, the neurons are prone to form a common response 5 Research bump and cooperate (Figure 3(d)). In contrast, when the inputs are distant, the neurons are prone to form two independent bumps and compete (Figure 3(e)). Due to noise interference, both states occur by chance given fixed θ vis and θ ves ; thus, the final bump pattern is bifurcated. When neuronal responses are saturated in rate-based dynamic, both states are stable due to the countereffect of excitatory and inhibitory components in lateral connections, and the bumps do not collapse unless the inputs are removed.  Figure S1). Group cooperation indeed results in sensory integration because the visual and vestibular inputs share the same receptive field. In this manner, the response of MST-d neurons indicates that they originate from the same source. In contrast, group competition results in sensory separation because the MST-d response represents visual and vestibular inputs with distinct receptive fields and thus distinct sources. The integration and separation are robustly encoded by measurable neuronal responses. Consistent with data analysis, we denoted the distance jθ vis − θ ves j as unisensory disparity Δθ uni , which is a hyperparameter in 6 Research CANN simulations. Since MST-d neuronal preference is characterized by the subnetwork, the distance between the responding bumps on the subnetwork is denoted as Δθ mul . When only a common bump exists, Δθ mul is defined as 0°.

Simulation Results.
By introducing balanced and imbalanced forward ratios (Figure 4(a)), we first investigated the dynamic property in the time domain. Inputs were applied at t = 0 ms and maintained until the end of the trial (t = 3000 ms). Figure 4(b) shows that Δθ mul usually became stable after 1500 ms (arbitrary unit). In the balanced group, the majority of simulated neurons (658/1000 ≈ 66%) presented two independent response regions in the subnetwork, and the corresponding Δθ mul ranged from 60°to 180°in the end (Figure 4(b) top panel). However, in the imbalanced group, the majority (664/1000 ≈ 66%) had one common response region in the subnetwork, where Δθ mul = 0°in the end ( Figure 4(b) bottom panel). Consistent with the data analysis, we chose the time window of 500 ms (t 1 ) to 1500 ms (t 2 ) and computed the mean Δθ mul in this window. Figure 4(c) shows that when MST-d neurons encode integration by a common response region, the mean Δθ mul does not necessarily decrease to 0°in the time window, especially for neurons in the imbalanced group. This validated that integration is encoded not only by neurons with Δθ mul = 0°b ut also by neurons with Δθ multi approximately 45°. Conversely, separation is encoded by balanced neurons with Δθ mul ∈ ½60°, 180° and imbalanced neurons with Δθ mul ∈ ½120°, 180°.
Next, we correlated the encoding functions from CANN simulations to physiological data. As mentioned above, experiments involved a 45°range for each recorded direction; thus, the neurons with Δθ mul implied the range ½Δθ mul − 22:5°,Δθ mul + 22:5°. We concluded that balanced neurons with Δθ mul ∈ f90°, 135°, 180°g encode separation, which includes a spatial range ½67:5°, 180° that is close to CANN predictions. On the contrary, balanced neurons with Δθ mul ∈ f0°, 45°g encode integration. For imbalanced neurons, those with Δθ mul ∈ f135°, 180°g encode separation, while those with Δθ mul ∈ f0°, 45°g encode integration. Imbalanced neurons with Δθ mul = 90°are reasonably omitted because they are both rare in CANN prediction and data ( Figure 4(d) bottom panel, the 4 neurons with Δθ uni = 90°a nd Δθ mul = 90°were found to correlate more with balanced neurons with mean r = 1:84, which is significantly smaller than that of other imbalanced neurons, p = 0:021, n 1 = 11, n 2 = 34, two-sample t-test). Figure 4(e) demonstrates the integration-encoding probability (p int ) of MST-d neurons as a function of Δθ pref uni (hereafter briefed as integration function). p int denotes n int /ðn int + n sep Þ at each Δθ pref uni , where n int is the integration-encoding neuron count and n sep is the separation neuron count in the balanced or imbalanced group at each Δθ uni . In the CANN model, p int denotes n′ int /ðn′ int + n ′ sep Þ, where n ′ int is the number of trials that form one response region in the end and n′ sep forms two response regions in the end. The total trial count (n ′ int + n ′ sep ) is 1000 at each Δθ uni . The CANN model fitted well with the experimental data. Both the data and simulations showed that the integration-encoding probability decreased with increasing synaptic disparity (Δθ uni ). Accordingly, the separation-encoding probability increased. In general, the balanced neurons were biased to encode separation, while imbalanced neurons were biased to encode integration. In conclusion, the model simulated hierarchical processing from unisensory to multisensory regions and validated that MST-d neurons receiving balanced synaptic inputs generally encode sensory separation, while those receiving imbalanced synaptic inputs generally encode sensory integration. Jointly, the functional distinction enables MST-d balanced and imbalanced neurons to be effective bases for multisensory encoding. This proved our early hypothesis raised from experimental analysis that the balance level of synaptic coupling strengths from MT and PIVC input to MST-d neurons may be the key mechanism in driving individual MST-d neurons to be congruent neurons for integration or opposite neurons for segregation by a selforganization process.

Dynamic Modulations from the Synaptic Ratio and
Intrinsic Noise. We further investigated the role of neuronal intrinsic noise by altering noise intensities in the CANN model. Different noise intensities were simulated by Gaussian noise with 0 means and different standard deviations (σ noise ). In each noise condition, we simulated the neuronal integration probability p int with synaptic ratios of 1.0, 1.8, 2.2, and 2.6. Surprisingly, we found that neuronal intrinsic noise not only determined how distinct the encoding functions were but also altered the effective encoding bases themselves.
We first simulated a noise-free condition (σ noise = 0, Figure 5(a)). In this condition, it was obvious that the model MST-d neurons encoded the inputs based on a fixed boundary (criterion) [30], and the boundary increased with the synaptic ratio (the criterion approximately 70°, 90°, 130°, and 150°as the ratio increased from 1 to 2.6). If the synaptic disparity range was in the boundary, the neuron robustly encoded integration; otherwise, it encoded separation. At the computational level, it is commonly acknowledged that effective encoding requires distinct neuronal responses. In this condition, effective encoding obviously relies on neurons preferring congruent or opposite stimuli, which is the congruent or opposite neurons in [1,13]. When real inputs have a low degree of disparity, congruent neurons respond actively to infer integration, while opposite neurons remain inactive, and the individual can infer integration based on the response of congruent neurons and vice versa. Without loss of generality, we only discuss the encoding of disparity, while absolute direction is omitted.
With increasing noise levels ( Figure 5(b)), the integration functions of different ratios became increasingly distinct Sep (66%)  Research from each other across the range of Δθ uni values, and the boundary in each function was blurred. At the noise level that fitted the data best, effective encoding was produced by balanced and imbalanced groups as computational bases ( Figure 5(c)), which was the case we demonstrated above. In this case, integration was effectively inferred by a higher group response from imbalanced neurons and separation from balanced neurons. In other words, neuronal intrinsic noise transferred the encoding bases from congruent and opposite neurons to balanced and imbalanced neurons, where the synaptic ratio played a dominant role to discriminate the functions. The encoding of congruent and opposite neurons became less effective because their responses were usually similar under noisy conditions, shown as flatter integration functions across the range of Δθ pref uni values. At a high noise level (σ noise = 15, Figure 5(d)), all integration functions approximated 50% since randomization was dominant, and the modulation from the synaptic ratio also deteriorated. Thus, neither encoding by congruent and opposite neurons nor encoding by balanced and imbalanced neurons was efficient.
These results suggested that the synaptic ratio and intrinsic noise are both keys to neuronal multisensory com-putation. To quantify the modulation of the synaptic ratio, the integration functions were averaged as p int at the bestfit noise level ( p int = ∑ n i=1 p int ,Δθ i uni /n). With the increasing synaptic ratio, MST-d neurons were automatically classified as integration or separation encoders by a threshold of 1.8 ( Figure 5(e)). When the ratio was less than 1.8, neurons generally served as separation encoders on average. When the ratio was greater than 1.8, they generally became integration encoders. Thus, the encoding function of MST-d neurons was dynamically determined by the synaptic ratio. Notably, the threshold we chose to classify the data (1.7) in physiological analysis was close to this optimal threshold.
Moreover, we propose that the typical balanced (r = 1:28) and imbalanced (r = 2:35) encoding was enhanced by the stochastic resonance (SR) mechanism ( Figure 5(f)). SR refers to the situation in which the existing noise improves the input and output signal-to-noise ratio [31][32][33]. In multisensory encoding, the effect of SR was interpreted as inference efficiency (ε) measured by the revised Kullback-Leibler divergence between the integration functions of two encoders (see Methods). Intuitively, this can be interpreted as more effective encoding if the function curves deviate more from each other, allowing clearer representations.

Research
The model predicted that maximum efficiency was achieved when σ noise = 6:5 (≈0.26 times the peak amplitude of the model external input), which was close to the optimal noise level (σ opt noise ) of 6 in our model. This indicated that real MST-d neurons achieved near-optimal encoding in the noisy physiological environment. Figure 5(f) also demonstrates the rescaled efficiency of congruent and opposite encoding (see Methods). In this case, the efficiency was maximized when σ noise =0.8 (≈0.03 times the peak amplitude of the model external input), suggesting that encoding by congruent and opposite neurons was most efficient in the lownoise condition. Nevertheless, encoding by balanced and imbalanced neurons was more efficient over a wide range of noise intensities. In the predicted physiological noise condition (σ best-fit ), the efficiency of balanced-and-imbalanced encoding was 185% compared to that of congruent-andopposite encoding (in Supplementary Figure S3, the model results with intrinsic noise following a uniform distribution are also presented).

Dynamic Modulation from Lateral Connections.
Based on the CANN model results, we propose that balanced and imbalanced neurons comprise effective encoding with the help of intrinsic noise. We further investigated the role of lateral connections in the subnetwork. Potentially, these connections have a critical role in the encoding functions because the countereffect from excitatory and inhibitory connections is the main cause of the bifurcation states. We introduced a scale factor (β scale ) to both excitatory and inhibitory components to increase or decrease the lateral connection weights (Figure 6(a)), while other parameters held the same weights. The best-fit CANN model refers to β scale = 1:0. Figure 6(b) presents the integration-encoding functions in balanced simulations given different β scale values. Separation encoding was robust when lateral weights decreased by half (β scale = 0:5). Notably, the balanced neurons still encoded separation even when lateral connections were removed (β scale = 0), although the integration probability was slightly larger. This is possibly due to the removal of inhibitory components, which is critical in competition among regions. When lateral weights increased (β scale ∈ f1:5,2g), the function of balanced neurons was shifted to encode integration. For imbalanced neurons (Figure 6(c)), neuronal function was augmented to encode separation when lateral weights decreased (β scale ∈ f0:75,0g), but the function to encode integration became more robust (p int~1 ) when lateral weights increased (β scale ∈ f1:25,2g).
In the best-fit model (β scale = 1), the total lateral input to one neuron in the subnetwork was nearly one-fifth of the total forward inputs. Although the lateral inputs appeared to be subordinate, these results indicated that lateral connections in the subnetwork are also critical to the functional distinction. In conclusion, effective encoding must meet certain requirements of the synaptic ratio (ratio between forward weights), presence of noise, and adequate lateral weights on the multisensory layer. These three factors comprise the neural mechanism of MST-d dynamic encoding.

Inference Decision Arising from MST-d Populational
Network. The analysis above demonstrated the hierarchical mechanism of computing function of individual MST-d 10 Research neurons in multisensory integration and separation. In this section, we took a further step along the cortical hierarchy and investigated the underlying algorithm of multisensory decision based on the existing of integrators and separators in the circuit. Specifically, we sought to resolve the debate about whether the high-level cortex performs Bayesian decision or non-Bayesian decision, such as those with deterministic strategies [30]. By simulating a bioplausible decision-making process, we prove that a single MST-d neuron utilizes fixed-criterion (FC) strategy, which makes deterministic inferences based on explicit boundaries (non-Bayesian). However, at the population level, the MST-d neuron group may compute causal inference in a reliability-based Bayesian strategy, which takes each MST neuronal response as a sampling of prior distribution to calculate the posterior probability of the cue origin state accordingly.

2.3.1.
Single-Neuron Level. The causal inference here denoted the binary judgment of whether the observed visual and vestibular inputs (x vis and x ves ) are attributable to one common cause (C = 1) or two separate causes (C = 2).
We first focused on the inference performed by a single neuron. Under noise-free conditions, we verified in the last section that MST-d neurons adopt an FC strategy, which means that the neurons infer a common cause (C = 1) if the two measurements are closer than a fixed boundary κ (jx vis − x ves j < κ) and infer two separate causes (C = 2) otherwise. The strategy is deterministic because the boundary κ is explicit and determined [30]. However, the presence of neuronal intrinsic noise (ξ) blurs the measured disparity. To test whether MST-d neurons still follow the FC strategy in the noisy condition, we simulated the noisy FC strategy as jx vis − x ves j + ξ and fitted the integration decision from this strategy with the neuronal integration functions (see Methods). The parameter was boundary κ, which varied across functions, and the noise level was ξ. Figure 7(b) demonstrates that the noisy FC strategy closely reproduced integration functions in the case of different synaptic ratios (data here refer to CANN functions for higher Δθ uni resolution). This suggested that in the noisy condition, each MST-d neuron still performed deterministic inference with fixed and explicit boundaries, and the boundary varied among neurons due to different synaptic ratios.

Population Level.
At the MST-d population level, we postulated that a new strategy emerges due to the pooling of various boundaries. Since the decisions are decoded from responses in neural circuits, we first characterized the MSTd neuronal multisensory response properties based on the recorded data. The multisensory response was characterized as a function of both preferred disparity and real cue disparity (f bal and f imbal , Figure 7(c)). Both f bal and f imbal were obtained by averaging the multisensory responses relative to the preferred condition in real data (see Supplementary Figure S2 for methods), and they characterized the real-time balanced or imbalanced neuronal response to hypothetical cues as R = f ðΔθ ′ Þ, where Δθ ′ = jΔθ mul −Δθ cue j. For example, if Δθ cue is 30°, the neurons for which Δθ mul =30°exhibit a maximal response because f ðjΔθ mul −Δθ cue jÞ = f ðΔθ′ = 0°Þ = R max multi . However, the neurons for which Δ θ mul =180°have low responses because f ðΔθ ′ = 150°Þ. We simulated Δθ cue from 0°to 180°in the horizontal plane, which is the same as the experimental conditions. Second, we adopted a probabilistic Monte Carlo sampling process of neuronal responses in balanced and imbalanced groups individually. The response sampling is widely observed along the cortical hierarchy [34,35]. Here, the sampling of MST-d neuronal responses simulated that MST-d neurons were randomly activated by fixed Δθ cue in noisy physiological condition, and each neuronal response served as a prior sample of the cues based on the inherent tuning property. To seek a minimal requirement to realize flexible decisions, we sampled 15 neurons from MST-d group, 9 of which were balanced neurons and 6 of which were imbalanced neurons. The proportion followed data observations (n bal : n imbal = 70 : 45 ≈ 3 : 2).
Next, the sampled responses were summed as balanced or imbalanced group responses and sent to a decision neuron, which compared the group response amplitude to reach a binary decision (the limited sample size is plausible because it is little likely that responses of all MST-d neurons are sent to a common neuron downstream). Since we proved that imbalanced neurons are integration encoding bases and balanced neurons are separation encoding bases, the resultant decision was to integrate the senses and report common source if imbalanced groups had higher responses or to separate the senses and report different sources otherwise. At each Δθ cue , we simulated such binary decision-making for 100,000 times to obtain the averaged decision probability of reporting a common source (p common ) as a function of external cue disparity (Figure 7(d)).
It is crucial to note that balanced and imbalanced neurons are actually reliability-based. Previous works proved that imbalanced neurons are more sensitive to reliability changes in the dominant cue but less sensitive to those in the subordinate cue [36]. On the other hand, balanced neurons have equal sensitivity to both cue reliabilities. Thus, it is reasonable to expect that when one cue is unreliable, the response of the balanced neurons is weaker, while the imbalanced neurons that prefer the other cue maintain the response level (we assume each decision is made with imbalanced neurons with the same cue dominance). The response change was simulated by amplitude scaling (R′ = α•R), where α is the scale factor. In this situation, the amplitude change was expressed as α bal < 1 and α imbal = 1. We simulated a mild decrease to balanced responses as α bal ∈ f0:9, 0:8g, and we presented that the probability of reporting a common source (p common ) significantly shifted toward 100% (Figure 7(d), circles, from black to light blue), which means that the decision was prone to integrate cues when a specific cue was unreliable.
To specify the strategy during such decisions, we simulated classical Bayesian optimal inference [20] (see Methods).

Research
The Bayesian strategy differs substantially from the FC strategy because it computes the posterior probability of C = 1 or C = 2 based on sampled measurements x vis and x ves and prior pðC = 1Þ. We considered the Bayesian strategy here because the prior may be computed by pooling the fixed boundaries during the decision. In the simulation, the prior was set as a parameter that varied across the decision functions. As the curves in Figure 7(d) demonstrated, the Bayesian strategy approximated well with decisions from the sampling process. Notably, the case when α bal = 1 and α imbal = 1 matched the real neuronal recording when both cues represented the same motion. In this case, the best-fit prior was 0.48, which was close to the flat prior (0.5). This result suggested that the inference in the adult brain assumes a fair prior when both cues are equally reliable. Furthermore, when the reduced reliability of one cue was projected to the weaker response of balanced neurons, the prior increased accordingly (Figure 7(e)) and resulted in a higher probability of reporting a common source.
Next, we considered the other case in which the dominant cue is unreliable. Consequently, the imbalanced neuron response decreased more than the balanced neurons. It is obvious that when both responses are decreased proportionally, the decision is the same as that when α bal = 1 and α imbal = 1. Thus, we simulated the case in which α bal = 1 and α imbal ∈ f0:9, 0:8g instead (Figure 7(f)). We presented data in Figure 7(g) that decisions from the sampling process were biased to different sources in this case. The Bayesian strategy consistently approximated the decisions with decreasing prior (Figure 7(h)). In conclusion, we proved that the Bayesian strategy naturally emerged from the encoding of balanced and imbalanced encoding bases. The two bases linked disparity coding with cue reliability in the form of prior computation and produced flexible decisions that varied accordingly with cue reliability. In other words, the fMCS model provided a physical base to derive multisensory decision by the posterior probability of integration based on the response samples in MST-d.
Finally, we demonstrated the Bayesian interpretation of sample size (total size = n′ bal + n′ imbal , Figure 7(i)). Note that the sample size in this section is independent of the sample size in experimental recordings. The variation of sample size could result from inherent synaptic connection from MST-d neurons to the specific decision neuron or from different intensities of external cues, where strong intensity activates more MST-d neurons, and more samples participate in a decision trial therewith. With an increasing total sample size, the decision functions showed steeper gradients (Figure 7(j)), indicating that decisions were made with more confidence. In other words, accumulating evidence is represented by more neurons participating in the decision. This is consistent with behavioral conclusions that the more evidence accessible, the more confidence is associated with the decision [37,38]. The decision functions were nicely fitted by a Bayesian strategy with a narrower probability distribution, while other parameters were fixed (Figure 7(k), pðx vis jsÞ and pðx ves jsÞ, where s denotes the sensory source and x the sensory measurements; prior probability = 0:51 in these simulations; see Methods for details). The narrowing distribution in Bayesian fitting matched the statistical rules that standard error decreases with increasing sample size (standard error = σ/ ffiffiffi n p , where σ is the standard deviation of sensory measurements and n is the sample size). These results strongly proved the emergence of a probabilistic Bayesian strategy from coding in the balanced and imbalanced groups of neurons.
Previous works argued that both FC and Bayesian strategies are likely to underlie brain inference, and we validated that the two strategies function on different levels. Crucially, a Bayesian decision cannot be reduced to the summation of individual neurons that perform FC strategy only. This is because each neuron carries a fixed prior representation, but the change in the prior is computed by pooling neurons with different synaptic features during a decision. Such group coding allows the realization of a flexible probabilistic decision that is biased to integrate the inputs from different modalities when one of them is not reliable and separate them when both of them are reliable. This section proposed the Bayesian-decision emergence by varying neuronal response. We also presented in Supplementary Figure S4 that change of proportion between balanced and imbalanced neurons also produced Bayesian-like decision, where the prior has more drastic bias. The change of proportion could result from various neuronal activation threshold in physiological condition. The proportion change added to the multisensory inference flexibility in real brain.

Discussion
This study is aimed at revealing the neural encoding mechanism of multisensory causal inference in the MST-d. The novel focus of this paper is to prove computationally that the balance level of synaptic coupling strengths from MT and PIVC inputs to MST-d neurons may be the key mechanism in driving the MST-d neurons to be congruent encoders for integration or opposite/intermediate encoders for segregation by a self-organization process. The computational results also demonstrated another novel mechanism that produces the maximum coding capacity by stochastic resonance with the optimal intensity of intrinsic noise. Based on these mechanisms, we further demonstrated that each MST-d multisensory neuron implements a non-Bayesian FC strategy, but the prior emerges by pooling diversified criteria. Moreover, MST-d neuron populations implement a probabilistic Bayesian strategy. Therefore, this study established a computational framework that the feedforward inputs from early pathways play a key role in determining the emergence of congruent and opposite neurons in multisensory decoding, and sensory motion discrimination requires both Bayesian and non-Bayesian strategies, each serving at the single-neuron or populational level.

Synaptic Coupling Generates a Novel Set of Neural Bases
for Multisensory Encoding. Previous works generally assumed that unisensory congruent neurons perform sensory integration, while opposite neurons perform sensory separation; theoretical works verified this assumption through vector computation [1,13]. However, there are also works 13 Research demonstrating that congruent neurons can perform separation, and opposite neurons can perform integration [39].
In this study, we first pointed out that neuronal preference in multisensory conditions is usually different from that in unisensory conditions. Therefore, implementing multisensory coding by unisensory distributions generally led to confusion. Second, we proposed the novel encoding bases of balanced and imbalanced neurons, whose encoding properties originates from inherent synaptic features but not unisensory features. In this case, the congruent and opposite neurons are not tightly correlated with the balanced and imbalanced categories (Supplementary Figure S5). The CANN model further demonstrated a neuronal encoding mechanism based on balanced and imbalanced neuronal bases, which is nearly twice as efficient as the mechanism based on congruent and opposite bases ( Figure 5 We specified the role of these neurons as separation encoding, which explained behavioral observations that separation estimation generally has less estimated error than integration [20,40]. In our theory, the separation encoding basis is the balanced neuron group, which has more neurons encoding a wider range of disparity (from 60°to 180°). Therefore, separation encoding has higher resolution in comparison because the disparity is detected by more neurons.
In our theory, causal inference emerges from the sensory processing hierarchy from unisensory to multisensory cortical areas, which has been verified by fMRI studies [21]. As the means of connection of this hierarchy, the synaptic coupling mechanism explains the finding in evolutionary biology that multisensory computation emerges postnatally with the development of synaptic connections [41]. In the superior colliculus (SC) of the cat, multisensory neurons are initially unable to integrate combinations of sensory cues to produce significant enhancement or depression of responses [42], but the ability to integrate cross-modality inputs gradually increases with postnatal experience as cortical SC synapses are modified following Hebbian rules [43]. Furthermore, the findings of the present study are in line with functional conclusions that the spatial reference frame of the MST-d varies as a function of the relative strength of visual and vestibular inputs when different modalities are combined [44].

Model
Interpretation of the Subnetwork. We propose that the subnetwork is necessary for multisensory attractor computation, while the recorded MST-d neurons mainly serve to report the inference. There are a few plausible explanations for this subnetwork identity. First, this network belongs to an anatomical subregion in MST-d. In this case, the responding units involved in separation encoding may be recorded as singularly tuned neurons because they only respond to a single input, while in integration encoding, the responding units may be considered multisensory neurons with aligned preferences. This hypothesis will be further substantiated if structural organization in MST-d is revealed in the future. Another potential identity is the ventral intraparietal area (VIP). The VIP meets three requirements crucial to attractor computation: (1) the VIP is anatomically connected with the PIVC [45] and reciprocally connected with the MT [46]; thus, in the multisensory condition, the response of the VIP may be fed back to preceding cortical areas to change the receptive field property. (2) Almost half of the neurons in the VIP are dominated by congruent visual and vestibular preferences [47]. This is crucial to the multisensory attractor computation because it passes Δθ uni from unisensory areas to a common level and the disparity feature is preserved. (3) VIP neurons have a larger activity correlation than MST-d neurons measured by correlated noise [47], which indicates that VIP neurons process multisensory signals more intensively than other neurons. Although it seems inappropriate to assume a whole network computation for each recorded MST-d neuron, the majority of neurons in this subnetwork are redundant and can be released from the circuit after synaptic modification, and only specific neurons carrying multisensory direction preference are preserved. It is also possible that the hypothetical subnetwork stands for an undiscovered type of multisensory synaptic attractor learning rule, which requires further work.

Probabilistic Bases in Group Encoding
Are the Key to Flexible Probabilistic Decision-Making. MST-d neurons are predetermined as integration and separation encoders based on synaptic inputs. Due to noise, balanced and imbalanced groups have distinct functions in multisensory inference, which makes them ideal computational bases to maximize inference efficiency. Although we consider each MST-d neuron an essential encoding component in a decision, our theory still supports the idea that MST-d group encoding is the foundation of multisensory discrimination [20]. Since the FC strategy applied by each MST-d neuron also stems from the group coding of the preceding unisensory areas, it seems that the complexity increases as downstream cortical neurons receive preceding inputs and produce outputs. Although the probabilistic decision substantially matches the behavioral results of similar tasks [35,[48][49][50], the psychophysical results show low resolution, with the threshold between integration and separation discrimination being approximately 47° (  Figure 7(g)), whereas the threshold in behavioral tasks lies at 20°. Other mechanisms, such as top-down modulation or crosstalk between several multisensory areas, might be necessary to further improve the resolution [30].
While many works have focused on the direct correlation between one decision and a specific neural basis, our results indicated that real cognitive performance may not be able to be reduced to a corresponding smaller structure but may instead emerge naturally through a series of probability distributions of bases. We presented in Supplementary Figure S6 that a clear correlation between the causal scenario 14 Research and neurons only makes rigid decision, while the distributed preference as we observed in data produces maximal inference efficiency. In conclusion, the brain functions as an integrated hierarchy, and neural firing in different cortical areas represents different features. In multisensory areas, it is likely that single neuron encodes multiple features that is either reliability-based or modality-based.
The multidimensional encoding may serve as the key to produce flexible intelligence behavior.

Limitations of This Study and Suggestions for Potential
Future Research. The physiological results must be interpreted with caution because of the limited sample size. Furthermore, the model used a simplified structure that excludes real neuronal synaptic dynamics, such as neural firing heterogeneity within and across the cortical area. The diversity of synapses may enrich the dynamics of integration and separation inference. Moreover, the model did not employ a synaptic learning process to transfer the characteristics of the response subnetwork to the receptive field property. Regarding possible future directions, further study is needed concerning receptive field modification to develop a complete attractor computation process. Additionally, further studies can test this synaptic modulation theory in other multisensory areas, such as the VIP, to specify the scale on which synaptic encoding modulation functions as a general mechanism.

Subjects and Surgery.
Two male rhesus monkeys (Macaca mulatta) served as subjects. The general procedures followed in this study have been described previously [51,52]. Each animal was outfitted with a circular molded plastic ring anchored to the skull with titanium T-bolts and dental acrylic. To monitor eye movements, a scleral search coil was implanted in each monkey. The Institutional Animal Care and Use Committee at Washington University approved all animal surgeries and experimental procedures, which were performed following National Institutes of Health guidelines. Animals were trained to fixate on a central target for fluid rewards using operant conditioning.

Vestibular and Visual Stimuli.
A 6-degree-of-freedom motion platform (MOOG 6DOF2000E; Moog, East Aurora, NY) was used to passively translate the animals along one of eight directions in the horizontal plane, spaced 45°apart. A tangent screen was affixed to the front surface of the field coil frame, and visual stimuli were projected onto it by a three-chip digital light projector (Mirage 2000; Christie Digital Systems, Cypress, CA). The screen measured 60 × 60 cm and was mounted 30 cm in front of the monkey, thus sub-tending~90°× 90°. The visual stimuli simulated translational movement along the same eight directions through a three-dimensional cloud of stars. Each "star" was a triangle that measured 0:15 cm × 0:15 cm; the cloud measured 100 cm wide by 100 cm tall by 40 cm deep and had a star density of 0.01 per cm 3 . To provide stereoscopic cues, the cloud was rendered as a red-green anaglyph and viewed through custom red-green goggles. The optic flow field contained naturalistic cues mimicking the translation of the observer in the horizontal plane, including motion parallax, size variations, and binocular disparity.

Electrophysiological Recordings.
We recorded action potentials extracellularly from both hemispheres in each of the two monkeys. For each recording session, a tungsten microelectrode was passed through a transdural guide tube and advanced using a micromanipulator. An amplifier, an eight-pole bandpass filter (400-5000 Hz), and a dual voltage-time window discriminator (BAK Electronics, Mount Airy, MD) were used to isolate action potentials from single neurons. Action potential times and behavioral events were recorded with 1 ms accuracy by a computer. Eye coil signals were processed with a low-pass filter and sampled at 250 Hz. Magnetic resonance imaging (MRI) scans and Caret software analyses along with physiological criteria were used to guide electrode penetration into the MST-d area [1]. Neurons were isolated while a large field of flickering dots was presented. In some experiments, we further advanced the electrode tip into the lower bank of the superior temporal sulcus to verify the presence of neurons with response characteristics typical of the MT [1]. Receptive field locations changed as expected across guide tube locations based on the known topography of the MT [1].

Experimental Protocol.
We measured neural responses to eight heading directions evenly spaced every 45°in the horizontal plane. Neurons were tested under three experimental conditions. (1) In vestibular trials, the monkeys were required to maintain fixation on a central dot on an otherwise blank screen while being translated in one of the eight directions. (2) In visual trials, the monkeys were presented with optic flow simulating self-motion (in the same eight directions), while the platform remained stationary. (3) In bimodal trials, the monkeys experienced both translational motion and optic flow. We paired all eight vestibular headings with all eight visual headings for a total of 64 bimodal stimuli. Eight of these 64 combinations were strictly congruent, meaning that the visual and vestibular cues simulated the same heading. The remaining 56 cases had conflicting cue stimuli. This relative proportion of strictly congruent and conflicting stimuli was adopted purely to characterize the neuronal combination rule and was not intended to be ecologically valid. Each translation followed a Gaussian velocity profile. It had a duration of 2 s, an amplitude of 13 cm, a peak velocity of 30 cm/s, and a peak acceleration of 0:1 × g (981 cm/s 2 ).
These three stimulus conditions were interleaved randomly along with blank trials, which included neither translation nor optic flow. Ideally, five repetitions of each unique stimulus were collected for a total of 405 trials. Experiments with fewer than three repetitions were excluded from the analysis. When isolation remained satisfactory, we ran additional blocks of trials with the coherence of the visual stimulus reduced to 50% and/or 25%. Motion coherence was lowered by randomly relocating a percentage of the dots on every subsequent video frame. For example, we randomly 15 Research selected one-quarter of the dots in every frame at 25% coherence and updated their positions to new positions consistent with the simulated motion, while the other three-quarters of the dots were plotted at new, random locations within the 3D cloud. Each block of trials consisted of both unimodal and bimodal stimuli at the corresponding coherence level. When a cell was tested at multiple coherence levels, both unimodal vestibular tuning and unimodal visual tuning were independently assessed in each block.
Trials were initiated by displaying a 0:2°× 0:2°fixation target on the screen. The monkeys were required to fixate on the target for 200 ms before the stimulus was presented and to maintain fixation within a 3°× 3°window to receive a liquid reward. Trials in which the monkeys broke fixation were aborted and discarded.

Data
Analysis. The neural responses were binned in 25 ms time windows. Mean neural responses were averaged from 5 trials, and the units of measurement were spikes per second. The outliers in the 5 trials were removed, and the mean response was averaged from the remaining 4 trials. Using MATLAB (MathWorks, Natick, MA), we chose the window of 625 ms to 1250 ms to select valid data. We considered a neuron to have discriminative tuning properties to one specific stimulus modality if the maximum response was 5 spikes/s more than the minimum response of the same curve. Tuning curve symmetry was not considered. Neurons that failed to meet this requirement for either the visual or the vestibular unisensory condition were considered unisensory-tuned neurons or poorly tuned neurons and removed from further analysis. Then, we computed the response ratio based on the visual and vestibular unisensory tuning curves of that neuron in the same time window. A threshold of 1.7 was chosen to discriminate between balanced and imbalanced data. We analyzed the tuning curve in the time window from 1000 ms to 1250 ms, which corresponds to the maximum motion speed and maximum neural response (data not shown). The window parameters were first scanned and then selected comprehensively to show the discrimination of the integration probability P int between the balanced and imbalanced neurons and to be physiologically plausible.
The group Δθ distribution was rectified by doubling the probability at 0°and 180°, while the probabilities in other directions remained the same. This procedure was followed because of the experimental protocol in which directions were binned in 45°intervals; a 0°preference encompassed preferences from -22.5°to +22.5°, and a 180°preference encompassed preferences from 167.5°to 202.5°. However, each of the other preference bins was represented twice because both sides were included (for example, Δθ of 45°e ncompassed directions from 22.5°to 67.5°and from -22.5°to -67.5°). To align the widths of the probability bins, the data counted at 0°and 180°were included twice. 4.6. Continuous Attractor Neural Network Modeling. The model is composed of three identical neural networks with hierarchical structures, of which two simulate the unisensory middle temporal region (MT) and parietal-insular vestibular cortex (PIVC), while the third simulates a subnetwork. Each network is specified as a ring attractor with the same structure. We consider a network of L (180) neurons i ∈ ½1, 2, ⋯, L arranged along a ring topology, each representing a 2°direction range (preference) in the external world. Due to the circular structure, neuron i = L is a neighbor of neuron i = 1, and we defined a circular distance dði, jÞ between neurons i and j as dði, jÞ = min ðji − jj, L − ji − jjÞ, which takes a value from 0°~180°.
Without loss of generality, we adopted a normalized rate-based model to describe the neural dynamics, which was sufficient for attractor computation. The activity of neuron k in all layers is described by the following equation.
τ is the time scale, and y k ðtÞ is the neuronal activity. SðIÞ is the sigmoid activation function.
I k,r is the recurrent connection within each ring network.
The connection weights w j,k follow the Mexican hat profile, which is identical across layers.
where w ex and w in are excitatory and inhibitory components (w ex > w in ) and σ ex and σ in characterize the excitatory and inhibitory range (σ ex < σ in ). d j,k denotes the topological distance from the j th neuron to the k th neuron. As unisensory layers, MT and PIVC receive corresponding visual and vestibular external inputs with range σ from proceeding areas, whose intensity decays with the relative distance (d) from the input center (θ max,vis and θ max,ves ).
where w external is the maximal input weight and σ external is the input range. Once the inputs are applied, the neurons in the same layer interact with each other through lateral connections and eventually form the group response in a bump shape on the layer. Meanwhile, the responses of two unisensory areas are sent to the subnetwork by forward synaptic connections, which are topologically aligned such that one neuron receives maximally weighted input from the upstream neuron at the same position. 16 Research where w MT and w PIVC are forward synaptic weights from the MT and PIVC regions to the multisensory subnetwork. The ratio between two weights simulates balanced and imbalanced neurons. σ forward characterizes the forward input range. d denotes the topological distance from the i th MT neuron or j th PIVC neuron to the k th subnetwork neuron, whose activity is y MT i and y PIVC j individually. Finally, each neuron is independently subjected to Gaussian intrinsic noise except for the individual MST-d multisensory neuron downstream.
When inferring the integration or separation from the response of the subnetwork, the threshold is set as 0.15 to discriminate effective population response from an undetectable response or neural avalanche. The cases in which all neurons in the final state have responses lower than 0.15 or higher than 0.15 were excluded. The threshold was fixed and used only when inferring the integration and separation trials. The optimal values of the parameters are listed in Table 1.

Neuronal Response Curve Simulation.
We supplemented direction interpretation with observations from the data by projecting the computation of the CANN network on the MST-d neuron by the leaky integrate-and-fire model. The input for this model is an external stimulus with different directions and the output the neuronal fire rate. For a given MST-d neuron along with its circuit, the input direction is fixed by the receptive field, and the input intensity decreases when the real motion direction is misaligned with the preference. The unisensory preference is fixed by the assigned location of two inputs on the receptive field.
In the unisensory condition, each simulated neuron receives preceding inputs from the aforementioned directions when real input directions θ ∈ ½0°, 360°.
where w external is the external input weight at θ uni max (same as that in the CANN model), whose intensity decreases when the real input direction θ deviates from θ uni max . Thus, the group output Y from either the MT or PIVC is the summation of each neuronal response, where y i,c is the i th neuronal response on layer c. The afferent input current I sti ðtÞ is further specified as α denotes the rescaling factor (limited in this section), and thr denotes the signal detection threshold. The membrane potential (V) of neurons in the`MST-d is derived from the differential equation where V m is the resting potential (-65 mV) and VðtÞ denotes the membrane potential at time t.
In the multisensory condition, the subnetwork is set to update the preference after 30 ms, and both inputs are sent to the MST-d.   Figure 6. We measured the inference efficiency based on crossentropy (Kullback-Leibler divergence) since we measured the distribution of integration probability (though not the probability distribution, which requires ∑p int ðxÞ = 1). Cross-entropy achieves a maximum value if two distributions are identical; however, the computational principle requires the encoding bases to differentiate distributions. Thus, we denote the inference efficiency ε by negative cross-entropy. To balanced and imbalanced neuronal encoding, where Δθ 0 is 0°in congruent neurons and 90°in opposite neurons. The congruence and opposite cases are set as 17 Research mutually exclusive here to match the balanced and imbalanced classification. The congruent probability is the mean probability of congruent neurons (0°~90°) from both balanced and imbalanced groups and the opposite probability from opposite neurons (90°~180°) from both groups. To align with the scale of balanced and imbalanced encoding, we denote ε = 2ε ′ when plotting the congruent-andopposite inference efficiency in Figure 5(f).

Inference
Based on a Fixed-Criterion Strategy.
x vis and x ves are the visual and vestibular location samples from each Gaussian distribution, and jx vis − x ves j corresponds to one sampling of Δθ uni . The fixed-criterion (FC) strategy results in an inference based on the criterion κ, which is fixed and independent of the prior. The inference is binary: if the measured disparity jx vis − x ves j is smaller than κ, the unit infers robust integration (PðC = 1jx vis , x ves Þ = 1); otherwise, it infers separation (PðC = 1jx vis , x ves Þ = 0). When noise is present, the disparity measurement is modified as jx vis − x ves j + ξ, but κ remains fixed. In this case, both integration and separation inferences are possible if the noisy measurement approaches κ. The inference was repeated 100,000 times to represent the integration functions of MST-d neurons. When the integration functions were fitted, only κ was set as a free parameter. 4.10. Inference Based on a Bayesian Strategy. We assumed that the stimulus measurements would follow a Gaussian distribution with different mean positions (x vis~svis ðμ vis , σ vis Þ and x ves~sves ðμ ves , σ ves Þ), where σ vis = σ ves = σ is a free parameter. The Bayesian strategy computes the posterior probability based on the prior [46,53,54].
Aside from the sampling distribution σ, the common cause distribution and the prior probability PðC = 1Þ were also set as free parameters to provide an optimal fit for the biophysical simulation. To align the binary judgments, we adopted an optimal Bayesian reporter that reports integration if PðC = 1j x vis , x ves Þ > 0:5; otherwise, it reports separation [20,21,30]. As with the fixed-criterion strategy, reporting was repeated 100,000 times. Different integration functions and psychophysical functions were fitted by adjusting only the prior PðC = 1Þ.

Psychophysical Decisions by Neural Monte Carlo
Sampling. We adopted a biophysically plausible decision process that weighs the group response sampling of each type of multisensory encoder. The balanced group has been demonstrated to encode separation, and the imbalanced group has been demonstrated to encode integration. Considering both of them as computation bases, the decision neuron reported separation if the balanced group responded more strongly than the imbalanced group and vice versa. Since Δθ mul is characterized by distribution in both groups, the multisensory preference (Δθ mul ) of the MST-d inputs may not be traversed, as the decision neuron does not necessarily receive many inputs under real conditions. Based on this fact, the decision neuron model performed probabilistic sampling according to the observed distribution of Δθ mul in the balanced (D bal ) or imbalanced (D imbal ) group; meanwhile, the input components remain proportional (balanced group: 70/115 = 0:61; imbalanced group: 45/115 = 0:39; balanced : imbalanced ≈ 3 : 2). For example, if the decision neuron received 15 inputs from the MST-d, then 9 of them were from balanced neurons and 6 were from imbalanced neurons. Among the 9 balanced neurons, the distribution of Δθ mul was fairly uniform for each neuron, and there was a fairly good chance that neurons with various disparities were represented. On the other hand, each of the 6 imbalanced neurons was more likely to sample a small Δθ mul . The neuronal response was obtained from a data-averaged multisensory tuning function (f bal and f imbal ; see Supplementary Figure S2), which measures the difference between the preferred multisensory disparity and real input disparity (Δθ ′ ). Finally, the decision neuron computed the summed responses and decided which response was higher. We repeated such decision reports for 100,000 trials in each real input disparity condition.
Data Availability The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.  Figure S1: model simulation of response curves of MST-d neuron. Figure S2: data-derived multisensory tuning functions of balanced and imbalanced groups. Figure S3: stochastic resonance elicited by noise following a uniform distribution. Figure S4: simulated decision with varying category proportions. Figure S5: distribution of congruent and opposite neurons in balanced and imbalanced categories. Figure S6: simulation from uniform to polarized (skewed) computational bases in a decision role. (Supplementary  Materials)