Quantifying Information via Shannon Entropy in Spatially Structured Optical Beams

While information is ubiquitously generated, shared, and analyzed in a modern-day life, there is still some controversy around the ways to assess the amount and quality of information inside a noisy optical channel. A number of theoretical approaches based on, e.g., conditional Shannon entropy and Fisher information have been developed, along with some experimental validations. Some of these approaches are limited to a certain alphabet, while others tend to fall short when considering optical beams with a nontrivial structure, such as Hermite-Gauss, Laguerre-Gauss, and other modes with a nontrivial structure. Here, we propose a new definition of the classical Shannon information via the Wigner distribution function, while respecting the Heisenberg inequality. Following this definition, we calculate the amount of information in Gaussian, Hermite-Gaussian, and Laguerre-Gaussian laser modes in juxtaposition and experimentally validate it by reconstruction of the Wigner distribution function from the intensity distribution of structured laser beams. We experimentally demonstrate the technique that allows to infer field structure of the laser beams in singular optics to assess the amount of contained information. Given the generality, this approach of defining information via analyzing the beam complexity is applicable to laser modes of any topology that can be described by well-behaved functions. Classical Shannon information, defined in this way, is detached from a particular alphabet, i.e., communication scheme, and scales with the structural complexity of the system. Such a synergy between the Wigner distribution function encompassing the information in both real and reciprocal space and information being a measure of disorder can contribute into future coherent detection algorithms and remote sensing.


Introduction
An electromagnetic field is a fundamental physical carrier of information. It is capable of reliably transmitting a modulated signal and collecting information about the propagation channel itself. With relevance to this IT-driven age, the two longstanding goals in information processing are (i) achieving higher channel capacity (i.e., throughput) and (ii) (pre)processing of collected information. However, the fundamental challenge of rigorous qualification and quantification of information in EM waves still remains a topic of debate including both physical and even semiphilosophical notions. Information theory stands out from most of other approaches in physics. Being a higher level of abstraction, it focuses on a configuration of a system under consideration in the context of its prehistory, similarly to thermodynamics, without attachment to a particular class of objects under study in its axiomatics. Out of the broad scope of studies where information theory has a potential to contribute, at this point, we exemplary explore its applications to electrical engineering and signal processing.
One way of defining information is associating it with the presence of distinctive features. For instance, human speech can carry up to 2 14 distinctive sounds (due to 14 binary distinctive features, e.g., [1]), only a small subset of which is realized in any particular known language. The amount of information a human can transmit per unit sentence containing a fixed number of words is indirectly correlated with the amount of distinctive features the language can handle (i.e., language capacity). Translating this into optics, it has been understood that a monochromatic planewave photon in free space can carry a rather limited amount of distinctive features (i.e., polarization and wavelength), providing up to one bit of information per photon. Naturally, if one generates photons with up to one bit of information, one is also able to detect only 1-bit information per unit carrier. To use an analogy, consider "a marine biologist casting a fishing net with two inches wide meshes for exploring the life on the ocean, naturally one should not be surprised finding only sea-creature larger than two inches long" [2]. This stumbling block has been shifted with the seminal work by Allen et al. [3] where it was experimentally confirmed that laser beams are capable of carrying a well-defined orbital angular momentum (OAM). An ability of such laser modes to carry theoretically unbounded amount of information per photon, e.g., [4], dramatically expands the EMfield's "language capacity." There are several approaches to assess a signal's information capacity developed in modern information theory, e.g., [5,6]. In many cases, information is defined with respect to a particular alphabet, giving up the generality offered by statistics in the foundation of information theory. Several groups used conditional information approach to quantify the signal capacity [7]. Here, we introduce the concept of expressing information as a measure of structure in a physical system by applying the Shannon information theory to singular optical beams. We discuss how the Wigner distribution function (WDF) can be taken as a corresponding probability distribution function accounting for partial quantumness of a shaped photon source. The important synergy between a comprehensive description of physical systems in their phase space, delivered by the WDF, and the generalized axiomatics of the information theory has the potential to conduce a cumulative approach to highinformation-density telecommunications and adaptive signal processing techniques. We validate this theoretical framework by experimentally showing how increased structural complexity of wavefront-shaped optical beams, such as Hermite-Gauss (HG) modes and optical vortices [8,9], can be analyzed using wavefront sensors.

The WDF and Classical Information in Optics.
The WDF belongs to the generalized Cohen's class of dual-domain distributions. It is simultaneously the most complete analytical description of an optical beam, and an observable that can be experimentally measured. It provides access to the spatial beam profile and its Fourier transform. The WDF in one spatial dimension can be defined as where x is the variable in the coordinate space and k is the corresponding coordinate in reciprocal space. The following integrals have a probabilistic interpretation: where juðkÞj 2 is the momentum distribution, juðxÞj 2 is the intensity distribution, and U tot is the total energy of the incoming signal. For a fully coherent light source, the WDF's Fourier-transformed function ΓðxÞ = uðx + aÞu * ðx − aÞ is known as the mutual intensity used in wavefront sensing for turbulence analysis and adaptive detection techniques. For brevity, a list of useful optical properties of the WDF can be found elsewhere, e.g., [10,11]. Now, let us consider a Gaussian beam, expressed in the following form [8]: where ρ > 0 is the position vector in the beam profile ρ = ffiffiffiffiffiffiffiffiffiffiffiffiffi x 2 + y 2 p , w = wðzÞ is a beam waist, R = RðzÞ is the radius of curvature of the beam's wavefront, and A is the normalization constant. The wave-numberk = fk x ,k y ,k z g is distinctive from the Fourier transform parameter k = fk x , k y , k z g in (1). The corresponding 1D WDF is [12] which is plotted in position-momentum space fx, k x g in Figure 1(a). Here, x = ffiffi ffi 2 p x/w and κ x is the redefined momentum: This Wigner distribution is properly normalized, delivering the beam intensity distribution when integrated over the entire momentum space.
Classical Shannon information, see [13], represents the amount of structure in the corresponding system. Its analog for an optical mode, characterised by its WDF, is where the WDF is a scalar function of position r and momentum k vectors. Similar definitions of information applied to characterizing optical fields, while having been suggested in the field of optics earlier, e.g., [12,14], have not been explicitly applied to topological optical beams in the way introduced here, to the best of the authors' knowledge. The main problem with this definition, Equation (6), is that the WDF is not a positive-semidefinite function and, hence, does not represent 2 Research a proper distribution. Negativity of the WDF can be interpreted as a marker of a phase-space interference and nonclassicality [15,16]. However, in quantum description, interference is associated with a local violation of Heisenberg inequality, e.g., [17]. For these reasons, we seek a definition that would comply with the quantum nature of light.
Here, we propose to replace the regular product in the definition of information (6) with the Groenewold associative product [18].
and hereby call it Shannon-Groenewold information: where Wðr, kÞ is the four-dimensional (4D) WDF (see [19]). A manifestation of this definition is the Weyl quantization of a classical observable in phase space [20]. It respects the Heisenberg uncertainty principle and is normalized to the phase space volume, which is one of the main advantages of this definition of information over Shannon information Equation (6), defined via the WDF. As an entropic observable, it measures the structure present in the optical field analogous to [21], as opposed to the earlier attempts to utilize the conditional information [7], that is, by design, less fundamental due to being alphabet specific. In the context of Information Theory, it may be interpreted as a measure of the amount of classical information that can be extracted from the laser mode if all the quantum uncertainty is removed by an appropriate experiment.
Using the Shannon-Groenewold information (8), we obtain the following expression for the information in the Gaussian beam:S where U tot = A ffiffiffiffiffiffiffi π/2 p is the total EM energy of the beam per spatial degree of freedom. We assume that the radius of curvature of the beam wavefront is always larger than the physical dimensions of the beam spot size: R ≫ fx, yg. It is important to note that for the Gaussian mode, both definitions (6) and (8) produce the same result. This is what one would expect since the WDF of pure Gaussian sources is positive-semidefinite and can be interpreted as a classical proper distribution function [22]. The fact that information is energy-dependent comes as no surprise if one remembers the important synergy between information and entropy in classical Shannon theory of information (see [13]). Even though entropy in information theory and thermodynamic entropy are not exactly identical, one would expect similar behavior when it comes to the fundamental laws of physics  (5), and x-coordinate in the beam's transverse plane. One can see that in the case of the Gaussian mode, the WDF is positive, while for the first-order HG mode, there is a negative contribution in the near-zero region of the phase space.
3 Research [23,24]. In this context, total energy of a laser mode is in equivalence with total internal energy of a thermodynamic system.

Higher-Order Modes and Shannon-Groenewold
Information. The HG mode is a higher-order solution of the Gaussian family of beam-like solutions of the paraxial equation. It carries a nontrivial beam geometry with singularities, stable on propagation [25] due to being topologically protected. Hence, it is only logical to expect that the amount of information for this family of modes is higher than for the zero-order Gauss modesS G ≲S HG [26]. Let us investigate this statement next.
The general solution of the 1D paraxial wave equation in cylindrical coordinates is given by The normalization is consistent with the one in Equation (3). The WDF for this mode is known, e.g., [27]. In our case, to keep the normalization consistent, we obtain As expected, for m = 0, the WDF of HG mode is exactly equal to the Gaussian WDF (Equation (4)) and so is the corresponding information in Equation (9). Then, the first two higher-order WDFs have "elegant" analytical expressions: The WDF W ðHGÞ 1 ðx, k x Þ is plotted in Figure 1(b). One can see explicitly the negative contributions, in the 2nd-order mode in opposition to the positive-definite 0th-order Gauss beam. In a similar manner, one can work out higher-order modes using the integral form in Equation (12).
Next, we explore these WDF expressions further. We find that these WDFs take negative values in the central region of the phase space ( Figure 1). The fundamental statement of classical information theory that "the more we know about a system's parameter space, the less is its uncertainty" is inevitably broken in the quantum context when considering correlated (conjugate) variables, such as position x and momentum p. If such "quantumness" is present in the corresponding PDF, it pushes the distribution into the negative domain [16] (Figure 1(b)). By respecting Weyl-Wigner quantization in the definition of information (8), we work around the WDF not being a well-defined PDF from the statistics point of view. Groenewold's product, introduced instead of conventional multiplication, enforces the negative regions to be integrated out, in compliance with the Heisen-berg uncertainty relation. These lead to real and positive-valued, monotonously increasing with energy information: As HG modes form a complete orthonormal set, they can be used as an expansion basis [28]. Hence, this approach can be straightforwardly applied to OAM modes, such as Laguerre-Gauss (LG), e.g., [8]. Let us express the LG mode as follows: where ℓ is the vorticity of the twisted mode and ϕ is the azimuthal angle in cylindrical coordinates, where the remaining parameters follow the definitions of Gauss (3) and HG (11) modes. Supplying the results from (12), we can express LG modes in terms of HG modes with a straightforward calculation; for instance, for the WDF of the 2D LG mode with p = 0 and ℓ = 1, where p and ℓ are correspondingly the order and degree numbers of the generalized Laguerre polynomial L ℓ p ð·Þ; ρ = ffiffiffiffiffiffiffiffiffiffiffiffiffi x 2 + y 2 p . The corresponding entropy can be calculated in a similar manner: The classical assessment of the amount of order in the optical mode is clearly increasing with the increasing complexity of the beam profile (see Figure 2), as one would expect from general considerations. It is important to recall that in the context of physical meaning of Shannon unconditional information, the WDF is normalised to U tot -total energy. Consequently, constant A is bounded from above by ffiffiffiffiffiffiffi 2/π p . The similarity sign in the expressions for information (structural complexity) of the considered above modes (9), (15), (16), and (19) is due to the presence of terms containing an odd logarithmic integral of the form: where I 0 is a modified Bessel function of the first kind. Numerical estimations show a tendency for these terms to go to zero; however, mathematically rigorous study of their absolute convergence has not been performed.

Research
With this theoretical framework, we performed an experiment to explore the possibility of obtaining the amount of information by measuring the structure in optical laser beams of various topology. Provided the wavefront reading was obtained from an off-the-shelf Shack-Hartmann sensor, we assessed the amount of information in several fundamental laser modes (see Figure 3).

Materials and Methods
Among the tools of adaptive optics, Shack-Hartman sensors (SHS) [29] occupy a unique place as a fast, affordable, and compact off-the-shelf tool for simultaneous intensity and angular distribution measurements. Advanced techniques for SHS state tomography [30] and WDF reconstruction [31] One-dimensional case LG1 (experimental) LG1 (theoretical) Figure 2: The Shannon-Groenewold information S WG as a function of total energy U tot (8) for one-dimensional (a) HG 0 /Gauss mode (black), HG 1 (dashed-yellow), and HG 2 (dotted-green), and two-dimensional (b) theoretically predicted (black) and experimental (dashed-blue) Gauss mode, experimentally measured (long-dashed-green) and theoretically predicted (dot-dashed-red) LG 1 with the corresponding error bands resulting from the errors on fit parameters shown in light-red; see Section 3 for details. One notices the overall tendency for the amount of information to increase with the growth of the overall complexity of the corresponding optical signal. Also, the drop in the amount of information inferred from the experiment as compared to the theoretical curve is attributed to the SLM's beam conversion efficiency and expected information loss during the propagation in free space.

Research
have been suggested alongside with conventional aberration correction techniques. Measurements of the wavefront distortions in EM beams with nontrivial topology are also of interest for both communication and sensing purposes, e.g., [32].
For the aim of this work, we use the SHS to reconstruct the WDF for the purpose of discriminating between the modes, assessing the beam quality and ultimately the amount of information in a beam. While a SHS is utilized generally to facilitate beam alignment and assess distortions within an optical channel by calculating the higher-order Zernike moments, here, we omit the SHS's wavefront data output and only consider the raw data of the intensity distribution in the superpixel array of the SHS's camera ( Figure 4). We compare the results to the modelled intensity distribution based on the theoretical WDF calculation, whose approximation can be modelled as [31] The smooth WDF is defined as follows [33]: where the coordinate shift is defined as The functions W b and W a are the WDF of the incoming signal, e.g., (4), (13), (14) or (18), and the transmission function of a single lens aperture correspondingly. The parameters f (focal length) and w(width of a single lens in a lenslet array) are the parameters specific to the detector and define the angles in the local wavefront of the field.
Using this model, we first simulated the synthetic data sets for Gauss, HG, and LG modes. In the model, we considered the WDF W b in the following general form: where PðαÞ is the polynomial of α. We started by testing two cases, namely, a Gauss mode (4) and a LG of order 1 (18). The polynomial fit in Equation (24) was taken to be with a and b being the fitting parameters ( Figure 3). One can see that when a = const and b = 0, the fit corresponds to a Gaussian profile (4), and when a/b = 2, the model includes LG 1 -like distributions (18).
To assess the quality of the model, besides estimating the χ 2 per each fit, we run a simulation with 1000 fits to synthetic data with Equation (21), obtaining a histogram of the deviation between the supplied fitting parameters and those from the best fit ( Figure 5(b)). The resulting data show that the parameters are centered near the "true" values, showing the satisfactory quality of the fit.
The inferred fundamental amount of information carried in an experimentally measured photon beam appears to be lower than the theoretically predicted for the ideal LG 1 mode (see Figure 2). The resulting entropy as a function of beam energy is extremely sensitive to the fit parameters a and b (25).
Since we did not consider the wavefront readings of the SHS, this model does not harvest the information stored in the reciprocal domain at this point, but rather infers it from the intensity distribution. This capability is to be explored and utilized in our future research. However, even at this conceptual level, the fit already can discriminate between the two modes. Hence, it provides an experimental estimate for the WDF (Figure 5(a)) supplied, however a "good" guess about the possible wavefront shape of the incoming signal. Due to intensity-only detection, the central region of the LG beam, generated in the experiment, is left out. Hence, the setup is inherently classical, in compliance with the definition of information, used here.
The quality of the fit, alongside with the accuracy of the numerical integration algorithm, also depends on the 6 Research detector modelling scheme, (21) and (22), which in this case has been fairly generic. One of the crucial assumptions that has been made is the plane-wave approximation. This approximation, generally speaking, is too brave for the case of topological beams. Another unaccounted source of discrepancies is the optical cross-talk due to SHS's architecture that has been extensively discussed before on the level of mathematical modelling [31]. Hence, this model, thought already fruitful, has a great potential for improvement. These results were used to assess the information stored in a physical channel and to compare them to theoretical curves ( Figure 2). As for applications, the full 2D model can also provide information about the medium the beam interacted with that can be useful in remote sensing. Based on the results of Section 2.2, in the course of future research, we expect topological beams to outperform the modes with planar phase structure for two main reasons: (1) greater library of nontrivial signatures in the original beam profile; (2) reported robustness and self-healing properties of vortex modes.

Discussion
Interestingly, while information technologies have seen outstanding progress over the last century, which lead to the digital revolution and created flourishing businesses, the field of information theory has remained in a shade. We believe that a universal technique to assess both the quality and quantity of information in a received signal, if provided, could become a conceptually novel tool to physicists and engineers alike. The approach described in this work is by far not the first attempt, neither is it the most general. However, in this approach, classical information does not require an early choice of a communication scheme (i.e., alphabet). It is rather based on a fundamental assessment of an optical system's capability to carry information, based on its overall complexity. The WDF is uniquely used here as a probability density function for Shannon information in optics. The constraints of probability theory on the definition of information and of quantum mechanics on conjugate observables are satisfied working around the properties of the WDF-a pseudo-probability distribution. We foresee the relevance of this formalism in the context of recent developments for (i) free-space information-processing optics [34]; (ii) integrated photonics-based information processing [35] such as neural network-based accelerators [36] and photonic tensor cores [37]; (iii) adaptive sensing [38]; and (iv) analog optical and photonic processors [39][40][41]. As the data compression coefficient is naturally bounded by Shannon information, carried by the beam [42], this work indirectly points towards higher information capacity in beams with a nontrivial structure, like HG, LG, and Bessel-Gauss modes [43,44].
Due to the WDF's relation to the EM-field correlation function, we foresee our approach to be extremely useful in adaptive optics. The reconstruction algorithm, when fully developed, has the potential to characterize the effects of decoherence in turbulent media, the 2D ambiguity function, and time-resolved frequency distribution, alongside with commonly available corrections for aberration, astigmatism, peak valley, and rms deformation provided by the SHS measurements. The WDF formalism uniquely gives access to such characteristics as mutual intensity of stochastic wave fields, which is of high importance when describing partially coherent sources.
In perspective, as the demand on high-speed data transfer and streaming grows exponentially, ADSL and fiber-tohome technologies are less and less likely to satisfy even an average consumer's data hunger, not to mention business and government agency calls. These, together with the recent advents in optical processing [34], micro-and nanofabrication [45], and OAM communications [46] put forward the mid-20th century's excitement around free-space communications in a new light. The new-generation free-space links will require coherent detection techniques to realise their potential to the fullest. For instance, the idea of using classical information as a quantifier for the amount of information processing that can be done on a given metasurface has already been implemented [47]. This approach elegantly gauges a specific pattern on the surface of a metastructure (lens) with the corresponding distribution of electromagnetic radiation in the far field [48]. With this progress, we believe that our approach may result in a better understanding of which types of measurements and device architectures are needed to efficiently mine information from a free-space link.

Data Availability
The data used in this research have been acquired manually in the OPEN Lab facility, the George Washington University, and can be shared by request. x, a.u.
LG 1 WDF eory Experiment  Figure 5: Comparing the experiment and the theoretical prediction for the WDF of the LG mode of order ℓ = 1: (a) the WDF of the ideal LG 1 mode (solid black) and of the measured in an experiment laser beam (dashed yellow) with the corresponding error band; the frequency histogram of the deviation of the fitting parameters a (yellow) and b (blue) in the inset at the rightbottom, Equation (25), supplied to the model (21), and resulted from the fitting procedure: 1000 synthetic data sets have been generated and fitted with the model, mentioned above; for each run, the deviation between the supplied and fitted parameters has been calculated.