The Science of Harmony: A Psychophysical Basis for Perceptual Tensions and Resolutions in Music

This paper attempts to establish a psychophysical basis for both stationary (tension in chord sonorities) and transitional (resolution in chord progressions) harmony. Harmony studies the phenomenon of combining notes in music to produce a pleasing effect greater than the sum of its parts. Being both aesthetic and mathematical in nature, it has baffled some of the brightest minds in physics and mathematics for centuries. With stationary harmony acoustics, traditional theories explaining consonances and dissonances that have been widely accepted are centred around two schools: rational relationships (commonly credited to Pythagoras) and Helmholtz's beating frequencies. The first is more of an attribution than a psychoacoustic explanation while electrophysiological (amongst other) discrepancies with the second still remain disputed. Transitional harmony, on the other hand, is a more complex problem that has remained largely elusive to acoustic science even today. In order to address both stationary and transitional harmony, we first propose the notion of interharmonic and subharmonic modulations to address the summation of adjacent and distant sinusoids in a chord. Based on this, earlier parts of this paper then bridges the two schools and shows how they stem from a single equation. Later parts of the paper focuses on subharmonic modulations to explain aspects of harmony that interharmonic modulations cannot. Introducing the concept of stationary and transitional subharmonic tensions, we show how it can explain perceptual concepts such as tension in stationary harmony and resolution in transitional harmony, by which we also address the five fundamental questions of psychoacoustic harmony such as why the pleasing effect of harmony is greater than that of the sum of its parts. Finally, strong correlations with traditional music theory and perception statistics affirm our theory with stationary and transitional harmony.


Introduction
Even though it is one of the most important components in music, and possibly the most widely studied [1], the definition of harmony differs vastly across time, genre, and individuals, reflecting how little is understood about it [2,3].
There are three aspects to the complete understanding of our perception of harmony, which we will, for brevity, refer to as what, why, and when. The what of harmony refers to an attribution to a defining quality. Its why goes further to explain the means by which such a quality ascribes to consonance or dissonance (or even sentiment or emotions). Finally, it should be recognized that the same harmony perceived as consonant in one context can be perceived as dissonant in another. This takes the what and why of stationary harmony (sonorities) into the context of transitional harmony (progression). We refer to this as the when of harmony and it has remained largely unaddressed by acoustic science.
1.1. Background. Early works effectively attributed the what of harmony to rational relationships [1,4]. This ascribes a chord's consonance to the ratio amongst its contributing string lengths (and consequently, wave periods and fundamental frequencies), being fractional with integer numerators and denominators. A fascinating number of esteemed mathematicians, physicists, and philosophers have made different contributions in this aspect. The development of the Pythagorean tuning system is commonly credited to Pythagoras in the fourth century BC [3,5,6]. Euclid wrote the earliest surviving record on the tuning of the monochord [7] and documented numerous experiments on rational tuning 2 Research [8]. Aristotle and Plato made various contributions to the development of ancient Grecian (rationally scaled) music that was later integrated into the diatonic system [8,9]. Ptolemy developed the syntonic diatonic system as early as the second century [10]. Euler proposed a grading system of chord aesthetics based on the assertion that the notes have a least common multiple (i.e., that they are rational) [11]. Since string lengths correspond to wavelengths, which correspond to wave period, and since notes used in harmony are taken from the scale, it can be said that the Pythagorean school effectively attributes harmony to temporal features.
It was not until 1877 that Helmholtz pioneered the psychoacoustic approach [3,8,12,13]. Isolating adjacent harmonic sinusoids from different notes using specifically devised acoustic resonators, he was able to record how amplitude modulation that resulted from their summation grew perceptually unpleasant as their modulation frequency increased towards a certain threshold [8], thus attributing dissonance to what he called beating frequencies and addressing the questions what sounds bad and why. Numerous others [14][15][16][17][18][19][20][21][22][23][24][25] conducted further studies in this approach, while others raised several questions with Helmholtz's theory [13,17,26]. For example, Plomp and Levelt [12] and Schellenberg and Trehub [27] have separately shown that consonances and dissonances are still perceived in harmonies with pure tones (tones without harmonics). Itoh [28] and Bidelman [29], amongst others, also showed that electrophysiological responses to pure-tone intervals did not agree with Helmholtz. All in all, the Helmholtz school attributes harmony to frequency features and comprises a large part of what is referred to in this paper as interharmonic modulations.
In 1898, a notable but short-lived [3] attempt at what sounds good and why was seen in Stumpf 's tonal fusion theory [30], which theorized that harmony was the effect the harmonics of its component notes fusing together to sound like a single note with a common fundament [12,13,26,30].
Because of the nonlinear relationship between tonal scale and frequency, scales derived from rational lengths of a string tended to leave certain intervals more rational than others. With this realization, Western music eventually adopted 12-tone equal temperament scale. This equally segments the octave in the log-frequency scale [31] such that each semitone interval is a factor of 2 1/12 , evenly redistributing the dissonances to accommodate to different keys. Despite its late adoption, original development of this scale predates Helmholtz to the 1500s. Vincenzo Galilei (father of Galileo Galilei) made the earliest known estimate of this in the West by approximating 2 1/12 with 18/17 [32], while Zhu was credited for perfecting it in the East by computing it to accurately to the 25 th decimal, both in the 1580s [12]. The earliest recorded estimate of this in the East was by He in the 5th century, whose estimate was already about as accurate as Galilei's [33,34].
In Rameau's Treatise on Harmony [1], which paved the foundations of harmony in modern music theory, notes of basic chords are derived from the division of the length of a common string [35]. However, this remains disjoint with the rest of the treatise, and modern music theory remains more of a compilation of rules and deductions from the pattern clustering of perceptual experiences [36][37][38][39][40][41][42], addressing the questions what sounds good and when without the scientific reasoning of why [37].
More recently, several studies have found high correlations between harmony and periodicity measures of the resultant signal [43,44]. This novel leap advances the Pythagorean school while presenting a persuasive attribute of what sounds good and why.
Several notable studies have also been conducted that relate harmony to nonacoustic attributes such as statistics and geometry. An example is Tymoczko's exploration of how multidimensional geometric patterns correlate strongly with patterns that exist in historic harmony use, addressing what sounds good and when [45][46][47]. Authors in [48] explored properties of musical scales on the Euler lattice, addressing the what of harmony. Numerous others such as [49][50][51] have worked on other mathematical relationships in harmony, addressing its what.
Yet others have looked towards a biological rationale towards our perception of harmony to address what sounds good and why. A recent example is Purves' attribution of the effect of the tonal scale to the familiarity of excited or subdued speech [14,[52][53][54]. Other examples are the works of [43,55,56] in the neuronal mechanism of harmony perception.

Scope.
In this work, we first seek a mathematical resolution across both acoustic schools by a single psychophysical theory. To start off somewhere familiar, we first describe the concept of interharmonic modulations (which adopts and encompasses Helmholtz's beating frequencies), from which we then introduce the concept of subharmonic [57] modulations and show how the two categories of modulations relate. (At some point after which, we also show how a specific case of subharmonic modulations addresses Pythagoras, thus integrating the two schools.) After explaining how perceptual tensions [18,36,58,59] in musical harmony may be identified in subharmonic tension in the stationary context, we continue to explain how perceptual tension resolutions [18,42] in transitional harmony (chord progressions) may be visualized in subharmonic trajectories. By these, we address the what, why, and when of harmony. Numerical results show strong to near-complete correlations with perception and chord-use statistics that are presented towards the end of the paper.
By applying our theory and equations, we will answer the five fundamental questions of psychoacoustic harmony. These are as follows.
(1) The phenomenon that the effect of harmony is greater than the sum of its parts [18,60]: where denotes the harmonious effect of 1 , 2 , and 3 representing notes of the chord and '+' denotes simultaneous presentation or cumulation. (4) We have the following phenomena.
(a) A chord that sounds better than another out of context can sound worse than being in context [42]. Given A chord that sounds better than another in one context can sound worse than being in another context [42]. Given We have the phenomenon that the transition from a low-tension chord to a high-tension one can still bring about the effect of tension release (resolution).

A Universal Theory of Harmony
In this section, a psychophysical basis for harmony is proposed as follows.
The human perception of harmony is composed of auditory events produced by the combination of sinusoids that make up each note in the harmony. These may be classified into interharmonic and subharmonic modulations.
First-order interharmonic modulations are those produced by the interplay amongst adjacent sinusoids across differing notes. These are loosely categorized by the frequency of the resultant amplitude modulation into dissonant beating frequencies [8] and consonant low-frequency modulations, triggering a variety of emotions according to their modulation and carrier frequencies. Second-order interharmonic modulations are produced by the alignment of first-order ones. The consonance types of different intervals may be identified according to patterns cast by interharmonic modulations on the interharmonic plot.
Despite the significance of interharmonic modulations, the effect of consonances and dissonances is still experienced in the absence of harmonics with pure tone harmonies. This implies that interharmonic modulations are not exclusive in our perception of harmony [12,13,17,[26][27][28][29]. From this, it may be deduced that subharmonic modulations also play a significant role.
Subharmonic modulations are produced by the interplay of sinusoids much further apart than interharmonic modulations. Unlike interharmonic modulations, which are analysed primarily in the frequency domain, subharmonic modulations are analysed primarily in the temporal domain and they are comprised of two parts. The first part is subharmonic wave formation, which occurs with the summation of component waveforms from each note to produce a waveform largely periodic to a common subharmonic frequency. The second is subharmonic wave deformation (an example is provided in Supplementary Video S1.), which is a distortion to every successive period of this composite subharmonic waveform due to the imperfect alignment of contributing wave periods. Stationary tension and transitional resolution may both be derived from subharmonic features which serve as measures of stationary and transitional harmony.
In order to explain interharmonic and subharmonic modulations in detail and how they unify the two prevailing schools of harmony, we will start from first principle by looking at the notes of a chord as the sum of their composite sinusoids.

Modulations in Sinusoidal
Summation. When waveforms of two notes, 1 ( ) and 2 ( ), at amplitudes and , respectively, are presented together, the result may be expressed as a sum of their composite sinusoids such that where, respectively, and represent the individual harmonics from each note, and represent the highest harmonics that need to be considered because of audible range, 푛 and 푚 represent the amplitude coefficients of each harmonic, 1 and 2 represent the frequencies of each harmonic with 1 and 2 representing the fundamental frequency of each note, 푛 and 푚 represent the starting phases of each harmonic, and represents monotonically increasing time.
Isolating a single pair of adjacent sinusoids from differing notes we get where ℎ 1 ( ) and ℎ 2 ( ) are the pair of harmonics from differing notes, = 푛 , = 푚 , 1 = 2 1 , and 2 = 2 2 . Since we are considering the modulating frequency resultant of the summation of both sinusoids spanning all phase combinations, it no longer matters which starting phase we take reference from. Hence, 푛 and 푚 can both be set to zero.
In the case of A=B, the resultant amplitude modulation is trivial and, as illustrated in Figure 1 (left), is given by the sum-to-product rule where û /2 is the normalized modulating frequency and is given by  is the normalized carrier frequency given by and the values of A and B are normalized to 1. However, in most cases, ̸ = , and the problem becomes nontrivial, because of the change in modulation frequency as the modulating waveform no longer crosses zero. This can be seen in Figure 1 (right).
We approximate the summation of these sinusoids to be where 푐 is bounded by 1 and 2 and is approximated to be (which denormalizes to ); ‖cos 2−퐴/퐵 (û /2) ‖ denotes the magnitude of cos 2−퐴/퐵 (û /2) signed according to the quadrant of (û /2) . denotes the larger of the amplitudes and and are normalized to = 1. When = , this simplifies to (4), where the modulating frequency is û /2.
However, as increases with respect to , 2 − / gravitates towards 2, and for which the modulating frequency is û . We can see from the plots in Supplementary Figure S1 that this estimation is accurate for values of B marginally larger than A to much larger than A.
For consistency, the effective modulating frequency for the case of = will be considered by the frequency of its rectified modulating waveform which is then, similarly, û . In music, we are interested in this frequency in hertz. Hence, we denormalize this to be In the next two sections, we will move on to see how this is applicable not only to the summation of adjacent harmonics in interharmonic modulations but also to distant sinusoids in subharmonic modulations.

Interharmonic Modulations
Interharmonic modulation refers to modulations across adjacent pairs of sinusoids from different notes that fall within a certain threshold, with modulation frequency corresponding to û in (9). Figure 2 shows a plot of all harmonics of notes c 3 (blue) and e b 3 (red) under 3 kHz. All adjacent sinusoids less than 120 Hz apart are identified in the figure, with their centre, , and modulating, Δ , frequencies labeled accordingly.

Beating Frequencies and Low-Frequency Modulations.
Interharmonic modulations with û that increase towards a certain threshold are known to become increasingly dissonant, and, as coined by Helmholtz, are known as beating frequencies [8]. Interharmonic modulations with small û , on the other hand, contribute to the harmonious effect perceived in consonance [65]. Figure 3 illustrates this.

Perceptual
Responses across the ûf-Feature Space. It is known that different combinations of notes contribute to different emotive valences [66]. This too may be decomposed into a sum of its harmonics. Hence, further to the consonances and dissonances, emotive responses may also be mapped onto the interharmonic plot. Although, as one might imagine, such responses would be different for every individual, we can plot the response for an individual as an example. Figure 4 shows an example of auditory responses triggered in the mind of the (first) author when exposed to frequencies in the horizontal ( ) axis modulated by frequencies in the vertical (ûf ) axis. The value of is indicated in the horizontal axis in both Hz and its corresponding note names. The degree of pleasure derived from interharmonic modulation is coded in the colored background as a reference. The green regions

Low-Frequency Modulation
Beat Frequencies     are perceived to be pleasing, yellow as somewhat pleasing, orange as unpleasant, but not to the point of annoying, red as dissonant, and black as beyond beating range. The black dots mark the locations of the thoughts or emotions labelled. This shows that interharmonic modulations bring about a large variety of thoughts or emotions. If several of these are triggered simultaneously when just one pair of notes sound simultaneously, one can imagine how ten fingers on a piano or all the instruments in an orchestra could combine several (thoughts or emotions) to paint stories on the interharmonic feature-space over time. (Hz) 2k ＠ Perfect 4 ＮＢ Figure 5: Interharmonic plots for all intervals within an octave regarded, in classical music theory, to be perfectly consonant with a root of g 3 . These are, namely, the Perfect 4 th (g 3 and c 4 ), Perfect 5 th (g 3 and d 4 ), and Octave (g 3 and g 4 ) intervals.

Intervals and Second-Order
then populated with dissonance levels from [12]. These colors provide a simple background reference for the dark blue dots that each represent a modulation at their corresponding û and values, which results from the summation of neighboring pairs of sinusoids (at frequencies + û /2 and − û /2) of the notes specified by the indicated interval. Also, for reference, are the two white lines that run across each plot, indicating the locations where the values of û coincide with a semitone (gentler slope) and a tone (steeper slope) of the corresponding values of (where û = (2 1/12 − 1) and û = (2 2/12 − 1) , resp.). The semitone and the tone are regarded as the most dissonant intervals up to halfway in either direction around the cyclic chroma [12,21,54].
The plots of perfect consonances are presented in Figure 5. These intervals are described with a bit of a dilemma in classical music theory [67]. They may be described as so consonant that they sound almost like one note. As such, their use contributes in a limited way to harmony [15]. For example, the use of perfect fifths is forbidden in parallel motion and octaves are regarded as the same note in a different register [42].
The interharmonic plot reveals the perceived traits of each category of intervals in a way that explains why they sound the way they do, and in a way music theory alone has never been able to. As shown in Figure 5, the constellations formed by interharmonic modulations of perfect intervals line up almost horizontally (While the methods used in this study are applicable with any form of tuning, only equitempered tuning is assumed in the computations in this section. This is consistent throughout this paper, unless otherwise stated.). Since each point that falls on the same horizontal has the same û , this means that they modulate synchronously and may be perceived collectively as a single modulation. This may be interpreted as fewer modulating microevents taking place, making them less interesting than other consonance intervals.
Dissonant intervals are presented in Figure 7. As can be seen in the figure, these intervals have points that fall mostly within the central dissonant region and line up along the two dissonant lines. Evenly spaced points along a line that passes through the origin also reveal that their û share a harmonic relationship. This has a similar (although this is somewhat lesser) redundant effect to that of the synchronous modulation described with perfect consonances. Consonances that properly contribute to harmony are called imperfect consonances [67] and are presented in Figure 6. As can be seen in the figure, imperfectly consonant intervals have points better distributed. This may be interpreted as erratic modulations that create a continuous stream of unpredictable events to stimulate aural attention, and thus, interest.
A lot of work has already been done on interharmonics since Helmholtz [12, 19-21, 24, 25]. While the main focus of this work is not interharmonics, one purpose of this section is, nevertheless, to provide sufficient background to complete our theory of how the human experience of stationary harmony is based around modulations of both interharmonic and subharmonic nature. From the interharmonic plots in Figures 5-7, a simple predictor of dissonance may be identified to be where û̂will be our shorthand for û / , (û̂), or (û / ) referring to the number of interharmonic modulations that fall within the central region of dissonance region, iterates through all interharmonic modulations on the plot, is the total number of modulations considered, û 푖 and 푖 refer to the pair of û and that describe the th interharmonic modulation, respectively, and 푙표푤푒푟 and 푢푝푝푒푟 define the lower and upper boundaries of the region on the interharmonic plot, respectively.
In this section, we have seen how interharmonic modulations are significant to our perception of consonance, dissonance, and emotive response in music. When listening to a duet of instruments with no overtones such as a sinewave theremin or a very pure musical saw, we realize that  consonance, dissonance, and emotion remain present even in harmony without harmonics (i.e., across a well-spaced pair of fundamental frequencies alone). This is just one amongst the several different ways [12,13,17,28,68,69] from which we can deduce that interharmonic modulations cannot be the only determinant of our perception of harmony, which thereby leads to our hypothesis on subharmonic modulations.

Subharmonic Modulations
Apart from the modulations that arise from the summation of adjacent harmonic sinusoids across differing notes, we can (as explained above) deduce that another category of modulations is significant to our perception of harmony. We call these subharmonic modulations. There are two levels of subharmonic modulations, which we dub subharmonic wave formation and subharmonic wave deformation. In this section, we will show how these are significant to our perception of not only stationary harmony, but also transitional harmony. Figure 8 shows the waveforms of a C Major chord (C) and a C minor 7 chord (Cm 7 ) composed of the fundamental sinusoids of each composite note. We let each sinusoid start at phase zero since; for purpose of example, we are only interested in wave period. Only the fundament needs to be considered for the same reason. In both cases, the waveform resultant of this summation repeats at a frequency approximately subharmonic to all its composite waveforms. In the figure, its period is marked 푠푢푏 . We call this subharmonic wave formation and say that 푠푢푏 is a common subharmonic to all its composite waveforms.
In the case of the C chord, as shown in the figure, each composite sinusoid crosses zero at nearly the same point around = 푠푢푏 . As marked in the figure, Δ (which is the difference between the first and the last negative-to-positive zero-crossing around the = 푠푢푏 region) is small. However, in the case of the Cm 7 chord, Δ is much larger. One can imagine that each successive period of the resultant waveform looks less and less like the first as it gets more and more deformed. This happens slowly for the C chord because of the small Δ but faster for the Cm 7 because of the large Δ . We call this subharmonic wave deformation. Supplementary Video S1 compares subharmonic wave deformation in a lowtension C chord to that in a high tension Cm7 chord.
Recalling our wave equation from (3), we can rewrite cos 1 + cos 2 , or cos 2 1 + cos 2 2 , as where 푠푢푏 is an approximate common factor of 1 and 2 , 1 and 2 are integer multipliers, and Δ 1 and Δ 2 are small values that balance the equation by making up for the discrepancies that arise with finding a common factor.
In (11), two fundamental frequencies 1 and 2 are described as the multiple of a lower subharmonic frequency that is common to them ( 푠푢푏 ). We call this their common subharmonic.
Since all harmonics are multiples of their fundamental, a subharmonic to any fundamental would inherently be subharmonic to all its harmonics. For this reason, only the fundamental of each note needs to be considered.
Since harmony in music is commonly composed of more than just two notes, we generalize this to describe fundamentals and common subharmonics from any number of notes to get N ∑ 푖=1 푖 cos 2 푖 = 1 cos 2 ( 1 푠푢푏 + Δ 1 ) where is the number of notes in the chord, cycles through each of them, and 푖 is the amplitude coefficient of note .
Beyond this point, it would be easier to visualize subharmonics in the time domain. With the fundamental frequency of note given by the fundamental period of each note is then where 푖 is the fundamental period of the note. Hence, the period of any common subharmonic can be expressed as 푖 푖 . We can then compensate for nonintegral discrepancies in period rather than in frequency. In doing so, we get for all , where 푠푢푏 is the common subharmonic wave period (we will simply say common subharmonic) of the chord. What carries over as 푖 푖 is essentially just the th subharmonic of note which lies in the region of 푠푢푏 . Since this is true for all pairs of 푖 and 푖 across all values of when they are each balanced by appropriate 푖 , may be dropped from the left hand side of the equation. Although the common subharmonic was introduced as the period between primary zero crossings as in Figure 8, we shall, for computational simplicity, redefine it as the mean of 푖 푖 across all notes of the chord. Hence, Figure 9 shows how the period of each subharmonic in the C Major chord from Figure 8 may be plotted. The left column first shows how the period of each subharmonic of c 3 may be plotted in red. The right column then extends this to every remaining note in the chord, with orange, yellow, and blue for the notes e 3 , g 3 , and c 4 , respectively. It may be seen in the right column that a subharmonic period from every note in the chord nearly coincides at around 30 ms. Hence, we say that this is its common subharmonic, 푠푢푏 , as defined in (16).
Having reduced the waveform plot to subharmonic periods in the vertical axis, we can represent time spanned by each subharmonic in the horizontal axis. We will do this for a song stanza in the next section, in a subharmonic plot. Stationary Harmony. Figure 10 shows an example of a subharmonic plot. In the horizontal axis there is time in bars and in the vertical axis there is the subharmonic wave period in milliseconds. Note that the subharmonic axis runs top down to put shorter wave periods at the top because they correspond to higher frequencies. Larger wave periods, which correspond with lower frequencies sit conversely at the bottom. The tails that run horizontally represent the span of time covered by each note. Subharmonics are colored to match their corresponding notes on the music score. For example, in the first bar, all subharmonics of f # 5 are marked out in red, followed by d 5 in orange, a 4 in yellow, d 4 in green, a 3 in blue, and d 3 in purple. The musical score runs in parallel at the bottom of the plot as reference. Once again, all plots and computations in our examples assume equal temperament unless stated otherwise. This example shows the opening stanza of Pachelbel's Cannon in D [70] and focuses on stationary harmony, leaving transitional harmony to a later example.

Subharmonic Modulations in
Subharmonics. For every bar, the dashes that flush with the reference point at 0 ms mark 0 × 0 . Carrying on top down with each bar in accordance to color, we get subharmonics at Notes and Melody Line. Since the topmost dash of each color for every bar below the 0 ms reference represents 1 × 0 , they relate to the fundamental period of each note; of these, the topmost ones of every bar across all colors mark the melody line, (They are red in this particular example.) Hence, it is easy to interpret the melody line in a subharmonic plot. The periods, 푖 , of each note of the melody are marked against the vertical axis in milliseconds as well as their common note names.
Chords and Coincidence. Common subharmonics may be visualized in regions with the (approximate) coincidence of dashes of every color. Again, the common subharmonics ( 푠푢푏 ) of each chord in the stanza are marked out against the vertical axis in both milliseconds and their respective chord names.
Key. Every note of the diatonic shares a common subharmonic. Hence, it is possible to identify the key of a song by its common subharmonic, assuming minimal deviations from its key. The common subharmonic associated with the key of this song is marked out much further down the plot. Dotted lines indicate discontinuity. (This part of the figure is plotted in just intonation to avoid the snowballing of Δ 푖 to better illustrate this.) Stationary Tension. Most of the time, contributing subharmonics from different notes are not precisely coincident. Major chords have better coincidence than minor chords, and triads coincide better than sevenths and extended chords. With subharmonic modulations, perceptual tension arises with the noncoincidence of common subharmonics. Noncoincidence is measured by an overall Δ as reflected in Figures  8 and 10. We call this its (stationary) subharmonic tension. This Δ is given by the difference between the largest and smallest subharmonics in the chord that coincides around From Figure 3 in the section on interharmonic modulation, recall that dissonances increased and decreased with interharmonic modulation frequency while consonances behaved inversely. This happens only within a certain range. When interharmonic modulation frequency shrinks to the brink of zero, it falls below musical significance. Subharmonic tension behaves similarly. Figure 11 describes different types of harmony on the subharmonic tension scale. As can be seen in the figure, our response to subharmonic tension is likewise. Perceived dissonances increase and decrease with subharmonic tension while   perceived consonances behave inversely within common range. Mathematically, where {X} is the harmonious effect of chord X and Δt 푋 is its stationary subharmonic tension (its Δt). However, as described in the figure, modulations from subharmonic tension fall below musical significance; the effect of harmony drops to zero as modulations from subharmonic tension fall below musical significance. Hence, where ût threshold is the said threshold of musical significance, as ût < ût threshold , Thus, perceptual tensions and consonances are experienced in slew-like modulations of the waveform at common subharmonic locations. (This is the effect of periodically changing phase relationships amongst the contributing waveforms, for which Δ is a measure.) While there may be several common subharmonics for every chord within reasonable range, we theorize that our ears identify most with the shortest few. Subharmonic consonances are described by gentler modulations (small Δ ) at the shortest common subharmonic locations (short 푠푢푏 ), while subharmonic dissonances are described by more turbulent ones (associated with absence of small Δ at short 푠푢푏 ). The sensation of a chord can be highly complex, with different tensions and consonances perceived simultaneously, an experience inadequately represented by a single term for dissonance. Attempting to rate every chord by its dissonance level alone can be compared to rating every  variety of chocolate in a candy store by only how sweet or bitter it is. The advantage of û , as opposed to existing correlates of harmony [3,13,43,54], is the way it explains abstract notions of perceptual tensions and consonances by ascribing them to regions across the subharmonic spectrum with a strong sense of attribution or identification. While, for purpose of illustration, Figures 9 and 10 have shown examples where a modal 푠푢푏 (shortest 푠푢푏 with smallest û ) is easiest to identify, we theorize complex chords with ambiguous 푠푢푏 (where it is difficult to attribute the collection of modulations experienced to a single modal); our ears often identify with several common subharmonics simultaneously. In other words indeterminate cases could possibly arise with particularly discordant harmonies without small û at short 푠푢푏 . Thus, for programmatic analysis of a large number of chords, it is, nevertheless, useful to have a single term to represent the overall dissonance of each chord. For this, we useû where a single term,û , represents the overall subharmonic tension, 푠푢푏,푗 and û 푗 refer to individual candidates of 푠푢푏 and û with iterating through each candidate pair, is the preemphasis (while 1/ serves as "post de-emphasis"), and Σ 푛:푚 denotes summing over the smallest values out of a range of values considered. In our work, is always chosen to be half of unless stated otherwise. Note that 푠푢푏,푗 here serves as a weighting factor to weight down higher subharmonics, which, as aforementioned, are less significant. Inverting before (and rectifying after) summation mimics our hearing by allowing smaller values of û 푗 to contribute better towards a smallerû .
We will see how representativeû is of stationary harmony in the next section. But before that, we will first explain subharmonic modulations in transitional harmony.

Subharmonic Modulations in Transitional Harmony.
While stationary harmony studies chord sonorities (how a chord sounds on its own), transitional harmony deals with chord progressions and resolutions (how chords transit from one to another). It is remarkable how a low tension (consonant) chord can transit to a high tension (dissonant) one yet still bring about the perceptual effect of tension release (resolution) [18]. From this it may be deduced that transitional harmony stands largely independent of stationary harmony, even though both are considered when assigning harmony in composition. Even though numerous studies have been conducted on stationary harmony from the psychoacoustic approach, work on transitional harmony remains primarily nonpsychophysical.
Traditional classical music theory uses the term resolution to describe the perception of tension released when a chord is suitably followed by another chord [18]. With subharmonic modulation, we theorize that these abstract perceptions of tensions released may be identified and quantified in the perceived trajectories of subharmonics as one chord progresses to the next. Figure 12 illustrates this. Figure 12 shows the opening line of Beethoven's Moonlight Sonata [71]. Before we begin our analysis, one should note that unlike Pachelbel's Cannon the use of arpeggios (broken chords) means that notes contributing to the harmony may not necessarily start at the same time, but, when the sustain pedal on the piano is applied, they sustain and overlap until the end of each bar. The names of the chords formed by the notes are labelled along the top of the score to aid the reader in this analysis. Another thing to note would be the fact that this piece maintains a strong sense of voice leading   [72], which means that each note from a chord has strong progressive associations with a note from the previous and another from the succeeding chord. The subharmonics of all notes that are associated in this way (i.e., of the same voicing) across the song are coded with the same color to aid the reader in this analysis. For example, all notes in red on the music score represent the bass (lowest) notes throughout the song, and every subharmonic of these notes is portrayed in red.
We theorize that in chord transitions every subharmonic ( 푖 푖 ) that (nearly) coincides around the common subharmonic ( 푠푢푏 ) of a succeeding chord is perceived to transit from the nearest corresponding (i.e., of the same voicing) subharmonics in the preceding chord. These transitions are marked out by the arrows in Figure 12, which are colored according to the notes they are associated with. Arrows are usually convergent (with the exception of, for example, a basic triad progressing onto an extended chord of the same root) because the subharmonics of the succeeding chord always identify with a common subharmonic whereas those of the preceding chord usually do not.
The central hypothesis of transitional subharmonic theory is that perceptual tension resolution, which is so often described in traditional music theory but never physically identified in acoustics, lies in the degree of convergence seen here.
Assuming transition to be abrupt (since notes do not commonly glide from one pitch to another in music) we compute a Δt for the succeeding common subharmonic and a Δt for its preceding corresponding subharmonics and simply measure this degree of convergence as the difference between the two. As such, where û 푠 refers to the û of the succeeding chord and û 푝 refers to the û defined by its nearest preceding subharmonics. This can be normalized by dividing by 푠푢푏 such that where ûû̂denotes normalized ûû and 푠푢푏 refers to that of its succeeding chord. ûû is, thus, a quantification of the tension; Δ is released over the transition at the wave period of the succeeding common subharmonic.
According to our theory, tension resolution is perceived in the release of this tension across each transition. Thus, mathematically, where denotes the perceptual resolving effect of tension release and ûût X 1 㨀→X 2 denotes the ûût across the transition of chord X 1 to chord X 2 .
Since resolution (tension release) [18,42] in harmony progression is perceived in the convergence of ût, what we will refer to as complication (build-up of tension or negative resolution) is seen in its divergence, where ûût < 0 and {X 1 → X 2 } is negative.
Three possibilities arise when looking at 푠푢푏 and û from this perspective, by which we can divide transitional harmony into three classes. As illustrated in Figure 13, these are as follows.
(1) Resolution, also called tension release: this is the most common occurrence and occurs with the convergence of Δt (i.e., û 푝 > û 푠 ) and a positive ûû . The larger the ûû , the larger the perceptual tension release.
(2) Complication, also called tension buildup: this is the least common occurrence and occurs with the divergence of Δ (i.e., û 푝 < û 푠 ) and a negative ûû . Just as negative aesthetics may be used expressively in a painting, it may similarly be used in music [73]. The larger the magnitude ofûû , the larger the perceptual tension buildup. Complications usually only occur when the preceding 푠푢푏 is equal or nearly equal to the succeeding 푠푢푏 . Musically speaking, it usually occurs when a simpler chord is followed by a more complex chord of the same root.
(3) Excursion: Because of the circular nature of the musical chroma, the preceding 푠푢푏 and the succeeding 푠푢푏 may be computed to differ by up to 6 semitones in either direction. When the difference is 1 or 2 semitones, this corresponds to a neighboring note, and the collective (uplifting or detrimental) effect of melodic movement (i.e., melody) across each note of the chord can overpower the effect of harmony. In such cases, our ears are persuaded to identify û 푝 with [ 푖 푖 ] 푚푎푥 − [ 푖 푖 ] 푚푖푛 of the nearest preceding 푠푢푏 . When this happens, [ 푖 푖 ] 푚푎푥 and [ 푖 푖 ] 푚푖푛 move in the same direction; hence, neither convergence nor divergence is perceived. There are 2 such cases as follows.
It is fascinating to note how the perceptual development (build-up and resolution) of tension that is so often described in music [18,42] but never identifiable with an acoustic attribute may here be visualized in the convergence and divergence of common subharmonics. Figure 13 further illustrates how 푖 푖 trajectories reflect the development of tension build-up and release. Additionally, trajectories for excursions are illustrated in the same figure.
Returning to Figure 12, the transitions between each chord are labeled 1 to 7 in the figure and correspond to 1 to 7 as follows.
(1) The song starts off with a C # m chord. Hence, the common subharmonic is observed around a wave period of c # . Our ears adhere especially to the shortest one, which is at c # 2 . Large Δt is attributed to the complex tensions within a minor chord. At the region marked 1, this transits to a C # m/B chord. The tension built up with the divergence of Δt may be visualized in the divergence of the arrows in the figure (of which the dotted ones across the plot are used to indicate the continuation of subharmonics, i.e., 푖 푖 that do not change). Both perceptually in music and acoustically, as defined above, this translates to a further complication to the existing minor tension.
(2) At region 2, there is a convergence to a momentary (half-bar) low-tension A chord. The uplifting effect of a large tension release, ûû̂≫ 0, is counterbalanced by the detrimental effect of a falling melodic sequence (lengthening 푠푢푏 ), adding to the complexity of the song.
(3) At region 3, A transits to a D/F # , which is a Neapolitan chord. The low f # bass extends over 2 octaves below the treble notes, putting a strong 푠푢푏 at a nonroot period of f # 1 and creating an amount of stationary tension that is unusual for a major chord. (In such cases, there is usually another common subharmonic with lower Δ but at a wave period corresponding to a root at a much larger 푠푢푏 .) (4) At region 4, the Neapolitan chord resolves to the Dominant 7 th , marked G #7 in the figure, with a large perceptual resolution that is signature to b II 6 -V 7 transitions in music [42]. This large tension release is visualized as a large convergence in the subharmonic plot as indicated by the arrows.
(5) Musically, the Dominant 7 th typically plays the role of building an anticipation for the upcoming return to the Tonic [42]. Beethoven enhanced this function particularly well with a double suspension with staggered resolutions in regions 5a through 5c. The subharmonic plot gives tangibility to the perceptual details with suspension-resolution long theorized about in music that can now be affirmed with visualization.
(a) At region 5a, the transition from the G #7 progresses to what is labeled C # m. However, this C # m is functionally still a G # with a double suspension of the 3 rd (b # ) to a 4 th (c # ) and the 5 th (d # ) to a 6 th (e), respectively. The perceptual complication that arises with this transition can be visualized in the subharmonic plot as indicated by the divergence of the green and cyan arrows, respectively. The deviation of the suspended notes from the primary triad is visualized as a deviation of their 푖 푖 from 푠푢푏 . (b) At region 5b, the tension resolution with the 6 th being resolved back down to the 5 th can be visualized in the subharmonic plot by its 푖 푖 resolving back to 푠푢푏 as indicated by the convergent cyan arrow. The continuation of the suspended 4 th is visualized in the dotted green arrow. (c) At region 5c, the tension resolution with the 4 th being resolved back down to the 3 rd can be visualized in the subharmonic plot by its 푖 푖 resolving back to 푠푢푏 as indicated by the solid green arrow. In preparation for a major resolution back to the upcoming tonic, Beethoven's touch of genius combines this resolution with a simultaneous complication in the introduction of the 7 th at this point. This is visualized in the deviation of its 푖 푖 away from 푠푢푏 as indicated by the divergent solid yellow arrow.
(6) At region 6, the Dominant 7 th is resolved back to the Tonic with a tension release unique to V 7 -tonic cadences that is so immense that it is has been long established as the de facto cadence for the end of musical passages [42]. This immense perceptual release of tension, too, is identifiable in the subharmonic plot. From the figure, it may be seen that the common subharmonic, 푠푢푏 , of C # m (located at the period of c # 1 this time, because of the g # 2 in purple) lies right in the middle of two common subharmonics of G #7 (located at the periods g # 1 and g # 0 ). This unique subharmonic behavior allows our ears to quite possibly identify with both 푖 푖 for the preceding û making û̂푝 significantly larger than its û̂푠. Its staggering convergence produces an immense sense of tension resolution with this transition.
(7) A final landmark that is interesting to note is at region 7, where the triad in the treble flips from the 1 st inversion to the 2 nd inversion while the chord remains unchanged. Notice that this brings about no change to both 푠푢푏 and û̂while ûû̂= 0. This, again, shows how subharmonic analysis agrees with music theory where, despite the change of notes, harmony remains the same at this point.
In this section, we have seen how, even in the context of transitional harmony, perceptual tensions and resolutions in a song may be visualized in its subharmonic modulation. We will move on to see how well numerical values computed with such modulations verify against listening tests and chord use statistics.

Experiment and Results
For both stationary and transitional harmony, tensions computed from our models show strong correlations with consonance rankings and historical chord use statistics. Table 1 tabulates a summary of the results of our experiment. We will explain each of these results in detail in the following subsections.

Stationary Harmony.
For stationary harmony, we take the overall tension of a chord to be a simple weighted sum of û푓 and û푡 û푓|û푡 = 푖 û푓 + 푠 û푡 (25) where û푓|û푡 is overall tension, û푓 and û푡 are taken to represent the tensions contributed by interharmonic and subharmonic modulations, respectively (normalized by linearly scaling to fit between 0 and 1), and 푖 and 푠 are their weights, or summing coefficients respectively, where 푖 + 푠 = 1 and 0.61 and 0.39 are found to provide a good distribution.
We use a simple estimate of û푓 , taking where 1 (û̂) and 2 (û̂) are a tally of interharmonic modulations (given by (10)). By visual inspection of the For û푡 , we use (û ) 2 , whereû is given by (21) preemphasized with = 2.1 across a range of = 5. (A preemphasis of just over 2 provided the sufficient discrimination without driving data into saturation. A broad range of -values are suitable but we settled on a smaller value of 5 for computational simplicity.) Numerous previous authors have performed notable work for stationary harmony both within and outside the psychophysical context [8, 12, 13, 18, 21-25, 43, 53, 62-64]. For dyads (intervals, or two-note chords) and triads (three-note chords), we the use precollated information in Tables 2-5 from Stolzenburg [43] for comparison. Dyads (intervals) are compared against the results of an average across 7 notable studies collated by Schwartz et al. [54] on a ranking of 12 chords. Stolzenburg adds the unison to Schwartz's list, which he reasonably assumes to be the most consonant, hence, we have appropriately included it as well. Triads are compared to results from an experiment by Johnson-Laird, Kang, and Leong [13] as cited in Stolzenburg [43]. For consistency with Stolzenburg's statistics in the comparison, these were first converted to ordinal rankings before computing the correlation as practised by Stolzenburg [43]. Table 2 lists our correlations for dyads and triads in stationary harmony against known relevant work as taken from Stolzenburg's [43]. A detailed tabulation of all available values for each chord is provided in the appendix.

Transitional Harmony.
For transitional harmony, ûû from (22) is suitable for hand-computation of transitional harmony across individual locations of succeeding common subharmonics, û 푠 , across the soundscape. While this is advantageous for visualizing individual complications and resolutions at multiple locations across the tensional soundscape, it requires manual identification of a modal û 푠 for every transition which can be ambiguous for particularly discordant harmonies. For a consistent programmatic approach with larger datasets, we take the measure of overall ûû of a transition defined bỹ whereûû is representative of overall tension resolved, ûû̂푗, û 푠,푗 , and 푠푢푏,푗 refer to individual candidates of ûû̂, û 푠 , and 푠푢푏 , respectively, is the range of nodes considered, iterates through all relevant common subharmonics of the succeeding chord, û 푠푢푏 denotes the distance between two adjacent 푠푢푏,푗 , Σ 푁 푗=1, ∀û푡 , <(1/2)û푇 denotes summing across all values of 1 < < wherever û 푠,푗 is less than half the distance between the adjacent 푠푢푏,푗 on either side, is the number of nodes summed, and is the preemphasis as explained with (21).
This effectively computes the preemphasized, weighted, and compensated mean ûû across all eligible common subharmonics within a range of for a given transition. 푠푢푏 weights down larger subharmonics which are less significant according to the theory. (It is a reciprocal as opposed to (21) because greater pleasure is associated with larger tension released.) û 푠,푗 compensates for the fact that, apart from tension resolution alone, stationary consonance also affects one's preference for the succeeding chord. û 푠,푗 < (1/2)û 푠푢푏,푗 effectively sets the criterion for a node to be considered a common subharmonic. In our experiments, we set = 9. (A broad range of will work, but we choose a smaller value for computational simplicity. Larger values may be required with larger range or dataset size.) In consideration of divergent transitions in the dataset, we set = 1 (no preemphasis) because divergent transitions have negative ûû which can be distorted by preemphasis.
With transitional harmony, conducting an accurate listening test is less straightforward. Rather than attempting to acquire a small number of fresh unproven opinions, it is reasonable to use statistics from a large number of wellesteemed premade decisions. A simple way to measure how well numerical values of subharmonic transition agree with the music theorists' school is to compare them with statistics of an expert music theorist's chord use. Capturing chorduse statistics from music score is again, however, a laborintensive process requiring domain expertise [46,47,74]. Details such as melody-harmony discrimination, transition onset, and root ambiguity (e.g., Dm 7 /F versus F 6 ) are often not precisely defined in a song. We find the largest relevant data readily available that also meets chord-spelling precision requirements in Tymoczko's Study on the Origins of Harmonic Tonality [45]. In this study, Tymoczko interpreted and recorded the statistics of 11,000 chord transitions from Palestrina's [75] corpus. Palestrina was highly regarded for his style of harmony by Helmholtz himself [76]. He is widely  [43] 0.982 (0.0000) 0.831 (0.0002) Rel. Periodicity Just [43] 0.982 (0.0000) 0.846 (0.0001) Log Periodicity Rational [43] 0.936 (0.0000) 0.813 (0.0004) Rel. Periodicity Rational [43] 0.936 (0.0000) 0.808 (0.0004) Rel. Periodicity Pythagorean [43] 0.817 (0.0003) -Rel. Periodicity Kirnberger III [43] 0.796 (0.0006) -Ω measure [62] 0  [43] ‡ as cited in [43] $ Dyads from [64] and Triads from Hofmann-Engl, 2004, both as cited in [43] ‖ Brefeld, 2005, as cited in [43] ¶ Dyads from [23] and Triads from Hutchinson & Knopoff, 1979, both as cited in [43] # Euler, 1739, as cited in [43] considered amongst music theorists to be the pinnacle of contrapuntal harmony [77]. Table 3 listsûû against frequencies of occurrence for each of the 17 most frequently used chords that follow V as read-off Tymoczko [45]'s chord tendency histogram. C, D, X↑, and X↓ indicate the convergence type of the progression. Just intonation was used as being opposed to equal temperament in this case to be consistent with Palestrina.
Their correlations are listed in Table 4.ûû shows a significantly strong positive correlation of 0.903 with Palestrina's chord tendencies in general. It is close to perfect at 0.996 for resolutions since the programmatic version of the model was designed with resolutions in mind. Complications may be interpreted as the negative release of tension. Even though a large number of contributing ûû̂푗 are negative, only one negativeûû can be seen in the table due to the influence of nonnegative candidates. Nevertheless,ûû shows a strong negative correlation of -0.761 with [45] for complications (agreeing with the fact that this resolution is negative). As earlier explained, with excursions the perception of a succeeding chord is also influenced by the rising or falling of parallel melodies. Unfortunately, descending excursions were insufficiently popular in Palestrina and only V-IV was being tallied. For escalating excursions, however, we have enough statistics to compute a correlation of 0.863. We have also computed the correlation across all other chords separately from complications (because, as explained, they correlate negatively) to be 0.970.

Discussion
Addressing the Fundamental Questions of Psychoacoustic Harmony. At this point, let us address the fundamental questions of psychoacoustic harmony as promised at the start of this paper in the context of subharmonic modulations. We will begin with question 2 and leave the first question for the last.
(2) We discussed the definition and explanation of stationary harmony, i.e., what sounds good and why, or, mathematically, to quantify { 푛 }, where {} denotes the harmonious effect of and 푛 represents chord .  [45].  With large subharmonic tension being perceived as dissonance while small subharmonic modulations are perceived as consonance, the aesthetics of a chord may be visualized in the subharmonic tension acting on its shortest common subharmonics. Mathematically, they are inversely related. As described by (19), { } ∝ 1/û̂.
(3) We have the definition and explanation of transitional harmony, i.e., what sounds good, why, and when, or, mathematically, to quantify { 1 → 2 }, where ' →' denotes transition from one chord to another.
The aesthetics of a chord transition may be visualized in the release of subharmonic tension at the shortest common subharmonics of the succeeding chord. As explained in (22) and indicated by the arrows in Figure 12, this refers to the transition to the shortest common subharmonics of the succeeding chord from the nearest subharmonics of the preceding chord. Thus, resolution (tension release) in a chord transition is perceived in the convergence of û̂(where ûû̂> 0) while what we call complication (build-up of tension or negative resolution) is seen in its divergence (where ûû̂< 0). Mathematically, as described by (24), We have the following phenomena.
(a) A chord that sounds better than another out of context can sound worse than being in context [42]. Given { 2 } > { 3 } this shows that The section on subharmonic modulations differentiates between stationary tension and transitional tension.
(5) phenomenon that the transition from a low-tension chord to a high-tension one can still bring about the effect of tension release (resolution). Given { 1 } < { 2 } this shows that { 1 → 2 } > 0.

Conclusion
In this paper the notion of interharmonic and subharmonic modulations was proposed as a psychophysical basis for both stationary and transitional harmony.
In the domain of stationary harmony (tension in chord sonorities), this work presents subharmonic modulations as an integral complement to interharmonic modulations and shows how perceptual tensions [18,36,58,59] and consonances [17,19,44] may be visualized through which.
In the domain of transitional harmony (resolution in chord progression), it unlocks the means of physically identifying, quantizing, and, thus, verifying perceptual resolutions and complications [18,42] in acoustic features that have until now remained abstract and nontangible.
Computed values correlate strongly with perception and harmony-use statistics for both stationary (tension) and transitional (resolution) harmony.
Finally, this paper presented a psychoacoustic solution to the five fundamental questions of harmony.

Conflicts of Interest
The authors declare no conflicts of financial interest.