Sound-source identity preservation in a stuffed-to-surfeit composition’s mix

As my first blog aptly demonstrated it is extremely easy to get caught up in the sheer complexity of trying to come to grips with this new technological ecosystem we computer musicians find ourselves in.

One point I’d like to make clear up front is that I realize that the preponderance of my fellow experimental composer’s method is to take a small collection of tools as their palette in painting what P.K. calls “holographic” audio. I think this is a sensible approach, and a valid one, and one that dovetails nicely with their aesthetics.

I, myself, taking as a jumping off point something few have considered attempting, have a different set of problems to overcome. I firmly believe that “more is more” when it comes to having an embarrassment of riches, an overabundance of options.

To put my aesthetic into a graspable form, to make it more explicit - since audio itself does not tell you what its creator is attempting - and since words are fairly impotent at pointing directly with our visually-oriented language, let me toss out a few visual metaphors:

* the self-transforming machine entelechies of Terrence McKenna’s travelogues of the DMT dimension
* the sensory overload of psilocybin-induced technologically-themed “download to consciousness” visions in the throes of a “heroic dose”
* the embarrassment of riches of godlings, avatars and multiplicity of aspects of Brahman Atman in a tapestry of the Vedic cosmology
* the sensorium-blast of popular movies’ conception of Virtual Reality (all of which seem to come from the same wellspring despite VR not actually existing yet) “Lawnmower Man” “Johnny Mnemonic” (which of course is based on Gibson’s story) “The Cell” “The Matrix”
* the blooming, buzzing confusion of Blake’s child
* the traceries of a Cordoban book cover or the Alhambra’s purulent, twining calligraphy

Not that I have these things in mind when composing. I simply recognize a common archetypal root. To impose the rococo upon the baroque, to gild the lily and then add a fiber optic light show to that. Sharks with friggin’ laser beams on their heads.

These choices of “depth of detail” in non-repeating variations are not arbitrary. Once one is committed to one sound-source which is peculiar to the digital techniques that we have access to now, and to which there is often no habituated or genre-based normative usage, it behooves one to carefully balance the other sounds “in the mix.” For let’s not kid ourselves: no one is going to have ears forgiving of these sounds the way they would of pre-conditioned auditory events like a cassette recording

More than anything it leads me to making bizarre proof-of-concept compositions. Which is not to say that I don’t truly enjoy my KVR-type music. But I have a lot more on my “to-do list” and a 400 song backlog from the past plus new ideas all the time. I can’t seem to “commit” to a genre; at least with experimental digital composition it’s all so new that it somehow all fits together as a group. It’s not like if I made a calypso song and a 7/8 locrian blues song and then had to ponder whether they’d fit into the same performance or album.

Also, it’s very freeing to make such “avant” songs as 1) I am making them to listen to myself and 2) there’s no pre-conditioning in the listener. Point 2 has it’s good and bad points. The good is that you don’t have to deal with possible expectations in a genre-form that you disagree with. I’m not a purist by any means but it seems that our contemporary culture’s way of achieving “natural selection” amongst variations and elaborations on a theme rewards the least common denominator far more often than not. Well, one need only look at hip-hop’s current state, or electronica’s manifestations that actually cross popular consciousness etc. etc.

The drawback of not having a pre-conditioned tradition in the ear of the listener is, of course, that without a grounding in familiarity most people aren’t going to bother allowing the song to play in their presence in the first place let alone give it attention enough to decide whether they are intrigued by what they hear.

Which, of course, why so many types of music find a loving home mostly amongst connoisseurs. If you know enough about the style of music, or the instrument, or whatever, you’re much more likely to be drawn in by the shock factor of an artist showing off what’s possible.

The reason that I use “naive” melodies and rhythms in this kind of music is so as not to distract the listener from the timbral morphing or the depth of field trickery and movement or whatever happens to be the non-standard compositional raison d’être of the song. I’ve spent a ludicrous amount of time testing - sadly, only my own - how human perception ends up focusing its attention during all the goings-on in my music, through various types of transformations. I recall dropping a track with the heading “foreground/background swivel” or something like that… the idea being that different elements, as they underwent a wide variety of transformations (pitch; including Doppler or filtering, loudness; time dilation - speeding and slowing; 3D movement whether it be near/far or in one case falling into “orbit” or various speeds around the listener; &c.), would capture the attention of the listener at different times and in different ways.

Of course, overall the complexity of what kind of change would draw one’s attention was added to the fact that a large variety of different sounds were all in the same stereo field competing to be the one that “stood out.” As we all know (even if we’ve never consciously put it into words as such) that loudness matters the most, followed by anything that’s frequency range falls into the “sweet spot” that the length of our ear canal resonates at - the frequency of human voices, and then it gets really complicated really fast. Startling introductions of new sounds (or re-introduced intermittent sounds) work well because we’re wired to react quickly to being ambushed by something that’s likely to eat us.

After that… well, I’m not sure how to put anything else I discovered into words. But the attempt was a success. Although I’m the one who created this particular soundscape I find it fascinating to listen and observe how my attention flickers amongst the happenings. Sometimes the density of material crescendos and suddenly several foreground sounds fall away and you’re left involuntarily attending to the tapestry of background noises that’d been “edited out” by your ear while foreground sounds had demanded one’s focus. It’s almost as though it were crafted to train the mind catch itself in the act of paying attention.

Anyone reading this is likely to be well advanced into the stage where it’s actually hard to turn of their critical listening faculties. That’s the curse of we who spend a lot of time mixing audio. At least we’re mixing our own; I’ve read about many studio engineers that can’t listen to music recreationally after decades of service to other people’s muses. But what of a normal listener? For the most part when people mix music a lot of effort goes into hiding the process of how one’s attention is directed here and there.

For the “more is more” composing approach one thing that immediately became an issue is that the we still only use two speakers and all of the sounds have to deploy in space from them. When I first started out making computer music I knew nothing about mixing, had no decent equipment and, worst of all, no internet access whereby I could conceivably learn such things. The FL manual was no help. It’s fairly well organized but it’s job is not to explain the jargon of recording, DSP and computer audio in general. So my first forays into mercilessly packing the stereo field had, shall we say, a generous amount of amateur mistakes. But unlike in previous eras my amateurish mistakes were backed up by digital audio editing and a lot of effects processing power. Luckily for my mixes I had but a 1Ghz Pentium III processor so simply changing the volume on my 10-minute-long magnum opus could take 20 minutes of watching a blue progress bar slowly inching forward before I even got to hear if the new volume setting was correct.

About the only useful trick I learned from my first batch of freeware was abusing the hell out of multiband compression. If some of my cherished sounds were “hidden” in the mix simply smashing them with compression until they’re all equally audible (and equally lacking in dynamics) was the only solution I had available. I’m hardly unique in that. But I knew there had to be a better way.

As it turns out the best tricks I’ve found haven’t been used to the best of my knowledge, at least not as a conscious way around the limitations of our two-speaker stereo systems. I’m sure I’ve mentioned somewhere around here my use of 3D binaural panning and how, perceptually, it makes our incredibly amazing hearing system do the work for me. E.g. if you have a kick drum and a bassline if you pan the bass “back” away from the listener (and narrow the stereo field in doing so) and “downwards” the filtering that comprises the binaural effect to give your ears “directional cues” actually makes those two sounds, which share the same frequency range, more distinct. You can tell them apart easily when they’re positioned in three dimensions away from one another. This example is rather remarkable as binaural filtering takes place far above the bass frequency range, so you’re really getting these “directional cues” from the treble-range harmonics and attack transients and the like.

This is useful as far as it goes I’m sure I’ll get into it more elsewhere, especially given that sidechain compression, stereo width, traditional panning and the like can also contribute to differentiating two sound sources that share the same frequency range.

But there’s another overriding factor in the “more is more” approach. When that transformational filigree involves change over time, or morphing, or what have you as the vast majority of my favorite effect processing does it’s all too easy for the listener, especially one not already steeped in the esoteric DSP arts as we are, to “lose track” of the supposedly distinct threads of auditory events…

What do I mean by this? Well, take a more “minimal” approach to timbral morphing, say, a filtered bassline in an electronica song. Although the sound changes over time the listener is likely to realize that it comprises “one intentional sound source” because, although the filter cutoff may change suddenly the resulting sound is still obviously a bass and it’s probably pretty repetitive, aiding the process of maintaining it’s identity over time due to the melodic content (which itself may be rather static).

Or let’s take the famous “quacking banjo” morph. If it starts out as a ducks quack at point A and morphs to a banjo at point B it has to occur slowly enough that there’s in-between states interpolated between the two sounds that the listener even realizes the fun audio trick being played. In more minimal approaches to composition you’ll tend to hear quite slow transformations… sometimes even too slow such that the listener isn’t even paying attention closely enough to realize that the interpolated states are related to the intended beginning and end points!

So when packing a mix full of a number of wildly transforming sounds a lot more thought has to go into maintaining the integrity of the “identity” of the streams of sounds if that’s required by the composition. Obviously, there’s plenty of examples of both disregarding the “identity” of a piece’s used noises and just as many where only the creator knows of the original sound source, such as using pictures as oscillators or most granular synthesis. Those are of interest to me, too, but simply not germane to what I’m trying to explain. It’s an issue of the purpose of one’s use of transformative processing. But not such that those hearing your sounds can identify the sound per se but that if a changing sound source is to be regarded as one event-over-time despite significant alterations to it’s time domain, timbre, or what have you, then one must consider how it comes across to the beholder in the context of the mix which may also have many similarly drastically changing “sound sources.”

To illustrate that I’ll give you an examples that utilize no processing. Consider a clarinet. Most anyone who hears your recording of a clarinet is likely to recognize it (from it’s envelope and it’s timbre). Even if they don’t specifically recognize it as a clarinet (don’t laugh, I’ve seen many, many people unable to do so, guessing as far afield as a “flute” and “some horn” or the like). Not everyone that reads this may be able to differentiate a clarinet from an oboe, for that matter. But in most cases a listener will recognize the sound for what it is. But what of a shrill squeak from a beginner? Well, it’s not all that obvious that a clarinet is causing that sound unless 1) there’s nothing else happening in the “audio timeline” to have caused the sound or 2) the listener has already encountered someone trying to play a woodwind for the first time. How about a purposefully-played overtone/harmonic? Well, the same is true as above except that, in the hands of a musician capable of executing this extended technique it’s “identity over time” has a couple factors in it’s favor. It’s even more likely, due to rhythmic continuity, to be parsed as a purposefully-played part of what’s preceded in the song and their “breath technique” is likely to make the horn’s envelope even more instantly recognizable than a beginner’s mistake.

To take the example further, consider the extended technique of multiphonics. Depending on what “broken chord” is being played the sound ranges from something fairly obviously harmonized-woodwind-esque (albeit atonally) to something whose very “density” would seem to divest if of woodwind-ness as clarinets’ normal sound is always monophonic and rather non-dense, not varying that much from a sine wave like the human voice or a flute. Is the average listener who has possibly never even heard a woodwind player executing extended technique playing likely to have any idea of what they’re hearing? Well, depending upon the context it might not matter. I’m not dealing strictly with a sound being “identifiable” as a specific, known instrument or sonic event but with it’s being regarded as being caused intentionally from the same sound-generating source regardless of what that is or whether one can give a name to it.

In this example I’m pretty sure that any listener would eventually figure it out. Even if the sound is startlingly different the player would likely repeat a similar technique more than once, or the mind would pick up on the mouth causing tremolo or vibrato in a recognizably woodwind-like way, or there simply wouldn’t be any other sonic material in the recording to confuse the issue. I can think of counterexamples I’ve heard, but I think I’m homing in on a description that conveys what I’m sketching out here.

When I look at my sound palette I see that I use both sounds that we haven’t got a name for and processes over time that can potentially obscure what is in reality a single source of sound (be it an instrument, a field recording, a voice, or whatever) and thus, if it’s compositionally important, ruin the whole point of morphing.

Consider some of the specifics: For the last several years I’ve used almost exclusively synthesis to generate sound, only rarely samples and those mostly for drums that’re processed beyond recognition anyways (though real drums are laudably capable of retaining their distinct attack and other envelope characteristics better than synth drums for my purposes). Synthesizers already are likely to create an “unnamed” sound in a recording. But I also love to change them in myriad ways over time with parameter automation. This immediately brings up the possibility of changing the sound significantly enough over time that it’s not grokked as “the same instrument.” Obviously this barely matters if the melodic content draws the listener’s attention through the timeline of change anyways. For synthesizers we typically have to take into account change to waveform type, waveform mix (in the case of multiple oscillators), waveform modulation (frequency modulation, ring modulation, phase distortion, even pulse width modulation or even wavecycle offset of supersaw effects), timbre (in the case of additive and some other types of synthesis), filter type, frequency and resonance, and most especially the ADSR envelope. Changing the envelope of anything obscures it’s “identity” much more drastically than any other parameter I can think of, especially the attack. This is just a short and simple list - there’s a great deal of variation amongst synthesizers that’d have to be considered on a case-by-case basis. For instance, I’m quite fond of automating the modulation in a modulation matrix which dramatically increases the complexity of these consideration.

As a special case, because I use them so often, I’m going to mention physical model synthesizers. Given that one has access to the waveguide parameters that create a distinct “model” one can quite easily transform a PM sound source beyond recognition with tiny little parameter changes. I’ve played with the “indentifiableness” of PM tuned plates, like metallophones, in any number of ways: subtly changing over time, allowing them to mutate beyond recognition suddenly and return, and even setting their parameters so it’s never obvious that they’re tuned plates in the first place. I love the fact that PM sound have a visceral “there-ness” to them regardless of whether they’re emulating something that could actually exist or not.

Then there’s processing those possibly-already transforming synthesizer sounds with automated or modulated DSP. The number of effects boggles the mind: delay (which, when automated, creates often lovely “interpolation” effects beyond simple echo, resulting in a sound like the speeding up or slowing down of a tape recording which I shall hereafter refer to as “dub echo scrubbing”), flanger/chorus/phaser (which are all modulated micro-delays or one form or another but, given the right kind of modulation can achieve a wide variety of effects many of which can easily obscure the affected sounds “identity”), comb filtering and other “resonators,” distortion (already fairly likely to mask a sound’s identity), panning which included autopanning and 3D binaural stereo placement/movement, vibrato, tremolo, bitcrushing and decimation, dopplering, filters (a wide variety typically with resonance, and including envelope wah effects), formant effects like the talkbox (vocoding isn’t really an effect since the “sound generation” is typically a synthesizer, but there’s also the “morph” effect of side-chaining two sounds through a vocoder but that typically refers to a type of spectral or convolution effect), gates (including rhythmic gating and the “swell” effect of a slow volume envelope), pitch- and formant-shifting (including interesting variations such as the auto-arpeggiator and harmonizers), reverb, ring modulation, (takes a deep breath)

Plus a whole list of even more severe or esoteric processes. There’s a variety of spectral effects such as warped filters (a la mutagene’s VSTs), spectral accumulation, spectral exaggeration, side-chaining to force the spectral overtone series of one sound onto another’s, spectral vocoding/morphing, resynthesis (which, like vocoding, is a bit beyond mere effect processing), a wide variety of “buffer override” effects which, though sharing a root DSP process don’t really constitute one “effect” but includes the stutter effect, time stretching in order to accentuate the algorithm’s artifacts, dub echo scrubbing (such as moving the tape across the read head like dub reggae or the “turntable scratching” effect which is entirely possible with realtime audio processing, or the “tapestop” effect, or the “playing backwards” effect), waveshaping (which like feedback delay networks can either be used with essentially random settings or set with a particular algorithm which can cause a wide variety of specific effects), convolution (which can range from reverbs to hardware emulation to purely abstract effects using waves not even meant to be impulses), and finally granulization which comes in many flavors that don’t necessarily sound anything alike any more so than chorus, phaser and slapback are all forms of delay effects.

Another layer of complexity is the automation and/or modulation you utilize. One popular for is parameters linked to a step sequencer. This aids sound-source identity over time due to our minds interpreting the periodicity of the rhythm as continuity regardless of how drastic a timbral change is. This is true even with non-integer ratios of rhythm: I love to automate the “clock” of step sequencers for “dilating” rhythms, and bouncing ball rhythms are popular, too. There’s a certain range of LFO frequency that we find doesn’t obscure our perception of “identity,” usually a different rate for each kind of effect I suppose having something to do with how drastic a change the effect causes. I wonder if there’s some sort of formula for each type of process based on how we perceive, much as the “thickening” of a chorus sound happens around 20ms whereas we hear distinct echoes from a delay somewhere around 30ms and higher? How much timbral obscurement can our ability to identify a sound source tolerate? How fast and by how much can a vibrato fluctuate a sound source’s pitch before we re-categorize it as a newly-occurring sound? Or, for my purposes, how quickly can one transition from unaffected to wildly-affected sound without losing that sense of “identity?” Or how much can one rely on repetition, melodic continuity or rhythmicality in an effect to preserve that identity?

I don’t know that there are answers to such analytical questions. For my purposes, again, it probably doesn’t matter and will have to be dealt with on a case-by-case basis due to my insistence of putting several such transforming sounds into a soundscape simultaneously.

I’d like to add to the foregoing, despite not fully fleshing out all types of modulation but of which you’re probably already familiar, audio-based modulation. In it’s simplest form you’ll encounter an envelope detector which then uses that data to modulate some parameter of an effect. The most famous is the auto-wah: the volume envelope of the incoming audio is used to modulate the cutoff of a filter, so that depending on how you play more or less wah effect occurs. To the best of my knowledge there’s only three more types of audio input-dependent modulation: peak-detection (which is just a simpler version of envelope detection), zero-crossing detecting (it counts the number of times the incoming waveform crosses the “zero” line thus a simple sine causes little modulation whereas a chord causes more, and a “broken” chord of a woodwind multiphonic causes a great deal of very rapid modulation), and pitch-tracking (whereby the pitch is analyzes and sent as the modulation level).

Each of these input-based modulation types can benefit from a modular DAW’s data manipulation. Give the modulation level an emphasis curve like you can to the velocity of a keyboard, reverse it entirely (so that, with a pitch-tracking modulation, lower notes causes faster modulation) and the like. You can easily dream up more complex ways of filtering, constraining and transforming the data which may be useful. Additionally, having a sidechain input go to the modulation source can be fun, for instance having a ring modulator on a bassline but have the pitch-tracking detector’s input be a trumpet track, so that the ringmod frequency always follows the trumpet melody instead of the bass’s frequency…

What I’d like to obtain (or, if I knew how to, I’d simply create myself) are other forms of modulation based on analyzing the incoming audio How about the “average of the three most prominent harmonic overtones” as a modulation source? Or “variation of stereo phase?” .modulating a track’s 3D effect, making it tremble from “near” to “far”…?

Having brought side-chaining into the discussion again let’s clamber up another level of complexity: all of these processes can be chained in the modern digital DAW. One has hardly learned some of the popular possibilities of the run of the mill effect processes! Now you’re expected to guess what one will sound like atop another? Or in a chain of eight sequentially? Well, why not, now that a near-infinitude of possibilities has suddenly been dropped into the laps of we who for whatever reason happened to get hipped to computer-based composition/recording/mixing/etc.?

I often hear the complaint that “there are too many options” - perhaps, for some, there are. I’d been pondering exactly this eventuality long before I had the slightest hope of getting my crazy mitts on all these new tools. I still recall being blown away by Cool Edit’s ability to slice and rearrange was! It’s not as though we’re sitting before the modular synthesizer behemoths of over 30 years ago and having to heat the room up to inhuman temperatures while physically patching 1/4″ cables here and there and everywhere and they did it. We should be thankful to be granted this digital paradise! I honestly can’t believe they managed to take some of those units on tour.

Well, that argument won’t be heard from my mouth anytime soon. If anything certain factors still work in our favor (re: my discussion of freeware and crowdsourcing). We don’t actually have to construct every synth and effect ourselves like with Supercollider or MAX/Msp. Nor do we have to learn the algorithms and the appropriate input values. It’s mostly pre-engineered instruments and effects like in previous era, but now with exquisite control, modularity and instant recall of settings (not to mention composing the movement of those settings!).

So, has the idea of chaining effects sequentially settled in? Obviously, a great deal of practice is required to take full advantage of the giant list of effects I mentioned above. Why do we tend to call this practice “experimentation”… how is one to practice in a non-experimental way with audio processes anyways? It’s not as though words can convey what one should expect when you set DSP effects to certain settings most of the time and even if you could the result would still be awfully dependent on the audio source material. Likewise, even if each effect only did one distinct thing the number of possible ways to chain just four of them in a row is rather startlingly large. A guitar player will most likely know what a flanger put after a distortion will sound like. What is the sound of a spectral accumulation on a guitar that’s been Dopplered and then decimated? There’s really only one way to find out - until you’ve “experimented” sufficiently to know your tools, that is, and then it gets a little easier.

Eventually you can train your “inner ear” to have a rough estimate of what chains of eight effects processes are going to sound like, how to use such a massive amount of processing in a way that results in a useful and cogent sound, and then later use that knowledge so that most anything that passes through your imagination’s flights of fancy can be executed in reality with the vast array of available freeware. Most of the time…

When you get to this far flung point like I have the question becomes, how much can I get away with in one sound field? Contrary to what most people seem to think even more DSP will come to your rescue sometimes. At this level of complexity (transforming sound sources effected by chains of modulated or automated effects processes) you simply have to take it on a case-by-case basis, just like any other mix.

That is the reasoning behind some of my digital proof-of-concept compositions. There’s DSP being used for basic sound design, compositionally-mandated transformations-over-time, mixing clarity and that thing I’ve been talking about all along, the maintenance of each sound source’s audio stream as identifiably a part of the series of events despite all it’s changes.

Here’s a sample of things that’ve worked in various situations:

1] Rhythmicality to the modulation, as described above

2] Certain frequencies of modulation, as described above, depending upon the effect in question and what else is going on in the mix

3] Traditional panning because, obviously, if a sound stays resolutely hard-panned to the left it’s most likely going to be interpreted as having contiguity

4] Binaural 3D positioning, due to stereo image narrowing (which works with traditional panorama placement, too), the way our ears interpret binaural filtering lending itself to the maintenance of “sound-source identity” as well as simply moving the sound away from other, potentially conflicting sounds. It’s almost magical how this works but blame your audio sensory system, not me.

5] Rhythmic panning or certain speeds of panoramic movement and especially binaural 3D movement - it’s an art, not a science, but stereo movement can help maintain a sound’s identity regardless of what kinds of transformations it’s undergoing because against a static background our ears track a moving sound as arising from a constant source. One need only think of a rustle of leaves here, a second later a twig snaps over there to the right, and when you add a third unseen sound a bit further to the right you may have just identified something that’s going to eat your sorry ass. That there’s natural selection. It’s also why multichannel audio is used in such a conservative way: the pictures only coming from a smallish area in front of you, but if you hear a startling sound behind you you’re going to involuntarily turn towards it, ruining the illusion of immersion in the video…

6] Speaker sims, ampsims, microphone simulators or the like: somehow the timbre imposed via a sound-source simulation like this helps maintain the sense of sound-source identity amazingly well even with wildly changing underlying overtone changes in whatever you’re running through the sim, plus like PM synthesis they just have a great sense of physicality and locality. Combined with stereo image narrowing and panoramic placement this is almost a sure-fire way to achieve maintenance of identity - possibly too much in the case of using the same sound source simulator to have different sound sources coming from it in the same song! Try it and see… I could also add to this list various other ways of imposing an overarching timbre to a sound source but there’s no good textual shortcut to naming any of the possibilities, just bear it in mind that the more complex the timbre character imposed the better it works. Bitcrushing/decimation and vinylization are two examples.

7] Resonators - this category can include a number of things. The traditional example is a sitar’s resonant strings. They’re called “drone” strings because, much like playing the open E string on a guitar while you play a melody on other strings, a sitar player often strums them as a drone but they’re actually resonators without being played at all. Whenever a note is played it resonates in sympathy with the “drone” strings and emphasizes certain notes and harmonics because, obviously, all of the strings are connected to, and thus causing vibrations in, one instrument’s body. Sounds a bit like the previous example, no? There’s not many resonator VSTs, sadly, but they work well for the same reason, as nothing else in your sound field is causing overtone resonance. Comb filters work great for this, too, but there’s not too many VST that do that, either. Oddly, comb filter effects tend to have great modulation options whereas resonator VSTs tend to have no modulation and in fact the ones I have like to crash when automated. Various kinds of reverb can also have this effect, but that’s the next entry…

8] Reverberation. Anything processes in the same “reverberant” space tends to be interpreted, at the very least, as originating from within that space. Reverbs tend to fill up a stereo field with a lot of sound, though, so you’re equally likely to mask the identity of all of the sounds in your stereo field, so precision and creativity are necessary. But you probably already know the challenges here - reverb can be the thing that perfects a mix or just muddles it in any engineering situation. Here’s I’ll only address special instance, such as putting a “tiny” space’s reverb on a sound that’s changing a great deal, preferably with stereo narrowing and panning of some sort. 3D binaural placement works wonders with this. Sometimes, with certain reverbs, you can even get away with panoramic movement, but more often this just sounds absurd to the human ear unless you’re trying to give the impression that an entire room is moving in relation to the static sound events in your mix! I’ve always used very specific VSTs for this, mostly AriesVerb for it’s odd and unique “spaces” such as the “shoebox,” small spherical reverbs or “harmonic resonance” reverbs… anything that doesn’t give the impression of a entire large space. Save that kind of reverb to “settle” the more static mix elements, such as drums. This is much like binaural filtering as the “depth of field” placement differentiates a sound source from anything without reverberation and gives a sense of locality.

9] Doppler effects. As helpful as panoramic movement is to maintain the sound-source identity of some transforming sound streams is it works even better with Dopplering movement. Of course, if you use too much Doppler pitch-shift you may alter your sound’s pitch content more than you care to but for a subtle effect that helps a changing sound seem contiguous a little Dopplering goes a long ways. This necessarily involved panoramic movement and obviously you won’t always be willing to have movement in a composition.

10] Resonant filtering. Good old resonant filtering, depending on the speed of the modulation, can often achieve the desired effect.

11] Time-domain modulation. Echoes, choruses, flangers, etc. Echoes are dangerous because they quickly take up a lot of the “space” you have to work with. The ability of chorus to “thicken” a sound does a decent job here, but flangers and phasers do an even better job especially if nothing else in the sound field has a modulated delay effect on it. For some reason negative feedback phasers and flangers tend to work better for this job than positive perhaps due to their “metalicity.”

Having covered those bases there’s obviously nearly limitless other ways to accomplish this task, both with effects and more traditional mixing concepts, mixing and matching whatever works in a given situation. There’s no way to present a recipe; there’s not even any good way to put it all into words, evidently. I suppose I should buckle down and write a description of something I’ve actually recorded and explain the methodology sometime.

I hate to end it here. I’d love to present specific examples and abstruse solutions. I’d love to cover in detail how to use each of the available effect types, but then I’d want to elaborate on all the other mad tricks you can pull out. The important facts are to be aware of this previously-undescribed aspect of the new era of digital music-making and to present some of the easier-to-describe ways of maintaining sound-source identity. Obviously we all have access to the same ways of looking at audio: rhythm, pitch, harmony and the suddenly much more necessary timbre, phase, panorama placement, modulation-over-time, etc. My hope is that my words can pare down the time between imagination and audible art for whoever is next in line learning all these miraculous tools and techniques. And, of course, the more we become aware of just how much is now possible the more our imaginations can take flight into an ever-more vast array of possibilities and tear down any limitations to their realization in music.

Sound Transformations (by Leigh Landy)

rachmiel suggested that this excerpted text I posted in the forum (from the Composers Desktop Project documentation) might be an interesting addition to the blog. The original full article is available here:

http://www.composersdesktop.com/landyeam.htm

If this subject is of interest to you, at the end of article I will also include links to some good texts by both this author and Trevor Wishart. Let me know what you think!

 

Sound Transformations in Electroacoustic Music

by Leigh Landy (5-10/91 - York)

The following text consists of an introduction to sound transformations in general, followed by a discussion of the aesthetics of sound transformations in electroacoustic music in Part II. Part III shows why a categorization system of sound transformations does not have to be based on a similar system categorizing all sound sources and/or timbres (colour, tone quality). In Part IV a number of categories for sound transformations is proposed and is accompanied by a couple of hints for future users.

During several months in 1991, I spent my sabbatical working on a long-postponed research project. Essentially I was interested in learning more about the highly undocumented world of sound transformations within electroacoustic music. (Most information in this area is of a specialized nature such as software manuals.) These transformations often turn out to be a powerful compositional device and are, according to many composers, a great challenge as well.

My specific interest was to investigate what has been called the ‘aesthetics of sound transformations’. I also wanted to create a classification system for sound transformations in the electroacoustic context as again no one had treated this subject either. It is hoped that through this article a start may be made towards the better comprehension of the potential of sound transformations as a compositional device. Now completed, a second collaborative phase of this project will commence which calls for more technical, practical aspects of sound transformations to be tackled including the creation of user-friendly graphic software for phase vocoding and a new user’s manual
 

I – Background

A) What are sound transformations?

One of the many fascinating compositional possibilities among the musical extensions available in today’s electroacoustic music is the ability to create timbral development from one basic sound-texture to another. This can be achieved within the course of one ‘tone’ or within a musical gesture. An early relatively well-known example of this can be heard in Jonathan Harvey’s work Mortuos Plango, Vivos Voco (1980) in which the sound of a struck bell occasionally seems to melt into the singing voice of his son as if this were a normal timbral transition (CD – Wergo WER 2025-2). As will be shown below, this concept of timbral migration is by no means new. It is the extent of potential metamorphosis and the enormous increase of potential timbres that are of greatest interest here. Sound transformations may be defined as a timbral metamorphosis (i.e., from point ‘A’ seemingly naturally to point ‘B’) within one single sound event or sonorous gesture. In the latter case this can either take place within a sound continuum or by way of a discrete sound’s repetition being transformed into that of a second[, third, etc.] sound, again as ‘naturally’ as possible.

It should be mentioned immediately that there are several opinions concerning the borderline between a single sound’s morphology and a sound transformation. Think, for example, of the evolution of a sustained cello tone, growing from a somewhat dry timbre to a vibrato and so on. The difference can be found in the fact that, from the listener’s point of view, there should not be a perception of just one original sound source, the ‘A’, within any transformation. This implies that the subtle composer could in fact manipulate one sound source (therefore generating a morphology of that single source) and create what is presently defined as a sound transformation. Even if this line of separation is mildly vague, most consciously created sound transformations can in fact be heard as such.
 

B) Is this a new concept?

As mentioned above, it is the hypothesis of this text that sound transformation potential has grown enormously with the coming of electroacoustic music. Of course ‘grown’ is the key word here. Timbral metamorphosis is almost as old as music and knows a variety of forms within vocal and instrumental music historically as well as within contemporary styles. As a matter of fact, since music has rarely been composed in what might be called block-structures, one might conjecture that timbral transformation has always been one of many basic elements of orchestration. A small selection of timbral metamorphosis within non-electroacoustic music now follows.

The first example is not necessarily the earliest. In John Taverner’s Mass The Western Wind, non-notated gradual overlapping is said to take place between choral tuttis and solo voice passages so that a more dense choral texture seems to flow into or become a thinner solo texture and vice-versa. This sense of taking over parts can also be found in various instrumental works of Schutz and Gabrieli. Haydn was known to have an interest in instrumental exchanges as well. One of his favorite transformations is to have a pitch played by one instrument before a cadence and then have it seemingly magically taken over by another one immediately afterwards, thus changing the timbral texture entirely.

These metamorphoses become more sophisticated with time, especially during the last two centuries. Towards the end of Mahler’s ‘Abschied’ from Das Lied von der Erde there is some wonderful timbral interplay including brass sonorities being born of string textures. This might have inspired Berio, for example when he composed ‘O King’ (also part of his Sinfonia) which is a whole work centered on timbral metamorphosis. Not only does sophistication increase with time; the quantity of metamorphosis-users also expands. A historic climax can be found in the work of Ligeti, the uncrowned master of timbral metamorphosis. He has spent most of his career developing techniques within this area. His Cello Concerto and Requiem are but two excellent examples of his timbre transformational works. To conclude, the step from timbral metamorphosis in instrumental and vocal music to sound transformations in the electroacoustic genre was inevitable.
 

C) What is its particular relevance within electroacoustic music?

The tale has been told many times, but it deserves a quick mention here as well. The birth of electroacoustic music meant leaving the realm of ‘just’ notes and stepping into the realm of all sounds. This of course might be categorized as a case of historical necessity, just like the sound transformation story above, as music’s timbral expansion has largely characterized music of this century (especially music written after 1945).

Not only has the the number of sound sources for music grown enormously, but the possible sound treatments have grown in equal proportion as our technology has evolved. If one were to study the growth of timbral possibilities for the cello for example - e.g. col legno and other indications found in Schoenberg scores – and a farther jump beginning at the time of the early Darmstadt years – e.g., all the timbral research Berio did in preparing his ‘Sequenze’ – we can see how timbral evolution has grown almost exponentially. Many ‘new’ sounds been created on the front, back, top and bottom of the cello; furthermore innovative complex timbral combinations and sound morphologies have been created on the cello as well as on all our traditional instruments. It is this point of sound morphology which is now relevant.

Along with the known building blocks of electroacoustic music – composing with sounds, the search for new structures, new ways of listening (see IIB, IIC below) – researching a sound’s potential development in time (theme and variation?) can become equally as important to a composer as the prior choice of sound source materials. As mentioned above, this sort of sonorous development can take on the form of a sound morphology, similar to the evolution of a cello tone. It can also take on the form of a sound transformation.

What, then, given this introduction is the relevance of sound transformations within electroacoustic music? These sounds can play a major role within a composition without falling into the category of special effects. Many commerically available works such as Trevor Wishart’s Red Bird (1973-77: an analog composition on the record YES-7) and Vox-5 (1986: a digital composition on the CDs – Wergo WER 2024-50 as well as Virgin Classics VC 7 91108-2 / 260 270-231) and John Chowning’s Phoné (1980-81: a digital work on the CD – Wergo WER 2012-50), just to name a few, illustrate this. The relevance of sound transformations can be found at the heart of what might be called their aesthetics which can be a very individual question. The aesthetics of sound transformations is the subject of Part II.

 

II – Some words concerning the aesthetics of sound transformations
 

A) What is meant by the aesthetics of sound transformations?

This article is more concerned with why sound transformations are used in electroacoustic music than what they are made from and how they are made. Therefore this discussion is not one of the traditional aspect of aesthetics, ‘musical beauty’, but instead of what might be called ‘musical dramaturgy’ (see Landy: 33 and chapter 7). In this way this section focuses upon musical raisons d’être for the use of sound transformations. It should be stated here that, as this text is not so much ‘what’ and ‘how’ oriented, a clear choice has been made to discuss this issue from the listener’s point of view. We unfortunately have few documents (scores, print-outs) to deal with which could be used to discover how these sound transformations work and even less information concerning their perception within the musical context of a composition. The following paragraphs are based on personal experience, interviews with composers interested in sound transformations and more general texts on electroacoustic music that were found relevant to this subject.

To begin, in a recent article (in Paynter) by the composer/scholar Denis Smalley, the term, ‘transcontextuality’ (borrowed from L. Hutcheon) in electroacoustic music was introduced. Smalley writes,

‘Where the sounds taken from cultural activity or nature are used as recorded, or where transformation does not destroy the identity of the original context, the listener may become involved in a process of transcontextual interpretation. We should include any recorded sound event where we are simultaneously aware of two (or more) contexts. … In transcontexts the composer intends that the listener should be aware of the dual meanings of a source. The first meaning derives from the original, natural or cultural context of the event; the second meaning derives from the new, musical context created by the composer.’

Smalley believes in using concrete sounds as the basis for transcontextuality as he views their interpretation as being based on ‘a particular attitude towards the material’ as well as on a dependence on ’shared norms and meanings’. Those sounds without ‘real-life context’ are by definition ambiguous and therefore would not fit into the above description.4. Do note that an electronic motor-like sound simulating crickets and a recording of real crickets may be similar from the listener’s point of view and are therefore both relevant here (Wishart: 1985, 82).

He continues along these lines, ‘The concept of transcontextuality is a useful way of understanding an indicative process since it is obvious that something outside the musical context is indicated.’ But Smalley also warns,

‘The more the composer intervenes in the spectro-morphology [see part IIIB for a description] of an identifiable source the more vestigial that source will become. … When transcontextual identity becomes less clearly defined the ear relies more on alternative indicative interpretations, and the composer remains the sole guardian of a dual knowledge.’

This lengthy borrowing from Denis Smalley has been chosen for many important reasons leading to an interest in the music of sound in general, and specifically the smaller world of sound transformations, are contained in his concept of transcontextuality (although this is not the only road leading to Rome). Obviously as any transformation of representational ‘A’s’ and ‘B’s’ will contain a drop of surrealism (see part IIB below for an elaboration of the term), change of context is implicit. This is why this concept of Smalley’s has been chosen here as a starting point. His desire to offer tangible, that is representational sources, is important here. Surely one can construct a sound transformation using only electronic non-representational sounds, but its perception will be highly different from the examples mentioned in Part I and tends to be less frequently applied than representational ones. One might therefore ask whether transformation is possible without changing or dualizing context.
 

B) One goal: new sounds and sound structures

One of the greatest attractions of electroacoustic music is the ability to search for new sounds (in a musical context) and sound structures. This is as old as the genre, but has not lost its impact after some forty years of electroacoustic composition.

As mentioned in Part I, timbral transformation as such is not new. Simply stated, the discovery of new sound structures within a musical context using sound transformations is inviting to those electroacoustic composers sharing this goal.

 

C) Another goal: new ways of listening

This subject also dates from the beginnings of the history of the genre. Think for example of the ‘Quatre Écoutes’ (Schaeffer: 112-128), Pierre Schaeffer’s approach to what he calls ‘reduced listening’.

The thesis here is a simple one. By expanding the music of notes to the music of sounds, the experience of listening to music has been widely broadened. The Schaeffer notions lay a theoretical foundation for discussing the phenomenon of listening to electroacoustic music. What is of interest to us is that through the application of sound transformations, creating as it were ‘impossible sound complexes’ in the sense that such things do not exist in nature, the listener is confronted with transcontextuality.

The combination of creating new timbral material and new musical circumstances for the listener must be underlined by a composer’s musical dramaturgy as such sound complexes can be ‘attention-grabbers’ and therefore should only be used with care. In a sense one might consider the composer’s paying special attention to the (passive) listening experience as the (active) search for new sounds and sound structures as seen through the looking glass.
 

D) Or a combination of these two goals, or even something else

Now that we have seen that the aesthetics of this subject can be approached from two points of view both involved with the discovery of sound, the story is not yet finished, as these two goals are but the obvious ones to attract a composer towards applying sound transformations within electroacoustic music. Individuals have their own personal reasons which sometimes complement the above. Before giving the word in Part IIE to the composer most associated with sound transformations, Trevor Wishart, the word will now be given to him anyway to exemplify one personal view. (Wishart: 1991)

He speaks of applying metamorphosis to sounds to create ‘musical metaphors’. He is interested in narrative (which may be difficult to grasp or ambiguous in some works) as opposed to art pour art abstraction. There have been many treatises written on metaphor within traditional European instrumental music, but those metaphors are rooted in music which is essentially abstract. It is through the concrete nature of most electroacoustic sound transformations (by their consisting of two or more representational sounds) that the ancient concept of metaphor within a musical framework can be modernized.

Wishart claims to be highly interested in the combination of mythology and music. Although this particular subject could take us on a large detour, one element associated with myths, ‘the symbol’ is essential here. Wishart speaks of the thought that a sound transformation can become ‘a truth that reveals itself’. (Wishart: 1991) Semiologists have been searching for signifiers and signifieds in music for a couple of decades. Wishart is of the firm belief that modern symbols can be created within the electroacoustic context and are therefore one of the main cornerstones of his approach to the choice and application of sound transformation in his oeuvre. Obviously one continually needs to search for new sounds to achieve this, and equally obviously the listening experience is at the heart of the matter. As Wishart has formulated his dramaturgy to this extent, it offers some food for thought for others who might be interested in a similar sonorous universe. These ideas are easily combined with the previously mentioned subject of musical surrealism.

This musical surrealism should be finally made more precise now. If we think of the traditional surrealist work, be it written, painted, sculpted or otherwise, elements of reality were combined in seemingly impossible ways (e.g., Salvador Dali’s melting watch). Might it be true that sound transformations using identifiable ‘A’s’ and ‘B’s’ are an excellent, be it historically late way of attaining a musical form of surrealism? Of course simply mixing sounds that do not belong together (e.g., breaking ice and tropical birds) forms another version of musical surrealism. In any event it is through the genre of musique concrète that musical surrealism has become feasible. An aesthetic for a surrealist approach to music, including sound transformations, is an interesting subject with respect to musical dramaturgy and should be looked into in more detail in the future. Returning to the concept of transcontextuality, although surrealism is just one way of articulating this, the composer should be aware of both contexts as articulated by Smalley: the sound’s own context as well and the musical/compositional one.

 

E) A case in point: Trevor Wishart

‘Making a good transformation is like writing a tune. … There are no rules.’

Trevor Wishart has done more for sound transformations than any other composer or researcher. He has used them in his compositions for some 15 years, beginning with his work Red Bird (see Wishart: 1978). His frequent choice of the voice within sound transformations turns out to be an obvious one for him as those who know his music realize that he is also one of the most inventive composers using extended vocal techniques. Convenience is not the only reason for this combination as we will now see.

Wishart claims to have become interested in the concept of transformation, not due to the evolutionary step from musique concrète techniques as one might assume, but through enthusiasm for the instrumental transformations he discovered in the early works of Iannis Xenakis. Again, as we found in part IB, the link to the tradition of transformation in instrumental music can even be found in the work of this highly experimental composer. Wishart’s goal is to make transformations with what he calls ‘real world sounds’.

As mentioned, Wishart has a preference for the narrative and the ‘objective’. He hopes that his music contains ‘more action than resonance’; therefore the identification of sources by the listener is fundamentally important in his works. He sometimes speaks of a sound transformation’s ’story’.

Perhaps his use of the term objectivity can best be understood through the fact that he chooses his sounds so that the listener can construct a relation to that sound’s role in human experience. (Here the concept of transcontextuality again comes to the fore.) This by no means reduces the value of Wishart’s comment concerning metaphor and metamorphosis, for it is through these ‘real world sounds’ that he achieves his contemporary musical mythology, something he expects to work out further in his largest-scale work to date, his forthcoming opera Orpheus – The Pantomime.

One ‘real world sound’ is used most often, namely the human voice. Wishart writes: ‘Certain sounds retain their intrinsic recognizability under the most extreme forms of distortion. The most important sound of this type is the human voice, and particularly human language, although the particular formant structure of the human voice itself has a highly intrinsic recognizability for human beings. This is partly due to the obvious immediate significance of the human voice to the human listener, but also the unique complexity of the articulation of the source’ (Wishart: 1985, 82 – this brings us back to part IIC - ways of listening). Here one notes as we so often hear that the voice is the world’s most versatile and communicative instrument. It turns out as well that through the voice’s flexibility it can be used effectively at the beginning or at the end of a sound transformation due to its ability to match a corresponding sound. As transformations with similar ‘A’s’ and ‘B’s’ tend to be more successful than larger-scale interpolations, Wishart’s extended vocal techniques accordingly have led to successful transformations (see the [dis]similarlity section in Part IVB; Wishart says, ‘The voice can be made to match almost any other sound, apart from bell-like stable inharmonic spectra’. [Wishart: 1988, 24]). So in this composer’s case, ‘Transformations come out of the voice’. Wishart’s goal is that ‘the voice speaks the world’; this is possible given all the potential ‘B’s’ to follow the voice’s ‘A’.

Before letting this composer ‘talk shop’ for awhile, let’s briefly look into what he said about his composition, Vox-5. In the introduction to his Computer Music Journal article concerning the piece, Wishart begins by stating that this fifth of the six part Vox series contains a primary aural focus, namely ‘a (super)human voice that metamorphoses into many recognizable sonic images, such as the sounds of crowds, bees, a horse, or bells.’ (Wishart: 1988, 21) Extended vocal techniques are of course at the heart of these sounds. He continues: ‘In all the spectral transformations one aesthetic aim has been to retain the “source credibility” of the resulting sounds; that is, they must always be believably vocal or naturalistic’ (although he often refers to a transformation’s dream-like quality [Wishart: 1988, 21]). Discussing the matching of sources (idem: 24), Wishart states that in sound transformations, ‘The two source sounds should have as many similarities as possible … [W]ith more complex sounds [like the voice-L.L.], a high level of acoustic/musical judgment is required to match the sources sufficiently before embarking on what can be a long interpolation computation.’

Wishart finds the subject of perceptual boundaries within sound transformations to be of particular importance. When does it sound more ‘A-like’, when more ‘B-like’? How dangerous or successful is the switch at the point of maximum ambiguity? (Wishart: 1988, 24-25)

Another subject of interest, as is often the case with composers of tape compositions, is Trevor Wishart’s involvement with ‘landscapes’, primarily in terms of the transcontextuality issues raised above, but also in terms of spatializing the performance of an electroacoustic work. (This latter subject refers to highly interesting diffusion questions, but cannot be further treated here.) In presenting his ideas about landscape, Wishart discusses three sorts of ‘imaginary landscapes’ which are important to transcontextuality in general, as well as sound transformations specifically (in Emmerson: 48, as well as Wishart: 1985, 79-80). The combination of unreal objects / real space is exemplified by the composer with the sonic image of substituting birds and animal sounds by arbitrary sound objects within a specific landscape. The converse, real objects / unreal space is exemplified by leaving the nature sounds and removing/ changing the context into an unreal (or ambiguous or fictitious, in Emmerson: 90) one. Finally the Wishart surrealistic pair consists of real objects / real space where the sounds do not belong to the space. These imaginary landscapes along with a real one where sounds and context belong together are a handy way of approaching sound transformations within acousmatic conditions.

In Part IVB we will return to this concept of landscape or context (see the ‘+1′ paragraph). One of the approaches to be dealt with there is Wishart’s keeping the sounds constant and instead transforming the landscape within a transformation.

Trevor Wishart has a definite preference for the use of what he calls dynamic or unstable (’second-order’) morphologies for his ‘A’s’ and ‘B’s’ above clear, periodic ones. (In his vocal works he has even defined special notations for such timbres.) He emphasizes these morphologies while presenting the concept of ‘gestural structures of sound’ in his texts, a notion worthy of special attention at this point.

Denis Smalley has discussed the importance of the pair, gesture and texture at length in his article, ‘Spectro-morphology and Structuring Processes’ (in Emmerson: 81-84 - also part IIIB). He defines gesture as being ‘concerned with direction away from a previous goal or towards a new goal’ and with musical causality. Texture, on the other hand, is concerned with internal behavior patterns … ‘ (in Emmerson: 82). So it is primarily gesture that one is involved with when discussing sound transformation. They might best be called ‘gesture carried’ structures.8. Wishart’s version of gesture includes four distinguishable gesture types (Wishart: 1985, 67): stable, unstable, leading-to and leading-from. He claims to focus on such gestural types while making his transformations.

But returning to a more tangible level, this case-in-point composer – who has been known to contradict himself at times in a charming manner – has often stated that he finds the , the interpolation aspect of a given sound transformation more important than its ‘A’ and ‘B’. Essentially his thought is that it is not just the transformational concept that is important, but the long, sometimes tedious and hopefully rewarding road to a successful transformation that is equally of relevance. In other words a transformation’s dramaturgy can also be found in its ‘–>’.

So what finally drives Trevor Wishart towards composing with sound transformations? When talking about Vox-5 he added a ‘Coda’: ‘The creation of Vox-5 helped me to test out many ideas about the control of musical articulation in a continuum, about spectral interpolation [i.e. sound transformation - L.L.], and about the organization of sound in space. … The computer opens up areas of compositional exploration that were previously inaccessible. The precision with which sound materials can be specified implies … [a]reas of sonic organization previously inaccessible to composers through the existing media of notation can be explored, opening a new world of dreamed of, but unsung possibilities.’ (Wishart: 1988, 26-27)

 

F) A Nota Bene: music above technique

Before leaving this section a few closing words are apropos. All the above has been written from the MUSICAL point of view. Most such articles tend to put a great deal of weight on the technological side of the affair. Certainly more information must be made public concerning the ‘how’s’ of sound transformations, 9 but one point should be made at present which is at least as important as any other made here. The choice to use sound transformations should be born of a composer’s musical intention founded upon a solid musical purpose or dramaturgy (that ‘why’ spoken of above).

As we all know very well, in today’s ‘image culture’ many a video effect can be used to attract a viewer’s attention to something.10 Overuse of visual effects, on the other hand, is simply not substantial aesthetically and often leads of inferior quality as far as content is concerned. In other words, by having a shaky aesthetic basis, or, conversely, choosing a technological approach to music where the music itself is subserviant to a technological vision, one often arrives at a less satisfying musical result. It is the opinion of the author that music must preside above technique at all times (except perhaps when something is technologically impossible at present and needs to be invented). This must be kept in mind whenever one decides to apply the exciting sophisticated potential of sound transformations. Therefore only a few helping hints for users will be offered in Part IV of the current text; everything else here concerns the music or musical goals.

 

III – Before categorizing:

why the sound sources/timbres aren’t the central point here
 

A) Background

Do sound transformations have to be categorized in terms of sound sources and/or of timbre? Initially when embarking upon this project, the author was uncertain about its feasibilty. The worry was that literally hundreds of transformations would have to be made before the results would be worthy of being called a categorization system. Then there was the question of whether timbre or specific sound sources should be at the focus. But fortunately this sticky problem was circumvented as in fact the idea of the detailed categorization system quickly turned out to be an unnecesary one as now will be demonstrated.

In recent years there have been several attempts towards sound categorization (though not one turns out to be as rigorous as originally expected) – each seemingly from a different point of view. Others interested in this area have been satisfied with their own (general) descriptions of sound colour. Many of these systems are credible in their own right, but not necessarily universally applicable.

 

B) Examples: Seven approaches to the categorization of sounds

Pierre Schaefer – Most obviously one should first think of the pioneering work done by Pierre Schaeffer and associates leading to his ‘Solfège de l’objet sonore’ (see Schaeffer, Reibel and Ferreyra). Although there never was a system made to classify all sound sources or timbres, this was the first important attempt within electroacoustic music to provide a reference. His ‘Scheme for the classification of sound objects’ is the heart of the system (see diagram in Schaeffer: 442). Even for those unable to read French, one sees immediately that no specific sounds or sound types (the likes of aerophones, chordophones, etc.) are presented here. Suffice it to say that the system works as a universe of itself and that composers and musicologists may find it useful for their own individual purposes, but Schaeffer has not offered the musical world THE all-purpose sound source system.

Karlheinz Stockhausen’s score for Mikrophonie I for 6 players (1964) includes an introduction with an entry which is essentially its own sound reference list. This list consists of descriptive words to help the performers to approach timbre during playing. Although these terms are by no means exact, they are useful for the interpreters. Examples include: wailing, pitched sounds, crackling, grating, crunching and tromboning. This approach differs highly from the above as it has an entirely different, though locally useful goal.

R. Murray Schafer – One who might have taken the trouble to create such a system is R. Murray Schafer who for many years headed the World Soundscape Project in which ‘acoustic communities’ were described. Schafer, a prime candidate for the ultimate sound source categorization system, only describes the presence of sounds locally in this project’s major texts. (see the Schafer references)

Robert Erickson devoted an entire book to Sound Structures in Music, yet the only true step towards categorization can be found in his ‘transformation triangle’ which consists of ‘pitch (with timbre)’, ‘a sound’ and ‘a chord’. The book is of a more illustrative nature.

Wayne Slawson’s major work in this area, entitled Sound Color is based on vowel formant analysis. His employs a terminology which can be of relevance to people who want to approach timbre in this manner, but is, as is often the case here, useful as A way of discussing sound colour, not THE way.

Denis Smalley has written two major texts on spectro-morphological thinking in electroacoustic music (see Emmerson, Paynter). Based on Schaeffer’s work concerning typo-morphology in his Traité des objets sonores, Denis Smalley writes, ‘Spectromorphology is an approach to sound materials a nd musical structures which concentrates on the spectrum of available pitches and their shaping in time.’ (in Emmerson: 61) This has been created from the listener’s point of view as opposed to the construction point of view of the composer and is in the opinion of the author the first major attempt to offer all involved in the music of sound a model for a potential terminology. Until now it has been primarily musicologists and a few composers who have been influenced by Smalley’s proposed way of thinking. It is expected in the future that his text will serve as a basis for greater, much needed developments in this area. Still it has not been created specifically for the categorization of sounds as such.

Trevor Wishart – Let’s call on Trevor Wishart once again. In his book, On Sonic Art (Wishart: 1985), he begins his treatment of the coherence of sound objects (idem: 37-38) by citing Steve McAdams’ four groups of sounds for ‘audio imaging’: similar envelopes, parallel frequency modulation, same formant characteristics and same apparent spatial location. We remain far from our desired detailed categorzation system.11.

Again these examples are all potentially useful as far as sound transformations are concerned. However, none will be chosen for further consideration; for to make an accurate categorization, one would therefore need to choose at least one of these systems, including one with a largely detailed approach to sound sources and/or timbre, and virtually try out every sort of transformation on every sort of sound to create a sound transformation categorization system. However by re-reading the previous sentence we are led to the crux of the reason which made this sort of work finally unnecessary. ‘Every sort of transformation on every sort of sound’ implies that a sound transformation categorzation system has already been made and then tested with every type of timbre and/or sound source. We will only be interested in the former subject in Part IV.

Having said that, we must let technology preside above musical application for a few lines for once in this text. As was proven in the experiments in creating sound transformation examples during the preparation of this article, certain types of transformations work differently/better using various types of techniques and are therefore worthy of specific attention. These will be specified in Part IVC below.

 

IV – A simple framework for categorizing sound transformations:

a ‘parametric’ approach
 

A) Introduction: The three ‘how’s’

This final section is a non-high tech attempt to delineate the world of sound transformations. This is not intended to be a user’s guide with how-to examples, but simply a basic introduction to what sorts of methods and types of transformations are possible offering the first model of its kind – be it a simple one – which hopefully will invite others to participate in its future expansion.

The three ‘how’s’ to make a sound transformation are:

Electroacoustically (through analog and/or digital treatment of recordings or previously generated sounds). This is most likely the most tedious way to go about this sort of work.

Through synthesis (digital sound generation using e.g. Csound or Cmusic leading to transformation), which sometimes may become a cumbersome process as well.

Through resynthesis (e.g., through the use of the the Phase Vocoder which first analyses input sound data and then allows sound manipulation to take place including interpolation, shifting of spectra, stretching, etc. all of which can be used towards the creation of sound transformations). This is the method used most often currently. See Wishart (1988) for a general introduction to the relevant techniques and routines.

 

B) Four (plus one) categories of sound transformations

During the course of the project four categories, each in the form of a parameter, were found which seem to span the space of sound transformation types. These parameters are absolutely not mutually exclusive.

Comparability: A and B are very comparable to the ear … incomparable (do note that our technology is currently unable to transform sounds ‘naturally’ that are quite dissimilar, at least as far as our perception is concerned).

In fact the most similar ones most resemble sound morphological movement. Between the time this paper is written and the time it is printed the amount of potential dissimilarity in perceptually successful transformations will probably increase; therefore, it is useless to attempt to define where the borderline is or will be at any given point. This parameter therefore concerns the extent of transformation. (It should be stated that questions the likes of: are A and B based on stable or unstable morphologies, are they more pitch or noise based, may be added here but are not central to this categorization system.)

Again Denis Smalley’s terminology is of use to us. In his recent article (in Paynter) he discusses the concepts of ‘graduated’ and ‘interpolatory shifts’. Although Smalley is not directly addressing a categorization of sound transformations – he is addressing what he calls ’shifts in sounding models’ – these terms are useful as ‘graduated’ refers to sounds ‘based on shared morphological attributes’, and interpolatory ‘ignores common attributes and emphasizes dramatic differences’.

Are the sounds both sequential (discrete) … continuous?

This leads to various approaches to transformation. Short sounds transform better through sequential repetition and development. Continuous sounds transform better within their own continuity. Anything in between is to be treated based on specific features.

Is the transformation to be short (from grain level) … long (eventually up to the structural level of a work)?

Trevor Wishart has said that a sound transformation using recognizable end points must last at least four seconds to be clearly perceived as being a sound transformation as such. Of course those composers who are not in need of this recognition or are working with abstract materials or structural approaches could essentially choose for transformation to take place at a granular or even on a large-scale structural level. Individual goals are perhaps more relevant here than elsewhere.

A and B are representational (clearly identifiable, mimesis intended) … ambiguous … abstract (unclear, nonrepresentational)

This final category does not need any further elaboration.

Some extras:

Here it should be reiterated that we have talked about the sorts of transformations, not how we go about them. Let’s return to Wishart’s mocking remark that the ‘–>’ between the A and B, i.e., the interpolation process itself is at least as important as its main characters. A sound transformation may be linear, it may follow a concave or convex curve of interpolation. Other even more complicated routes may be taken. There may be intermediate stages the transformation passes through. There may even be a ‘C’ and ‘D’…

Furthermore, a composer may have personal preferences as to how such transformations should/must be made. Therefore a given A –> B may sound very different than another transformation with the same end points. One might choose to use characteristic analysis to feature certain parts of A’s and/or B’s spectrum during the interpolation or other transformation process.

(+1) An issue mentioned a number of times above is now added to our four categories here as an ‘extra’. This concerns keeping the sound objects, but changing the context, a sort of sound transformation which is the fifth of the four categories and is therefore named separately (as it concerns a spatial or contextual transformation, not directly a sound transformation). Smalley refers to this area as the outer limit of what he calls ’spatio-morphology’ (morphology being in a sense singular, transformation plural, in Emmerson: 91) An excellent example of dealing with this element is Bernard Parmegiani’s Dedans dehors (1976 on record - INA-GRM 9102Pa). (What’s in a name?).
 

C) Some final tips

This final paragraph is offered to those working with sound transformations or are about to try creating some. Rajmil Fischman has provided the following seven suggestions which, depending on the circumstances and the context of a composition, may help in producing convincing results.

Differentiability:  The ‘A’ and ‘B’ of a sound transformation must be perceived as such. For example, a transformation from a soft trumpet timbre into a flute may not be perceived because the steady state of both of these sounds is similar. The main perceptual difference resides in their attacks.

Similarity:  As previously mentioned, some common characteristics between the ‘A’ and ‘B’ help in achieving continuity in transformations. For example, sounds that have the same pitch are recommended. (Sounds with different pitches can be problematic to convert into one another.) Transformation between a highly pitched and a non-pitched sound may require some pre-processing to achieve similarity: gradually making the pitched sound more noisy by means of thickening, frequency modulation, etc. and/or gradually making the non-pitched sound less noisy during the transformation by means of time variable filters, etc.

Duration:  The duration of a transformation is dependent on the type of sounds. This is probably the most difficult characteristic to generalize. Perhaps the closest one can get to this is saying that sounds with slowly changing morphologies will probably require longer transformations than quickly changing (’gestural’) ones. Furthermore, transformation between slowly changing long sounds and short ones may not work.

Linearity: Whether to choose a linear or exponential transformation depends again on how fast and how violently it is to be perceived. If we take two transformations of the same duration, one linear and the other exponential (= exp(a) where a > 1), they may be perceived as having different durations because of the slowness of the change of the latter at the beginning of the transformation and its speedy variation toward the end.

Spatial movement:  One of the criteria used by our perception when separating sound sources is their position and movement in space. Therefore, if two sounds have the same spatial movement, we are cancelling one of the characteristics that make them appear as coming from different sources. However, convincing spatial movement of a transformation (including Doppler shifts) – e.g. the ‘A’ at one point, the interpolation moving along a continuous stream towards the ‘B’ at the final destination – can assist the listener’s perception of a sound transformation.

Diversions:  Even if the first five points have been taken into account, there may be a problem that our perception does not accept the transformation, the ‘–>’ as containing either the ‘A’ or the ‘B’. Instead one may hear an unrelated unrecognizable sound, a sort of ‘grey area’. In this case, the ear needs to be ‘fooled’ by attracting the attention of the listener to something else, including a third sound, happening at the same time.

One such technique is called ‘blurring’ (also known as ‘wedging’/ ‘reverse wedging’, ‘flocking’ – see Wishart [1978]). Here during the transformation several layers of the same or slightly modified sound are used to aid in the natural flow from A –> B. This how-to recipe is taken from the sound transformation kitchen as it is used surprisingly often. (This technique was used originally by Wishart in Red Bird to hide analog technical limitations.) The classical example is probably Wishart’s vocal buzz &150;> bee swarm transformation in Vox-5. In order to capture the attention of the listener, the spectrum of the vocal buzz is made to change independently of the transformation while the latter is taking place.

Context:  The context of the actual piece may help overcome some other difficulties. When there is a common element, e.g., Wishart’s voice, in several of a work’s transformations, this can stimulate the perception of a ‘voice to …’ (or whatever) quality through a sort of pattern recognition. Perhaps a transformation might not sound convincing when heard outside of the context of a piece, but can work in context for this very reason.

Please see original article for proper acknowledgments, bibliography, and footnotes.

 

BOOK LINKS

Understanding the Art of Sound Organization - ISBN-10: 0262122928 or ISBN-13: 978-0262122924

http://www.amazon.com/Understanding-Sound-Organization-Leigh-Landy/dp/0262122928/ref=sr_1_1?ie=UTF8&qid=1244530840&sr=8-1 

Audible Design — ISBN 095103137
A plain and easy introduction to practical sound composition from Electronic Music Foundation or Digital Music Archive or ICR or bookshops.

On Sonic Art — ISBN 371865461 or (paperback) 37186547X
The aesthetics of composition in a digital age published by Taylor and Francis Books from bookshops.