Carlo A. Nardi

Università di Trento

Humboldt Universität, Berlin



Home page      Online texts     PDF version (incl. illustrations)  

Playing by eye

Music software and visuality

Logic will get you from A to B. Imagination will take you everywhere. (Albert Einstein)

Reason is a whore, surviving by simulation, versatility, and shamelessness. (E. M. Cioran)



This dissertation deals with the culturally informed relationships between the visual and the aural in music making, with a particular attention for digital technologies. Using programs like sequencers, virtual instruments and wave editors, music makers are faced with a specific representation of sonic events. Once stated that music software is based to a great extent on visual indices, I investigate the mechanisms which stem from the relationships between this perceptive organization and music production, focusing on the role of digital instruments as means for a supposed form of "democratization".


Historically and culturally derived hierarchies of the senses

First of all I want to show how the aural and the visual are interrelated in ways which reflect the cultural meanings and functions assigned to these faculties. As some interesting anthropological researches illustrate, our hierarchy of the senses, with sight at the top and hearing trailing behind, can’t be considered neither a natural predisposition nor a universal of culture. Steven Feld (1990), Paul Stoller (1989) and Alfred Gell (1995), among others, contrast the western under-rating of the sense of hearing with its importance respectively in Kaluli, Songhay and Umeda cultures. The main point is that there’s a link between the environment and the organization of sensibility, with consequences in the domain of cognition and therefore in the way the world is perceived and events are described. Each sensorial faculty is accorded a determined value in relation to the foundation of experience and the social construction of knowledge. For instance, the Suya of the Brazilian Matto Grosso "deem keen hearing to be the mark of the fully socialized individual. The Suya term ‘to hear’ […] also means to understand, while the expression ‘it is in my ear’ is used by the Suya to indicate that they have learned something, even something visual such as a weaving pattern. Sight, in fact, is considered by the Suya to be an anti-social sense, cultivated only by witches" (Classen 1993, p. 9).

On the other hand, our society can brag about essays like psychoanalyst Imre Hermann’s Perversion and hearing world (Hermann 1970), where we are taught that most of "those individuals whose psyche is governed by the aural" are predictably exhibitionist, fetishist, voyeur, kleptomaniac, pederast, and so on.

Besides a variance between different cultures, we can also notice a diachronic change within the same civilization. Just think about the shifting connotation of the dictum "Verba volant, scripta manent": in the past it used to mean that the written word is sterile and dead as a stone, while the spoken word can spread its wings and fly. Nowadays that connotation has been radically reversed: the spoken word – that is, the oral/aural – is ephemeral, while the written word – the written/visual – is stable and substantial (see Borges 1978). This is in tune with the founding role we assign to written words, charts, photographs, clothing, numbers, fingerprints, etc. in the definition of reality.

*** *** ***

Nevertheless, no musical or visual event is "pure" enough to exclude the other four senses and overall to exclude the thick network of linguistic and behavioural codices which interweave every human activity (Fabbri 1996, p. 180). At this point, we come up with this query: is there any specificity in aural perception, as opposed to visual one?

Adorno and Eisler (1969) stress the fact that the ear, differently than the eye, is always open. This means that it is a vulnerable receptor, both "passive" (it cannot decide when and what to perceive) and, at the same time, always "active" (or better, "activated"). Besides, the ear receives stimuli from every direction, and doesn’t need, as opposed to the eye, to be focussed towards the source of information; hence the ear keeps an "archaic sense of participation" with the surrounding environment, rather than trying to control it by means of zooming in on specific objects.

Though revealing, what the two scholars argue should be cleaned out of its hinted claims of universality, and instead be contextualized within a certain culture and at a certain time. As we have seen, aural perception varies according to culturally constructed norms and practices; moreover, as shown by social psychology, perception is also an active process of selection (see, for example, the distinction between sound and noise). In brief, what Adorno and Eisler bring forward is that, in our society, the ear, being less directional, is inclined to be also less selective and less subjectively organized than the eye (ibidem).

Conversely, the eye enjoys classifying and systematizing qualities which are tuned in to the most distinctive instances of Western rationalism; furthermore, directionality implies also an intention, and thus a will of control over the environment.

This said, if music is also about organized sounds, we assume that there’s a need for a system of reference when making music. This system doesn’t necessarily refer only to sound qualities, because the latter are always structured within cultural patterns, which comprehend also perceptive inputs different than aural ones plus mental representations in general.


A visual representation of sound

Music software faces us with a determined representation of sonic events, where the signs we can see on the screen don’t exactly "look like" the signs that we can hear from the speakers. As we have seen, this representation is not necessary at all, but culturally derived, the more that we could imagine different ones – in fact we have different ones, like the music score.

It is very likely that this representation bears some effects on which parameters are emphasized. Even at first glance, music software gives prominence to certain aspects like wave shape, sound treatment, dynamics, rhythmic intertwine between voices, accents, looping and repetition, texture, timber, etc. Moreover we get a subtler view of details like phase, clips and clicks, fades, frequency specter and panning. Then, we can open plug-ins which consist in virtual instruments, effects and sound wave analyzers. Finally, we have MIDI, which reduces a musical event into some of its measurable traits: pitch, duration, velocity, amount of pedal of expression, etc.

If we open an audio file on a wave editor, we can observe an image which looks familiar to us, that is a Cartesian coordinate system. This mathematical function represents the digitalized sound wave, with time on the axis of abscissas and dynamics (measured in dB) on the axis of ordinates.

Sequencers have a similar approach; on the abscissas again we have time, while on the ordinates we find discrete events called audio and MIDI tracks. Every audio track contains another Cartesian system inside, that is the sound wave. We can notice that this display highlights the intertwine between events.

In a sequencer, time is divided into equal sections, consisting in multiples or fractions of the unity, measured in b.p.m (beats per minute), so that a grid organizes spatially the domain of the song. An audio-MIDI project looks like an empty space to be filled with audio samples or midi phrases. Sampling Timothy Warner (2003, p. 26), "the visual nature of the computer screen presents musical material as simple blocks and, as a result, encourages the production of pieces with additive, rather than organic structures". Hence, "most change is produced by addition or subtraction of timbres, not through organic growth or musical development" (ibidem, p. 45), and - I would add - in an analytic fashion, as expressed in particular by cut & paste technique.

Continuity, again, is obtained through repetition of events – or blocks -, which in its turn can refer to single samples (e.g. bass or snare drum, etc.), loops (e.g. drum groove) or entire sections (e.g. verse or chorus).

This "space" is virtual and can be conceived, quoting Pierre Lévy (1995), as a process of transformation from a modality of being into another one. What I want to bring to attention is that this virtual domain is an abstraction, and that sight is our primal way to read this space and move within it.

For instance, dynamics cues are fundamental organizer factors of visual perception; they are recognized as hints in building loops and in beat-matching, which is fundamental in order to put events in time or to use samples from other records. This way, in a drum loop dynamics picks are instant cues for bass and snare drum, and consequently for recognizing beats and measures.

This phenomenon of formalization reaches its utmost in a MIDI sequencer, where musical events are reduced to rectangular bricks, which might remind us of those slide-rules we used in our first encounters with mathematics. Moreover, MIDI sequencers are generally controlled though a keyboard, which works both as a hardware device and as a virtual interface; needless to say, this fact, notwithstanding the pitch wheel, encourages a particular organization of pitch events, the same Max Weber considered as being one of the characteristics of Western music rationalization (Weber 1921).

  Metalinguistic functions of visuality

Now I would like to examine some of the functions of visuality as a metalanguage in music making.

Music makers and consumers need means through which storing and transmitting musical knowledge, in order to be able to share it. This becomes crucial in some occasions:

Both verbal language and visual communication can serve these purposes. Here it comes again the issue of the "relative reliability we ascribe to aural information compared to visual" (Thorn 1996): in order to assess that something is «real» or «true», we need to see it.

For instance, in the description of their activities, musicians, producers and engineers often use metaphors belonging to the visual arts:

"I’ve always described my job as painting a picture with sounds; I think of microphones as lenses. Engineers is such a wrong term for music mixing, really" (Geoff Emerick, in Massey 2000, p. 84).

"The more you know, the greater the palette of colors you have to choose from" (Nile Rodgers, ibidem, p. 185)

In this case, verbal communication serves as a metalanguage on a second level: it speaks about visual, which in its turn "speaks about" music. This way, we have a meta-metalanguage.



Taking for granted that music making is generally a collective process (see in particular Sorce Keller 2003), implying a more or less hierarchical division of labour, a successful communication between the people involved is a primary condition of existence for a team work, at least from a functional point of view. This task requires a language that can be understood by people who have different competences and training (that is, musicians in its strict sense, engineers, producers, A&R, etc.); hence it’s not surprising if this language is very often borrowed from visuality, at least for two reasons: (a) this kind of discourse is shared at a more general level of competence (see Stefani 1987) and (b) it has a strong, even though metaphorical, structuring quality. Moreover, this metaphorical feature can become a further stimulus for creation.

One of the most common complains about digital technologies is that they are so powerful and versatile that they drop you alone with the entire burden of choice on your shoulders. In effect, infinite copying with no deterioration of quality and non destructive, software-based editing are some of the characteristic of digital production (see Warner 2003), that means also less physical limitation to production: "it’s not what you can do that counts, but what you choose to do" (Eno 1996, p. 394). In this regard, Leonard B. Meyer (1989) points out the role of constraints, being them explicit or not, as a conceptual prerequisite for creation: "without cultural constraints, memory is emasculated by the momentary" (ibidem, p. 349). Therefore, also ideas or theories coming from extra-musical fields can help the music maker building solid creative strategies within a guiding and inspiring symbolic network, supplying values and suggesting cultural topics and instruments.


Potential and complexity

In fact there seems to be a direct relationship between the potential of an instrument and its user complexity. For instance, George Martin said that the Beatles’ Sgt. Pepper wouldn’t have been as good had it been recorded in 24-track: as the saying goes, "necessity is the mother of invention". […] Or also, Tony Visconti stresses that extrinsic limitations of the options available can help: "David Bowie and I have discussed this many times, that having less options in mixing is a positive thing. […] It would keep us going in that direction – we wouldn’t deviate from that sound" (in Massey 2000, p. 143).

At the same time, software manufacturers are concerned with providing powerful tools with user-friendly interfaces, continuously mediating between complete features and accessibility. What emerges is a concern for functionality: in order to render the gear practical for the users’ needs, manufacturers have to anticipate and provide for its possible uses; this way the resulting tool, in a certain sense, takes into account also the consumers’ needs.

However, a deterministic view, asserting that technology moulds unidirectionally its users, is untenable, as instruments are created also (or re-created) by users in the act of playing them: "the ability of the consumer to define, at least partially, meaning and use of technology is an essential assumption and theoretical point of departure" (Théberge 1997, p. 160). This happens because every planned structure implies a certain degree of choices among different alternatives (see Middleton 1990). Still, the planned uses and in particular the structuring character of the tool, referring to more general cultural issues, have effects that can’t be overlooked.

About the psychology of the computer/user interface, Richard Thorn (1996) points out that "Icon-based, rather than word-based interaction is believed to be more 'user-friendly', permitting immediate, almost intuitive response. But technology and [visual] metaphors reflect only one cultural view of the world, and why should intuitive response be limited to a visual stimulus?"

On the other hand, if for instance we consider CSound, this program is said to be capable of producing any sound, symbolizing Western abstraction at its extremes. It’s not a coincidence if text-based programs are more diffused among academic musicians. Nonetheless, the potential isn’t just something that belongs to the machine, as it raises only from the relationship with its user.


Storing sounds

Another function of visual representation is documenting the operations required to produce determined sounds. Indeed it is sometimes necessary, or just practical, to fix sound-making procedures, in order to be able to replicate them at will. There are lots of technical or didactic books which make use of pictures, drawings and diagrams which show the correct posture of the body while playing a certain instrument, how to place microphones in order to record a determined instrument and to obtain a certain nuance from it, the right position of monitors in a recording studio, equalization-curves that illustrate how to get a particular filtering effect, stylized sketches of scratch techniques, and so on. Musicians themselves use these visual means. I can quote, by way of illustration, Tony Visconti: "I take photographs of the mic placement. I have a series of photos from over the years – they’re like photographs of my drum sounds" (Massey 2000, p. 149).


Irreducible musical conducts

Though, some specific musical conducts seem to be irreducible both in visual terms or through verbal or mathematical language; I could mention concepts like "feel", "swing" or "groove", whose content is fuzzy in theory and yet clear in practice, at least for people acquainted with a certain musical style. It’s like when Saint Augustine said, referring to time: "What is it? If nobody asks me, I know. If I’m asked, I ignore it".

In the domain of artificial intelligence, software programmers are concerned with providing computers with all the abilities needed to make them play like a human – or to reproduce every human gesture. This purpose requires a formalization of these empirical conducts, in order to digitalize them. Those aspects which haven’t been properly digitalized yet, and that can therefore be considered, though only at the present time, irreducible, are some of the ones which actually make the difference between the human presence and the machine, especially in live performances (see Warner 2003, pp. 41-43).

An apparent paradox is that these "irreducible" characters, that I’ve been defining as intrinsically musical and therefore should lead us onto a more specialized musical knowledge, instead seem to refer to a more general competence, exactly to what Gino Stefani would call a General Code, that is "perceptive and logical schemes, anthropological behaviors, basic conventions according to which we perceive and interpret every experiences, hence also aural ones" (Stefani 1987, p. 18). Mistaking musical abilities with performing skills and theoretic knowledge, rests on misunderstanding and underrates the importance of listening. As John Blacking wrote: "Latent ability is rarely recognized or nurtured, [while] the creation and performance of most music is generated first and foremost by the human capacity to discover patterns of sound and to identify them on subsequent occasions" (Blacking 1973, p. 9).

From another point of view, we could say that a properly musical event, when it is irreducible to the spoken language or to logically-organized languages (e.g. the binary code), is one of the few conducts in our culture which preserve a marked pre-logical character. I can add that this character is strictly connected with body movement and flowing of time.

About the importance of the body, from a poietic perspective, just consider the role and implications of gesture while playing an instrument, compared to the abstraction which connotes programming a machine.

About the flowing of time, I’m not going to approach here such a problematic topic, but just give some cues. Time is analogue for definition: digitalization presupposes a divisibility in a discrete quantity of intervals, or samples – which is in contrast with the idea of an uninterrupted dimension, being it linear of circular. But if we don’t consider the time as a continuous flowing, we risk to get trapped in William James version of Xenon’s paradox, according to which flowing of time, is impossible. For instance, if we imagine present as a point, that is without extension, it doesn’t exist (Borges 1978). Hence the present is only an abstraction and not a spontaneous actuality of our conscience, while what we experience is a continuum – as opposed to discrete sampling.

The point is that, according to the scheme I’ve drawn thus far, both logic thinking and these archetypical conducts seem to refer to the same cultural level of competence. This convergence should call attention to a contradiction in the heart of our culture, between rationality and "something else", that is the theme of a permanent debate (see e.g. Touraine 1992). To a certain extent, this unmarked region could be identified with what we call "the body", as it is implicit in those irreducible musical events, which are in their turn often connected with corporal gestures and with a presence in space. So, a current definition of «body», conceived more in terms of "experience of sensuousness" (see Peter Wicke 1987) than of a physiological configuration, could be «the human that can’t be digitalized».

It comes to my mind Philip Tagg’s laboratory experiment on TV music which enlightens the contradiction between the progressive connotation of the verbal and visual contents and the stereotypes carried by music: "it appears that music in our culture, its digital technology notwithstanding, can categorize shared subjective experience of and relation to our social and natural environment at deeper, possibly more ‘archaic’, levels of consciousness than visual and, more notably, verbal symbols. […] Such asylums of nonverbal symbolism may be psychosocial necessities in a culture whose ideology of knowledge so one-sidedly invests certain symbolic systems, notably numbers and words, with great power and status as legitimate carriers of knowledge while banishing others to the freaky realms of ‘Art’ or ‘entertainment’ — the fact that this presentation about music is mostly words illustrates that point quite clearly" (Tagg 1989, p. 17)

Summing up, visualization, notably through digitalization, represents an attempt to reduce sonic events within a logically organized structure. In this perspective, those musical elements which preserve their irreducibility can be considered a sort of archetypical residue – or stronghold, withstanding the supremacy of logical thinking.


Democratic claims

Now I’d like to examine how these digital technologies partly redraw the social definition of musicianship, in connection with their presumed democratic function.

First of all, we can detect a wide spread of computer hardware and software dedicated to music making. More in detail, three trends are documented: (a) a progressive increase in the consumption of computers, also for private use; (b) file sharing practise, through which anyone who owns a computer and a fast Internet connection can download a great variety of software without any expense; (c) relative low costs of a basic home recording studio equipment. In general, we can take note that we are in presence of a diffused familiarity with computers, especially in the new generations.

A significant facet is the progressive hybridization between production and consumption: producers of cultural objects, in the processes of music-making, consume (a) gears – that is the means of production -, (b) educative tools and services, (c) techniques, (d) mass media, etc. This aspect is parallel to what happens with devices like samplers, synthesizers and the likes, that is when musicians become also consumers of pre-recorded sounds. Quoting Paul Théberge, recent innovations "alter the structure of musical practice and […] place musicians and musical practice in a new relationship with consumer practices and with consumer society as a whole" (Théberge 1997, p. 3).

One of the most important aspects is that all these processes pass through the same medium, that is the computer. Not incidentally, this evidence is often recalled to support communitarian claims, as it offers a common basis in terms of sub-cultural capital (see Thornton 1995).

This way, in a certain sense we have an audience made of music makers. Even if not professional, people regard themselves less as "the audience" than as equal to "the artists" and taking part in the same community. The more, they also manipulate music and discourses about music, by means of editing compilations with WinAmp or VideoLan, mixing tracks with Traktor DJ Studio or Atomic Virtual DJ, sampling and editing loops with Steinberg WaveLab or Sony Sound Forge, composing songs with Acid or Cakewalk, sharing music files in the Internet through Kazaa or eMule, talking about music in forums and user groups as experienced musicians rather than just as consumers.

Getting back to the main argument of this paper, about the claims that digital instruments are means for a form of ‘democratization’, it’s time for asking ourselves if it’s precisely this reliance on visuality what actually makes music making more accessible in a society where the eye is more trained than the ear.

The reason interfaced music software is so popular could be viewed as a dialectic relationship between two contrasting forces. As I’ve illustrated, there’s a connection between visuality and accessibility. Someone could raise the objection that also music scores rely on visuality; the difference between the two is that sequencers and the likes can be used drawing on a general competence rather than on a specifically musical one. Moreover, this software emphasizes parameters whose aware employment needs less music-specific abstraction, like dynamics or rhythm patterns rather than harmony. As a consequence, these visual indices can help people who haven’t been taught to play an instrument. Finally, and most of all, digital tools can sound: even if you start making choices according to what you see, you can always test these choices in real time through the speakers. This feature is very relevant as listening practice, though subject to distinctions (see Bourdieu 1979, Thornton 1995), is certainly more diffused than performing abilities. As John Blacking wrote, our "society claims that only a limited number of people are musical, and yet it behaves as if all people possessed the basic capacity without which no musical tradition can exist - the capacity to listen and to distinguish patterns of sound" (Blacking 1973, p. 8).

Hence we have a particular combination of aural and visual qualities: on one side, visuality as abstractive and organizational skills; on the other, aural comprehension of a style through a listening practice:















Apparently incompatible, the two terms of this dichotomy instead can help us to explain the wide spread of these technologies and their claimed democratic nature: they weave together the threads of creation and listening and at the same time they enhances two qualities of popular competence: the structuring skills of sight and the ability to listen.


*** *** ***



Adorno, Theodor W., Eisler, Hanns (1969) Komposition für den Film (München: Rogner & Bernhard)

Bennett, H. Stith (1980) On Becoming a Rock Musician (Amherst: University of Massachusetts Press)

Blacking, John (1973) How musical is man? (Seattle: University of Washington Press)

Borges, Jorge Luis (1978) Borges Oral, Buenos Aires: Emecé Editores, 1979

Bourdieu, Pierre (1979) La distinction (Paris: Le éditions de minuit)

Chapman, Owen (2004) Sonic Sedition through Aural Audition: Who’s got the conch in sample-based music?, paper for On the Right Track/Sur la bonne piste, Carleton University, Ottawa, Saturday, 15th of May 2004

Classen, Constance (1993) Worlds of Sense: exploring the senses in history and across cultures (London: Routledge)

Eno, Brian (1996) A Year With Swollen Appendices (London: Faber and Faber)

Fabbri, Franco (1996) Il suono in cui viviamo (Milano: Feltrinelli)

Feld, Steven (19902) Sound and Sentiment: Birds, Weeping, Poetics, and Song in Kaluli Expression (Philadelphia: University of Pennsylvania Press)

Feld, Steven (1994) From Ethnomusicology to Echo-muse-ecology: Reading R. Murray Schafer in the Papua New Guinea Rainforest, in The Soundscape Newsletter, 8:4-6

Foucault, Michel (1966) Les mots et les choses (Paris : Gallimard)

Hermann, Imre (1970) Perversion und Hörwelt, Psyche, Stuttgart

Gell, Alfred (1995) The language of the Forest: Landscape and Phonological Iconism in Umeda, in Hirsch, E. & M. O'Hanlon (ed.), The Anthropology of Landscape: Perspectives on Place and Space (Oxford: Clarendon Press)

Lévy, Pierre (1995) Quíest-ce que le virtuel? (Paris: Éditions La Découverte)

Martin, George (1979) All You Need Is Ears (New York: St. Martin’s Press)

Massey, Howard (2000) Behind the Glass: Top Record Producers Tell How They Craft the Hits (San Francisco: Miller Freeman Books)

Meyer, Leonard B. (1989) Style and Music. Theory, History, and Ideology (Chicago: University of Chicago Press)

Middleton, Richard (1990) Studying Popular Music (Buckingham: Open University Press)

Sorce Keller, Marcello (2003) Siamo tutti compositori. Alcune riflessioni sulla distribuzione sociale del processo compositivo, (1) in Musica Realtà, 70, marzo, pp. 26-62; (2) in Musica Realtà, 71, luglio 2003, pp. 25-53, formerly in Schweizer Jahrbuch für Musikwissenschaft, Neue Folge XVIII(1998), pp. 259-330

Stefani, Gino (1987) Il segno della musica (Palermo: Sellerio)

Stoller, Paul (1989) The Taste of Ethnographic Things (Philadelphia: University of Pennsylvania Press)

Tagg, Philip (1989) An Anthropology of Stereotypes in TV Music? (Göteborg: Svensk tidskrift för musikvetenskap, pp.19-42)

Théberge, Paul (1997) Any Sound You Can Imagine: Making Music/Consuming Technology (London: University Press of New England)

Thorn, Richard (1996) Virtual Reality: A Sound Proposition?, paper for Hearing is Believing 2, University of Sunderland, Saturday, 2nd of March 1996

Thornton, Sarah (1995) Club Cultures. Music, media and subcultural capital (Polity Press)

Touraine, Alain (1992) Critique de la modernité (Paris: Librairie Arthème Fayard)

Warner, Timothy (2003) Pop Music – Technology and Creativity (Aldershot, Hampshire: Ashgate)

Weber, Max (1921) Die Rationalen und soziologischen Grundlagen der Muzik (München: Drei Masken Verlag)

Wicke, Peter (1987) Rickmusik: zur Ästhetik und Soziologie eines Massenmediums (Leipzig: Verlag Philipp Reclam jun.) (Engl. trans. Rock music: Culture, aesthetics and sociology, Cambridge: Cambridge University Press, 1990)

Williams, Alan (2004) Science Fiction Double Feature: The impact of the computer monitor on the process of the digital audio workstation, paper for On the Right Track/Sur la bonne piste, Carleton University, Ottawa, Sunday, 16th of May 2004