Está en la página 1de 12

Anders Friberg

KTH—Royal Institute of Technology Speech, Music and Hearing Lindstedtsvägen 24 SE-100 44 Stockholm, Sweden andersf@speech.kth.se

pDM: An Expressive Sequencer with Real-Time Control of the KTH Music- Performance Rules

Experimental research in music performance is now an established field with a number of interesting ad- vances relating to the understanding of the musical communication between performer and listener (Palmer 1997; Gabrielsson 1999). Research in this field includes the communication of structure (Friberg and Battel 2002), emotional expression (Juslin 2001), and musicians’ body gestures (David- son and Correia 2002; Wanderley et al. 2005; Dahl and Friberg forthcoming). From this research, it is evident that the performer has an important and es- sential role “giving life” to a composition by intro- ducing gestural qualities, enhancing the musical structure, and providing a personal interpretation reflecting an expressive intention. For example, studies of emotional expression have shown that the same piece can communicate completely differ- ent emotions by changing the performance. A part of this research field has attempted to find and model the underlying performance principles used by the musicians (De Poli 2004; Widmer and Goebl 2004). One such system, the KTH rule sys- tem for music performance, is a result of a long- term research project initiated by Johan Sundberg (Sundberg et al. 1983; Friberg 1991; Sundberg 1993). The KTH rule system currently consists of about 30 rules covering different aspects of music perfor- mance. The input is a symbolic score, and the output is the expressive rendering in which the performance parameters such as “microtiming,” articulation, tempo, and sound level are shaped. The rules model typical performance principles such as phrasing, ar- ticulation, intonation, micro-level timing, rhythmic patterns, and tonal tension. High-level musical de- scriptions, such as emotional or motional expres- sions, can be modeled by using a selection of rules and rule parameters (Bresin and Friberg 2000). In the first development stage, rules were derived using

Computer Music Journal, 30:1, pp. 37–48, Spring 2006 © 2006 Massachusetts Institute of Technology.

the analysis-by-synthesis method: using the musi- cian Lars Frydén as an expert judge, the computer was taught how to play more musically by imple- menting a tentative model, listen to the result, and then go back and make adjustments. Later real per- formance measurements were also employed later for modeling some of the rules. The project has re- sulted in about 60 publications, including three doc- toral dissertations. (See www.speech.kth.se/ music/performance.) The program Director Musices represents the main implementation of the rule system. It was designed primarily for research purposes and is not suitable for real-time control. However, real-time control of the rule system has a number of poten- tial applications. It could be used for interactive expressive control of the music in, for example, a conductor style, as is suggested below. Whereas previous conductor systems primarily controlled overall sound level and tempo (e.g., Mathews 1989; Borchers et al. 2004), this system would provide macro-level control of the overall expres- sive character by changing the musical perfor- mance details. Recently, a combination of the Radio-Baton system by Mathews (1989) and the rule system was tested. The score was preprocessed in Director Musices using only the micro-level rules, leaving overall tempo and dynamics to the real-time control (Mathews et al. 2003). This im- proved the performance, but performance varia- tions like articulation remained static during the performance. The aim of the present work was to develop the basic tools needed for real-time expressive control of music performance. In this article, the overall method is described, followed by a detailed descrip- tion of the rule application and the pDM player, the real-time control system including mappers and a gestural interface. The article concludes with a dis- cussion of possible applications aimed toward con- ductor systems.

Friberg

37

Figure 1. The method used for real-time rule control. In the preprocessing phase (left), all rules are applied to the score. The score and the resulting deviations

are stored in the pDM file. In the performance phase (right), these deviations can be applied and con- trolled in real time.

Method Overview

Director Musices (DM) is the implementation con- taining a major part of the rules (Friberg et al. 2000a). The use of Lisp as the programming language has proven to exhibit a number of advantages for research purposes. In particular, rule prototyping is rather quick and can produce reliable results owing to a number of specialized functions and macros that con- stitute the programming environment. In using DM, one typically loads a score file, selects a set of rules (the rule palette), applies the rules, and then listens to the result. The resulting performance variations can also be visualized on various graphs. A further description of all the rules, including further refer- ences, is provided in the DM manual (Friberg 2005). DM was never intended to have any real-time control of rule application. First, the rules were not designed to be applied in real time. Some of the rules, such as the Phrase Arch rule, require the context of the whole piece to compute the final performance. Second, the Lisp environment is not a suitable plat- form for real-time control. The rules have been im- plemented and tested over many years; the prospect of re-implementing the rules in a new programming language was not tempting. It would require a fair amount of time to do this, because each rule must be verified using several musical examples. (Partial implementations have been made by Bresin 1993; Bresin and Friberg 1997; Kroiss 2000; and Hellkvist 2004.) This would also leave us with two different versions to support. Therefore, we decided to make a compromise in which the rules are preprocessed in DM, yielding a new file format that includes all rule deviations. This pDM file is then played using a real- time player programmed in Pure Data (also called pd; Puckette 1996). As shown in Figure 1, the method is as follows: (1) the rules are all applied one at a time on a given score in DM; (2) for each rule and note, the rule-generated deviations in tempo, sound level, and articulation are collected and stored in a pDM file; (3) this file is loaded into pDM, where the amount of each rule deviation can be controlled in real-time. This approach also has additional advantages. It is easy to implement the real-time module, because it is essentially a slightly extended sequencer with well-known technical solutions. It also makes mod-

Figure 1. The method used for real-time rule control. In the preprocessing phase (left), all rules

est claims on processor power, which makes it suit- able for running on small devices such as hand-held computers, PDAs, or mobile phones. The first prototype following this idea was imple- mented by Canazza et al. (2003a) using the EyesWeb platform (Camurri et al. 2000). It used the same method as described in steps 1–3, with score prepro- cessing in DM collecting the deviations for each rule. The current implementation in pd is a differ- ent design using a simpler structure solving previ- ous issues, such as polyphonic synchronization.

Rule Application in Director Musices

The first step is to apply the rules on the given score. This is done in DM using a fixed set of rules (see Table 1). In this selection, we tried to include all ma- jor rules in general use. These are then the rules that are available for real-time control in pDM. Having a fixed set of rules rather than a user-selected set was considered better from the user’s point-of-view. Be- cause there are no user choices to be made during the rule application, it is possible to further automate this step in the future. There are a few rule alter- ations that are difficult to control using the two-step approach of invoking DM followed by pDM. This includes the additional parameters available in the Phrase Arch rule, which are set at fixed values as specified in Table 1. Also, in the current implemen- tation of pDM, it is not possible to control the onset time of one note relative to other simultaneous notes. Therefore, ensemble synchronization rules, such as the Swing Ensemble rule, were omitted. The procedure of the rule application in DM is as

38

Computer Music Journal

Table 1. List of Performance Rules in Director Musices That Are Applied to the Input Score

Rule Index i

Rule

Descript.

Add. Param.

T

SL

ART

  • 1 Phrase Arch 4

Friberg 1995

:Turn 0.7 :Next 1.3 :Amp 2 (1)

(1)

  • 2 Phrase Arch 5

Friberg 1995

:Turn 0.7 :Next 1.3 :Amp 2 (2)

(2)

  • 3 Phrase Arch 6

Friberg 1995

:Turn 0.5 :Amp 2

(3)

(3)

  • 4 Phrase Arch 7

Friberg 1995

:Turn 0.5 :Amp 2

(4)

(4)

  • 5 Phrase-Ritardando 4

 

(5)

(5)

  • 6 Phrase-Ritardando 5

(6)

(6)

  • 7 Phrase-Ritardando 6

(7)

(7)

  • 8 Final-Ritard

Friberg and Sundberg 1999

(8)

  • 9 Punctuation

Friberg et al. 1998

(9)

(1)

  • 10 High-Loud

Friberg 1991

(8)

  • 11 Melodic-Charge

Friberg 1991

(10)

(9)

  • 12 Harmonic-Charge

Friberg 1991

(11)

(10)

  • 13 Duration-Contrast

Friberg 1991

(12)

(11)

  • 14 Inegales

Friberg 1991

(13)

  • 15 Double-Duration

Friberg 1991

(14)

  • 16 Repetition-Articulation-Dro

Friberg 1991

(2)

  • 17 Score-Legato-Art

Bresin and Battel 2000

(3)

  • 18 Score-Ataccato-Art

Bresin and Battel 2000

(4)

  • 19 Overall-Articulation

(5)

Technical descriptions of the rules are found in the references given in the Descript. column. The marked positions in the col- umns DT (relative tempo deviation), DSL (sound level deviation), and DART (articulation in terms of pause duration) indicate that a given rule is affecting these parameters. The numbers in parentheses refer to the rule parameter position used in the pDM file format.

follows: (1) the performance parameters are reset to their nominal values given by the original score file; (2) one rule is applied; (3) the resulting deviations are normalized and collected in a result vector. These three steps are repeated for all the rules in Table 1. After that, the resulting pDM file is written. Note that the deviations for each rule (and note) are stored separately. The deviations are collected for the note parameters DT (relative tempo deviation), DSL (sound level deviation in dB), and DART (articu- lation in terms of offset-to-next-onset duration in msec). Also stored in the pDM file are the nominal values (given by the original score) of the correspon- ding parameters (duration and sound level), as well as the MIDI note number and MIDI channel.

pDM File Format

A custom text file format is used for storing the score together with the rule deviations. It was tailor-made

for simple parsing in pd. Each line in this file format constitutes one command. Each command starts with a delta time indicating the delay relative to previous command followed by a command name and data:

<delta time(msec)> <command name> <data> + ;

This means that each note onset time is given implicitly by the accumulated delta times. This is the same method as used in MIDI files. A command can be either be a new note (NOTE) or a rule devia- tion in DT (DT), DSL (DSL), DART (DART), or a header command:

<command name> = NOTE | DT | DSL | DART | <header command>

The NOTE data format gives the nominal values of each note:

<delta time> NOTE <MIDI note number> <MIDI channel (1–128)> <sound level (dB relative to 0)> <note duration (msec)>

Friberg

39

Figure 2. Example of a pDM file. The score is the first two notes of a mono- phonic melody.

Figure 2. Example of a pDM file. The score is the first two notes of a

All rule deviation commands apply to the subse-

quent note in the list. The DT, DSL, DART data for-

mat gives the rule-generated deviations in tempo

(DT ), sound level (DSL) and articulation (DART ) for

each note:

<delta time> DT <DT 1 > <DT 2 > <DT 3 > <DT 4 >

. . <delta time> DSL <DSL 1 > <DSL 2 > <DSL 3 > <DSL 4 >

. . <delta time> DART <DART 9 > <DART 16 > <DART 17 >

.

<DART 18 >

. .

.

.

The indices refer to rule numbers in the leftmost

column in Table 1. Note that only deviations af-

fected by the rules are included. The position in the

list is given by the numbers in parenthesis in Table 1.

An example of the first part of a pDM file is

shown in Figure 2. As seen in the figure, in addition

to the note information, a header section in the be-

ginning specifies track information such as MIDI

program number and track name. In a polyphonic

score, notes from different tracks are all combined,

separated only by their MIDI channel numbers.

This is roughly similar to a MIDI File Type 0 archi-

tecture in which all notes appear in one track.

pDM player

Because all the rule knowledge is situated in DM

and has therefore already been embedded in the in-

put file, the architecture of pDM is rather straight-

forward and simple. For each of the performance

parameters tempo (T ), sound level (SL), and tone du-

ration (DUR), the nominal values given in the pDM

file are modified according to the following:

T

=

T

nom

1

⋅+

14

i = 1

kT

i

i

)

SL

=

SL

nom

+

11

i = 1

k

i

SL

i

DUR

=

DUR

nom

+

5

i = 1

k

i

ART

i

(1)

(2)

(3)

Here, T nom , SL nom , and DUR nom are the nominal

tempo, nominal sound level, and nominal tone du-

ration; i is the rule number; k i is the weighting fac-

tor for the corresponding rule; and DT i , DSL i , DDUR i

are the deviations given in the pDM file. All devia-

tions not affected by rules are assumed to be zero.

The k values are the dynamically applied weights

that are used for the real-time control. The three

parameters T, SL, and DUR are all computed just

before each new note event is played. Therefore,

there is a negligible delay from control to sound.

The effect of several simultaneous rules acting on

the same note is additive. This could, in theory, lead

to the same note receiving too much or contrary de-

viation amounts. However, some of these interac-

40

Computer Music Journal

Figure 3. A schematic view of the rule interaction. The user can either control the rule parameters ( k values) directly or via an expres- sive mapper.

Figure 3. A schematic view of the rule interaction. The user can either control the r

tion effects between rules are already taken into

account in the DM rule application. For example,

the Duration contrast rule (shortening of relatively

short notes) is not applied when the Double dura-

tion rule would be applied and lengthen relatively

short notes.

The tempo deviations defined by Equation 1 are

global. This implies that any type of polyphonic

score can be manipulated by any of the tempo-

changing rules without affecting the synchroniza-

tion. However, currently there are no means for

moving one note onset relative to other simultane-

ous onsets.

A known problem using MIDI is that MIDI veloc-

ity and MIDI volume information are arbitrarily units

not specified in terms of sound pressure level. Rather,

the effects of these commands are synthesizer-

specific and sometimes also programmable by the

user. In pDM and DM, this is solved by using sound

levels specified in dB inside the program and then

using mapping functions to MIDI velocity and vol-

ume based on measurements of different synthesiz-

ers. The mappings are rarely linear and are typically

modeled using third- or fourth-order polynomials

(Bresin et al. 2002).

pDM is implemented within pd, which is a graph-

ical programming language intended primarily for

processing MIDI and audio in real time. pDM is a pd

patch (also called an abstraction) using only the

standard libraries included in the pd-extended dis-

tribution, version 0.37. The implementation is built

around the qlist object provided in pd, which is

essentially a flexible text-file sequencer that allows

any data to be sequenced in time.

Real-Time Control

The rules can be either controlled directly or through

a mapper (see Figure 3). The most direct control is

available in terms of the rule-weighting parameters

k that access each individual rule. The left-bottom

window shown in Figure 4 is used for manipulating

the k values. Here, a fine-tuning of the performance

can be realized. This is useful for defining a fixed-

rule palette suitable for a certain musical piece. In

addition to the possibility of modifying each indi-

vidual k value, the user can also scale the rule effect

for each affected parameter (tempo, sound level, and

articulation). For simplicity, this control was not in-

cluded in Equations 1–3.

However, for real-time control of a performance, a

mapping from a more high-level expressive descrip-

tion to the rule parameters is needed (cf. Hunt et al.

2004). The mapping is by necessity a reduction of

the control space. Therefore, the selection of con-

trol space must be a trade-off between control (many

input dimensions) and usability (in the sense of few

but effective parameters). Some examples of control

spaces using only two dimensions will be presented

here. Using two dimensions makes it easy to con-

struct an interface, for example on the computer

screen. The first two mappings use well-known di-

mensions previously used for categorizing emotions

Friberg

41

Figure 4. A screen shot of pDM. The top-left window is the main window used for loading and starting performances. The bottom- left window allows control

of individual k values for each rule using the sliders to the left. The top-right window is the mouse inter- face for the activity- valence space.

Figure 4. A screen shot of pDM. The top-left window is the main window used for

or for describing expressiveness in music. Thus,

they may also have an intuitive meaning that is

easily understood by the user.

Activity-Valence Control Space

The first example is the emotional expression map-

per in Figure 3. It uses as input the activity-valence

two-dimensional space commonly suggested for de-

scribing emotional expression in the dimensional

approach of emotion science (e.g., Sloboda and

Juslin 2001). Activity is related to high or low en-

ergy, and valence is related to positive or negative

emotions. The quadrants of the space can be charac-

terized as happy (high activity, positive valence), an-

gry (high activity, negative valence), tender (low

activity, positive valence), and sad (low activity,

negative valence). Activity and valence often appear

as the two most prominent dimensions in, for ex-

ample, experiments that apply factor analysis to

similarity judgments of emotions in facial expres-

sions. This space is the basis for Russell’s circum-

plex model of emotions (Russell 1980) in which a

large number of emotions appear on a circle in the

Table 2. Suggested Rule Parameters in Terms of k

Values for Each of the Four Corners in the Activity-

Valence Space

Rule

Happy

Tender

Sad

Angry

Phrase Arch 5

1

1.5

3

–1

Phrase Arch 6

1

1.5

3

–1

Final Ritard

0.5

0.5

0.5

0

Duration Contrast

1.5

0

–2

2.5

Punctuation

1.8

1.2

1

1.4

Repetition Articulation

2

1

0.8

1.5

Overall Articulation

2.5

0.7

0

1

Tempo Scaling

1.1

0.8

0.6

1.2

Sound Level Scaling

3

–4

–7

7

The tempo scaling parameter defines a tempo factor. The sound level scaling is given in dB SPL.

space. It has been suggested as particularly suited to

music applications owing to its ability to capture

continuous changes that occur during a musical

piece (Sloboda and Juslin 2001). Previous musical

applications include the measurement of emotional

reactions (Schubert 2001) as well as different com-

putational models dealing with expressivity in mu-

sic and dance (Camurri et al. 2005).

42

Computer Music Journal

Figure 5. An example of the k value interpolation in the activity-valence space for the Phrase Arch rule. Corner values are given according to Table 2.

Figure 5. An example of the k value interpolation in the activity-valence space for the Phrase

The activity-valence space was implemented by

defining a set of k values for each of the four corners

in the space characterized by the emotions above. We

used a limited number of seven rules combined with

overall control of tempo and sound level. The rules and

their k values were chosen such that they clearly con-

veyed each emotional expression according to previ-

ous research (Bresin and Friberg 2000; Juslin 2001). The

resulting k values are listed in Table 2. These values

should only be considered as a guideline for a suitable

mapping. They are oriented towards Romantic West-

ern classical music in which large tempo variations

often are used for marking phrases. For example, if a

more stable tempo is preferred, the Phrase Arch rules

can be replaced with the Phrase Ritardando rules,

which only change the tempo in the end of phrases.

When moving in the space, intermediate posi-

tions are computed by interpolating the k values

for each rule. Each k value is computed by a linear

interpolation in two steps, resulting in slightly

skewed plane in the two-dimensional space. If x de-

notes activity and y denotes valence (each ranging

from –1 to 1), the formula used for each k value is

(

kxy

,

) =

+

  • 1 k

4 1
4
1

[

(

k

k

htas

−−+

k

)

x

+

k

k

]

y

htas

+−−

k

k

[

  • 4 ]

(

k

)

kx

htas

−+−

k

k

+

k

k

k

htas

+++

k

,

(4)

where k h , k t , k a , k s refers to the k values defined in

the corner positions defined by (x, y) = (1, 1), (–1, 1),

(1, –1), (–1, –1), corresponding to the happy, tender,

angry, and sad corners, respectively. An example is

given in Figure 5 for the Phrase Arch rule.

The activity-valence space has the advantage that

it is simple to understand and use also for musically

naïve users. This makes it a good candidate for ap-

plications such as games and music therapy. The

disadvantage is mainly the lack of fine control of

musical parameters.

Kinematics-Energy Control Space

The kinematics-energy space has been suggested by

Canazza et al. (2003b) for describing musical expres-

sivity. This space emerged in several experiments

relating to expressive performances characterized

by different musical descriptions such as hard, light,

heavy, and soft. This involved both production ex-

periments in which musicians were asked to play

the same piece with the different descriptions and

listening experiments in which the listeners were

asked to rate played performances according to sim-

ilar descriptions. In terms of musical parameters,

they found that the energy dimension was associ-

ated primarily with dynamic level and articulation

(high energy, staccato articulation), while the kine-

matics dimension was associated with tempo.

This space was rather straightforward to imple-

ment using the same interpolation scheme as

above. The energy dimension was mapped to sound

level scaling as well as all of the articulation rules

(such as punctuation, repetition articulation, and

overall articulation). The kinematics dimension

was simply mapped to the overall tempo control

(cf. Canazza et al. 2000a, 2000b). In addition, some

phrasing and duration contrast were added. The

phrasing rules were mapped to the kinematics di-

mension (slow tempo, large phrasing), and the Du-

ration Contrast rule was mapped to the energy

dimension (high energy, large duration contrast).

Table 3 gives the k values for the four corners. Simi-

lar to the activity-valence space above, the results

from the research are qualitative. Thus, the actual

values have been chosen by the author by testing

with some musical examples and should only serve

as a guideline.

Friberg

43

Table 3. Suggested Rule Parameters in Terms of k Values for Each of the Four Corners in the Kinematics-

Energy Space

 

Low Energy,

High Energy,

Low Energy,

High Energy,

Rule

Fast Tempo

Fast Tempo

Slow Tempo

Slow Tempo

Phrase Arch 5

0.5

1

2

1

Phrase Arch 6

0.5

1

2

1

Final Ritard

0

0

0.5

0

Duration Contrast

–1

3

–1

3

Punctuation

1

2

1

2

Repetition Articulation

1

2

1

2

Overall Articulation

0

2

0

2

Tempo Scaling

1.3

1.3

0.7

0.7

Sound Level Scaling

–7

7

–7

7

The tempo scaling parameter defines a tempo factor. The sound level scaling is given in dB SPL.

Table 4. Suggested Rule Parameters in Terms of k

Values for Each of the Four Corners in the Gesture-

Energy Space

Rule

Gentle

Expressive

Light

Strong

Phrase Arch 5

2

2.5

0.5

–0.5

Phrase Arch 6

2

2.5

0.5

–0.5

Final Ritard

0

0.5

0.5

0.5

Duration Contrast

–1

2

1

3

Punctuation

1

2

2

1.5

Repetition

1

2

2

2

Articulation Overall Articulation

0

0

3

1

Tempo Scaling

0.8

1.1

1

1.2

Sound Level Scaling

–7

3

–7

7

The tempo scaling parameter defines a tempo factor. The sound level scaling is given in dB SPL.

Other Control Spaces

Using the same interpolation scheme as in the

activity-valence and the kinematics-energy map-

ping above, one can assign any set of rule parame-

ters to the four corners of the two-dimensional

space. As an example, a space was defined that

could be characterized as a gesture-energy space,

with the gesture component connected to the

phrasing rules and the energy component either

connected to sound level and tempo (high gesture

amount) or connected to articulation (small gesture

amount). The four corners were labeled gentle (high

gesture, low energy), expressive (high gesture, high

energy), light (low gesture, low energy), and strong

(low gesture, high energy). The k values are given in

Table 4. This space has no justification in terms of

scientific investigations as motivated above but

serves as an example of how a slightly different in-

terface could be defined using terms commonly

used in music.

Of course, all interfaces using only two parame-

ters will be limited in some sense or the other. Cer-

tainly, musical expression is a rich and complex

field involving numerous potential control possibil-

ities. A finer control space would be to divide the

rule system into its major components. This would

imply a six-parameter space that includes overall

tempo, overall sound level, phrasing (the Phrase

Arch, Phrase Ritardando, High Loud, or Final Ritard

r ules), articulation (the Punctuation, Repetition

Articulation, Overall Articulation, and Score

Legato/Staccato Articulation rules), microtiming

(the Duration Contrast, Inegales, and Double Dura-

tion rules), and tonal tension (the Harmonic and

Melodic Charge rules). However, this would be

quite difficult to control in real-time owing to its

many parameters and sometimes overlapping con-

trol possibilities. For example, emphasis on one

note provided by the Melodic Charge rule can also

be achieved by increasing overall sound level just

for that note.

44

Computer Music Journal

Gesture Interface

The two-dimensional spaces described above are well

suited to be controlled by moving a cursor on a com-

puter screen. However, music is intimately related

to human gestures in several ways (Shove and Repp

1995; Friberg and Sundberg 1999; Clarke 2001; Juslin

et al. 2002). First, music performed on traditional in-

struments is directly influenced by the gestures used

to control the sound (Askenfelt 1989; Dahl 2004).

Second, the musical expressivity is often also evi-

dent in the body language of the musician (Davidson

and Correia 2002; Wanderley et al. 2005; Dahl and

Friberg forthcoming). Third, conductors use gestures

to communicate their expressive intentions to the

musicians. Fourth, music has ability to evoke percep-

tual sensations of movement in the listener (Gabriels-

son 1973; Friberg et al. 2000b). These results strongly

suggest that it would be better and more natural to use

more complex human gestures rather than merely

positioning a computer mouse to control musical

expressivity. Another advantage would be that a body

activation (perhaps even dancing) makes it much eas-

ier to be engaged in the music process—to get into

the “flow” state of mind often found among success-

ful musicians (Csikszentmihalyi 2000; Burzik 2002).

The idea of using gestures to control sound is not

new, and a number of interesting interfaces have

been developed in the past (see overviews in Wan-

derley and Battier 2000). Many of these interfaces

are oriented toward music creation, meaning trig-

gering/modifying sounds/sequences; however, some

of these are aimed (as is pDM) at conducting a com-

posed part (e.g., Mathews 1989; Borchers et al. 2004).

However, with the pDM tool, it is possible to ob-

tain finer-level control of the musical performance.

Previous conductor-style interfaces, such as the

Radio-Baton by Mathews (1989), mostly allowed

control of sound level and tempo. We can now keep

the expertise of the musician and control overall fea-

tures such as phrasing and articulation but in a way a

musician would have been interpreted it according to

the musical context. This is closer to a real situation

regarding the role of the conductor and the musician.

Recently, there has been considerable progress in

extracting expressive gesture data from video cam-

eras. Several useful gesture cues have been identified

and modeled for describing general expressive fea-

tures in human movement and dance (Camurri et al.

2004). These include the overall quantity of motion,

the contraction index (limbs close to the body/ex-

tended), and overall motion direction (up/down).

Following these ideas, a gesture interface was im-

plemented. The aim was to try the simplest possible

setup both in terms of recognition techniques and

in terms of mapping. A typical consumer “Webcam”

video camera was used (a Logitech 4000 Pro). The

video signal was analyzed by a computer patch written

in EyesWeb (Camurri et al. 2000). The first step in the

video analysis was to take the difference between con-

secutive video frames. This has the effect that any

changing parts of the video will be visible and all static

parts will be black. It means that only moving objects

are recognized by the system, eliminating the need

for any elaborate background subtraction. Thus, the

camera can be used in any room without the need

for special lightning or background screens. From this

difference video signal, two cues were computed: (1)

overall quantity of motion (QoM) in terms of the time-

averaged number of pixels visible; and (2) the vertical

position computed as the center of gravity for the vis-

ible pixels. These two cues were directly mapped to

the activity-valence space above with QoM connected

to activity (high QoM, high activity) and vertical posi-

tion connected to valence (high position, positive va-

lence). Informal experiments confirm that this simple

interface is indeed much more fun to use (and more

fun to see) than the mouse counterpart. Whether the

music itself becomes more expressive is an interesting

question for future research (see also Friberg 2005).

Summary and Discussion

A new software application, pDM, allowing real-

time control of expressivity by means of the KTH

performance rule system, was described. The gen-

eral idea is that all rule knowledge is kept in the

original rule application Director Musices, which

produces the pDM file. pDM then applies and scales

the different rule-generated performance variations

for a given score at playback time.

The development and testing of pDM triggered

many new ideas. Previously, the rules were set to a

Friberg

45

static rule palette that did not change during the piece.

(For an attempt to estimate the time variation of rule

parameters, see Zanon and De Poli 2003.) The static

rule setup worked well for shorter pieces. However,

with the real-time control, it opened up the possibil-

ities not only to make the “ultimate” performance

but to also “conduct” the piece by continuously

changing the rule parameters. This highlighted the

need for different mappings simplifying the interface

to a few but intuitive parameters. Two such mappings

were suggested with support from previous research:

activity-valence space and energy-kinematics space.

The choice of keeping the rule knowledge in DM

as well as using the real-time programming environ-

ment pd had a number of advantages. It minimized

the development time and future maintenance. The

pDM application is automatically multi-platform,

because pd runs on Windows, Linux, and Macintosh

operating systems. Also, owing to its simple struc-

ture, it can easily be re-impemented in another pro-

gram such as Max. Because the rule application in

DM has no user options, it is straightforward to in-

corporate the preprocessing of the score in the pDM

environment by calling the DM application implic-

itly. An important future addition would be to also

include rule-based displacement of single notes, thus,

incorporating the ensemble rules into the system.

Using tools such as pDM, a completely new way

of performing music can be envisioned in which the

traditional boundaries between musician, conductor

and listener are less defined. A “home conductor”

system can be constructed in which the listener

controls the music performance through gestures.

By varying the complexity of the interface, the role

of the user could vary from the listener (no control)

to the conductor (overall expression and tempo con-

trol) to the musician (individual note control). From

previous research and experience, the following lev-

els of control seem promising for such a system:

Level 2 (Simple Conductor Level)

Basic overall musical features are controlled using,

for example, the energy-kinematics space. As this

space is defined in motion terms, it creates a simple

relation to the gestures.

Level 3 (Advanced Conductor Level)

Overall expressive musical features or emotional

expressions are combined with the explicit control

of each beat, similar to the Radio-Baton system.

Such a system would finally allow the listener to

actively participate in the process of music creation.

This has been in the past reserved for musicians de-

manding typically a minimum of ten years of prac-

tice (Ericsson and Lehmann 1996). An interesting

extension would be to develop an expressive player

for people with limited movement capabilities. Pre-

vious experience indicates that music technology

can provide several means for these people to access

music therapy and music performance (Hunt et al.

2004). In this case, the gesture recognition interface

must be specifically designed according to the

movement capabilities of the user.

Downloads

non-commercial use under the GPL license at our

Web site www.speech.kth.se/music/performance/

download. There are DM versions for Windows,

Macintosh, and Linux. The Linux version is also in-

cluded in the GNU/Linux audio distribution pro-

vided by the AGNULA project (www.agnula.org).

pDM was developed and tested on Windows. It may

run on other platforms using the pd releases for

Macintosh or Linux, but this has not been tested.

Level 1 (Listener Level)

Basic emotional expressions (happy, sad, and angry)

are used for control. This creates an intuitive and

simple interface comprehensible without the need

for any particular musical knowledge. A prototype

was described in the gesture interface section above.

Acknowledgments

pDM was developed by the author during a stay at

the Music and Media Technology group at the Uni-

versity of York, UK, supported by the Swedish

Foundation for International Cooperation in Re-

46

Computer Music Journal

search and Higher Education (STINT). Helpful com-

ments on the manuscript were given by Roberto

Bresin and two anonymous reviewers.

References

Askenfelt, A. 1989. “Measurement of the Bowing Parame-

ters in Violin Playing II: Bow Bridge Distance, Dynamic Range, and Limits of Bow Force.” Journal of the Acous- tic Society of America 86:503–516. Borchers, J., E. Lee, and W. Samminger. 2004. “Personal Orchestra: A Real-Time Audio/Video System for Inter- active Conducting.” Multimedia Systems 9(5):458–465. Bresin, R. 1993. “MELODIA: A Program for Performance Rules Testing, Teaching, and Piano Scores Performing.”

Proceedings of the X Italian Colloqium on Musical In- formatics (CIM). Padova: Associazione di Informatica Musicale Italiana. Bresin, R., and G. U. Battel. 2000. “Articulation Strategies in Expressive Piano Performance. Analysis of Legato, Staccato, and Repeated Notes in Performances of the Andante Movement of Mozart’s Sonata in G Major (K. 545).” Journal of New Music Research 29(3):211–224. Bresin, R., and Friberg, A. 1997. “A Multimedia Environ- ment for Interactive Music Performance.” Proceedings of the AIMI International Workshop: KANSEI—The Technology of Emotion. Genoa: Associazione di Infor- matica Musicale Italiana, pp. 64–67. Bresin, R., and A. Friberg. 2000. “Emotional Coloring of Computer-Controlled Music Performances.” Com- puter Music Journal 24(4):44–63. Bresin, R., A. Friberg, and J. Sundberg. 2002. “Director Musices: The KTH Performance Rules System.” Pro- ceedings of SIGMUS-46, Kyoto. Japan: Information Processing Society of Japan, pp. 43–48. Burzik, A. 2002. “Practising in Flow—The Secret of the Masters.” Stringendo 24(2):18–21. Camurri, A., B. Mazzarino, and G. Volpe. 2004. “Analysis of Expressive Gesture: The EyesWeb Expressive Ges- ture Processing Library.” In A. Camurri and G. Volpe, eds. Gesture-Based Communication in Human- Computer Interaction, LNAI 2915. Vienna: Springer Verlag, pp. 460–467. Camurri, A., et al. 2000. “EyesWeb: Toward Gesture and Affect Recognition in Interactive Dance and Music Sys- tems.” Computer Music Journal 24(1):57–69. Camurri, A., et al. 2005. “The MEGA Project: Analysis and Synthesis of Multisensory Expressive Gesture in Performing Art Applications.” Journal of New Music Research 34(1):5–21.

Canazza, S., et al. 2000a. “Audio Morphing Different Ex- pressive Intentions for Multimedia Systems.” IEEE Multimedia 7:79–84. Canazza, S., et al. 2000b. “Real-Time Morphing Among Different Expressive Intentions in Audio Playback.”

Proceedings of the 2000 International Computer Music

Conference. San Francisco, California: International Computer Music Association, pp. 356–359. Canazza, S., et al. 2003a. “Expressive Director: A System

for the Real-Time Control of Music Performance Syn- thesis.” In R. Bresin, ed. Proceedings of the Stock- holm Music Acoustics Conference 2003, Volume II.

Stockholm: Speech, Music and Hearing, KTH, pp. 521–524. Canazza, S., et al. 2003b. “An Abstract Control Space for Communication of Sensory Expressive Intentions in Music Performance.” Journal of New Music Research

32(3):281–294.

Clarke, E. F. 2001. “Meaning and Specification of Motion in Music.” Musicae Scientiae 5(2):213–234. Csikszentmihalyi, M. 2000. Beyond Boredom and Anxi- ety: Experiencing Flow in Work and Play. San Fran- cisco, California: Jossey-Bass. Dahl, S. 2004. “Playing the Accent: Comparing Striking Velocity and Timing in an Ostinato Rhythm Performed by Four Drummers.” Acta Acustica united with Acus- tica 90(4):762–776. Dahl, S., and A. Friberg. Forthcoming. “Visual Perception of Expressiveness in Musicians’ Body Movements.” Davidson, J. W., and J. S. Correia. 2002. “Body Movement.” In R. Parncutt and G. E. McPherson, eds. The Science and Psychology of Music Performance. London: Oxford University Press, pp. 237–250. De Poli, G. 2004. “Methodologies for Expressiveness Modeling of and for Music Performance.” Journal of New Music Research 33(3):189–202. Ericsson, K. A., and A. C. Lehmann. 1996. “Expert and Exceptional Performance: Evidence of Maximal Adap- tation to Task Constraints.” Annual Review of Psy- chology 47:273–305. Friberg, A. 1991. “Generative Rules for Music Performance:

A Formal Description of a Rule System.” Computer Music Journal 15(2):56–71. Friberg, A. 1995. “Matching the Rule Parameters of Phrase Arch to Performances of Träumerei: A Prelimi- nary Study.” In A. Friberg and J. Sundberg, eds. Pro- ceedings of the KTH Symposium on Grammars for Music Performance. Stockholm: KTH, pp. 37–44. Friberg, A. 2005. Director Musices 2.6 Manual. Available online at www.speech.kth.se/music/performance/ download.

Friberg

47

Friberg, A. 2005. “Home Conducting: Control the Overall

105(3):1469–1484.

Cybernetics and Artificial Intelligence, University of

Musical Expression with Gestures.” Proceedings of the 2005 International Computer Music Conference. San Francisco, California: International Computer Music

Vienna. Mathews, M. V. 1989. “The Conductor Program and the Mechanical Baton.” In M. Mathews and J. Pierce, eds.

Association, pp. 479–482.

Current Directions in Computer Music Research.

Friberg, A., and G. U. Battel. 2002. “Structural Communi- cation.” In R. Parncutt and G. E. McPherson, eds. The Science and Psychology of Music Performance. Lon- don: Oxford University Press, pp. 199–218. Friberg, A., and J. Sundberg. 1999. “Does Music Perfor- mance Allude to Locomotion? A Model of Final Ritar- dandi Derived from Measurements of Stopping Runners.” Journal of the Acoustical Society of America

Cambridge, Massachusetts: MIT Pres, pp. 263–282. Mathews, M. V., et al. 2003. “A Marriage of the Director Musices Program and the Conductor Program.” In R. Bresin, ed. Proceedings of the Stockholm Music Acous- tics Conference 2003, Volume I. Stockholm: Speech, Music and Hearing, KTH, pp. 13–16. Palmer, C. 1997. “Music Performance.” Annual Review of Psychology 48:115–138. Puckette, M. 1996. “Pure Data.” Proceedings of the 1996

  • J. Rink, ed. The Practice of Performance: Studies in

Friberg, A., et al. 1998. “Musical Punctuation on the Mi- crolevel: Automatic Identification and Performance of Small Melodic Units.” Journal of New Music Research

International Computer Music Conference. San Fran- cisco, California: International Computer Music Asso- ciation, pp. 269–272.

27(3):271–292.

Russell, J. A. 1980. “A Circumplex Model of Affect.” Jour-

Friberg, A., et al. 2000a. “Generating Musical Perfor- mances with Director Musices.” Computer Music Jour- nal 24(3):23–29. Friberg, A., J. Sundberg, and L. Frydén. 2000b. “Music from Motion: Sound Level Envelopes of Tones Express- ing Human Locomotion.” Journal of New Music Re- search 29(3):199–210. Gabrielsson, A. 1973. “Adjective Ratings and Dimension Analysis of Auditory Rhythm Patterns.” Scandinavian

11(3):50–58.

nal of Personality and Social Psychology 39:1161–1178. Schubert, E. 2001. “Continuous Measurement of Self- Report Emotional Response to Music.” In P. N. Juslin and J. A. Sloboda, eds. Music and Emotion: Theory and Research. London: Oxford University Press, pp. 393–414. Shove, P., and B. H. Repp. 1995. “Musical Motion and Per- formance: Theoretical and Empirical Perspectives.” In

Journal of Psychology 14:244–260. Gabrielsson, A. 1999. “Music Performance.” In D. Deutsch, ed. The Psychology of Music, 2nd ed. New

Musical Interpretation. Cambridge: Cambridge Univer- sity Press, pp. 55–83. Sloboda, J. A., and P. N. Juslin. 2001. “Psychological Per-

York: Academic Press, pp. 501–602.

spectives on Music and Emotion.” In P. N. Juslin and

Hellkvist, A. 2004. “Implementation Of Performance

  • J. A. Sloboda, eds. Music and Emotion: Theory and Re-

Rules In Igor Engraver.” Masters thesis, Department of Information Technology, Uppsala University, Sweden. Hunt, A., R. Kirk, and M. Neighbour. 2004. “Multiple Media Interfaces for Music Therapy.” IEEE Multimedia

search. London: Oxford University Press, pp. 71–104. Sundberg, J. 1993. “How Can Music Be Expressive?” Speech Communication 13:239–253. Sundberg, J., A. Askenfelt, and L. Frydén. 1983. “Musical Performance: A Synthesis-by-Rule Approach.” Com- puter Music Journal 7:37–43.

Juslin, P. N. 2001. “Communication of Emotion in Music Performance: A Review and a Theoretical Framework.” In P. N. Juslin and J. A. Sloboda, eds. Music and Emo- tion: Theory and Research. London: Oxford University Press, pp. 309–337. Juslin, P. N., A. Friberg, and R. Bresin. 2002. “Toward a Computational Model of Expression in Performance:

Wanderley, M., and M. Battier, eds. 2000. Trends in Ges- tural Control of Music. Paris: IRCAM. Wanderley, M. M., et al. 2005. “The Musical Significance of Clarinetists’ Ancillary Gestures: An Exploration of the Field.” Journal of New Music Research 34:97–113. Widmer, G., and W. Goebl. 2004. “Computational Models of Expressive Music Performance: The State of the

The GERM Model.” Musicae Scientiae Special Issue

Art.” Journal of New Music Research 33(3):203–216.

2001–2002:63–122.

Zanon, P., and G. De Poli. 2003. “Estimation of Time-

Kroiss, W. 2000. “Parameteroptimierung für ein Modell des musikalischen Ausdrucks mittels genetischer Al- gorithmen.” Masters thesis, Department of Medical

Varying Parameters in Rule Systems for Music Perfor- mance.” Journal of New Music Research 32(3):295–316.

48

Computer Music Journal