Practice English Speaking&Listening with: From Sound Synthesis to Sound Retrieval and Back

Difficulty: 0

MALE SPEAKER: Today, we're very happy to have, visiting

from Barcelona, Doctor Xavier Serra, who's the Director of

the Music Technology Group at the Pompeu Fabra

University in Barcelona.

And he's also a native of Barcelona.

And he comes out every summer and teaches some computer

music and sound processing at Stanford.

Take it away.

We're happy to have you.

XAVIER SERRA: Thank you.


So, thank you, very much.

And it's a pleasure to be here.

And what I will talk about today, if it's not clear

enough from the title, is the idea of sound synthesis and

some of the techniques that, historically, in the field of

computer music, we have been using and developing.

And, basically, the idea that how, out of that, a lot of the

sound retrieval work and music information

retrieval work develop.

And then, in fact, out of that, how, in sound synthesis,

we benefit again.

And how, nowadays, a lot of the sound retrieval techniques

and music information retrieval techniques are being

used creatively in sound synthesis.

But, basically, the excuse of all this talk is to also

present one side, one side that we have been developing

for a couple of years now, which is

the Freesound website.

It's a database of sounds under Creative Commons license

that are used for research and creative applications.

And, basically, that site and that project unifies a little

bit this idea of how, nowadays, large data bases

like Freesound can be very interesting, can be of a lot

of use to both the creative people, the musicians, the

people that want to play around with sounds, some

recording people, and also the researchers doing work in

sound retrieval and music information retrieval.

OK, so the idea is that, first, I will present a little

bit of the group I am in so that you see a little bit the

context why we are interested in these things and what kind

of thing we are doing.

Then I will go a little bit back, historically, to three

techniques that have been used for some years now, some very

historical and some more actual, like musique concrete,

granular synthesis, and sampling.

And then the work on sound modeling, on the spectral

processing that, basically, has allowed to take a step

further all the work on sound processing and on the work on

sound synthesis.

And then, basically, how, out of all that, we have all this

work on sound retrieval and what is the state-of-the-art.

What are the things that are being worked on now and that

are of interest?

And then, taking advantage of all this work, the new sound

synthesis paradigms and, especially, I will talk about

too, the mosaicing in sound synthesis and the idea of

concatenative synthesis, how it takes

advantage of all that.

And it has taken the work of sound producing into new, very

interesting, areas.

And, finally, I will present the Freesound Project, which

is a very needed platform and environment, in fact, to make

some of these possible.

Because that's one of the major problems or major

roadblocks, both creatively and research, is to have

access to large databases of sound, well-labeled, that can

be used freely and without all the restrictions, the legal

restrictions that, nowadays, especially music and sounds,

are very strong, much stronger than in many other of the

content areas.

And I will give you some conclusions.

OK, so in the Music Technology Group at the Pompeu Fabra

University in Barcelona we are, basically, working a lot

on audio, and from the sound, and doing applications and

technologies based on that.

And here's some of the lines that we have been worrying

about lately, the main one, and originally the one that I

started with.

I did my PhD here at Stanford, and then I went back.

And my PhD was in spectral processing and spectral-based

modeling and synthesis.

So I started the group, basically, on the tradition of

how to do processing and how to use spectral analyses for

analysis and synthesis.

Then, out of that, came the work on music description and


And this is the sound retrieval, music information

retrieval area, and how to extract automatically, admit

the data from the signal, from some

signals and music signals.

Also, we have a group that is developing interactive music

systems. And we have recently developed an instrument that

has become quite popular, which is the reacTable.

I will just briefly mention it.

And then we have been doing more basic research areas in

music admission and computer models for music admission,

not on performance, music performance understanding and

modeling performance.

So I will just, very briefly, just mention three topics, or

three projects, that we have been doing so that you can get

an idea of the kind of things we did.

One of the first projects, I got involved when I went back

to Barcelona and it was together with Yamaha.

We have been collaborating with Yamaha for

quite a while now.

In fact, I worked for Yamaha, also, for

a while doing research.

And we developed this Karaoke-based impersonation

system in which, in real-time, you sing and you can modify

your own voice to a target voice or to the voice of some

professional singer.

It's based on spectral processing and is, basically,

the idea of morphing in the sound domain.

And that was quite successful.

Then, a continuation of that, we developed, basically, the

first singing voice synthesizer that exists,

which, now commercially, is called Vocaloid.

Also, it was a project in collaboration with Yamaha that

we developed.

We called it the Daisy Project, in honor of the first

singing synthesizer, the first song that was produced by a

computer at Bell Labs in the '50s.

So this is just a synthesizer, which is the first one that

can handle a singing voice.

Until now, all the synthesizers have been,

basically, instrumental sounds.

So you type in the melody, the lyrics, and then you can hear.

And later, I will just play one example.

In the area of content processing, going towards the

sound retrieval aspect and coming from all these audio

analyses, we have been working a lot on the idea of

extracting features, and the scripters, high and higher up

in the semantic layer.

So we started from these, extracting the low-level

descriptors of a signal, the sinusoids, the harmonics, the

residual, the [? foremans ?], sort of signal-based


And we have been going higher up.

And now we are working about things: identifying the

harmony, the rhythm, identifying the structure of a

piece of music, identifying the instruments.

So, for example, this is a video of some work that

identifies the chords.

So you see the signal on top.

And underneath the red is the proposed chord.


XAVIER SERRA: It tells you the probability of being one chord

in respect to the others.

And so this is, again, the spectral analysis-based

techniques that matches the chords.

So having these type of signal-level techniques, you

can, of course, start developing some interesting


And quite a number of companies and research labs

doing this type of thing.

It has been evolving a lot lately.

So we are also developing music recommendation systems,

sound search, sound retrieval systems, so that, based on

these descripters, you can look for songs that have the

same harmonic structure, or songs that may have the same

rhythm, so you can then recommend, or organize, or

navigate, through a collection of music with

this type of tools.

OK, so that's the retrieval aspect.

And then, just finally to put another example, we have been

developing some interfaces, tangible interfaces.

And lately, the one that we are concentrating on is called

the reacTable.

And it, in the last few months, has been quite on the

media because Bjork is using it on her tour that, in fact,

started here a couple of months ago, on the Volta Tour.

And there is a lot of interest for musicians to use these

type of tools.

So, it is a tool that recognizes objects, recognizes

the finger, and, out of that, then you can hear its sounds.

And it gives you some visual feedback.

And that's one of the things that people are attracted to

because you get some sort of visual representation of the

sound that is being generated.

So that's a table, but these regular table hide.

And, of course, apart from music applications, it's not

that different from the surface of Microsoft in terms

of the technology.

This is [? much cheaper ?].

But we are interested in the musical and instrument


So it has all kinds of objects that you can do, all kinds of

musical things.

Anyway, so that gives you an idea of the kinds of things we

do and where we come from.

So now, if we concentrate on the topic of the talk, we

start with sound synthesis.

We start from the '40s and '50s.

And one of the first work that was done in this electronic

media, first analog, and then it evolved into the digital

world, was musique concrete.

And the inventor was Pierre Schaeffer.

And the idea was from actual recordings.

And first was in LPs, and then it was with analog tape.

And by cutting, splicing, and doing some processing, you

were able to make a piece of music from existing material.

And the term musique concrete comes from that.

So, for example, this is a piece from '48--

I think this was done with LPs and with very, very crude


putting together a piece of music out of train sounds.

The sound is not loud enough.



XAVIER SERRA: And some of the idea behind this type of work

is to start from existing material, take it out of

context, and put it into a creative context that changes

the meaning and brings new meaning to this material.

So this tradition of musique concrete has been, basically,

continuing for years.

It started in Paris, and it has evolved for many years.


When the computers came into use to take the musique

concrete into the digital world, of course, the major

advantage was the accuracy and of being able to cut, splice,

and combine, and algorithmically combine these

pieces of sound.

So granular synthesis is this idea of

starting, again, from grains.

In this case, we could go down to smaller grains and put

together new material out of existing material.

And I would say that this is one of the major idiomatic or

sound-making techniques that defer from the tradition of

traditional instruments.

That is, it's impossible to think of traditional

interfaces and traditional musical instruments to control

this type of thinking.

So, for example, this is a piece of music from an

American composer.


XAVIER SERRA: So these are just tiny grains

algorithmically put together.

And, basically, there is not much transformation.

The grains are left as they are.

There is sound filtering.

Maybe there is a little bit of pitch shifting.

But this has been, very much, in the tradition of computer

music and electroacousitc music that's very much at the

core of a lot of the music that happened.

In the commercial world, this idea became the sampling.

And so the first sampler was Fairlight from 1980.

And that was the first instrument, basically,

targeting to the commercial world: very

expensive at that time.

And it, of course, also targeted more the art,

music-type world.

And the idea is that, with the right control, with the right

software, by adding sequencing, by adding sound

transformations, like sampling rate conversion and some

filtering, you can record samples from an instrument,

record samples from anything, and then make music out of it

with a keyboard.

So, for example, there is some of the samples original from

the Fairlight that have become quite, then, used later on.


XAVIER SERRA: So this started from a single voice sample.

Of course, these, we're talking 8 beats.

We are talking something, a rate that

maybe was 11K, something.

A rate so that, the quality, there is

quantization all over.

But some of these sounds have become quite used later on.

And, for example, of course, you can just

record a dog sound.


XAVIER SERRA: And then, just map it into the key word.

So anyway, so that's the commercial

version of all that.

OK, so that's that for the synthesis

base on existing material.

The idea of spectral processing and sound

processing, in general, focus on how can we process these

sounds more than just by cutting and splicing?

How can we change the sound in a more flexible way?

So the spectral processing idea is that you start from an

original sound, you do spectral analyses.

Then you extract some features, the features that

you want to process, or identify, or characterize

like, in this case, the pitch or the frequency of the sound.

Then you apply the transformations.

And then you do the inverse for your transform, or

whatever transform you're using, to go back to the

original sound.

That gives you much more power over the control and the

transformation of the sound.

We are not talking about time domain.

All what I have been talking until now is time domain.

Frequency domain, with these feature analyses and the

techniques that, in the past 15, 20 years, have been

developed has really taken sound to a much more powerful,

flexible material that can be used.

So, for example, this is from one sound like these.


XAVIER SERRA: This is just a tiny blow of a flute.


XAVIER SERRA: And, in fact, this was a student in my class

at Stanford, like, 10 or 15 years ago.

The first day of class, I gave them this sound.

And I told them OK, make some transformations out of that

without even understanding the signal processing behind it.

So, out of that, he came up with this.


XAVIER SERRA: So now, this sound, we can very easily

change it completely and drastically into many

different ways.

So the potential to combine this with what I was just

mentioning is incredible.

So this is just a representation of some of the

data that is behind the harmonics, the residual and

some of the transformations that you can do from this

spectral data.

So, out of that then, a lot of the music information

retrieval work came about.

Music information retrieval, of course, didn't aim to make

sounds to the synthesis aspect.

It just stayed with the idea, what can we do with the sound

analyses that can be able to describe it?

So this is are very complicated diagram that I'm

not going to go through, but, at least, I wanted to show you

some of these things.

Of course, in music information retrieval, we are

not just interested in sound, we are interested in scores,

in editorial data, in any kind of data that

relates with music.

And, so, here at the bottom, you have this information that

is attached to the physical objects that music is, like

the CD, or the record, or whatever.

Then you have all these.

Out of these, the digital information that you can

extract: the text, the lyrics, the digital sound data.

And then, out of that, that's what you have. And that's what

you normally, in libraries, in digital archives, what you

have access to.

And, out of that, you try to make of the best of it to

search, to do recommendations.

So, in the music information retrieval, the idea has been

to go up the semantic ladder, obtain symbolic semantic

information, automatically, from the

information you have at hand.

So, by doing single processing, you are able to go

to these low-level features, like the pitch, duration,

temper, intensity, whatever.

You can go higher up and go to these more musical contexts,

like the one I just mentioned about the harmony, so you can

identify the chords, segment a piece of music,

identify the rhythm.

And then, hopefully--

and this is, basically, a state-of-the-art--

right now, reaches this level.

But, ideally, you want to go higher and identify more

musically meaningful descriptors and what type of

music it is, what key, to identify the melody within a

very polyphonic piece of music, et cetera, et cetera.

And even ideally, you want to go to more cognitive aspects

and bride this semantic gap in some way or another.

So, in fact, the semantic gap, of course, is one of the hard

challenges in an audio.

So this represents the music playing, in which, apart from

audio, you have text.

You have image, all the information that surrounds

music on these different levels of abstraction; the

signal features that can be extracted automatically;

content object features, or mid-level features that are

starting to be extracted automatically; and, of course,

this more human knowledge, which, in music, I believe we

are a little bit more ahead than with other content areas

in which we are starting to be able to bridge, a little bit,

this semantic gap.

Until now, most of the techniques that we've been

using are signal processing.

Of course, that was the original methodology approach.

Then, with the statistical modeling machine, learning the

music theory models, and web mining, we have been able in

the last few years, many projects are pushing that.

And the next wave is to go higher up in incorporating

computational and neuroscience models to try to go a bit

higher in music admission, computation of musicology,

text understanding, origin of rules and etiologies,

multi-model processing, combining image, text and

music in a more unified way.

So that's, sort of, what current research is aiming at

and is trying to push.

And I believe that, in some very narrow areas, we are

really crossing the semantic gap, in music especially.

And so we are getting to be able to have these

applications that are starting to be very interesting.

So anyway, so that's the sound retrieval work and what is

being worked on.

So now, combining that plus--

oh no.

I just left this slide out.

This is about sound retrieval.

In sound effects and sound, the music information

retrieval hasn't focused so much on sounds, in general.

They have been focused, mainly, on music.

Sound is a completely specific world, very complex, in which,

again, you have very different areas of description,

representation, of the sound.

You have the perceptual world.

You have the source: the real world origin of the sounds.

You have all these recording and this post-production and

format tools.

And, of course, ideally, you would like, out of the data

that you have, to be able to generate as much of all this

data possible to do some search, retrieval, processing,

generation, music making, et cetera.

So, given that, in synthesis, now we are in synthesis II.

So now we have all these techniques in sound synthesis,

sound processing, techniques in information

retrieval, what can we do?

And, of course, this is before all this work, but I used this

reference because I think it triggers quite a bit of the

issues that are now--

not technologically, but also in socially, legally--

are coming up because of these new technologies, because of

the potential that we currently have.

You might know John Oswald.

He has been, basically, making music out of existing music.

But not just samples.

Not just snippets of, pieces of sound, like in sampling,

like in musique concrete, like in granular synthesis, but

existing music material: cutting out, splicing it, and

making new pieces of music.

As you can imagine, one of the main roadblocks

was the legal aspect.

So he hasn't been able to release any

of his music, legally.

And it's on the web.

And you can find it.

And it's nice, I mean, if you like this type of music.

So this is based on Michael Jackson.


XAVIER SERRA: So, of course, this is very much at the core

of all the DJ'ing and DJ music-making.

And, nowadays, that's a very big part of our music

tradition and music-making in the world: the idea of

starting from existing material, manipulating it, and

making a new one.

So that has, of course, socially, has triggered a lot

of debate and interesting things.

And so, in the history of music, that has

never happened before.

People were able to borrow without any problem.

And it has been a long tradition of music built on

existing music.

Now, with all these record companies behind, it's very

difficult to do that.

But, again, creatively, that's what people are doing now.

And that's why, in fact, I think, that's having some

strong consequences on the creative world, on not being

able to promote certain kind of creative thinking and

creative process.

So now, in this field of computer music and sound,

digital music world, one of the inspiring ideas were from

mosaicing the image that have been done for some years now:

the idea of starting from an existing picture and filling

it with tiny pictures that match certain

parts of the image.

So you recreate the image by mosaicing, by a

puzzle-type of thing.

What is the meaning in music?

So there has been some work on that, and the idea is that,

while there's several approaches, you can do exactly

what we just saw.

So you have a piece of music, either a score or an audio, so

that would be set up on that side.

So you start from a symbolic score, or from an existing

piece of music, and you analyze it.

And that's your target.

And then you have pieces of sounds, or other pieces of

music, samples, whatever.

That's the source sound.

You're starting the database, and that can be as

large as you want.

And then, basically, what you do is matching.

You do a unit selection of this database based on the

target information, and then, out of

that, you put it together.

And, of course, as part of this, there is all these

transformations that are being used, more sophisticated than

in the previous times we had for granular

synthesis or sampling.

So here, we are now able to do more interesting things.

In this example, is not so much the transformations.

This is from a thesis from the media lab from

MIT by Tristan Jehan.

And it's a piece of music that started from an audio score,

and then, slowly, as you will hear, it keeps replacing

elements with others, automatically--



XAVIER SERRA: --and maintaining the tempo, the

beat, and some of the structure of this.

OK, so that's the idea of mosaicing.

Another type of concept in this synthesis II is, we can

call it, concatenative synthesis.

In fact, also in speech, that's a common technique.

And the term is also used like that: concatenative synthesis.

And the idea is, again, start from large databases of spoken

words, large databases of music material or of recorded

sound from existing instruments,

and then put it together.

But now, with these tools that we have of sound retrieval, of

spectral processing, we can do a much better work:

concatenating and transforming to make it smoother, all these


So, for example, this is an example that I will show later

of the singing voice that we have developed

that went into Vocaloid.

And the idea is that you start from two samples of voice--

and, of course, the voice is very clear--

that, just by splicing the two, it won't sound any good.

Or, even cross-fading, there is no way that we can get a

singing voice, a smooth transition, by just

concatenating like that.

I mean, speech is easier, still not ideal.

But, in speech, there's not so much pitch variation or timber

variation between the sound, so you can fake it by

concatenating in a very crude way.

In singing voice, that's clearly not the case, so you

need to worry about timber evolution.

I'm not going to explain this.

But, basically, you do all this spectral analyses in one

side, spectral analysis on the other side, feature selection,

feature analyses.

And then, you're trying to interpolate whatever

parameters are needed to make a smooth transition.

And also, of course, in singing voice, it's impossible

to record every possible nuance, so you have to

transform all this recorded material to make it

appropriate for whatever context you are.

And even just something that may not sound too important,

like the phase of the signal, has to be a match.

And that's not an obvious thing to worry about, that you

didn't splice, so that you smoothly make a transition.

Well, let me just first play you--

so, this is the singing voice, the Vocaloid.


XAVIER SERRA: Of course, it's not perfect.

But, definitely, it's much better than anything that

existed before, and it's smooth.

I mean, it's sort of whole.

Sometimes you hear some timber that does not sound that

natural, but it works.

So, what is the problem?

What is the issue?

Well, with this diagram, it explains, a

little bit, the problem.

The big circle, the A circle, is, basically, all sounds that

a given instrument, the voice, a person, can produce.

Of course, in a musical context, you don't use all the

sound that you can produce, you use a sub-set.

So that's the circle, B. You sort out the sounds that you

are interested in reproducing in a specific

context to make music.

But your recordings, the kind of things you can sample, it's

impossible to sample everything.

And, on top of that, we are talking about time-varying

material, things that evolves, so what you're going to record

is these wiggles, these

trajectories, within this space--

either it being melodies, either being phrases, either

being spoken phrases, whatever--

and that's what you have. That's what you're sampling.

That's your database.

That's the kind of things that you have access to.

And, of course, the idea of making music or creating some

new sounds out of that means to be able to draw any

trajectory within this space.

So, in a synthesizer context like in the Vocaloid, you

start from the score, a performance score.

Then you have some model of performance that you can add,

some vibrato, some off-setting of some of the timing, some

small transformations that the performer does to the existing

notated score.

And, of course, you look.

Under your database, there is some information about

performance aspects.

And then, out of that, you generate the trajectories that

you're interested in obtaining, the trajectories

within that space, like this one, that you want to be able

to generate, OK?

So you are proposing a trajectory

within that sonic space.

And then, the sound rendering.

And, of course, in here, what you need is search on this

database for the best tempos.

You need to do spectral processing to transform them.

You need to select which ones and how to

concatenate the sample.

So then you have the sound rendering and, therefore, you

have the sound output OK?

So the sound you heard was basically based on this idea.

This could be called performance sampling or some

kind of thing like that.

Before, I talked about granular synthesis for more

the art, tradition, and the contemporary

music time of tradition.

And then, the sampling became more the commercial thing.

Now, the mosaicing is more the artistic-type of tradition:

very free, without any boundaries.

And this type of concatenative synthesis is more for more

commercial type of application, so that you can

make music with this existing material.

OK, so that's, basically, in terms of

synthesis, retrieval, synthesis.

And then, a very important, needed tool for making all

that possible is to have access to databases.

And databases, especially for normal users, that they are

free and that they can access to, so we started this

Freesound project.

This Freesound project was started as part of the

International Computer Music Conference as we organized in

Barcelona a couple of years ago, and so that was the

excuse to promote this idea.

And, originally, the idea was to just put it online and ask

people to contribute with their sounds:

sounds that they made.

Sounds that they recorded, so that they own it, so that they

could be put under Creative Commons, and so that other

people could share.

And, in these two years, it has evolved very much, and it

has gone beyond our expectations.

And, basically, it has become a social network of some kind.

It is a network of freaks of sound much wider

than we ever expected.

We thought that we would be targeting the electroacoustic

music, the computer music-type of crazy guys, and the MIR,

music information retrieval, guy, so that they would need

some of that, and it definitely

has gone beyond that.

And there is many people downloading ring tones, that's

obvious, but also, from all kinds of applications, sound

recording people.

It has gone, also, to Hollywood, and there has been

some movies from Hollywood using them.

And so there is a lot of good feedback and a lot of usage in

different ways.

And it has gone beyond our small resources, so we are

trying to rethink, a little bit, the whole thing and make

it, really, a much more powerful tool.

So, basically, some of the numbers: it was created in

2005, in April.

Right now, there is 35,000 sound bites

of different lands.

We have some sort of filtering, manual filtering,

so that, the sounds that go in, they're not from some

proprietary, or some record, or something like that.

So we don't aim at music, we aim at sound material.

Some are small music excerpts that can be used to make

something out of that, or it can be used

for research purposes.

So, out of these 35,000, there's all kinds of things.

Basically, we could say there's two big communities:

one that they did not expect to grow that much, one is the

sound environment and soundscape-type people, people

that go around when they travel with a mic and with a

small, handheld recording system.

And they keep recording

everything, and that's amazing.

There is all these people and, because of Freesound, some

people have been gotten into that.

So, when they travel apart from their camera, they take

their sound recording.

They record things from all over and then they put it in

the Freesound.

The interesting thing is that there is 370-something

thousand registered users.

And the interesting thing is that the

growth of users is enormous.

It's really growing exponentially.

But the growth of sounds is linear.

In the last two years, it is steadily growing in this

straight line.

But the access, and the user, and the downloads of it is

growing exponentially.

In some countries, and Spain is one of them, people, very

much, use the internet to get stuff, but they're not so much

into the idea of participating, and blogging,

and doing collaborative work.

So, we have to change that somehow, and we are interested

in exploring some new things so that people--

there are so many people out there with sounds, and

especially musicians, that could contribute them, but

they are not contributing as much as we would have like.

So, there is 15,000 visitors per day, but most of them are

just simply getting sounds, using them, and having fun

navigating through the database, which is another

interesting thing.

People are not just there accessing the sounds and using

somewhere, but they are just having fun navigating, playing

around with them, even they just click

different play buttons.

And they have it all the time on so that they can have some

sonic environment constantly changing.

And so there is a stable, sort of, community of people.

They chat.

They do things.

And there is a lot of people requesting for a sound, and

people recording the sound and putting it back,

et cetera, et cetera.

So some of the main features is, again, the Creative

Commons license.

We started with Creative Commons Sampling Plus, and now

we are rethinking whether to open it a little bit more and

explore other licenses so that people are freer to choose

what license they want.

There is some social network features, like the forums, and

the chat, and the profiles of people, et cetera, but it

could be much more.

And people are really asking for much more, and that can

help build a better community.

The tagging, the folksonomy tagging that has worked very

well, so people put tags.

And you can visualize the tags and the weights in quite

different ways.

Since we are working on the sound retrieval stuff, and

search by similarity, and based on timber and rhythm,

and things like this, there is a functionality, which has

been disabled for a while, and now it's back in.

And we are promoting it again, which you can search, once you

have these large databases, the tags, and

especially the sounds.

Sometimes they don't mean that much, especially when you are

in sound post-production activities.

Some sounds, that can have the origin completely different,

might be appropriate for a given application.

So you can search, organize things by similarity of the

audio signal.

And that gives you some nice organization of the sound.

Geotags, that's a nice feature, and that's that.

So that, because of all these soundscape and people

recording things all over the world, has been

used quite a bit.

And there is quite a few thousand sounds, geotags.

And you can navigate through with the geomap, or the Google

Earth, through them.

And, of course, our interest is to really support these

communities, and especially this creative community and

the music research community.

And that has been going quite well.

In terms of research, there's many places that use these

sounds for setting up test benches and comparing, and we

want to promote that.

We need to add a few more APIs and things like this so that

people can take more advantage of searching certain sounds.

And the creative community, there has been so many people

coming out with records, and DJs doing things with these

sounds, and doing installations,

all kinds of things.

So that's going quite well.

So what do we want out of Freesound

related with all these?

Well, basically, I just mentioned some of that.

Well, this is just sort of a little joke.

Last year, they just opened a super-computing center in a

church in Barcelona.

It's too bad for the church because it was very nice.

But anyway, and then they offered us to use it.

But we still don't have enough sounds and enough things to be

able to take advantage of that.

So, we would like to be able to take advantage of it.

We have our small cluster that is sufficient for this

database, so, ideally, we would like to take

advantage of that.

In order to take advantage of that, we need to grow the

database quite a bit more.

So we definitely need to increase this linear.

We have to change this linear evolution into exponential

evolution, and we have some initiatives to promote that.

And the software, of course, we're thinking of a

small-scale type of thing.

The software is holding up pretty well.

And there is some great students improving it, but we

need to do to make sure that it can scale up.

We have been implementing, also mirroring and different

ways so that we have, now, two servers to

handle different things.

And so that's going all right, but we have to rethink some of

the core software that is behind that.

And, definitely, we need to add functionality to help

people access in different ways to APIs that, then, can

people use it.

And so, for example, some of the mosaicing work has been

very much using this Freesound for the more

concatenative synthesis.

This is sophisticated.

The Vocaloid-type thing requires very specialized

recording, so that's more difficult.

But, for this community that is into DJ'ing, mosaicing,

creative, that has been going quite well.

So people can download according to some criteria.

There are sample tags.

There is a whole bunch of tools to select a set of

sounds, put it into your instrument, into your setup,

and then making music out of that.

So anyway, so that's, basically,

what I wanted to tell.

And then, as a conclusion, well, I just

thought of some ideas.

Well, of course, Freesound, I believe it's becoming a great

tool for the research community and for the

creation, and the mix of the two, which I believe it's a

very fruitful combination.

And that, very much, relates to the second point that we

tend to see sound retrieval syntheses, the creative

community together with the research

community, quite separate.

In our field, in computer music, they have been quite

tight together from the very beginning.

And in these areas, it's very nice to see them maintain the

kind of things I have been talking.

It shows, very well, the need to exchange and to cross-feed

most of this work because, I believe, it's beneficial for

both types of activities.

And then, again, this idea--

and Freesound is an example--

that, by adding some interaction to many of these

sound-listening tools or sound-searching tools, you are

bridging the more passive listening approach to the

creative, active, music-making type of approach.

We, in the computer music, feel, basically, when we

started, we basically were targeting

professional musicians.

Those were the people that were using these tools.

Now, it has changed completely, and we are

targeting general users too.

And so we have tools that, basically, breach these users,

standard users, with musicians, with professionals.

And we are seeing all these applications, this potential

to have music-listening, sound-listening, sound

interaction into something very creative, very fun.

People are using--

they log-in to Freesound not just to access a specific

sound, to just have fun, to interact, and to do things.

Anyway, and just to finish, I have been involved, as part of

a European project, to write a road map of our field, of what

are the challenges for the future, what is the state of

the art, what he is the complex, industrial context

and the social and the research context.

So I believe it's a nice document to sort of get an

idea of what I just have been talking.

And some other things feed into the overall picture of

what we call the sound of music computing field, the

research in sound of music competing, and how the current

context, the current technologies are really

reshaping a lot of the goals and research directions that

are being established for all these years.

So anyway, so that's all I wanted to say.

So thank you, very much.

And maybe, if there are some questions, I

would be glad to--



Maybe you can take this, I guess?

Is that right?



Can we turn on the hand mic?


I think you have to--


MALE SPEAKER: Is that working?


AUDIENCE: All right.

I was just wondering if you had any Vocaloid samples in

English, just so we could have a better understanding of how

that sounds and understand the language.

XAVIER SERRA: The Vocaloid recording?


XAVIER SERRA: Ah, the Vocaloid.


XAVIER SERRA: If I can play more?




Japanese is better.

Let me just see if I--


Let me--

Oh, come on.

Let me see.



I don't know if I have recent ones, but let's

fly me to the moon.


XAVIER SERRA: So, here, you see the problems.

MALE SPEAKER: Thanks a lot.

Are there any other questions?

AUDIENCE: Yeah, here.

So, a while back, I was reading documentation on

something called peer data.

You may have heard of it.


AUDIENCE: Peer data.

XAVIER SERRA: Peer data, PD.


AUDIENCE: Kind of like Max/MSP.


AUDIENCE: And I came across someone talking about how he

didn't think that we should be samples any more at all.

We should be using synthesis to create all of our sounds,

like in video games, and movies, and stuff.

So, what do you think about that?

Do you think it's feasible?

Do you think it's worth pursuing?

Do you think it's even possible pursuing?

XAVIER SERRA: OK, basically, right now, there is two big

approaches to sound synthesis.

One, which is basically a very synthesis approach based on

physical modeling, of having models of the instruments, of

the objects that have been producing those sounds, and

this more sampling,

spectral-based, type of approach.

Sincerely, by far, I believe this approach has taken over

and has this poor synthesis, except on art music, very

experimental type of thing.

In terms of commercial products, they

have completely vanished.

I mean, nowadays, sampling really is taking over.

And if you add all these possibilities to transform

sound, I don't think there is a question that synthesis,

pure synthesis, it has a very tough luck.

It's really difficult.


MALE SPEAKER: We have to finish up, but

let's take one more.

AUDIENCE: As far as performance and interaction

with the synthesizer, that was the reacTable you

had shown us earlier?


AUDIENCE: I noticed that we didn't talk too much about the

sound generation used by that particular device.

It sounded subtractive.

It looked like you had a square wave, and a

filter, and an LFO.

XAVIER SERRA: Yeah, that was the example.

It was a very simple synthesis strategy.

In fact, with syntheses, the sound is a PD.

In fact, underneath, there is PD, and you can do anything

with this peer data stuff.

Or you can put anything you want, like Bjork.

What she does is put all her drum loops and her samples

from recording, and then manipulates the recordings

from the device.

So, this device is not particularly attached to any

synthesis as a strategy, in particular.

It can be used for anything.

So, in fact, we are working on DJ'ing-type of applications,

sampling-type of applications.

But the basic one is sound synthesis, and it has

oscillators and all the typical thing.

AUDIENCE: What's the sound source that you showed?

Is it an ESP instrument or a real synthesizer?

XAVIER SERRA: That was a real synthesizer, but it was PD.

It's this software synthesizer, and it had, just

for demonstration, oscillators, and very simple

filters, and things like that.

MALE SPEAKER: OK, thanks, again.

XAVIER SERRA: OK, thank you.


The Description of From Sound Synthesis to Sound Retrieval and Back