AT68 - page 29

Spécial « Congrès Acoustics 2012 »

Un nouveau procédé d’optimisation de la distance géométrique dans un système de reconnaissance automatique de chants d’oiseaux

Running the two methods side by side, both returned the

same result (a distance of zero) for perfect matches, but

the GD proved much more effective and robust for the

identification from similar (as opposed to exact) matches

to the reference.. The GD is computationally more expen-

sive than the Euclidean Distance but the advantages were

soon obvious.

Calls can by analysed at a rate as high as 100,000 per

second per processor but more typically the rate is 2,000

-3,000 per second, when a large number of different

reference templates are used simultaneously. The rate

depends on the call settings and the number of reference

calls in a reference template, with the rate increasing for

larger templates. A template is a file that is a mathemati-

cal image representing a collection of WAV file examples

of the call. Essentially a template contains the informa-

tion to build the images in Figures 1 and 2, for a collec-

tion of WAV reference files.

The Geometric Distance Concept

The most common measure of similarity is the Euclidean

Distance and as the name implies it uses the linear distance

between two patterns as a measure of the difference. The

Geometric Distance is measured as the angle between

two vectors that are the result of transforms on the origi-

nal data. For our purposes the GD is measured in degrees

with 90 degrees being the distance between two totally

dissimilar images. Differences of 3 to 3.5 degrees in CD

quality sounds are found between different sounds (as

subjectively judged by a human listener). In real world

soundscapes sounds that are similar are typically within

a GD of 5-6 or less of each other.

Dimensionality

Most of the early work that we did used a 2-dimensional

image (see the example below of two different Australian

bird species in Figure 1: a Kookaburra call in Figure 1A

compared to a Pale-headed Rosella call in Figure 1B). It is

easy to see that the images are rather different and it is

these transformed images of the call that we compare.

To derive these images, the Linear Predictive Coding (LPC)

transform is used to calculate the frequency vs amplitude

spectrum of a frame typically of 2001 data points. This

was found to work well for most sounds, but it inherently

loses temporal information which is sometimes important.

The GD concept is N-dimensional (See Jinnai et al [7]) and

so it is possible to employ it on the more conventional

3-dimensional spectrogram (see the 3-D image below in

Figure 3), again calculated using the LPC. The 2-dimen-

sional GD process was found to be a lot faster than the

3-D and so we use both, choosing the 3-D only where it

is needed. It is also easier to visualise how the pattern

matching can be done using the 2-dimensional LPC than

with higher dimensional ones.

The Spectrum Transform

Initially the Fast Fourier Transform (FFT) was used as the

default transform and while some success was had with

it, we became concerned that the artifacts of the trans-

form were making the matching process less exact. By

running a lot of tests against the LPC using the same data,

we concluded that the LPC transform, though significantly

slower computationally than the FFT, was a more appro-

priate transform for our purposes.

Fig. 1A : The 2-D image of a Kookaburra call

Une image en 2-D d’un chant de Kookaburra

Fig. 1B : The 2-D image of a Rosella call

Une image en 2-D d’un chant de Rosella

Fig. 2 : The 3-D image of the same Kookaburra call as in Figure 1A

Une image en 3-D du même chant de Kookaburra que

pour la figure 1A

The LPC, first mooted in 1966 by S. Saito and F. Itakura of

NTT Japan, is widely used as a telecommunications speech

compression transform. Its use as a spectral transform

gives results that are largely consistent with the FFT but

with fewer artifacts. It can also be used to resolve small

signal fragments without the same loss of spectral resolu-

tion that is characteristic of the FFT. It is of course subject

to the limitations of the uncertainty principle and, as imple-

mented by us, does produce some artifacts.

SEO Version

Warning.

You are currently viewing the SEO version of !text.
It has a number of design and functionality limitations.

We recommend viewing the Flash version or the basic HTML version of this publication.

1...,19,20,21,22,23,24,25,26,27,28 30,31,32,33,34,35,36,37,38,39,...76