31
68
Spécial « Congrès Acoustics 2012 »
Un nouveau procédé d’optimisation de la distance géométrique dans un système de reconnaissance automatique de chants d’oiseaux
Testing with the Dawn Chorus
By November 2011 we began testing the system with
the dawn chorus. One of us (Boucher) lives in a semi-
rural location in Australia that was suitable for the testing.
The recordings were made with a tripod mounted LS-11
which was set to record for a few hours on successive
mornings.
The following species are known to be participants of
the chorus.
- Australian Crow (Corvus spp.)
- Pied Currawong (Strepera graculina)
- Eastern Whipbird (Psophodes olvaceus)
- Grey Shrike-thrush (Colluricincla harmonica)
- Guineafowl (Numida spp.)
- Kookaburra (Dacelo spp.)
- Lewin’s Honeyeater (Meliphaga lewinii)
- Magpie (Gymnorhina tibicen)
- Noisy Miner (Manorina melanocephala)
- Pale-headed Rosella (Platycercus adscitus)
- Pied Butcherbird (Cracticus nigrogularis)
- Spur-winged Plover (Vanellus spp.)
- Rainbow Lorikeet (Trichoglossus haematodus)
- Eastern Sedgefrog (Litoria fallax)
There were 14 reference templates consisting of a total
of 566 individual examples of calls (from various sour-
ces) loaded into the software. Some of the templates
were rather scant (e.g. the Eastern Sedgefrog was repre-
sented by only 3 examples of the call), but most had 25
or more.
In a random sample of the dawn chorus field recordings
totaling just over an hour (1 hr 8 minutes) the following
results were returned
Table 1 : Recognised calls from 1 hour 8 minutes of recording
Chants reconnaissables au bout d’1 heure 8 minutes
d’enregistrement
So we have a total of 41,991 recognised calls or 37,000
per hour. The Kookaburra (which can be heard in the origi-
nal recordings) alone was not in the references and so
was not found.
The very large number of calls in such a short time partly
reflects the fact that the recognisor looks for call segments
of about 0.05 seconds (it varies by call type) in dura-
tion. Typically the recognisor will find 3 call segments
per call.
For other dawn chorus recordings the recognisor typically
returned 10,000 to 40,000 identifications per hour with
better than 95% accuracy (as determined by a human
listener). That accuracy rate can be improved with more
and better references.
Soon after doing this test a CD arrived that contained 45
minutes of professionally recorded frog calls. A quick
check revealed that the Sedgefrog was included. From
the calls a reference file of 218 calls was made and
run. These were then compared to the original 3 and no
match was found between any of the two groups. A liste-
ning test confirmed that the calls did not match. A quick
study revealed that “Sedgefrog” is a generic name for a
number of different species of frog and even though both
recordings came from the same region they were not of
the same frog.
Just to see what would happen, the new reference file was
run against a 3 hour dawn chorus recording (that inclu-
ded the one above) and a total of 27 matches were found.
They were all false positives which is a false positive rate of
0.00014%. The three instances with the original file (Table
1 above) were confirmed to be a correct matching.
Performance
The software is suitable for processing terabytes of data
and can be set to run all of the files on a HDD. It essentially
searches the nominated HDD for all .WAV files and proces-
ses them sequentially. The approximate speed of proces-
sing on a 3.0 GHz processor is 50 times faster than real-
time per reference file. This typically translates to about 10
times faster than real-time for about 5 reference files.
A template could be a collection of files for one species,
but more generally it can be any collection of reference
files (perhaps of multiple species) for which there is a
common setting. The setting is the window through
which we view each target call. Thus if we look at Figure
1A and 1B (which are displays of the viewing window)
we notice that they span a different frequency band and
they have different noise floors. The frequency and noise
floor are two of the settings so these two species would
have unique settings. Other settings include target GD,
weighting vector, and frame size (the number of points
for matching on). Changing even one of these makes the
reference file unique.
The exact processing speed depends on the number of
signals on the recording (the software looks for signifi-
cant energy levels and interprets these as signals to be
processed) and it also depends on the settings used.
The speed of processing can be compromised by noise
such as wind and rain, both of which generate significant
energy levels. A good microphone windshield is thus
strongly recommended.
The accuracy depends primarily on the quality of the refe-
rence recordings, but is also dependent on the settings.
Correctly set up the accuracy should easily exceed 95% (a
figure which corresponds to human accuracy when asked
to recognise a group of words out of context).