Replies: 5 comments
-
Hi @ceperman , I formatted your table, as I couldn't easily read it. To be honest, as this is a comparison between BirdNET Analyzer and Merlin: the best place for feedback about your experience is https://github.com/kahst/BirdNET-Analyzer/discussions Regarding your last paragraph, Chirpity has two Nocmig models and BirdNET available for detection, but it does not use a BirdNET web service (there isn't one, to my knowledge*). When using the BirdNET option, it uses the same model as BirdNET Analyser (v2.4) ported to JavaScript. It can be run offline and should give identical results. If it doesn't you can raise a bug report - please share the audio file if you do. '* BirdNET is available here: https://birdnet.cornell.edu/api/, but this is indeed a very old version of BirdNET, and despite it having /api in the URL, I don't think the endpoints are documented. |
Beta Was this translation helpful? Give feedback.
-
Hi @Mattk70, in as much as you can see this as a comparison between BirdNET and Merlin, you're right that it probably belongs elsewhere, and I may do that. I thought it would be interesting for Chirpity/BirdNET users to understand that when it comes to AI identification, opinions differ even between AIs from the same stable. Perhaps more significant is the difference between BirdNET and what I could hear. I'm no expert but know a blackbird and pheasant when I hear one; BirdNET did pick them up but only with a low confidence (around 0.2). In order to catch these birds and others that were clearly present I would have to lower the Chirpity confident limit to 0.2, which would pick up many false positives and give me a lot more work eliminating them. Chirpity obviously does make this a lot easier to do, but it would operating at a much lower confidence level than would usually be recommended. At a more usual level of 0.7 I would have only picked up the Green Woodpecker and missed the other 8 species that were present. I'm not sure where this leaves us, other than pointing out that using a high confidence level means eliminating false positives but also missing false negatives. To repeat what I said earlier, my gut feel is that BirdNET is much better at identifying isolated birds than when all jumbled together, as in a dawn chorus. So in the latter case, use a lower confidence level. Re. what I said about Chirpity using the BirdNET web service, I know that BirdNET has a web interface and admittedly I was guessing that you used this somehow purely because Chirpity is getting different results from my desktop version. But as far as I can tell I'm also using v2.4 (BirdNET-Analyzer doesn't have a -version option but 2.4 is mentioned in the Readme.adoc file) so I don't know why I'm getting different results. I've included the files in question (mp3 files don't appear to be supported so I've zipped them). Dawn chorus: |
Beta Was this translation helpful? Give feedback.
-
Thanks @ceperman , I think it's generally accepted that all AI models struggle when presented with a busy soundscape of overlapping species' calls. The very best performing ensemble models, such as those that win BirdClef achieve < 70% accuracy even after deploying many bespoke tricks to optimise for the test sounds. I am not sure of BirdNET's v2.4 model "soundscape" accuracy but I suspect it will be closer to 50% (maybe @kahst can comment?) I looked at both files in Chirpity vs. BirdNET and can account for the different results. The main difference is that Chirpity does not report all the detections at a specific timecode, only the top one (You can see these if you look at the results for a single species - there will be a clickable grey circle indicating there are additional results) . The other difference you will notice is that the reported confidence differs very slightly. This is due slightly different floating point rounding errors when your files are resampled to fit BirdNET's expected 48K. One of your file's rates is 44.1KHz, the other an unusual 46KHz. Neither give nice decimals when you convert them (e.g. 48/44.1 = 1.0884353741...) Python's resampling algorithm differs from that in ffmpeg - so they round in slightly different ways. For practical purposes it does not make a difference, and you don't see any difference at all if you use a sample rate that divides nicely into 48. I've shown in the table below how that affects the results:
|
Beta Was this translation helpful? Give feedback.
-
@Mattk70 Thanks for the extremely detailed response and analysis. I've still a lot to learn about Chirpity! FYI the file sampling rates. The 44.1KHz file was created using a commercial mp3 recorder, and this rate is fairly typical for CD quality recording. The other was from my home-grown recorder, which creates WAV format at 24KHz, 16 bits, mono. Because they are smaller, I attached mp3 versions that I'd created some while ago by exporting from Audacity, using its default export values. When using Chirpity, I process the WAV files. Did you get to look at the Barn Owl file 03395116.mp3? I still see a significant difference in the owl detection between BirdNET used as a command (77%) and via Chirpity (48%). BTW You said "...resampled to fit BirdNET's expected 48K...". I'm not aware of this requirement. When I use BirdNET I don't do anything special with the input files. Can you explain? |
Beta Was this translation helpful? Give feedback.
-
I did look at the Barn Owl file. If I run the mp3 file in BirdNET Analyzer, it picks up the Barn Owl at 47% I can see from the BirdNET csv that you got 77% when analysing the original wav file. Is this a co-incidence, or did you compare predictions from the WAV in BNA to predictions from the mp3 in Chirpity? If you think the wav file shows a discrepancy, maybe share the wav file? Re resampling: BirdNET requires audio with a 48KHz sample rate. Both Chirpity and BirdNET Analyzer applications resample the audio internally to match that. As a side note, I misread the properties of the barn owl file you shared. 46Kbps is the bitrate, the sample rate is actually 24KHz. A file this heavily compressed has a lot of compression artefacts, and doesn't provide the full frequency range used by BirdNET (0-15KhZ) for its predictions. I suspect this is the reason the results from the WAV and the mp3 differ so much. An entirely different possibility is that you had applied audio filters in Chirpity and enabled "send filtered audio for analysis" in the settings. This will definitely result in differences. |
Beta Was this translation helpful? Give feedback.
-
I've build myself a sound recorder that I can deploy remotely to record birds over a period of days or weeks. It alternates between recording and sleeping (both periods configurable), creating files on an SD card which I can then analyse back home using BirdNET-Analyzer which I have locally installed on my PC ie. not the web version. I'm going to be relying on BN to do the identifications, and I'm looking at Chirpity to see how it can enhance the analysis process.
BirdNET-Analyzer appears to give credible results, that is until I compare them with Merlin (the phone app I use when I'm out and about) and sometimes the Mk1 ear. I'm interested to know how others feel about its accuracy and if they have their own comparisons.
To put the following examples in context, I live in Warwickshire, UK
First example
I've made some dawn chorus recordings over the years for a local website, and identified what I could by ear (this was before AI identification was around, or at least before I discovered it). Having found Merlin and BirdNET I analysed some of them for interest.
See (or hear!) https://www.oakleywood.org.uk/2020/05/dawn-chorus-2020/ (the second recording)
This recording was made with the equipment mentioned on the web page.
I played it with Merlin listening, and also ran it through BirdNET-Analyzer. Merlin largely agreed with what I could hear; BN at 0.7 confidence detected just the Green Woodpecker at 33 secs in. I'm not doubting this identification (confidence 0.9320), I was surprised at everything it missed. This is the comparison table I made:
Admittedly I was using a very low condifence threshold for BN (this was before I knew what sort of level I should typically be using, perhaps nothing less than 0.7).
Second example
This was made recently with my home-grown still-in-development recorder in a rural garden, making a 3 minute recording every 30 minutes. These are the BN results (xn = number of detections in the 3 min period):
Time slot Species
15:10 Blue Tit, Great Tit(x2) (Merlin: same)
15:39 Robin(x2) (Merlin: Blue Tit, Robin)
16:09 Robin (Merlin: same)
20:33 Tawny Owl(x2) (Merlin: nothing, although I could clearly hear it)
03:54 Barn Owl (Merlin: nothing, I couldn't hear it)
07:49 Robin(x21), Redwing, Dunnock, Long-tailed Tit (Merlin: Robin)
08:18 Wren, Robin(x3) (Merlin: Great Tit, Blue Tit, Robin, Chaffinch)
08:47 Robin(x6) (Merlin: Robin, Great Tit, Blue Tit, Great Spotted Woodpecker)
09:17 Dunnock, Robin (Merlin: Robin, Greenfinch, I only heard a Pheasant!)
09:46 Robin(x16), Pheasant (Merlin: Robin, Greenfinch, Blackbird)
10:16 Robin(x3)
10:45 Robin(x3), Great Tit(x8), Long-tailed Tit
11:15 Robin(x2)
11:44 Robin(x30), Blue Tit, Dunnock, Water Rail(!!*) (Merlin: Robin, Blue Tit, Dunnock)
Over-high gain (perhaps) in the recorder created crackle and disortion of close/loud sounds, which may have accounted for the unexpected Water Rail. Apart from this, the BN results are all quite credible.
However, when compared with Merlin identifications, it all looks a bit uncertain. Which do you believe? Both these products come from the same stable (Cornell Labs) but my understanding is that they use different AI implementations. I've used Merlin for some while now and come to have confidence in its identifications. It alerted me to the presence of Spotted Flycatchers in our local wood. Initially I didn't believe it, but it was persistent in a particular place where I eventually spotted (sic) them.
So, where does that leave me? I know that these products are not infallible, but the disagreement between them is disappointing. My feeling is that Merlin may be better at dealing with overlapping sounds, as in the dawn chorus; BN happier with descrete sounds. I obviously want to have confidence in BN because that's what I will use for my recordings.
One more thing. I think Chirpity uses the BirdNET web service. I use the desktop version and detects the Barn Owl (above) with 77% confidence level. I ran the recording through Chirpity and it detected nothing at 70% confidence level, but I dropped it to 40% and then it did detect it, at 48%. Why is there a difference? This is confusing!
Beta Was this translation helpful? Give feedback.
All reactions