Dr. Grant Van Horn

Dr. Grant Van Horn is a computer scientist at UMass Amherst. He is one of the minds behind the computer vision and machine learning technology driving community science apps like Merlin, Seek, and iNaturalist. We had a great evening talking to Grant about the intersection between computer science and community science!

This event was held at the Hadley Public Library and on Zoom on November 12, 2025 from 6:00 – 7:00 PM.

You can view the event recording on YouTube here and read the transcript below.

TRANSCRIPT

THOMAS:
Welcome, everybody. Thanks for being here. Welcome to our Science Cafe.
My name is Thomas Nuhfer. I’m a PhD student in the Organismic and Evolutionary Biology program at UMass Amherst. I’m going to be MCing for us tonight. Just to give you a little bit of background about Science Cafes and Science Stories, Science Cafes began in 2011. These interviews are organized by graduate student researchers at UMass Amherst through the Science Stories graduate student organization. So during the fall and spring semester, we invite experts like Dr. Grant Van Horn here to come to public spaces, share a little bit about their research, answer questions from you all, and of course, we all get to eat as well.
We also have a website, scistories.org, which we’ll share a QR code, get some QR codes on the table, we’ll also share it. at the end of the presentation.
You can go there, you can subscribe to our mailing list to get notified about future Science Cafes. So we do these about once a month during the academic year. You can also subscribe to the That’s Life Science Blog, which are short, accessible science blog posts written by graduate student scientists at UMass Amherst. So that’s a good way, especially, to tide you over while we’re on break for these science cafes. You can go read our writing about various scientific topics.

And now I want to introduce our speaker here. Dr. Grant Van Horn is a computer scientist at UMass Amherst. He’s also a researcher with the Cornell Lab of Ornithology. He got his doctorate at Caltech and has since been involved in programming apps that you may know and use, like iNaturalist, Merlin, and Seek. He’s here, he’s going to talk to us today about machine learning, computer vision, and how those sorts of technologies and approaches can be leveraged to help us understand and connect with the natural world. Please join me in an exuberant loud welcome to Dr. Grant Van Horn. So the structure of this event, I’m going to ask Grant some questions.
We’ll chat a little bit. We’ll have a chance for you all to ask him questions, and so on until the end of the evening. So I want to start out, Grant, learning a little bit about you. What got you into coding or computer science?
Do you remember a first computer or a first coding project?

GRANT:
Yeah, I did my undergrad at UC San Diego. I went in as a chemical engineer. Took a few chemistry classes, didn’t like those. So I started shopping around.
And at that time, computer science was not impacted as it is now.
So it was easy to actually switch into a computer science major.
So I took a few classes of that, fell in love with sort of like the intro to CS at UCSD was a Java class. And so one of the projects was take a picture of yourself with a green screen and then replace all of the green pixels with something fun. And as soon as I was able to do that, like take my own picture and then modify it, I was like, wow, this is awesome. And that was it. So I switched over to computer science and then pursued stuff there.

THOMAS:
Great. Can I ask what you didn’t like about chemistry?

GRANT:
I probably just didn’t give it enough time. It’s probably the real thing. I wanted that quick progress. Yeah, it felt a bit like the high school classes that I had just gotten done with. So there wasn’t like kind of this new feeling that computer science had.

THOMAS:
I did the same thing with ancient Greek in college and I still can’t speak ancient Greek.
So you sort of fell in love with JavaScript, making Zoom backgrounds, maybe pre-Zoom. How did that then, that course that you took in undergrad, sort of set the trajectory for you into your current career and the work you’re doing now?

GRANT:
Yeah, so I… The professor for that was Beth Simon, and I really wanted to work with her as like an undergrad researcher. So I knocked on her door.
She didn’t have any like open spots in her lab, and so she sent me down the hall to this guy, Serge Belongie, and he was like, or she was like, he’s really cool. He’s in a band, and he works on this new project with birds.
And I was like, well, okay. Like at that time, I wasn’t really into birds.
But I was an eager undergrad researcher, or wanted to be an undergrad researcher. So yeah, so she introduced me to Serge, and Serge was just, him and his group were, they were computer vision researchers, and they were just starting to study how do you get computers to recognize bird species.

THOMAS:
Are you a birder now?

GRANT:
I am now like a full-blown birder.

THOMAS:
So you really came into natural history through that computer science introduction. They weren’t at first parallel tracks.

GRANT:
Yeah, not at all.

THOMAS:
Wow. That anecdote makes me feel better about turning down undergraduate research assistants, actually, because I may be setting them on some path towards greatness.
So some of these projects that you’re best, these computer vision projects that you’re best known for are some of these smartphone apps and websites that help people to identify what they see in nature, right, or what they hear.
Where did you get the idea to build those tools? Were you just looking for applications of the technology? Were you responding to some particular need that you had heard or seen expressed? Or was this coming from your advisor?

GRANT:
It was all, I’d say a lot of it was opportunistic. So when I first met Serge, this was 2010, and the iPad also came out in 2010. And so it was kind of this like fun new technology. And I wasn’t skilled enough to contribute to the computer vision, like the core computer vision research at that time, but I could make a demo. And so Serge was always into kind of the latest and greatest gadget. So he immediately bought one of these iPads and he asked me to make a demo of some of his graduate students’ work run on the iPad, then that kind of like got me really into how do you transfer some of this technology into a really accessible format.

THOMAS:
Yeah, and I mean, so it’s a very different way of identifying a species than I was trained in, which was to take a dichotomous key and, you know, go out and go through this sort of decision tree identifying features.
So that identification ability, what was the sort of impetus behind that? Had you thought very much about species identification or using these sort of technologies that way?

GRANT:
Yeah, so again, at all this time, really a lot of the credit goes to Serge and some of the other kind of PhD students in the lab at the time.
But the feeling was human face recognition and fingerprint recognition were kind of like very in topics at the time, along with general object categories, like water bottle versus chair versus human. Those were kind of like the big problems. And actually, UMass was doing a lot of the face recognition work.
So his – Serge’s students, graduate students, were just kind of trying to look at other problem spaces, like where else could they be working where they’re not necessarily going to get scooped by folks at UMass or Berkeley or wherever. And at the time, if you went to photo sharing websites like Flickr, you would find like landscape pictures, portrait pictures, but inevitably like one or two bird photos was like kind of always in their top 20 photos of the day. And that got us thinking and then thinking that like birds might be this interesting data set that we could grab data from the internet readily and study whether we could get computers to do recognition of different bird species. Because people love taking pictures of birds. We really like taking pictures of birds.

THOMAS:
Yeah, right. Well, birds are beautiful.
So when you mentioned just now face recognition, fingerprint recognition, and then these, I mentioned these apps, which I have on my phone, probably some of you have on your phone, it makes me think that maybe there’s a lot of ways, even when you unlock your cell phone, that computer vision is present in our day-to-day lives these days. So I wonder, though, with this phrase computer vision, does a computer actually see in anything like the way that we see? Or what is computer vision? Can you describe that to me?

GRANT:
Yeah, so yeah, quick answer is no. The computers aren’t looking at photos in the same way that we’re experiencing perception. So we store photos right in a computer format. At the end of the day, these end up being encoded as just ones and zeros. And so the computer sees those numbers. It gets to look at basically a matrix of the ones and zeros. And it’s looking for patterns in those ones and zeros, right? Your face shows up as a pattern of ones and zeros.
Birds show up as a pattern of ones and zeros. So it’s really like, you know, for a long time I was like pretty against the term AI because it’s so, it’s such a grandiose. term. And then even machine learning is a bit grandiose.
I feel like really a lot of the work that I do and my fellow researchers do is pattern recognition. Like we’re just teaching the machines to find patterns in data. And yeah, images get represented in a certain format and a lot of our work is just how do you find patterns, reliable patterns in that data.

THOMAS:
Yeah, I’m going to come back to that phrase to use machine learning in a little bit. But I think that point about pattern recognition, you know, I talked about these sort of traditional ways of identifying species that I learned, but
I think like naturalists in the audience know that you spend enough time doing that, you kind of put the dichotomous key away and you’re able to get just a flash looking at something, even if not maybe the individual species.
Maybe you can identify the family or the genus of something even that you’ve never seen before. So that’s a sort of pattern recognition, you know, too, from repeated exposure to data. So maybe not similar vision, but similar learning process.

GRANT:
Sure.

THOMAS:
Yeah. I want to actually ask you in terms of vision, you’re talking about images getting translated into data. And some of the other projects that you’ve worked on are not image-based.
So I use Merlin to do… Who in the audience uses Merlin?

Wow. What about iNaturalist? I’m an iNaturalist user. Great. Wow.

Okay. So we have a bunch of users in the crowd. So I use Merlin because I’m terrible at like… pattern recognition for birdsong, so I need the help. So that’s sound, right, that’s being translated into an identification. And I think that the Seek app uses video, if I’m remembering correctly.

GRANT:
We process video, but we actually just process it frame by frame.

THOMAS:
Okay, right.

GRANT:
We treat it as still images. The machine, it just… all it perceives is still images.

THOMAS:
So what about sound then?

GRANT:
Yeah, so sound is interesting. There’s, so again, you can represent data in various formats. The way that Merlin sees, or like processes sound, is actually through images. So we convert audio into spectrograms, so that thing that scrolls across the top of the app, you know, as you’re using it. Merlin doesn’t see that exact view of the audio, but it sees a very similar view of it, and it’s again, it’s looking for those squiggly lines we’re looking for those patterns, and when it detects the cardinal chip note or its song, it will find that pattern and show the user, Hey, I just heard a cardinal, but hearing is a strong word because it’s all visual processing.

THOMAS:
Wow, so you have this sort of real background in the… the programming side of things. How much did you have to learn about birds? Because it would seem to me that there must be even individual to individual variation in that spectrogram or…

GRANT:
Yeah, well, I think that that’s kind of like been the real success of these apps is like none of this was ever a solo effort. This was always like working with the lab of ornithology and all the experts that they have in-house or working with the folks at iNaturalist and their broad citizen science community.
So I don’t have that expertise, but we can build teams that have both the machine learning expertise and the kind of the domain expertise.
And that’s really what it takes, I feel like, to build a performant machine learning system these days, right? So companies like Waymo with their self-driving cars, right, a lot of that is like they’re driving those cars around to collect data, which is then being annotated by professional annotators, whether it’s to find pedestrians or cyclists or other cars. OpenAI is annotating a ton of text data to make these really nice conversations that they can train their machines on. We do the exact same thing at iNaturalist and the Lab of Ornithology where either the community is submitting data to us or we’re going out and collecting data specifically to get the examples of all that diversity that we want and then teach the machine how to find, or at least show the machine those patterns and kind of hope it can find them reliably.

THOMAS:
I feel like I get a lot of those CAPTCHA things that must be for Waymo, like identifying which squares have somebody on the bicycle, but I’ve never had one yet ask me to identify a white-throated sparrow or something, thankfully. I’d be locked out of my account forever. So I imagine you must be spending a lot of time to build these models. You must be spending a lot of time kind of immersed in the training data, which I want to ask you about in a minute.
But first I’m wondering, does that change the way that you hear birdsong?
Like, do you find yourself thinking in these sort of sprectrograms or interpreting that as data differently than you maybe would have prior to working on these projects?

GRANT:
Oh, funny, yeah. I mean, I definitely, if I, you know, like a brown thrasher or something, if I hear something that’s just kind of like all over the spectrum, I’m definitely like, oh, I have to get my phone out and just see the spectrogram because I bet this thing is just going to be beautiful.
You know, or if you get like, yeah, like white throated sparrows with like those constant tones, you’re just like, oh, I need to like see this because I bet this is going to be really nice. I also just use Merlin a lot to test these models because I’m just constantly like, oh, is it going to get that distant note? Is it going to get this jumble? I’ve got titmice and chickadees kind of mixed in. Is it going to pick them up? Will it pick up that creeper behind me?
So it’s yeah, it’s kind of now it’s just like I know my birds well enough where I can kind of gauge if the system is performing how I’d want it to perform.
Yeah, so it’s changed both in the way like I experience birdsong, as well as even just how I’m walking outside. Now it’s often everything else is kind of like, car tires are so noisy and the dogs and the lawn mowers are so annoying. I just want to be able to hear my birds. Yeah, it’s been like, you know, none of this, like, you know, I was never this way as a kid. Like birds were just another background noise.

THOMAS:
Right. I think that’s a real strength of iNaturalist as well, to help people really start to see plants. You know, I think that’s, that seems like one of the most common uses that I encounter is for plant identification. And I remember, when I started training as a plant ecologist, that there were maybe some
digital tools to identify plants, and they really seemed not good.
You would take a picture, a really great picture of something, and it would come back and say, Maybe it’s a flowering plant. It’s like, Oh, that’s really helpful, and I’d steer people away from them, and then something changed.
It seems like a few years ago, and now… now, field ecologists are often using these tools to identify plants in the field, and they just have an incredible, I mean, I’ve really been amazed by how much things seem to have progressed in the last 10 years. Do you know what changed? Did something specific happen, or was it just more of a continuing progress, accumulation of data?

GRANT:
Yeah, I think it was sort of like a combination of factors. I don’t think you can point to any one thing. And this is why we’re seeing machine learning kind of technology in many aspects of our lives these days, is the field settled on a particular design for machine learning models that it just works well across many problems. So it works well on audio because you can convert audio in a different format, works well on photos, it works well on text. Like we just, the field, yeah, like with enough trial and error, we got to a point where the models themselves are quite good. Then we had, the internet had been around for, now over 30 years, and enough data has accumulated, mainly from these, citizen science type applications, whether it’s eBird or iNaturalist or various other smaller groups. but they’ve collected enough data to kind of feed these machines in their particular format that they wanted. And then the third thing that came together was hardware. So we can run these models on the phones that are in your pocket. We can train them on machines that it’s not just Google that can afford them, but universities and smaller groups can also afford them. So it’s kind of like those three pieces of the kind of the science and research, the data and the hardware are kind of all coming together to cover this stuff.

THOMAS:
Right, that makes sense. Yeah, I wonder even if phone cameras are progressing. I mean, I think for moving things. I still have a cheap, crummy Android, so I can never get good bird pictures. But I want to pause here. I have some more questions to ask, but I want to turn it over to the audience, who I’m sure has questions for you. Does anybody in the audience have a question that they’d like to ask? And please wait for a microphone to be brought to you.

AUDIENCE:
My name is Paul Peeley, and I’m just curious about the history of its development because, for example, in police detective work, we’ve been using fingerprints for a really long time. And sketch artists to try to figure out, you know, describe that, you know, assault person and things like that. And especially like looking through fingerprints, you know, a century ago, what technology did we use back then and obviously it’s changing very quickly now, but have you looked at that history to see how you can develop it more efficiently?

GRANT:
Yeah, no, so yeah, the fingerprint stuff is all super cool. So that’s another situation where the data had come together much earlier than some other domains. So reading checks and reading fingerprints was like some of the first pieces that the research community was able to crack because
the federal government, wanted those problems solved, or the big banking industry wanted things solved. So a lot of the techniques actually that we’ve developed for those things kind of helped carry the field along. Since done away with a lot of that because we’ve got so much amount of data and the data’s become so complicated. So fingerprints and kind of handwritten digits on checks are, they can be complicated, they can be messy.
But it’s just not the same thing as like driving at night time or dawn forest with a bunch of bird species. So we’ve had to also evolve some of our techniques to do the modeling process. But no, a lot of, yeah, like those kind of fundamental problems that humans were able to do, and they just wanted to do them a lot more efficiently, were a lot of the inspiration for the current status of machine learning.

AUDIENCE:
Hi, Grant. Thanks so much for coming and talking, this is awesome. I just got an email in my inbox from AOS, and one of the little headlines was like, AOS is the American Ornithological Society, it was like, humans still outperforming AI on bird recognition calls. And I guess I’m just interested in this conflict between people who have like trained for years to conduct these, you know, remote point counts or volunteers who’ve done breeding bird surveys or, you know, been responsible for curating some long-term project that relies upon bird recognition by ear and how you see people navigating all of a sudden this tool that’s so widely available that I imagine maybe in extreme cases could replace a job, but also could just shape how we collect data and really change the type of data and the quality potentially and the consistency. So yeah, just curious how you’ve interacted with that conflict. Man, that’s such a good question, and so many pieces.

GRANT:
So starting with the humans better than the machine, definitely with regional knowledge, absolutely. I mean, yeah, that’s why we still have experts annotating all of our data to teach this machine. I mean, there’s amazing birders here in Amherst, and I’m a good friend of Marshall Iliff in the Boston area. If you guys know him, he’s, you know, an incredible birder by ear. And I would never say that Merlin is like sort of quote unquote better. But the point of Merlin was never meant to be kind of like better than Marshall Iliff. It was meant to just kind of like open up the world to, you know, the world of birds to as many people as possible.
One of the big power things of Merlin in the iNaturalist is an even better example is the breadth of knowledge that these machines can have, right? So while I would never say that Merlin’s better than Marshall, I live at the birds of Massachusetts. Merlin is definitely better than Marshall for birds around the world, right? Merlin now knows about 3,000 species where you can identify those relatively reliably. You know, and Marshall is,
for as much of a superhuman as he is, he’s not kind of everywhere all the time learning about the bird song. So yeah, I think that’s the true power of the machine learning stuff is it’s not about necessarily being better than a human at like one particular thing. Like there’s always going to be an amazing human that can do that. But it’s the breadth of knowledge these things can bring together in our ability to kind of train them once and then ship them around the world to everyone’s cell phone.
We’ve been super cautious about releasing Merlin in a kind of like research mode because we are very aware of like the impact that like this could have on bird surveys. You know it just changes it just changes a lot of the statistics that people are collecting. So we’ve been super cautious to like not necessarily advertise this as something that we really want people to swap in you know Merlin for doing doing point counts.
However, I do think we’re getting to the point where that’s actually pretty viable, a viable path forward. I think the model’s quite good, and certainly in the United States and a few other parts of the world. And there, I don’t ever imagine it as like a replacement for the human researcher. I imagine it more as like a superpower. You know, like I want to help them just get through more data, you know, conduct, you know, even more detailed analyses of their study site.
So I’m hoping to be kind of more of a human in the loop or a machine in the loop type system where we kind of alleviate a lot of their burden of going back to these data backlogs. Yeah, so absolutely, AI, there will always be a human that will, I think, confidently say that they’re better than some AI system. And I think it’s not, I think it’s our opportunity to help turbocharge those particular people to process, get through more of their work. So, yeah, does that answer all the questions?

AUDIENCE:
Yeah, So, I’ve got a little anecdote. So, I live on a dirt road in the middle of a beech, hemlock, oak forest. And the people across the road have chickens. And when I put Merlin out and the rooster crows, it keeps telling me it’s a ringneck pleasant though. And most of us are probably aware of the modern variation in the species calls young birds, regional variants, and so forth.
And you partially answered this, but I’ll ask it again. How much room for improvement is left in Merlin, particularly if you start using LLMs, you know, to get at some of the really deeper amount of variation and so on.

GRANT:
Yeah, that’s a great question. So, and I’m sure we all have anecdotes of like, I mean, even today I’ve got false positives on Merlin. So yeah, the system’s got plenty of room for improvements. And I think that the thing I always like to tell an audience like this is like, one of the easiest ways that like we can make Merlin better is by submitting by continuing to collect and submit that data. I just gave a talk to Mass Audubon folks, and they were like, wait, what, you want more data? Like, I thought Massachusetts was covered. And it’s like, no, we absolutely need more examples of all these different call types. Yeah, because it’s just not super well represented in the Macaulay Library right now at Cornell. One of the next kind of exciting steps from Maryland that we’re taking right now is doing call versus song doing vocalization types, so we’ll be able to, we’re starting with some of the common eastern backyard birds, but…
Instead of just telling you that it’s a cardinal, we’ll be able to tell you like, hey, that’s a cardinal chip note going on in the background there along with that chickadee song. So we’ll be able to get more specific. And actually, this kind of goes back to the previous question of kind of helping the researchers who are doing these surveys of areas. We’ll be able to tell them like, hey, it’s not just that we detected a cardinal, but maybe we detected a cardinal doing a breeding song so that we know that we’ve got breeding cardinals in this location. So that’s definitely one direction is just to make Merlin more aware of the specific vocalization types. And then we’re also broadening it into different taxonomic groups. So again, starting in sort of Eastern backyard stuff, but we’re getting mammals and amphibians in the model as well.
So that we have probably folks have their own stories of this, but you know, it’s not uncommon to see folks like trying to identify a chipmunk as a bird.
And so trying to get to the point where we can tell like, no, that’s a chipmunk, not a bird.

THOMAS:
Okay, we have time for two more short questions before we get back into this.
So I think there’s two up front. Okay, so yeah, up front here, and then maybe we’ll have another chance for questions later on as well.

AUDIENCE:
Okay, so I’m a really pretty frequent user of iNaturalist. I also study insects.
So something that I’ve noticed a lot using iNaturalist is kind of the idea of frequency bias, of you have these insects that are very well documented, and then you have issues where if someone posts an image of an insect that is less well documented, it thinks that it is the more common insect. And so I guess I was curious. about whether there’s a way to deal with that using the computer science, or if the only way to deal with that is just by uploading more and more data and having better reviews of that data.

GRANT:
So the second solution is definitely a solution, and that will just happen kind of naturally. If iNat continues to catch these identification mistakes on the part of the computer vision system,
And then your first solution is like, that’s kind of like where all the research happens. And so that’s where like the folks at UMass and other places, we’re thinking through like, how can we get squeeze out more performance, deal with some of these, we call these like these long tail biases where these species with lots of data end up kind of like hogging the inference predictions when the users are actually using the system.
And so that’s what we work on kind of like on a daily basis is just how do we improve these things. There’s like, it’s still an open problem. It’s not, it’s really hard. I think like one of the lessons we’ve learned recently is like data, like just having more data is such a good way to solve problems. Everything else is just like hard to even match close to what, you know, just more data will get you. But that’s what kind of keeps it fun too, is kind of those open challenges.
But yes, those problems that you pointed out are definitely real.
And yeah, and that’s why we still rely on the community to help kind of
clean those up and improve the quality of the data set.

THOMAS:
Can we get the question from Zoom?

ZOOM:
So the Zoom question, it’s more of a comment and someone, Tom had said, one of the fantastic things about the app is that for some of us who cannot hear the scratchy, scratchy, high-pitched bird noises up in the top of the forest crown, and then they light up on your phone, and they end up getting species that they never knew were there. And then another, the question along with that is, will you also be including insects eventually, insect calls?

GRANT:
Yeah, so, yeah, we love, like, we love being that, like, those comments on folks who are either hard of hearing or, you know, have aged to a point where they’ve lost some of their hearing. Yeah, it’s like, and we’ve, there’s actually kind of a cool deaf community that uses the app in that exact way as they go on their walks. Yeah, insects are really tough. I’d like to say that they’re on our radar, but I would say that would not be true. They’re just not even on our…Not because we can’t… People, we can collect data, but getting it labeled, knowing what it is, is really hard. And we can do things like…
There’s a cicada. But it’s not quite the same as like getting kind of species information. And so we’ve been pretty hesitant to just be like, yeah, that’s lots of cicadas going on back there. We can’t then get them onto that species page that has the cool pictures that kind of becomes that gateway to more information. So yeah, insects are just undeniably difficult. Don’t have a current plan for them right now with Merlin. But yeah, I mean, obviously at some point we’d like to get to it.

THOMAS:
Cool. Yeah, maybe not in the computer vision realm, but I know there are some folks at UMass doing audio detection for insects.
So maybe we’ll have to bring some of those out, some of those researchers out for a future cafe. We’ll have a chance for more audience questions, but I have a few more questions for you. So we keep tossing around all these names of apps. And I saw a lot of hands in the audience, so maybe you all roll the hat with the differences between each of these programs. But we’ve talked about Merlin, we’ve talked about eBird, we’ve talked about Seek, and we’ve talked about iNat. You’ve worked on all of these. And so can you give us a real quick breakdown of what each of these do and how are they different from each other? Should I have them all on my phone?

GRANT:
I mean, I do. So, no, I mean, I guess so. I guess, I definitely use it in different ways. So, the general timeline is, sort of like eBird gets built or gets kind of envisioned in 2005, and that’s also the same year, about approximately the same time that iNaturalist is envisioned as a master’s project at Berkeley. That was Scott Lurie and Kenichi that were doing that at Cal. So those projects have been around for a long time. Obviously, eBird catering specifically to birders, this is where you go outside and do what we call a checklist. So you add all the birds that you’re hearing or seeing to your checklist. And by virtue of that, we also know what birds are absent, which birds you didn’t observe. So this is a complete checklist. We know everything that was encountered by you and not encountered by you. And we get your time, the duration of the checklist, location, and all this other metadata. That’s just been an incredible tool for studying bird populations. iNaturalist is sort of this broad identification, and really iNaturalist itself is more of a community. I feel like that’s the real true sense of iNaturalist. It has these really nice identification tools, but kind of the best parts of iNaturalist are their amazing identifier community. And they’re there to help you identify everything, kind of everything under the tree of life. Merlin came about, we started great starting that project in about 2011. I think it was released to the public in 2013.
I released photo ID for that in 2017, and then sound ID came out in 2021.
Seek is technically the most recent app, and an early version of that was released before I actually worked with the iNaturalist guys, maybe in 2016 time frame, but then the real-time version that you guys experience now was released in, I think, 2019. That app was, as a vision researcher, we were working with iNaturalist and we were like, this is really cool, but it’s like,
kind of like these steps that have to, like, you take your picture, you identify it on iNat, you upload it, you do all these things. And I was like, really what I want to do is I want to open up my camera and just like scan it around and just identify the stuff around me. And so that kind of desire is what gave rise to Seek. And now we’ve since moved a lot of, I don’t know if you guys have seen the newest version of iNaturalist, but the new version actually has the Seek, like real-time ID built into it. So, that’s kind of the future of iNaturalist. This is actually a merger, kind of a merger of the two.

THOMAS:
And while I’m asking you to define and differentiate some things, computer vision, machine learning, are those the same thing?

GRANT:
For a bit there, they were very different, but now they, like, I’d say computer vision is like a small subfield underneath machine learning.

THOMAS:
And there’s AI in this sort of umbrella.

GRANT:
AI is just, yeah. And I still feel like AI is more of a marketing term. So yeah, yes, but yeah, I think that’s the way people view it.

THOMAS:
Great. So I’m going to sort of shift us into talking about this community science aspect. We sort of made this natural transition. And we had some great questions about other taxa, you know, using merlin for mammals, amphibians, insects, maybe. And I’m wondering if there’s something special about birds, either biologically or in terms of the community of birders that exist, that make birds a good place to start, that make them a natural. I mean, you mentioned there’s a lot of bird pictures. There were a lot of bird pictures on Flicker, maybe not as many bug pictures.

GRANT:
Yeah, no, I mean, that birding community, I mean, it’s just, you know, there’s just a long, there’s a legacy that, you know, Wave precedes any of these apps, right? You know, both in, in Europe and in the United States and elsewhere.
And so, yeah, so I feel like we’re benefiting from the hard work of multiple centuries of people kind of having that interest and passion to look at birds and to study them.
Yeah, and they’re, you know, they’re, yeah, so they’re, you know, these are really cool things that you can hear and see and you can identify in both of those modalities quite readily. And they migrate, they’re just birds are these, they’re kind of… There’s this iconic group of taxa that capture people’s imaginations. They’re beautiful. They’re absolutely stunning when you get to photograph them. So yeah, they lend themselves well, I think, to our human desire to categorize things and look at pretty things. Insects, unfortunately, have some of this ew and gross factor that we need to overcome.
Mammals are, they’re gorgeous as well, but they’re a little bit more difficult to observe sometimes. People can attract birds to their backyard with feeders.

THOMAS:
Yeah, so you can attract bugs to your backyard as well.

GRANT:
That’s true.

THOMAS:
I’ve been trying, really trying to get the bugging community to take off like birding has, but maybe these apps are the path to that.
Something I think is really, really cool about, I use iNat most frequently, and something I think is really cool about it is that there’s this sort of function that it serves of engaging people with the natural world, helping people recognize species in their community, maybe doing educational events or things like that. So that’s just sort of 1 function. And then as a researcher,
I use data from iNaturalist to study patterns of plant movement in response to climate change. I use that. It’s really, really helpful to have all this distributional data collected by people all around the world.
So I’m kind of benefiting from their engagement, their learning, their relationship. So I wonder if you can share any stories about ways you’ve heard. I love that anecdote you shared about deaf communities, you know, using the Merlin app to engage with birdsong. Are there any stories you can share, either about how people are using these apps to sort of build community, learning about nature, or in terms of how scientists are using the data for cool applications?

GRANT
Yeah, so I don’t know if folks are aware, but on iNat there’s this, or there’s a concept of a project, right? Anyone can start a project, and it is so fun to just go look at the different projects you will create, right? So there’s, there’ll be like the Massachusetts grasshopper society. There’ll be the California, bird group, or like, and then it’ll be like regional, it’ll be like the Amherst, you know, blah, blah, blah. And like, it’s, you can just go and browse these things, and it’s just really cool to see what people are into, because everyone’s got like their own kind of thing that they’re really into, or you know, maybe they like migrate from like project to project as their interests change. But yeah, I just love that iNaturalist in like a pretty like, Projects was like a very easy thing to build, and it was like we haven’t really touched that in a long time. But it’s just like kind of evolved into this really powerful concept. You can, when you make a project, you can put a little geo fence around an area and be like, you know, all observations that are done in this little geo area want to be part of that project. Or you could be like, all observations, but they all have to be an insect, or they need to be a mammal, or they need to be a bird. Yeah, so they just take these really powerful tools to form those communities, and then, you know, everyone can join. so I really, I just really enjoy seeing that type of passion, because it’s from a computer science perspective, it’s pretty rare to see that kind of passion not associated with monetary things, right? It’s like a lot of, at least when I was in undergrad, masters, grad school, you wanted to go work at Google or Facebook, right? And it was like about doing ads.
Like that’s what they were going to, you’re going to get hired by one of those folks and you’re going to like work on how do you serve ads. And I think the thing that’s always attracted me to like the Merlin Project or like eBird is just the crazy passion that people have that’s like not associated with sort of, yeah, with monetary means. And then on this, so on the science side, I think like you were saying, Both iNat and eBird make their data available in slightly different ways. So iNat has just made it insanely easy.
And it’s really cool to see folks just like kind of take stuff and run with it.
So for a long time I was following this project where people were studying the molt patterns of mountain goats. from the photos that people would upload on iNaturalist. So these researchers were, like, any photo of a mountain goat that got uploaded, they would download, and then they’d be able to kind of track when mountain goats molt based off of how much more things done in these photos. Yeah, flowering changes based off of over time and link to climate change. Yeah, no, so I think that, and kind of like sky’s the limit on what scientists, what questions people want to ask, right?
We have no idea. And yeah, it’s just, in some sense, we’re just trying to make the data available in a format that lets them ask whatever they want.

THOMAS:
Yeah, I know. I work with invasive species also, and so early detection and sort of tracking improvement of new invasive species is something that it’s really great to get community eyes and ears out there for. And I love being able to direct people, you know, to these tools for some of that, you know, kind of reporting as well as, of course, like calling things into natural resources agencies and things like that. I think that point that you made about the sort of, maybe not being sort of monetarily driven, is an interesting one. As an ecologist, I feel like I run into it in my work a lot of times where it’s like, I really need a computer scientist to do this. We really need people building applications to sort of, we can produce all this research data maybe, but what we keep running into is people don’t want to download enormous CSVs and look at them in Excel for some reason. And so we really have this great need for interactive, accessible digital tools. But it’s hard. Ecology doesn’t pay as well, I think, probably as working for, if you have computer science skills as working for Meta might.

GRANT:
Yeah, no, it’s not. But that’s where I feel like the universities have a really nice role to play. I mean, I think there’s grad students that are hungry to publish papers. And I think, at least for my students, what I try to identify is the sweet spot where the an ecologist can meet us halfway on providing their expertise, both in the form of data and annotations and analysis. Because we need feedback on whether we did, like, the model’s working. Is anything going in the right direction or is it all going, is it going wrong? Yeah, the tough part is like when an ecologist, I mean, there’s like, like you said, there’s lots, there’s lots of people with challenging problems, but it’s like really hard when someone comes to us with like, kind of like nothing’s been organized and it’s like, you kind of got to meet us halfway, and then we can make a ton of progress. And that was actually kind of one of the nice things with when we first started working with iNaturalist. We were like, wait, you guys have like a bunch of images and they’ve been identified by a community of people, the species? Like, wow, that’s a huge head start to like training a computer vision system. It’s like, because most people would come to us with like, hey, we have a million images, but we don’t know what they are. And it was like, well, we can’t really do much with that. Yeah, so anyway, that’s where I think.
Naturalist and Merlin and Ebird were just these really awesome opportunities where like the data was like kind of ready or it was obviously coming in and it just married nicely with some of the computer science stuff.

THOMAS:
What are some of the current challenges that you’re facing with these technologies? It seems like the landscape is shifting a lot with large language models. So yeah, I wonder if there are challenges or sort of struggles in either developing these or or connecting with the communities that are using them.

GRANT:
It’s funny, yeah. There’s not necessarily challenges per se. There’s like a ton of exciting opportunities. So the language model stuff is really cool. At the end of the day, it’s kind of the same thing. It’s like a particular style of a model paired with a ton of high-quality data, and then groups like OpenAI show that that type of system could work. So we haven’t brought in some of the language models into Merlin yet because we haven’t quite found that awesome need. Although one of these things I’ve been wanting to do is the five-step workflow in Merlin, which not a lot of people use any longer, was the first day that the app was released with. So you got like, it was like bird ID. It was like, you dump in there and you’re like, where did you see it? What was it doing? What were the three primary colors kind of thing? That I would love to replace with a just tell me what you saw. Like, whether you do speech to text or if someone wants to type it in. But just like, tell me what you think you saw, what it was doing, however you want to do it, just free form text. And then we’ll try to give you some suggestions based off of that. And I think that could be like a very powerful way, yeah, to let people communicate however they want to. So I think there’s really fun opportunities there. Again, with SoundID, we’re very interested in just making it kind of like more knowledgeable about the different song types and the call types. We’re back on photo ID, actually, with iNaturalist and Merlin camera, like you were saying, so smartphones have gotten really good.

THOMAS:
Other people’s smartphones.

GRANT:
Other people’s smartphones. The new Pixels from Google, the new iPhones from Apple have really good cameras. And so we’d like to kind of revisit a lot of our photo things. Because one of the original criticisms we used to get for working on birds was like, no one can take a picture of a bird unless you have like a nice camera. And now I feel like that argument is like sort of being like eroded away because like cell phones, I mean, they’re not, obviously they’re not ever going to replace a nice telephoto lens, but you can get a decent photo for identification purposes, maybe not for National Geographic.

THOMAS:
You know, I think that my hardware limitation there is it’s not even in the phone. It’s like, how quick can I get out of my pocket? It’s like, I’m not a fast enough draw. So, how far do you think we are from… direct human to bird translation.
Both ways.

GRANT:
I mean, I feel like scientists need to agree on this vocalization type project for Merlin. We’ve been talking about this since we envisioned SoundID.
It was like, wait, wait, let’s not just do Cardinal. Let’s try to say which vocalization type the Cardinal is making. It is so difficult to get ornithologists to agree on the different call types for a species. Like they just, like they don’t, like, and this is for like eastern backyard birds, like let alone the Neotropics or like somewhere else. Yeah, so I feel like there’s actually a lot of room for like, you know, ornithologists and ecologists and biologists to like make some
agreements on what things are and then we can figure out how to bring them in too.

THOMAS:
I’m with you. All right, I want to give a chance for some more audience questions. If there’s anybody in the room or on Zoom, I think there was a hand up over here earlier that we didn’t get.

AUDIENCE:
Thank you so much for this evening. The question I asked was done. So I have a new question. So I have all your programs on my computer, on my phone. So I use it almost every week. But every time I open the program, I never pay you a penny. Zero. So I hear you are continue improve the programs, sort of continue developing. So what’s the mechanism to keep it sustainable?

GRANT:
Yeah. So both apps are just are powered by donations. So yeah, if you haven’t, if you haven’t gotten a little like donate button, I’d be pretty surprised. Yeah, I mean, iNaturalist is becoming like, is getting those a little bit, I feel like I’m getting those too frequently. I’m like working on you right now. Stop asking me to donate. But yeah, so yeah, it’s, the lab ornithology has, you know, a long tradition of like kind of fostering a big donation group and iNaturalist is now starting to do that. So iNaturalist was actually owned by National Geographic and the Cal Academy of Sciences for a long time. So those institutions were footing the bill for the software development and server storage. But we also, I mean, we do work with industry. So like iNaturalist stores all of their photos and audio on the Amazon cloud, and Amazon picks up that bill, you know, sort of free of charge. So yeah, so there are collaborations and there’s like, you know, we find opportunities to get money and cut deals and, but no, but we’re always, you probably have more resources.

AUDIENCE:
I think next time I use the program, I’ll make sure to the enter a donation. Thank you.

THOMAS:
Yeah, we should have put a QR code. Any other questions? Yeah, in the back there.

AUDIENCE:
So, I know you touched on this a little bit before, and I know Cornell is doing this quite a bit, but I’m going to talk for just a second about aggregating data from the individuals of birds’ farms, and what I can tell you about, for example, how climate change is changing the composition of the biome, including birds. You know, what kind of competitive exclusion you get between different bird species. It seems to me that it’s a really rich area to be explored by aggregating data for the millions of people who are using birding.
And like I said, Cornell is doing some of this, but I don’t know much more extent that.

GRANT:
Yeah, no, it’s a great, it’s a great sort of question. Yeah, we’re, it’s okay, so
Sometimes this comes as a shock to people, but the Merlin right now is kind of a black hole. So you run sound ID or you use photo ID on Merlin, Cornell does not get that data back. We keep that data local to you. The only data we’re really getting is if you then do an eBird checklist and let us know what you saw. So we’re currently working through changes to Merlin that would allow you to then save detections back to Cornell. And that would sort of be this… As soon as we did that, we guys would like flip the switch, so to speak, on a system like that, we would probably create a citizen science database larger than kind of anything we’ve ever built in a matter of like weeks.
We would quickly surpass eBird and we’d quickly surpass iNaturalist.
And so we’re currently thinking through kind of like what data we really want from users. How do we store this in an efficient way? Because we’re not Google. We can’t just store everything indefinitely. So yeah, we are thinking through that. The Ebird team is super excited to have a data set like this, specifically for what you just mentioned, this aggregation of both human-generated checklists and machine-generated acoustic checklists is what we’re calling them. And then, yeah, and then you can bring in things like iNaturalist and aggregation. The kind of the science questions you ask, people are conducting those, and a lot of, like, we were actually working with some researchers in Ecuador earlier this year who have been doing kind of long-term, like, manual studies of a few sites in the Ecuadorian Amazon.
And it’s pretty, yeah, it’s pretty humbling the changes that they’re seeing, right? Even in these, like, pristine forests that haven’t necessarily been touched directly by humans, Yeah, the impact of climate change is absolutely felt. I mean, with the 2019 report, right, that North America has lost 3 billion birds, and that’s just continued over the last five years. And yeah, we’re seeing this stuff in these kind of, again, pristine, untouched forests, right? So our kind of our best, it feels like for a long time our strategy was like, okay, we’ll just carve off sections of land and not touch it, but that doesn’t seem like it’s going to be enough if we want to maintain some of these bird populations.

THOMAS:
Any other audience questions, or is anybody on Zoom something they’d like to ask?

AUDIENCE:
So I have a question. Thank you. Because I actually didn’t realize that when you use Merlin, like those recordings are not going anywhere. So, am I keeping them in my phone? Is that what’s happening?
So, and at some point, like they fill up on your memory?

GRANT:
They do.

AUDIENCE:
Is that what’s slowing down my phone? Like, do I have to go in there and start deleting a bunch of these recordings?

GRANT:
If your phone’s getting slow, yeah, you can go into Merlin and clear out all these recordings. Yeah, no, it’s… It’s, yeah, it’s turning to, so in some sense, sound ID became so, we knew sound ID was going to be really cool.
I don’t think we quite realized how much people were going to like it.
And so from the beginning, we just had never built in these mechanisms to kind of get that data back off your phone. And up until now, so five years into the project, we’ve basically been playing like kind of catch up. Actually, coming back to your question, like, you know, it’s … All of our resources have just been running, like keeping that program going, kind of expanding the core SID model. Now we’re at a point where we can revisit that, like, okay, let’s get this data back off of people’s phones, if they want, like, you know, make it very obvious, like, hey, you’re going to share these detections back to Cornell. But yeah, let people opt into that and then start aggregating all that information. But yeah, right now, you’re, you know, and I, you know, I…
Because the audio is taking up so much space, I often will use sound ID and then just cancel out of it as opposed to stopping it. And so we’ve built all these weird human behaviors now because we haven’t thought this through. But yeah, we’ll work on that.

AUDIENCE:
Thanks so much for being here. I know it’s really annoying to ask app developers, hey, what about this? What about this? But ecology would be fundamentally changed if we could do individual ID. So I’m curious what you think about the future of individual ID, maybe not from Merlin, but from images in particular.

THOMAS:
And can you define that for the audience?

GRANT:
Yeah, so this would be like, this is really related to facial recognition with humans. So with humans, we can recognize who you are by your face, and it would be really nice if we could do this with other species through different modalities, whether it’s photos or acoustically. I know for species that have sort of unique patterns, so snow leopards, whales, in terms of some of their marks on their flukes, those species can be readily ID’d to individuals.
Things like birds, there’s been studies, it’s very difficult to do it visually.
So as a machine learning researcher, I look at situations where humans can definitely do a problem, and then I’m like, okay, like a machine’s got a chance.
It might not, it might not do it, but there’s like, if humans can do it, so far the track record has been like, we can make a machine that can do it.
If humans can’t do something, then I’m like, that’s not usually the problem that I try to crack. Just because there’s many other problems where like, I feel like we can probably make more progress. So birds visually, maybe. Definitely we can pick up the banding tags. The cool thing acoustically, though, is that some researchers in Pennsylvania are finding that they, like for oven birds, for example, they have been able to re-ID based off of some of their song characteristics. So that actually makes me kind of excited that maybe we can do something with audio. Again, more as a researcher-focused tool as opposed to kind of a rolling style thing. Yeah, the eBird team is actually who’s been kind of knocking on my door asking about this because with this Merlin acoustic checklist type idea, what we would give eBird is a presence-absence list, but what they really want is they want counts. They’re like, how many cardinals did the person have here? And if we could do individual ID or at least distinguish that, would get us a bit closer to that, which would be pretty cool.

THOMAS:
That’s a better app pitch for individual ID than the one that I recently gave Grant, which was to make an astrology app that gives you a horoscope based on your Merlin observations. So we have just got time to wrap up here.
I want to ask one question, which is in just a couple of words, what are you most proud of from the work you’ve done? What’s an outcome that you’re really pleased with?

GRANT:
It always just makes me so happy when people come up and say how much like Sound ID has like changed the way they like view outside. Like that’s just, and then I’m like, as soon as I say that, it makes me like really happy, but then I’m quick to tell them that like it’s not just like, you know, like I was a, I was one part of a team that like made that system. So I think the other thing that I’m, so I’m really happy about Sound ID that has just turned into this like really cool project. But the other thing I’m really proud of is we keep putting together really cool teams to solve these problems, and then we get to work with amazing communities, whether you’re an iNaturalist or the birding eBird community. And that’s just, yeah, it’s just like really nice. It’s really fun.

THOMAS:
What a good note to end on. Thank you so much, everybody. We’re really appreciative of Grant. We’re appreciative of all of you making the time on a cold, dark November evening. It’s felt like evening for hours and hours now already. Please feel free to stick around for a little bit. Grant, if you’re able to linger, then people can come up and ask you some questions. Please eat some pizza, take it with you. And I want to, again, remind all of you to please check out our Science Stories website. You can subscribe to our events listserv and our blog listserv. They’re separate listservs, so if you’re interested in both, go ahead and subscribe to both. Please also fill out our post-event survey.
This will also be emailed to you if you registered after the event, but this is really helpful for us to be able to improve how we host these cafes.
You can also have the chance to suggest what kind of science topics you’d like to see us cover in the future that we haven’t covered yet. Our next event is not going to be until the spring semester, so not until the new year, as we’re going to take a winter break. So please stay in touch by checking out our website, checking out our blog. Please join me again in thanking Dr. Grant Van Horn for this great interview.

GRANT:
I have a few stickers as well, if folks want some Merlin or Ebird stickers. I ran out of iNat stuff, sorry.

THOMAS:
So please come up and get some stickers and get home safe, everybody. Thank you so much.