Old MacDonald

Harys Dalvi

July 2021

Does a dog say bark bark, arf arf, ruff ruff, or woof woof? Perhaps the answer is none of the above; Spanish speakers might argue for guau guau. In Chinese, dogs say 汪汪 (wāng wāng). In Hindi/Urdu dogs say भौंक भौंक / بھونک بھونک (bhaunk bhaunk). Who's right?

The imitation of animal sounds has fascinated humanity for ages. Players of the Australian didgeridoo are not concerned primarily with pitch, but instead traditionally use the instrument to imitate animal sounds and other nature sounds [1]. In American culture, the most famous example of this fascination is the nursery rhyme "Old MacDonald Had a Farm". It is often used to teach kids what sound animals make: sheep go "baa baa", cows go "moo moo", cats go "meow meow". But sometimes the sound of an animal doesn't become a part of our culture. Norwegian duo Ylvis took advantage of this in "What Does The Fox Say?" which went viral on YouTube.

Imitating animal sounds may be common across cultures and across time, but the actual imitations are far from universal. There is incredible variation between languages, and even within languages: "bark bark" versus "woof woof" is a good example. I decided to use acoustic phonology to determine the most accurate onomatopoeias for animal sounds and other sounds, removing cultural bias. First, I'll briefly go over my methodology, some principles behind it, and limitations. Then comes the big reveal: I'll show my conclusions for which human speech sounds come closest to the source sounds. Finally, I'll go over my thought process for each sound in detail, in case you're interested or you disagree.

Methodology

My method was based on spectrograms. These are graphs that show time on the x-axis, and the various frequencies of a sound at that time on the y-axis. Color represents amplitude, with darkness towards black indicating higher amplitude at a given time for a given frequency. They are created using the Fourier transform, which can be used to represent a function as the sum of many sine waves with different amplitudes and frequencies.

I started trying to make my own program to record and analyze sounds, and to make spectrograms. My spectrograms looked like this:

It's a spectrogram, but not a great one. Then I found the software Praat [2], which is designed for this sort of thing. I used Praat for most of this project.

Of course, I also had to figure out how to look at this complicated chart and come up with human speech. For this, I relied on Rob Hagiwara's page "How to read a spectrogram" [3] and other pages. I learned how to use a spectrogram of human speech to decode (to some extent) what's being said in the recording, without listening to the recording.

I wondered if I could apply the same ideas for decoding human speech to decoding other sounds as if they were human speech. I analyzed each spectrogram and made my best guesses for what human sound they might represent if they came from a human. I also listened to the recordings to help guide me, because reading spectrograms from humans can be very difficult; reading spectrograms and pretending they're from humans is even more difficult. With this, I was able to put together ways for humans to imitate various sounds.

I am far from an expert in acoustic phonology, so there are probably better interpretations than what I came up with. This is especially true because the online resources I used almost entirely discussed reading spectrograms of English speech. I tried to be as unbiased as possible, but I'm sure my answers would've been different if I was more knowledgeable about reading spectrograms in a diverse set of languages. Maybe one day I'll revisit this page and see if I would change anything with my new knowledge.

The sounds I looked at are often very difficult from human speech, so deciding which human sound to use was a little bit of a judgement call at times. There are probably other answers that are equally valid or more valid given the same data.

Reading into sounds looking for human speech where there is none is difficult, but I am somewhat satisfied with most of my answers. If you are unsatisfied, and you know about acoustic phonology (or are just curious), you can read my detailed analysis where I explain my reasoning for each sound.

Results

Here's what I came up with. Rhymes are American English unless indicated otherwise.

Sound	Imitation (IPA)	English approximation	Rhymes with
Bird (chirp)	[i˥˧w]	"eew" /iw/	Seem
Cat	[ɲaːw]	"neow" /njaʊ/	Meow
Cow	[nwɹ̩ː]	"nwer" /nwɝ/	Twirl
Dog	[ɹʊw]	"row" /ɹow/	Sow
Donkey	[i˧˥˧ a˩˩]	"eeh ah" /i ɑː/	See bar
Duck	[wæ̃ː]	"wan" /wæn/	Fan
Elephant	[ɹ̩ː˥˦]	"rerr" /ɹɝ/	Her
Fly buzzing	[zː]	"zoo" /zu/	Zoo
Frog	[oː˩˩ o˨]	"ohwa" /oʊ.ˈwə/	Show uh
Horse (neigh)	[hɛ˧˥˧]	"heh" /hɛ/	Let
Horse (gallop)	[\| \|]	"ta ta" /tɑː tɑː/	Ha ha
Monkey	[wʊ˩˨ haː˦˥˦]	"woo ha" /wu hɑː/	Shoe mart
Owl	[uh uh]	"ooh-hoo" /u hu/	Woo-hoo
Sheep	[æː˧˦˧]	"aah" /æ/	Add
Snake	[sː]	"siss" /sɪs/	Hiss
Whale	[uː]	"ooh" /uː/	You
Wolf (howl)	[hɹuːw˦˥˦]	"huroo" /hɝuː/	Fir two
Ambulance	[ɹãːɹ˧˥˧]	"rahn" /ɹɑːn/	Palm
Bamboo flute	[ɒː]	"au" /ɔː/	Thought (British RP pronunciation)
Metal flute	[ɔː]	"au" /ɔː/	Thought (British RP pronunciation)
Gong	[ksː]	"kuss" /kʊs/	Book
Guitar	[zu]	"zoo" /zu/	Zoo
Piano	[du]	"doo" /du/	Zoo
Sitar	[ʒu]	"zhoo" /ʒu/	zh like s in measure, oo as in zoo
Violin	[ɔː]	"au" /ɔː/	Thought (British RP pronunciation)

Detailed Analysis

Duck

I'll start with the duck sound, because I think it was easy yet instructive. This is also the sound I'll explain in the most detail so you can understand my method. Later on I describe my thoughts in decreasing detail for reasons of brevity and laziness.

I'll start with the spectrogram for a duck noise.

I've labeled the fundamental frequency in green, and the formants F1/F2/F3 in orange/blue/yellow. I will continue this color scheme. The fundamental frequency (F0) seems to be about 170 Hz and doesn't change much. It's important to note F0 because the distance between F0 and F1 seems more important than the frequency of F1 in perception of vowel quality [4]. Similarly, the distance between F2 and F1 seems more important than the frequency of F2 [5]. I labeled these distances in red.

Note the general shape of this sound. F1 and F3 stay roughly constant, while F2 starts near F1 and rises to near F3. Also look at the red dotted lines between F1 and F2. There doesn't look like much noise in this area, hinting at a nasal, but I'm not really sure since there are a lot of empty dots that make it hard to read. Anyways, a nasal sounds right to me when I listen to the recording, and a duck's beak is basically a mouth and a nose together so a nasal sound makes sense.

Let's try "quack quack" and see how it compares.

This is a recording of me saying "quack quack", and I clearly do not sound like a duck. In the regions marked A and B, you can see bursts from the "q" /k/ in "quack". In region C, you can see some aspiration. I think there might be some aspiration after region A too, but I can't see it clearly. I'm bringing your attention to these regions because the actual recording of the duck has nothing like them. We definitely don't want a [k] sound when imitating a duck.

Thinking back to the spectrogram of the duck, F1 and F2 were close at the start, with F3 higher. This reminds me of /w/. Later, F2 rose, becoming something like the /æ/ you can see in the "quack quack" recording. So maybe try [wæ̃ wæ̃] instead of "quack quack"?

Here you can see the overall shape is quite close to the actual duck: F2 rises from near F1 to near F3. The big difference is that F2 falls again. I think this is because I said [wæ̃ wæ̃] continuously, like [wæ̃wæ̃], instead of separating it into two sounds. Overall [wæ̃ wæ̃] seems great, much better than "quack quack". The only problem is that English doesn't have nasal vowels. I decided to use /wæn wæn/ to give the /æ/ a bit of a nasal quality.

This is pretty good too. So duck is complete.

Sheep

In the recording of the sheep, there was a notable rise and then fall in pitch. I decided to start with a narrow band spectrogram to tell where F0 was and how it was changing so I could find F1 and isolate the effects of moving formants from changing pitch.

The change in pitch is clear from this spectrogram. Then I switched to a wide band spectrogram to continue analysis.

Here I have marked higher formants just in case. I notice that all the formants point downwards at the beginning and the end. At first I thought this might indicate a bilabial, making sheeps sound something like /bæb/. Then I remembered this is probably just because of the change in pitch, and it's the gap in formants that really matters. The gaps stay fairly constant, except F2 moves farther from F1 in the middle, indicating a less rounded/more front vowel. I tried saying something like [äæä] for this, but it was too hard, so I just chose [æː˧˦˧].

Dog

This is a common one, but one that has produced an infamous amount of disagreement across and within cultures. From the spectrogram, I could see why.

Region A is fine: there are three low formants, so I'm thinking [ɹ]. Region B is extremely difficult to read; there's just sound everywhere. Then region C seems to have just one formant, with everything above slowly fading out.

For B, I thought that since there is sound across many frequencies, maybe that can be interpreted as a large amount of formants all close together. This would make a low F1 and a low F2, so that's a high back rounded vowel: [u]. Then region C seems to continue region B in some ways, but it gets softer, more like a consonant. I picked [w] here. So that's [ruw], but when I said that I felt like the [w] didn't shine as a separate region from the [u]. I changed it to [rʊw] to fix that.

Cat

Like the sheep, I heard changing pitch in this one. I checked that with a narrow band spectrogram.

There's a modest drop in pitch here, and also on the main spectrogram:

The beginning of the spectrogram definitely looks like a nasal, but I couldn't find any evidence to support [m] in particular over any other nasal. So I was thinking [n], but I also heard a very strong [j]-like sound in the recording, just like "meow". I thought it might be cultural bias but it wouldn't go away. So I decided on [ɲ]. Then there's a vowel with both F1 and F2 fairly high, so a front open vowel like [a]. Finally, F3 becomes extremely weak. This sounds a lot like a [w] to me somehow. I found some evidence for that in Rob Hagiwara's May 2007 mystery spectrogram [6], in which the F3 becomes weak in the /ʊ/ of /oʊ/. So I went with [ɲaːw].

Monkey

As a fellow primate, I was excited to see if a monkey spectrogram looked more human than some of the other animals. Indeed it did.

There are actually two different sounds here. First is the "ooh" of "ooh ooh ah ah", repeated three times; then come multiple "ah"s. First the "ooh".

There isn't much going on here. The F1 and F2 are fairly close together, and it's hard to see a clear F3. The beginning looks a little like a consonant though. With listening to the recording, I picked [wʊ].

Here there seem to be three sounds. The first one looks a little like a fricative, then there's a pause; the second one looks like a nasal or approximant maybe? And then the last is clearly a vowel. The first fricative had similar formants to the rest, so I classified it as [h]. I wasn't sure what to make of the second sound, and I couldn't hear much of it in the recording, so I ignored it. Finally, the vowel has high F1 and F2, so I chose [a].

Horse (neigh)

This might have been the hardest sound, and it's the one I'm least satisfied with.

We can see (and hear) wild variations in pitch, making it hard to see changes in the formants relative to each other and F0.

Also I see that F2 is quite weak, and I'm not sure what to think about that. The three formants are quite evenly spaced, and follow a fricative with similar formants. For the fricative I picked [h], and for the vowel I thought it looked a little like [ɛ] but I'm not sure. This needs an expert.

Horse (gallop)

Compared to the neigh, this wasn't too bad.

This looks a lot like an alveolar click, as seen in the Yeyi language (Fulop et al. 2003) [7]. Alveolar clicks are also used by speakers of non-click languages like English to imitate horses galloping ("clip-clop") so this looks good. However, since English isn't a click language, I chose "ta ta" for the English approximation. Like "ta ta", this sound has a strong burst ([t]), and then I just picked a vowel to make it possible to say.

Donkey

Unlike the horse, this was surprisingly humanlike, like a monkey. Checking the narrow band spectrogram, I can see that there are two distinct pitches, one much higher than the other.

Based on this, I can find F0 and I know the formants must be above F0 in each region.

The low pitch region has F1 a fair distance above F0, which is easier to see in the narrow band spectrogram. Then F2 is clearly high above F1. So this is a low, front, unrounded vowel: [a]. For the high pitch region, all the formants are high. The high F2 makes it sound like [i] to me, although the F1 is a little high for that. I went with my ear.

Elephant

All the formants are low, so this looks like a syllabic R.

Frog

It's hard to see from the graph, but the second region is audibly higher pitch than the first. Also, I don't see a clear third formant. F1 is low but with some gap between it and F0, so this is probably a mid vowel. The distance between F1 and F2 looks similar to [o] in Rob Hagiwara's guide [3]. So that's what I chose.

Fly buzz

This is clearly a voiced fricative. I chose [z], although there is a little too much noise at lower frequencies. I couldn't find a better match though.

Bird chirp

At first I wondered if there was only one formant high above the fundamental frequency.

Then I wondered if the ear interpreted this as F1 and F0 being extremely close together, with a high F2. Listening to the recording, this made sense to me. Towards the end of the chirp, the high formant disappears, and the low region becomes perhaps a little wider, maybe making room for the ear to interpret a new low F2.

I interpreted the main part of the chirp as [i], and the final consonant-looking part as [w].

Owl

First I looked at the narrow band spectrogram.

Looking at the main spectrogram in this context, it looks like there might be a low F1, almost touching F0. It also looks like there is a weak F2 right above the F1. This all points towards [u].

But there is also a section after each [u] that looks more like a fricative. It seems to have similar formants to the [u], so I considered it [h]. When multiple hoots are together, the [uh uh uh] can sound like [u.hu.hu], making the "hoo hoo" that we are familiar with.

Cow

The beginning part definitely looks like a nasal, but I couldn't find anything pointing to the [m] of "moo". After this there is a time with low F1 and F2 but high F3, like [w]. Then as F1 and F2 rise, the F3 is lowered relative to them, making a syllabic R.

Wolf howl

At first, F3 isn't really visible (or maybe is the very high stripe), and F2 is low. The beginning looks a little like a fricative, with a lot of noise. Then the F1 and F2 stay similar, but a new low F3 is more clear. This looks like something rhotic, similar to the beginning of the dog barking. Then F2 fades out, leaving what the ear might interpret as an F2 low and blending with the more prominent F1. This is [u]. Finally the F2 and F3 are visible again, with the F2 near the F1 but the F3 much higher, like [w].

Whale

At first this looked a little like a high F1 above a thick band of F0.

Listening to the recording, I thought it made more sense to interpret this as an extremely low F1 and F2 both blending with F0, similar to [u].

Ambulance siren

The formants alternate between all being close like an R, and spreading out like an open vowel, maybe [a]. Also during the vowel section, there is almost no noise between formants, like a nasal.

Piano

It looks like there is a voiced burst at the beginning. After this, the F2 stays close to F1 while F3 dies off, like [u]. The burst looks like it has some high noise, like an alveolar. So I picked [d].

Guitar

This is somewhat similar in structure to the piano, but a lot more complicated, with formants extending to much higher frequencies at the beginning. So I swapped the stop consonant for a voiced fricative to account for the extremely high frequencies.

Sitar

This is similar to the guitar, except the sound lingers a lot longer, and the higher frequencies don't fade out as quickly. The thick region near the bottom but above the bottom formants reminded me of [ʒ] from Rob Hagiwara's guide.

Indian bamboo flute (bansuri)

The F1 is quite high, so this seems to be an open vowel. The F2 is quite low, indicating a back rounded vowel. Interestingly, this reminds me of "om" but without the final nasal, as if the flute is part of a meditation.

Western metal flute (concert flute)

Here the F1 isn't quite as high, so the vowel is more closed. The F2 is still low, so it is a back rounded vowel.

Gong

There is a burst at the beginning like a stop consonant, and then what looks like an unvoiced fricative including high frequency noise.

Violin

This is surprisingly similar to the metal flute. The higher formants are more prominent than the lower ones in comparison to the flute, but I don't know how this affects vowel perception so I considered it to be the same sound.

Fox

The secret of the fox is an ancient mystery. We will never know.

References

The Didgeridoo and Aboriginal Culture (Aboriginal Art & Culture) ^

Praat (Paul Boersma and David Weenink, University of Amsterdam) ^

How to read a spectrogram (Rob Hagiwara, University of Manitoba) ^

The role of F0 in vowel perception (Stockholm University) ^

Formants Spectrograms and Vowels (University of Arizona) ^

May 2007 Mystery Spectrogram (Rob Hagiwara, University of Manitoba) ^

Yeyi Clicks: Acoustic Description and Analysis (Sean A Fulop, Peter Ladefoged, Fang Liu, & Rainer Vossen) ^

Identifying sounds in spectrograms (University of Manitoba)
Mystery Spectrogram Archive (Rob Hagiwara, University of Manitoba)

Resources

Animal Sound Effects (Wavsource)
Musical Instrument Samples (University of Iowa)
Ambulance siren - Sound effect (Suono Electronico)
Sitar C5 (Free Wave Samples)
Rattlesnake Sound Effects (Fesliyan Studios)