Created: 2022-01-03 18:51
Completed:
Type: #code #sound
Tags: #project #Supercollider #colab #python #birds #school #dxarts
## Introduction
**Description:** [[Contacts#James Coupe|James Coupe]] agreed to supervise a few credits of DXARTS 499 A: Undergraduate Research, and suggested that i investigate the possibility of generating synthesized bird calls based on image analysis of imaginary birds.
---
## Tasks
#### Catalog image analysis methods and approaches
Assigned: 2022-04-07
Due date: flexible (2 weeks?)
Status: #InProgress
##### Notes:
segmentation of image
background removal
bird feature recognition classifier
dft on individual body parts
background classifier
---
#### Extend song patterns
Assigned: #unassigned
Due date: None
Status: #deferred
##### Notes:
[THE FOUR BASIC SONG PATTERNS](https://earbirding.com/blog/archives/4598)
- **Phrases** are clusters of _unique notes_ that are _slow enough to count_;
- **Series** are clusters of _repeated notes_ that are _slow enough to count_;
- **Warbles** are clusters of _unique notes_ that are _too fast to count_;
- **Trills** are clusters of _repeated notes_ that are _too fast to count_.
---
#### Adjust acoustic environment
Assigned: #unassigned
Due date: None
Status: #deferred
##### Notes:
use background information to adjust reverb, etc.
caged bird different acoustics from forest/field/etc.
distance/amplitude
---
---
## Timeline
**2022-01-03:** Initial Discussions
**2022-02-03:** Initial Meeting
**2022-02-20:** Switched to Colab
**2022-03-22:** Initial workflow completed
---
## Project Log
### Spring Quarter
#### 2022-05-09
Time Spent: 6-7 hrs
##### Trying to build a segmentation model:
11:30 I feel like i'm hopelessly behind on this project this quarter, but i suppose i'll never get anything done if I don't do anything, so:
[segmentation of image](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/images/segmentation.ipynb): beyond image classification, intended to identify the shape of the object, works by classifying each pixel of the image.
went through tensor flow tutorial at https://colab.research.google.com/drive/1os3MdOtgddkuonTlom6GsT2Ul7bxa6GT?usp=sharing
![[ImaginariumAvem1.png]]
12:09 identified data sets
1. [BIRDS 400 - SPECIES IMAGE CLASSIFICATION]( https://www.kaggle.com/datasets/gpiosenka/100-bird-species?resource=download)
2. [Caltech-UCSD Birds-200-2011](http://www.vision.caltech.edu/datasets/cub_200_2011/) with [segmentation data](https://data.caltech.edu/records/20097)
downloading data, then realized it's available as a TFDS: https://www.tensorflow.org/datasets/catalog/caltech_birds2011
```python
dataset, info = tfds.load('caltech_birds2011', with_info=True)
```
but:
```python
NonMatchingChecksumError: Artifact [https://drive.google.com/uc?export=download&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45](https://drive.google.com/uc?export=download&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45), downloaded to /root/tensorflow_datasets/downloads/ucexport_download_id_1hbzc_P1FuxMkcabkgn9ZKinBw1sbZS_7v82APqMiPk0wej8WzZ5MmtVy61NUy-F6708.tmp.7bf4c584fe3e49cca44d4b8e9ae222d0/uc, has wrong checksum
```
ugh. i don't know what i'm doing and can't tell if its a bug, so im trying some more basic tensorflow tutorials to get some context.
starting with [basic classification](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/classification.ipynb)
uses: numpy, matplotlib.pyplot and [tf keras](https://www.tensorflow.org/guide/keras/sequential_model)
from dataset: [fashion-mnist](https://github.com/zalandoresearch/fashion-mnist)
##### load & normalize dataset
```python
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
```
produces 4 arrays:
train_images,
train_ _labels,
test_images,
test_labels
images are 28 by 28 pixels
to get size and detais of an array:
```python
array.shape
```
from matplotlib import pyplot as plt
```python
for x in range(100, 115):
print(train_labels[x])
plt.imshow(train_images[x], interpolation='nearest')
plt.show()
```
oh, even better:
```python
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
plt.xlabel(class_names[train_labels[i]])
plt.show()
```
##### define layers
[Layers](https://www.tensorflow.org/api_docs/python/tf/keras/layers)
[layers explanation](https://youtu.be/oXMEeGrAuk0)
define model in terms of layers
```python
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
```
input layer flattens the 28x28 array to 784 pixel values
then add two keras layers to the network
1. "dense" or fully connected (128 neurons)
2. returns a [[logits]] array of length 10
##### compile the model
- [_Loss function_](https://www.tensorflow.org/api_docs/python/tf/keras/losses) —This measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction.
- [_Optimizer_](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) —This is how the model is updated based on the data it sees and its loss function.
- [_Metrics_](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) —Used to monitor the training and testing steps. The following example uses _accuracy_, the fraction of the images that are correctly classified.
```python
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'`])``])`
```
##### Train the model
1. Feed the training data to the model. In this example, the training data is in the `train_images` and `train_labels` arrays.
a. [model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit)
b. [model.evaluate](https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate)
- [Demonstrate overfitting](https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#demonstrate_overfitting)
- [Strategies to prevent overfitting](https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#strategies_to_prevent_overfitting)
The model learns to associate images and labels.
2. You ask the model to make predictions about a test set—in this example, the `test_images` array.
1. With the model trained, you can use it to make predictions about some images. Attach a softmax layer to convert the model's linear outputs—[logits](https://developers.google.com/machine-learning/glossary#logits)—to probabilities, which should be easier to interpret.
```python
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
```
3. Verify that the predictions match the labels from the `test_labels` array.
```python
predictions = probability_model.predict(test_images)
np.argmax(predictions[0])
test_labels[0]
```
###### second attempt
okay i kind of see the overall workflow now to try again. [this is my error](https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror) but URL is ok.
*hours later:* **huzzah!** the issue was with pip version of tfds. instead use:
```python
!pip install -q tfds-nightly tensorflow matplotlib
```
![[ImaginariumAvem2.png]]
yes! okay, so after some more massage the [colab file](https://colab.research.google.com/drive/1os3MdOtgddkuonTlom6GsT2Ul7bxa6GT?usp=sharing) seems to be seeing the training data correctly!
![[ImaginariumAvem3.png]]
the model:
![[Pasted image 20220509180442.png]]
before training:
![[ImaginariumAvem4.png]]
but it isnt getting better with training: ![[ImaginariumAvem5.png]]
hm. all is birds. nothing is birds. very zen.
![[ImaginariumAvem7.png]]
---
#### 2022-04-23
13:12
##### [[Background extraction]]
#TAGS: #log-
Time Spent: 6h
**Notes:**
made a little progress on this, learned some simple CV object interactions, but honestly not sure this method of background extraction works on still frames, since its looking for data that isnt changing much from frame to frame.
https://docs.opencv.org/4.x/d7/df6/classcv_1_1BackgroundSubtractor.html
https://www.geeksforgeeks.org/python-background-subtraction-using-opencv/
https://docs.opencv.org/4.x/d1/dc5/tutorial_background_subtraction.html
#### 2022-04-07
##### Initial Spring Setup
#log-entry
**Time Spent:** 4hrs
##### Notes:
Initial meeting with James to set goals
set up Project page in [[Obsidian]]
imported content from Winter emails
### Winter Quarter
#### **3/22/22:** Coding: Refactor and DFT Sampling
I spent a good part of today cleaning up code and setting up the dft sampling. I attached four audio files generated from four bird images.
The “main” bit of code is in the section “generate dft of image, sample dft”:
```python
# folder of downloaded bird images
source_folder = '/content/drive/MyDrive/DX499/source-images'
# generate a name from the color, part, and family text files
bird_name = random_bird_name()
# choose a random image and load it as a numpy array
bird_image = random_bird_image(source_folder)
# generate a dft with numpy fft
dft_array = generate_dft(bird_image)
# normalize the dft
norm_dft_array = normalize_dft(dft_array)
# take samples from the dft, take abs value, modulo 1, round to 4 places, etc.
sample_list = sample_array(norm_dft_array, 15)
# convert the sample list to parameters for the SC synth (this could be cleaned up)
bird_dict = parameter_list_to_dict(bird_name, sample_list)
bird_list = parameter_dict_to_list(bird_dict)
# issue the SC command line instruction that generates the WAV file
generate_birdsong(bird_list)
```
The dft sample is just taking 15 values evenly spaced from the dft data array, normalized by absolute value and dropping any integer part, and rounding to 4 places. It’s arbitrary, but as we discussed, the goal at this point was to have a functioning workflow, and now it is technically converting an image into the control parameters, and generating the wav file from that.
---
#### **3/4/22:** Image DFT and Species Name Generator
I spent several hours this week on the project. I added a couple of small text files with names of colors, body parts, and some common bird families; now, given a folder of images in google drive, for each one it generates a name for the audio file:
cinnamon-throated-Grassbird
coral-tailed-White-eye
misty-rose-backed-Parrot
eggplant-rumped-Boobie
brass-toed-Shrikes
gold-toed-Berrypecker
etc.
loads the image, converts it to greyscale and produces a DFT of the image:
![[fftimage1.gif]]
![[fftimage2.gif]]
With a little more time I’ll sample the images in a few places to produce the fifteen decimal values and generate a sound. Then I’ll refactor the code to clean it up, split it into sensible objects/methods/functions and add a little bit of error checking (and maybe add some testing.) That will likely take us through the end of this quarter. At that point we’ll have a minimal complete process/outline for the image-to-birdsong conversion.
And then next quarter we can improve it. I’m imagining we can start looking at other options for image analysis, as you suggested, extracting the bird from the background, doing segmentations etc, and maybe look at the synthesis model in a little more depth, to extend the vocabulary of the birds to include phrases and groupings.
---
#### **2/24/22:** More research and Review (Books and Tape)
That [Alku tape + book](http://alkualkualkualkualkualkualkualkualkualku.org/pmwiki/pmwiki.php/Main/ALKU99) came today and it's good. And it's canary yellow which is even better. I’ve just skimmed the book but some good bits about fictional vs fictive birds, an Avian Turing Test, feedback training of birds for Spanish birdsong competition, and an essay by the great Goodiepal on his [“Mechanical Birds”](https://youtu.be/R5qO2wFQ0wU) It also looks like one of the artists also created another album of [“Wildlife Recordings of Desktop Birds”](http://www.underhund.com/anders/digitalbirds/index.html)
Also, I found a pretty good book called [The Physics of Birdsong](https://link.springer.com/book/10.1007/3-540-28249-1). I scanned [chapter 7](https://drive.google.com/file/d/14AGnsMfvUG4jA5FWJsLyMsIDamMkC5Ml/view?usp=sharing) and added it to the drive. It has a circuit diagram for an analog electronic syrinx if that’s something you are interested in.
And one more [book](https://alliance-primo.hosted.exlibrisgroup.com/permalink/f/kjtuig/CP71133679860001451) which comes with a set of 2 flexidiscs and a bunch of nice mid-sixties spectrographs of a variety of bird species.
---
#### **2/22/22**: Command Line parameters and NRT SuperCollider
happy twosday.
I made some more progress yesterday and today, and now I’m running a supercollider subprocess from python in a colab notebook that generates a wav file from the HansM/Farnell/Olofsson model using a given list or dictionary of parameters. (huzzah!)
[https://github.com/wtkns/birdsong/blob/main/birdsong.ipynb](https://github.com/wtkns/birdsong/blob/main/birdsong.ipynb)
There’s still a little refactoring to do of the Python and SC code, but it looks like I need to start looking at some image processing options.
---
#### **2/20/22**: Supercollider in Colab
This week I successfully built supercollider and the sc3-plugins inside a colab notebook; and using it I was able to generate a wav file using command line arguments.
For SC to run on the colab hosted runtime, it can’t just be installed via apt. It must be built from source, with the graphical tools turned off. Building it initially wasn’t that difficult, but when I tried to run it, I hit a roadblock with a weird error that google had never seen before. I opened an issue on the SC git repository, but I kept working on it and realized if I checked out the previous version “release” branch rather than the default current “develop” branch, that it would run properly. Turns out this was a valid bug and they’re looking at fixing it. I’m so proud to have contributed to the open-source software movement. 😉 [https://github.com/supercollider/supercollider/issues/5731#event-6108032349](https://github.com/supercollider/supercollider/issues/5731#event-6108032349).
Cloning, building SC and plugins takes 20+ minutes, so I also performed some simple initial testing on using a GCE hosted runtime. This worked as expected, with the advantage that I can retain the built SC from session to session; and with the disadvantage that it has a cost associated with time that it is running. Not sure if I’ll use it, it probably depends on whether the project ends up needing gpu or not. Hosting my own “compute optimized” VM without GPU is pretty reasonable in cost ($0.17 hourly) for a much faster (4 vCPU, 16GB) runtime for colab. The cheapest GPU option starts around ($0.28 hourly) and goes up from there and gets expensive quickly.
Next up will be getting the NRT Server generating audio files from a python command.
---
#### **2/3/22**: Progress Meeting
meeting with James
---
#### **1/30/22**: DFT Research
Not a lot of focus this week. Spent a few hours but didn’t feel productive, hoped I would have more attention this weekend but still didn’t make much progress. sorry this update is later than usual.
I spent a little time this week going further into DFT and a fair amount of time looking for anything on image classification using DFT. I didn’t find anything particularly relevant. (though there was a paper from a Kurdish University on [“Race Classification from Face Images Using Fast Fourier Transform and Discrete Cosine Transform”](https://drive.google.com/file/d/1Cs27eltFVF6xVliLB0Xmzeq2v3hI0g4T/view?usp=sharing) ).
I’m not thinking this is really the way to go, but I’ve found some stuff on [Image hashing with OpenCV and Python](https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/) so we could create a 64 bit perceptual hash of the image and then use 4 bits of the hash for each of 15 control parameters. At least it would mean that the same image of the bird wouldn’t produce two different calls, but honestly other than that it’s not much more than “random” association.
I also moved a little bit forward on SC, tested some basic NRT Code, ran it from the command line and figured out roughly how to use command line arguments with thisProcess.argv[n]; relevant code is in the github repository.
PyImageSearch. “Image Hashing with OpenCV and Python,” November 27, 2017. [https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/](https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/).
----
#### **1/21/22:** Supercollider Synthesis Model
I spent about five hours on the project this week. I spent some time with the Farnell book, began evaluating the patch more closely to port to SC and I found good news.
Looks like Frederik Olofsson has come through again! He helped me out on the forums when I got stuck on my SC particle system in 462, so I was already a fan. I found his SC [version](https://fredrikolofsson.com/f0blog/hansm-bird/) of a slightly different Farnell Pd patch ported from this [tutorial](https://web.archive.org/web/20151025212054/http:/www.obiwannabe.co.uk:80/tutorials/html/tutorial_birds.html). This version of the patch uses 15 control parameters and seems like it’s at least a good place to start trying to test. I added everything to the google drive.
I started a new public github [repository](https://github.com/wtkns/birdsong) for coding. In the repo I set up a venv python environment and installed the library scikit-image which has both [FFT](https://scikit-image.org/docs/0.3/api/scikits.image.transform.html#fft) and [edge detection](https://scikit-image.org/docs/dev/auto_examples/edges/plot_edge_filter.html) plus other tools. I’ll probably just work in a single development branch (simpler). I’m assuming I’ll follow a roadmap something like this:
1. Get some image fft results
2. Figure out how to simplify the results to 15 real numbers
3. Export a few of these number sets to a text file
4. Test the numbers in SC to see how they actually sound
5. If they sound “right” then figure out how to pipe OSC from Python into SC
6. Figure out how to generate the audio in SC as non-realtime (NRT) and save it as a .wav
7. Figure out the scripting to tie all the bits together. Ideally the final product will take a folder of images and output a folder of wav files.
I’m not confident enough in my understanding of Fourier analysis to have a sense yet of what step 1+2 looks like. Most of the fft image research I’ve found online is about converting an image via fft; then filtering or otherwise modifying the image in the frequency domain and converting it back. What I need to accomplish is converting an image into a unique set of 15 fractions. I’m guessing I filter the image frequency data into 15 bins and get the magnitude of each?
Next week I’ll keep looking and trying to grok DFT and generate some image fft results in python. I’ve requested the following from the library based on a recommendation in the [scipy documentation](https://docs.scipy.org/doc/scipy/tutorial/fft.html#nr07). If you have a book or tutorial suggestion on DFT/FFT let me know.
[Numerical recipes : the art of scientific computing](https://alliance-primo.hosted.exlibrisgroup.com/permalink/f/kjtuig/CP71137747220001451)
William H. Press, author.
2007 Cambridge, UK ; New York : Cambridge University Press
----
#### **1/14/22:** Research and Literature Review (bird-sounds)
This week was kind of low energy for me (allergies or covid, still undecided) and it’s been a busy week at work since we’re trying to pivot to a reduced staffing model at the library. I’ve spent a few hours today on the project.
I reviewed [Neurally driven synthesis of learned, complex vocalizations](https://drive.google.com/file/d/1qyZLEqlHgDef9XpE8oJRTqL_imtF9aEn/view?usp=sharing) and I did understand a few of the words that the authors used. It seems they are working toward building a functioning “neurally driven vocal prosthesis for songbirds” and I think the synthesis model is [here](https://github.com/zekearneodo/syrinxsynth). It sounded kind of sweet at first, an artificial syrinx for birds. When I realized their data was collected via bird craniotomy and implantation of “an in-house designed, printable microdrive“ I lost my personal enthusiasm for the project. But, if you need an stl model to 3d-print a gas anesthesia mask for a zebrafinch they have one of [those](https://github.com/singingfinch/bernardo/tree/master/hardware/finch_anesthesia_mask).
Not especially practical, but it seems relevant to me -- I spent a little time reading about [Messiaen’s](https://alliance-primo.hosted.exlibrisgroup.com/permalink/f/kjtuig/CP71251774490001451) exploration of birdsong, found some references to Deleuze’s opinions about Messiaen but have yet to look for the source materials, and listened to [Oiseaux Exotiques](https://youtu.be/dttUzAlDsRg) (1955-56) and a few other samples from his work. Call me a philistine, I preferred [Le Merle Noir](https://music.youtube.com/watch?v=hT8MQpg7oTo&feature=share).
More practically, I’ve installed Pd and downloaded the Farnell patch. It works, but I haven’t used Pd enough to understand exactly what I’m looking at (image attached). The full Farnell textbook arrived today at the library. I’m only onsite on Wednesday next week, so I’ll pick up the book then.
I have also looked into a web based method for compiling Pd to C source code. Unfortunately, the patch didn’t compile right out of the box, vline~ isn’t implemented in the compiler. I switched that to line~ (naïve, but hey, maybe?) and it [compiled](https://www.rebeltech.org/patch-library/patch/Farnell_Birdcall), but doesn’t seem to work properly. Rather than C source code it gives me .js and sysex code. The python based [compiler](https://github.com/enzienaudio/hvcc) is also on github.
Oh, and I’ve attached the sample audio from the Fornari EA paper using the Farnell PD patch. It’s also in the shared drive. It should give you some idea of the range of the patch.
Next week I’ll try to spend some time reading Farnell and understanding the Pd model. I’ll also look to see how we can connect python to Pd, in case we can’t get the Pd to compile. Presumably osc? I’ll also look to see if there’s another better option for porting the code to make it easier to script.
I haven’t looked a the fft side of this yet, but I’ll try to start on that as well.
----
#### **1/7/22:** Research and Literature Review (synthesis)
This week I’ve spent about 4 hours on research and documentation. I set up Zotero and a google drive to hold resources and [articles](https://drive.google.com/drive/folders/1o6pTveWFVR2g3UBCQHz2bkmfdueGxzhs?usp=sharing). There’s also a google doc for my [literature review](https://docs.google.com/document/d/1OHmvjhWQkqAmF9wMjCKmohbFrY3HhWMAgQFO9z0rczk/edit?usp=sharing). It’s just a collection of notes for now, but could be formalized later.
I’ve collected and reviewed some potentially useful sources. One source that I’m following up on is a [2012 conference paper](https://drive.google.com/file/d/16vVoKDrzE7YapvK_mjKFXH9qqwwE8NWT/view?usp=drivesdk) describing “A Computational Environment for the Evolutionary Sound Synthesis of Birdsongs” (Fornari). Fornari created a Pure Data (PD) patch which converts Twitter data (ha ha) into control parameters for birdsong synthesis based on physical modeling of the syrinx.
The synthesis model is documented thoroughly in the book _[Designing Sound (2010)](https://alliance-primo.hosted.exlibrisgroup.com/permalink/f/kjtuig/CP71185021090001451)_ (Farnell) which is available online, and the code is [here](http://mitpress2.mit.edu/designingsound/birds.asp). It has 16 control parameters describing aspects of the birdsong (eg. Beak articulation, Fundamental and Formant frequencies, resonance rates, etc.). The Farnell patch is “loosely” based on an earlier CSound program by Hans Mikelson. I’ve also seen a [Max port](https://github.com/unriginal/Designing-Sound-Max-Patches) of the Farnell patch, and I wouldn’t be surprised if there’s a supercollider version out there.
I’ve only skimmed the technical part of Fornari’s paper but it appears each 16-tuple collection of parameter values comprise an individual “bird” object, and then the author uses an evolutionary algorithm (EA) to iteratively modify the parameters as generations of “birds” reproduce, varying the sound over time. FFT data from an image wouldn’t be any more arbitrary than a string of characters from a Twitter tweet.
The sonic results of the EA were pretty varied. Some of them sounded like birds, some sounded like R2-D2, some sounded like Bernard Parmegiani whistling underwater. There is an audio sample of the EA soundscape at this tragically spammy and [highly unsafe website link](http://www.4shared.com/audio/gEsDwkNw/soundsample.html.) (warning: I was only able to get it to work once on my ipad, and otherwise it’s just produced a miasma of link popups and probably toxic cookies).
**Moving forward,** I’d like to continue literature review and see what else is out there, I’m still not totally sold on physical modelling, even though it is low-hanging fruit. I also found a paper on “Pure-tone birdsong by resonance filtering of harmonic overtones” and _NIPS4Bplus,_ a “richly annotated birdsong audio dataset” that could possibly be used for training _SampleRNN_.
I’ll read the Farnell chapter and try to generate some sample samples. I’ll also extend my research to start including converting images to spectral domain with FFT, and see what libraries we might use for analysis. I’ll also continue tuning my zotero/goodnotes workflow.
---
#### **1/3/22:** Initial Discussions
initial discussions with [James Coupe](http://jamescoupe.com/)
---
---
## Code:
**github repository:** https://github.com/wtkns/birdsong
Open in Colab: https://colab.research.google.com/github/wtkns/birdsong/blob/main/birdsong.ipynb
---
## Meeting Notes and Agenda
### Meeting: 2022-04-07
**Participants:** me, [[Contacts#James Coupe|james]]
#### Agenda
* Review Progress from Winter
* Outline direction for spring
#### Notes
possible directions:
##### generate or modify from descriptions of sounds
dal-e
dal-e 2
(vocabulary?)
how to describe the song?
bird song descriptions (in words?)
learn relationship between audio and text
describe and then generate the song
gan (text to sound)
##### compare sound output for multiple images
image features > sound
circular image vs rectangle?
##### morphology
image features (what counts?)
* zoom in on features
* long beak?
* Bird size?
* other features?
---
### Meeting: 2022-02-23
**Participants:** me, [[Contacts#James Coupe|james]]
#in-person #Raitt
#### Agenda
* review progress
* outline direction and goals
#### Notes
Goal: Minimum viable workflow take in an image and generate a birdlike sound
---
## Resources
### Contacts
### Links
### Bibliography
[Numerical recipes : the art of scientific computing](https://alliance-primo.hosted.exlibrisgroup.com/permalink/f/kjtuig/CP71137747220001451)
William H. Press, author.
2007 Cambridge, UK ; New York : Cambridge University Press
Arneodo, Ezequiel M., Shukai Chen, Daril E. Brown, Vikash Gilja, and Timothy Q. Gentner. “Neurally Driven Synthesis of Learned, Complex Vocalizations.” _Current Biology_ 31, no. 15 (August 9, 2021): 3419-3425.e5. [https://doi.org/10.1016/j.cub.2021.05.035](https://doi.org/10.1016/j.cub.2021.05.035).
Beckers, Gabriël J. L., Roderick A. Suthers, and Carel ten Cate. “Pure-Tone Birdsong by Resonance Filtering of Harmonic Overtones.” _Proceedings of the National Academy of Sciences of the United States of America_ 100, no. 12 (2003): 7372–76.
Berliner Philharmoniker. _Messiaen: Oiseaux Exotiques / Uchida · Rattle · Berliner Philharmoniker_, 2014. [https://www.youtube.com/watch?v=dttUzAlDsRg](https://www.youtube.com/watch?v=dttUzAlDsRg).
Chapin, Keith, and Andrew H. Clark. _Speaking of Music: Addressing the Sonorous_. US, UNITED STATES: Fordham University Press, 2013. [http://ebookcentral.proquest.com/lib/washington/detail.action?docID=3239827](http://ebookcentral.proquest.com/lib/washington/detail.action?docID=3239827).
Rebel Technology. “Compile Pure Data Patches with Free Online Heavy Compiler,” September 12, 2018. [https://www.rebeltech.org/2018/09/12/compile-pure-data-patches-with-free-online-heavy-compiler/](https://www.rebeltech.org/2018/09/12/compile-pure-data-patches-with-free-online-heavy-compiler/).
Rebel Technology. “Compiling a Pd Patch - OWL,” November 13, 2017. [https://community.rebeltech.org/t/compiling-a-pd-patch/895](https://community.rebeltech.org/t/compiling-a-pd-patch/895).
Dingle, Christopher. “A Catalogue of Messiaen’s Birds.” In _Messiaen Perspectives 2: Techniques, Influence and Reception_, 133–66. Routledge, 2013. [https://doi.org/10.4324/9781315595061-19](https://doi.org/10.4324/9781315595061-19).
“First Light Harold Budd - Google Search.” Accessed January 19, 2022. [https://www.google.com/search?q=first+light+harold+budd&rlz=1C1CHBF_enUS981US981&biw=1515&bih=1189&sxsrf=AOaemvJqKW5uMemMu4yVZ8PoGR57gSC3Sw%3A1642625147279&ei=e3joYbLRDO2u0PEPw_GdgAI&gs_ssp=eJzj4tFP1zcsNM1KKklLLzBg9BJPyywqLlHIyUzPKFHISCzKz0lRSCpNSQEAAqgNew&oq=first+light+harold+budd&gs_lcp=Cgdnd3Mtd2l6EAMYADIHCC4QsAMQJzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwA0oECEEYAEoECEYYAFAAWABguRJoAXACeACAAQCIAQCSAQCYAQDIAQnAAQE&sclient=gws-wiz](https://www.google.com/search?q=first+light+harold+budd&rlz=1C1CHBF_enUS981US981&biw=1515&bih=1189&sxsrf=AOaemvJqKW5uMemMu4yVZ8PoGR57gSC3Sw%3A1642625147279&ei=e3joYbLRDO2u0PEPw_GdgAI&gs_ssp=eJzj4tFP1zcsNM1KKklLLzBg9BJPyywqLlHIyUzPKFHISCzKz0lRSCpNSQEAAqgNew&oq=first+light+harold+budd&gs_lcp=Cgdnd3Mtd2l6EAMYADIHCC4QsAMQJzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwA0oECEEYAEoECEYYAFAAWABguRJoAXACeACAAQCIAQCSAQCYAQDIAQnAAQE&sclient=gws-wiz).
Fornari, José. “A Computational Environment for the Evolutionary Sound Synthesis of Birdsongs.” In _Evolutionary and Biologically Inspired Music, Sound, Art and Design_, 96–107. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, n.d. [https://doi.org/10.1007/978-3-642-29142-5_9](https://doi.org/10.1007/978-3-642-29142-5_9).
Kahrs, Mark, and Federico Avanzini. “Computer Synthesis Of Bird Songs And Calls.” _The Journal of the Acoustical Society of America_ 111 (December 2, 2001). [https://doi.org/10.1121/1.4778131](https://doi.org/10.1121/1.4778131).
“Kinematics of Birdsong: Functional Correlation of Cranial Movements and Acoustic Features in Sparrows | Journal of Experimental Biology | The Company of Biologists.” Accessed January 5, 2022. [https://journals-biologists-com.offcampus.lib.washington.edu/jeb/article/182/1/147/6508/Kinematics-of-birdsong-functional-correlation-of](https://journals-biologists-com.offcampus.lib.washington.edu/jeb/article/182/1/147/6508/Kinematics-of-birdsong-functional-correlation-of).
medici.tv. “Olivier Messiaen, The Crystal Liturgy.” Accessed January 14, 2022. [https://edu-medici-tv.offcampus.lib.washington.edu/en/documentaries/olivier-messiaen-the-crystal-liturgy/](https://edu-medici-tv.offcampus.lib.washington.edu/en/documentaries/olivier-messiaen-the-crystal-liturgy/).
Philharmonia Orchestra (London, UK). _Olivier Messiaen 1908-1992: Messiaen’s Use of Birdsong_, 2008. [https://www.youtube.com/watch?v=0MgLXeaf3zc](https://www.youtube.com/watch?v=0MgLXeaf3zc).
UpLevel. “[Project] Using Deep Learning to Identify Birdsong.” Accessed January 5, 2022. [https://projects.uplevel.work/features/using-deep-learning-identify-birdsongs-convolutional-neural-network](https://projects.uplevel.work/features/using-deep-learning-identify-birdsongs-convolutional-neural-network).
Turčoková, Lucia. “Diversity in Bird Song.” _Sylvia_ 47 (January 1, 2011): 1–16.
Zúñiga, Jorge, and Joshua D. Reiss. “Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization.” Audio Engineering Society, 2019. [https://www.aes.org/e-lib/browse.cfm?elib=20578](https://www.aes.org/e-lib/browse.cfm?elib=20578).
### data:
@techreport{WahCUB_200_2011,
Title = {{The Caltech-UCSD Birds-200-2011 Dataset}},
Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},
Year = {2011}
Institution = {California Institute of Technology},
Number = {CNS-TR-2011-001}
}