SoundNet

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual models (e.g. ImageNet and PlacesCNN) into the sound modality using unlabeled video as a bridge.

Visualization of learned conv1 filters:

Requirements

torch7
torch7 audio (and sox)
torch7 hdf5 (only for feature extraction)
probably a GPU

Inteligência artificial

Páginas

sábado, 4 de fevereiro de 2017

Generating predictive videos using deep-learning

SoundNet: Learning Sound Representations from Unlabeled Video

SoundNet

Requirements