sábado, 4 de fevereiro de 2017
Generating predictive videos using deep-learning
http://robohub.org/generating-predictive-videos-using-deep-learning/
SoundNet: Learning Sound Representations from Unlabeled Video
SoundNet
We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual models (e.g. ImageNet and PlacesCNN) into the sound modality using unlabeled video as a bridge.
Requirements
- torch7
- torch7 audio (and sox)
- torch7 hdf5 (only for feature extraction)
- probably a GPU
Assinar:
Comentários (Atom)

