Multi-modal learning is a model to represent the joint representations of different modalities. The multi-modal learning model combines two deep Boltzmann machines each corresponds to one modality. An additional hidden layer is placed on top of the two Boltzmann Machines to give the joint representation. The goal of this research is profiling a person's preference through verbal and non-verbal data(Audio and Video) with Deep Learning Architecture.
Profiling using Multi-modal Deep Learning
Deep Learning Architecture for Gesture Analysis