EmoHydra: Multimodal Emotion Recognition using Logit Transference

Abstract

Affective computing is a field of growing importance, as human society becomes more integrated with machines. This work introduces EmoHydra, a multimodal model created through the fusion of three top-level models fine-tuned on text, vision, and speech, respectively. The model processes these modalities and then fuses them into a single framework that handles multimodal features, allowing for the learning of how each modality affects the prediction of human emotions.

Research Paper

For a detailed exploration of our methods and findings, read our preprint:

EmoHydra on arXiv

Demonstration Video

See EmoHydra in action:

About the Authors

Information about the research team behind EmoHydra. This project is a collaboration between researchers at the Department of Computing and Software Engineering, Kennesaw State University.