Abstract
Affective computing is a field of growing importance, as human society becomes more integrated with machines. This work introduces EmoHydra, a multimodal model created through the fusion of three top-level models fine-tuned on text, vision, and speech, respectively. The model processes these modalities and then fuses them into a single framework that handles multimodal features, allowing for the learning of how each modality affects the prediction of human emotions.
Research Paper
For a detailed exploration of our methods and findings, read our preprint:
EmoHydra on arXivDemonstration Video
See EmoHydra in action:
About the Authors
Information about the research team behind EmoHydra. This project is a collaboration between researchers at the Department of Computing and Software Engineering, Kennesaw State University.