VibeTune: Multi-Modal Emotion-Based Music Recommender
Production-ready emotion detection with real-time Spotify integration
Production-ready emotion detection with real-time Spotify integration
Overview
VibeTune is a production-ready web application that analyzes your emotional state through three different modalities—face, voice, and text—and provides personalized music recommendations from Spotify to match your current vibe.
Deployed with one-click on Render, featuring lazy-loaded ML models, automated CI/CD pipeline, Docker containerization, and comprehensive monitoring with Sentry error tracking and Prometheus metrics.
Key Features
Face Analysis
ResNet50 fine-tuned on RAF-DB dataset achieving 74% validation accuracy. Detects 7 emotions from webcam or uploaded images.
Text Analysis
DistilRoBERTa transformer model from Hugging Face for contextual emotion classification from user-provided text with 6-emotion support.
Voice Analysis
Wav2Vec2-based speech emotion recognition supporting both live recording and audio file uploads with multi-emotion detection.
Spotify Integration
Real-time playlist generation with emotion-aligned tracks, album art, and 30-second audio previews across 8 emotion categories.
Production Features
Lazy-loaded ML models, automated CI/CD pipeline with GitHub Actions, Docker containerization with health checks, and comprehensive error tracking.
Monitoring & Metrics
Sentry error tracking, Prometheus metrics exposure, Grafana dashboards, and in-app memory management controls.
Model Architecture
System Architecture Diagram
End-to-end multi-modal emotion detection pipeline with Spotify integration

Face Emotion Detection
Model: ResNet50 fine-tuned on RAF-DB dataset
Performance: 74% validation accuracy on test set
Emotions: Happy, Sad, Angry, Surprised, Fearful, Disgust, Neutral
Input: Webcam capture or image upload
Text Emotion Classification
Model: j-hartmann/emotion-english-distilroberta-base
Architecture: DistilRoBERTa transformer (pre-trained)
Emotions: 6-class emotion classification
Input: User-provided text (any length)
Voice Emotion Recognition
Model: ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition
Architecture: Wav2Vec2 trained on speech emotion datasets
Input: Live audio recording or file upload
Processing: Librosa for audio feature extraction
Tech Stack
ML Frameworks
Audio & Vision Processing
Web & API
DevOps & Monitoring
Implementation Highlights
Deployment & Monitoring
Deployment Stack
Docker runtime with automatic builds
Health checks and auto-deploy on push
Pre-configured environment variables
Prometheus metrics on port 9100
Render one-click deployment
Monitoring & Observability
Sentry for error tracking & tracing
Prometheus metrics exposure at /metrics
Grafana dashboards (optional local stack)
Environment-based configuration
In-app memory management controls