Serenity
Architected a LangGraph-based multi-agent pipeline exposed via a FastAPI streaming interface, orchestrating complex NLP workflows including 176-language detection, Albert-based emotion classification, and LLM-powered safety guardrails. Engineered an advanced hybrid RAG system leveraging Pinecone for combined sparse (BM25) and dense vector search, augmented with cross-encoder reranking, relevance grading, and an automated query rewrite loop. Fine-tuned a Qwen2.5 LLM using Direct Preference Optimization (DPO) to align the model's generation with safe, empathetic domain-specific guidelines, seamlessly integrating the local checkpoint alongside low-latency Groq API routing.
Technologies Used
Problem Statement
Enterprises deploying conversational AI face critical risks regarding brand safety, compliance, and user trust. Unpredictable LLM behavior, hallucinations, and inappropriate responses can lead to severe PR crises and legal liabilities. Furthermore, generic chatbots fail to recognize user sentiment or handle complex, multi-step NLP workflows efficiently across diverse languages, rendering them ineffective for sensitive customer service operations.
Solution
Serenity provides an enterprise-grade, guardrailed conversational architecture. It employs a LangGraph-based multi-agent pipeline that enforces safety constraints and emotion classification (across 176 languages) before generation. The solution leverages a hybrid RAG system (Pinecone + BM25) with cross-encoder reranking to ensure responses are grounded in factual, retrieved data. Crucially, by fine-tuning a Qwen2.5 model via Direct Preference Optimization (DPO), Serenity aligns model behavior with safe, empathetic corporate guidelines, minimizing reputational risk while delivering highly accurate and emotionally intelligent user interactions.
Key Features
LangGraph-based multi-agent pipeline orchestration
FastAPI streaming interface for responsive user experiences
176-language detection and Albert-based emotion classification
Advanced hybrid RAG system (Pinecone dense + BM25 sparse search)
Cross-encoder reranking, relevance grading, and automated query rewriting
Qwen2.5 LLM fine-tuned via Direct Preference Optimization (DPO)
Low-latency routing via Groq API integrated with local checkpoints
Engineering Challenges
Balancing low-latency streaming requirements with the overhead of cross-encoder reranking
Curating high-quality preference data for effective DPO fine-tuning
Orchestrating complex, multi-stage LangGraph cyclic workflows reliably
Results & Metrics
Successfully aligned local LLM generation to strict domain-specific empathetic guidelines
Eliminated hallucination-based edge cases via strict relevance grading loops
Achieved high-throughput conversational AI safe for enterprise deployment
Lessons Learned
Direct Preference Optimization (DPO) provides powerful behavioral alignment without the complexity of PPO
Hybrid search and reranking are non-negotiable for enterprise-grade RAG systems
Agentic architectures require robust fallback mechanisms for safe failure handling