Generative AI & Agents

Serenity

Architected a LangGraph-based multi-agent pipeline exposed via a FastAPI streaming interface, orchestrating complex NLP workflows including 176-language detection, Albert-based emotion classification, and LLM-powered safety guardrails. Engineered an advanced hybrid RAG system leveraging Pinecone for combined sparse (BM25) and dense vector search, augmented with cross-encoder reranking, relevance grading, and an automated query rewrite loop. Fine-tuned a Qwen2.5 LLM using Direct Preference Optimization (DPO) to align the model's generation with safe, empathetic domain-specific guidelines, seamlessly integrating the local checkpoint alongside low-latency Groq API routing.

March 15, 2026

Source Code

Technologies Used

LangGraphFastAPIPineconeBM25Qwen2.5DPOGroqPython

Problem Statement

Enterprises deploying conversational AI face critical risks regarding brand safety, compliance, and user trust. Unpredictable LLM behavior, hallucinations, and inappropriate responses can lead to severe PR crises and legal liabilities. Furthermore, generic chatbots fail to recognize user sentiment or handle complex, multi-step NLP workflows efficiently across diverse languages, rendering them ineffective for sensitive customer service operations.

Solution

Serenity provides an enterprise-grade, guardrailed conversational architecture. It employs a LangGraph-based multi-agent pipeline that enforces safety constraints and emotion classification (across 176 languages) before generation. The solution leverages a hybrid RAG system (Pinecone + BM25) with cross-encoder reranking to ensure responses are grounded in factual, retrieved data. Crucially, by fine-tuning a Qwen2.5 model via Direct Preference Optimization (DPO), Serenity aligns model behavior with safe, empathetic corporate guidelines, minimizing reputational risk while delivering highly accurate and emotionally intelligent user interactions.

Key Features

LangGraph-based multi-agent pipeline orchestration

FastAPI streaming interface for responsive user experiences

176-language detection and Albert-based emotion classification

Advanced hybrid RAG system (Pinecone dense + BM25 sparse search)

Cross-encoder reranking, relevance grading, and automated query rewriting

Qwen2.5 LLM fine-tuned via Direct Preference Optimization (DPO)

Low-latency routing via Groq API integrated with local checkpoints

Engineering Challenges

Balancing low-latency streaming requirements with the overhead of cross-encoder reranking

Curating high-quality preference data for effective DPO fine-tuning

Orchestrating complex, multi-stage LangGraph cyclic workflows reliably

Results & Metrics

Successfully aligned local LLM generation to strict domain-specific empathetic guidelines

Eliminated hallucination-based edge cases via strict relevance grading loops

Achieved high-throughput conversational AI safe for enterprise deployment

Lessons Learned

💡

Direct Preference Optimization (DPO) provides powerful behavioral alignment without the complexity of PPO

💡

Hybrid search and reranking are non-negotiable for enterprise-grade RAG systems

💡

Agentic architectures require robust fallback mechanisms for safe failure handling

Related Projects

Generative AI & Agents

FlexiHire

A dual-sided intelligent matching recommender system connecting job seekers and companies via multi-agent AI.