Pac-Man Deep Q-Network (DQN)
Authored a Deep Q-Network (DQN) from scratch using TensorFlow and the OpenAI Gym library, orchestrating the full reinforcement learning lifecycle to train an autonomous agent to master Ms. Pac-Man. Designed a custom Convolutional Neural Network (CNN) for raw pixel state processing, optimizing game frame feature extraction via custom preprocessing pipelines. Implemented critical RL stability mechanisms, including an Experience Replay buffer to break temporal correlations and an epsilon-greedy decay strategy to balance exploration and exploitation.
Technologies Used
Problem Statement
Developing autonomous systems capable of sequential decision-making in dynamic, complex environments is a fundamental challenge in robotics, logistics, and automated trading. Traditional rules-based systems scale poorly when state spaces explode (e.g., millions of potential pixel states). Businesses need intelligent agents that can learn optimal strategies directly from raw environmental feedback without explicit human programming.
Solution
The Pac-Man DQN serves as a foundational proof-of-concept for deep reinforcement learning in complex state spaces. By authoring a Deep Q-Network from scratch and designing a custom CNN to process raw pixel data, the project demonstrates how an agent can learn optimal policies purely through trial and reward. Implementing critical stability mechanisms like Experience Replay and epsilon-greedy decay proves the ability to engineer reliable RL systems. This underlying technology translates directly to business applications such as dynamic pricing algorithms, autonomous supply chain routing, and adaptive control systems.
Key Features
End-to-end Deep Q-Network (DQN) authored from scratch
Integration with OpenAI Gym for environment simulation
Custom Convolutional Neural Network (CNN) for processing raw pixel states
Experience Replay buffer to break temporal correlations in training data
Epsilon-greedy decay strategy for exploration-exploitation balancing
Custom game frame preprocessing pipeline
Engineering Challenges
Stabilizing the reinforcement learning training loop to prevent catastrophic forgetting
Managing the high dimensionality and partial observability of raw pixel inputs
Tuning the epsilon decay and learning rate for efficient convergence
Results & Metrics
Successfully trained an autonomous agent capable of mastering Ms. Pac-Man gameplay
Demonstrated sustained capability in balancing exploration and exploitation
Validated the custom CNN's ability to extract meaningful features from raw pixels
Lessons Learned
Experience Replay is mandatory for breaking temporal correlations in sequential RL data
Hyperparameter tuning in RL requires significantly more patience and systematic tracking than supervised learning
State preprocessing (e.g., frame stacking, grayscale) drastically reduces computational overhead