Reinforcement Learning

Pac-Man Deep Q-Network (DQN)

Authored a Deep Q-Network (DQN) from scratch using TensorFlow and the OpenAI Gym library, orchestrating the full reinforcement learning lifecycle to train an autonomous agent to master Ms. Pac-Man. Designed a custom Convolutional Neural Network (CNN) for raw pixel state processing, optimizing game frame feature extraction via custom preprocessing pipelines. Implemented critical RL stability mechanisms, including an Experience Replay buffer to break temporal correlations and an epsilon-greedy decay strategy to balance exploration and exploitation.

January 20, 2026
Source Code

Technologies Used

TensorFlowOpenAI GymCNNPython

Problem Statement

Developing autonomous systems capable of sequential decision-making in dynamic, complex environments is a fundamental challenge in robotics, logistics, and automated trading. Traditional rules-based systems scale poorly when state spaces explode (e.g., millions of potential pixel states). Businesses need intelligent agents that can learn optimal strategies directly from raw environmental feedback without explicit human programming.

Solution

The Pac-Man DQN serves as a foundational proof-of-concept for deep reinforcement learning in complex state spaces. By authoring a Deep Q-Network from scratch and designing a custom CNN to process raw pixel data, the project demonstrates how an agent can learn optimal policies purely through trial and reward. Implementing critical stability mechanisms like Experience Replay and epsilon-greedy decay proves the ability to engineer reliable RL systems. This underlying technology translates directly to business applications such as dynamic pricing algorithms, autonomous supply chain routing, and adaptive control systems.

Key Features

End-to-end Deep Q-Network (DQN) authored from scratch

Integration with OpenAI Gym for environment simulation

Custom Convolutional Neural Network (CNN) for processing raw pixel states

Experience Replay buffer to break temporal correlations in training data

Epsilon-greedy decay strategy for exploration-exploitation balancing

Custom game frame preprocessing pipeline

Engineering Challenges

01

Stabilizing the reinforcement learning training loop to prevent catastrophic forgetting

02

Managing the high dimensionality and partial observability of raw pixel inputs

03

Tuning the epsilon decay and learning rate for efficient convergence

Results & Metrics

Successfully trained an autonomous agent capable of mastering Ms. Pac-Man gameplay

Demonstrated sustained capability in balancing exploration and exploitation

Validated the custom CNN's ability to extract meaningful features from raw pixels

Lessons Learned

💡

Experience Replay is mandatory for breaking temporal correlations in sequential RL data

💡

Hyperparameter tuning in RL requires significantly more patience and systematic tracking than supervised learning

💡

State preprocessing (e.g., frame stacking, grayscale) drastically reduces computational overhead