Journal Reflection #3: Insights into your first AI/ML Pygame project

From https://sjpl.bibliocommons.com/events/uploads/images/full/ad1b09d8293117cbbc8bfadf3b63caaf/10-Programming-Languages-for-Game-Development-e1622712606533.jpg

Please use your reply to this blog post to detail the following:

  1. Please give a full description of the nature of your first AI/ML game project.
  2. What was the steepest part of the learning curve for this project? Was it learning how to implement the AI or how to use the Pygame/Pyglet library? Please elaborate and explain your answer.
  3. What went “right” with your project? As in, what worked seamlessly? What went “wrong” with your project? As in, what were your biggest hurdles or where did you have the most trouble debugging or getting your project to run?
  4. Describe the AI/ML algorithm your game implements. Did you work through a tutorial you found online? Did you start from scratch because you were motivated by a particular game or algorithm and you wanted to implement it using Pygame/Pyglet?
  5. If you had to teach this class next year, what project would you recommend to students in the Advanced Topics class to give them a broad and comprehensive overview of some fundamental AI algorithms to implement in a game?
  6. Include your Github repo URL so your classmates can look at your code.

Take the time to look through the project posts of your classmates. If you saw any project or project descriptions that pique your interest, please reply or respond to their post with feedback. Constructive criticism is allowed, but please keep your comments civil.

This entry was posted in Uncategorized. Bookmark the permalink.

12 Responses to Journal Reflection #3: Insights into your first AI/ML Pygame project

  1. Aarav Prakash says:

    Please give a full description of the nature of your first AI/ML game project.
    My AI/ML project is an AI that plays the first level of Super Mario Bros(the original). It uses reinforcement learning through a torch neural network. Also, I used the OpenAI Gym Enviroment called Gym Super Mario Bros to train my AI. After this, I replicated the gym environment in PyGame and ran my AI. I did not write the PyGame project for Super Mario Bros, I only modified the main loop of it in order to allow my AI to get the game’s input and control Mario.
    What was the steepest part of the learning curve for this project? Was it learning how to implement the AI or how to use the Pygame/Pyglet library? Please elaborate and explain your answer.
    The steepest part of the learning curve for me was learning how to implement the AI. I had never used torch before, and the concept of storing certain events in the “memory” was difficult for me to understand. I also had quite a bit of difficulty understanding the theoretical elements of implementing the AI. I had to watch a bunch of youtube videos explaining the concepts and breaking down how the Agent I made actually works.

    What went “right” with your project? As in, what worked seamlessly? What went “wrong” with your project? As in, what were your biggest hurdles or where did you have the most trouble debugging or getting your project to run?
    I think that I had a lot of difficulty with this project. Nothing really went seamlessly when I was developing the AI. However, in the end, I think the training algorithm for my AI worked well, but required quite a bit of processing power. If I had known how much processing power my AI would require, I would have definitely tried training it on a GPU earlier.
    Describe the AI/ML algorithm your game implements. Did you work through a tutorial you found online? Did you start from scratch because you were motivated by a particular game or algorithm and you wanted to implement it using Pygame/Pyglet?
    I worked of a youtube tutorial that gave me the concepts I needed to implement, and then I looked up how to implement those concepts. Usually, Gemini was able to give me those concepts or I would dive deep into stack overflow. Surprisingly, there were quite a few tutorials on how to make a Super Mario Bros AI, but the majority of them aren’t recent, and so I ended up having difficulties with package management. I decided to use the most recent one, but it didn’t provide any actual code, only pseudocode.

    If you had to teach this class next year, what project would you recommend to students in the Advanced Topics class to give them a broad and comprehensive overview of some fundamental AI algorithms to implement in a game?
    I would recommend that students learn something that has multiple controls and and environment to read. Super Mario Bros was quite challenging because it took a lot of processing power. However, I think it helped me gain an understanding of how game AIs analyize their environment. I would recommend something simple, two dimensional, with no more than 5 possible controls, and a simple environment to input.
    Include your Github repo URL so your classmates can look at your code.
    https://github.com/A0Prakash/Project_02_gAIme-Super-Mario-Bros.git

  2. Ori Moore says:

    Please give a full description of the nature of your first AI/ML game project.
    My AI game project was a bot to play the game Hnefatafl, a viking board game sometimes nicknamed “viking chess”. This project was a collaboration with Ryan, hoping to have our bots play each other.

    What was the steepest part of the learning curve for this project? Was it learning how to implement the AI or how to use the Pygame/Pyglet library? Please elaborate and explain your answer.
    The hardest part for me was figuring out how to do the training along with PyGame. My solution was to not actually use the PyGame UI in my training, but that made it difficult to follow what was going on. I also didn’t have experience with neural networks before doing this project, and I had trouble conceptually understanding what goes on in a neural network and what type of network would work the best for my project. Because of the complicated spatial relationships in Hnefatafl, I chose to do Deep Q learning with a Convolutional Neural Network.

    What went “right” with your project? As in, what worked seamlessly? What went “wrong” with your project? As in, what were your biggest hurdles or where did you have the most trouble debugging or getting your project to run?
    I eventually got a bot that plays very well and beats me practically every time, but that was after a lot of model changing. There were several challenges though. One was that the two teams in Hnefatafl have different objectives, so I had to train separate models. Additionally, since Hnefatafl is a pretty obscure game, there’s no data to train on, so I had to train on self-play. The issue with self-play with my own models playing each other is that it’s hard to track progress of the model. Is one model doing better because it’s getting better or because the other one is getting worse? I did track win percentages over episodes and tried to make sure I didn’t have a model that was winning super rarely (<10% or so). I considered that a sign that one of my models was missing strategic moves, so I adjusted rewards and eventually decided to add simulations (explained in the next question).

    Describe the AI/ML algorithm your game implements. Did you work through a tutorial you found online? Did you start from scratch because you were motivated by a particular game or algorithm and you wanted to implement it using Pygame/Pyglet?
    I started from scratch and had some conversations with ChatGPT about approaches that might work for Hnefatafl. I started with a pretty simple DQN, but eventually realized that using a CNN in my DQN would help it understand Hnefatafl better. Later, I realized with the number of possible actions in Hnefatafal, I would have to use some simulation in addition to my reinforcement learning to get a model that made good moves. I designed my own simulation algorithm that creates a tree of possible future moves and then feeds its data into my DQN model’s decision making.

    If you had to teach this class next year, what project would you recommend to students in the Advanced Topics class to give them a broad and comprehensive overview of some fundamental AI algorithms to implement in a game?
    To be honest, I don’t really think doing a board game was the best implementation of reinforcement learning. I definitely think I got to learn about reinforcement learning, but I think I might have been able to get more out of reinforcement learning techniques with more of a video game than a board game.

    Include your Github repo URL so your classmates can look at your code.
    https://github.com/Ryan-Bauroth/Project02_PygAIme.git (see mooreo directory)

  3. Preston Swigart says:

    My project was based on The Dinosaur Game, which is a game about jumping over obstacles. It uses DeepQ learning to create an AI to play.
    The steepest part of the learning curve was figuring out how to implement the AI. I have some experience with PyGame (although not much), so figuring out how to make the game was not that challenging in comparison. It also helped that I forked someones repo and made changes, so the base game itself was already made, I just had to make some changes so it ran better. The AI was entirely unfamiliar to me, so it took a very long time to understand how to implement it well (and some time with chatgpt too).
    I think the game itself functions very well with the AI. I did not expect this level of functionality from my code, and it was a pleasant surprise. What didn’t work well for me was figuring out how to prevent my AI from not running itself into the ground. Sometimes, my AI would just stop jumping after a while, which was incredibly hard to deal with as I didn’t know how to change it. I also didn’t realize that my AI was random for an embarrasingly long time (thanks cameron you saved my life).
    I used DeepQ learning, and I chose this because everywhere I looked online had information about it and it seemed like a very good method. DeepQ learning is a process using a neural network to approximate “Q-Values”, which are values assigned to an action in a state. I followed a tutorial commonly known as ChatGPT, and had to work on adapting it to my code.
    If I taught the class, I would definitely recommend a board game rather than a video game as I was much more interested in others projects than my own, and watching/playing the board game AI looked super cool and watching my AI was much less fun. I also think that a board game would be able to give a good overview of the algorithms, which would be helpful.
    https://github.com/PrestonSwigart/Project02_pygAIme

  4. Joshua Yoon says:

    My pygAIme project is an Omok AI that plays and theoretically should be good at Omok. The AI combines two approaches of the Monte Carlo Tree Search Algorithm and Convolutional Neural Networks. The Monte Carlo Tree Search in itself is not a learning algorithm, it simply finds better moves as the game continues on, simulating optimal moves. So I attempted to combine it with a convolutional neural network, which would effectively summarize the game state in a form the AI would process easily, allowing a more efficient decision making process. The game state, in this case would be the state of the board that has black and white stones in a certain configuration, would be fed into the AI, connected to the source code of the game itself, and it would move every time it is their turn. I was inspired by the game between DeepMind’s alphaGo and Lee Sae Dol from 2016, as well as DeepMind’s Maister and Magister AIs that beat every single top ranking Go player afterwards in the biggest online Go platform shortly after the game happened. Omok was similar to Go in how it looked but a lot simpler, allowing me to take on the challenge.
    Implementing and constructing the AI was the hardest part for me. In order to create nodes in the tree search class, it had to carry out simulations of random moves to explore which moves yielded the best outcomes. I initially tried using the deepcopy method to make an immediate copy of the Omok board and then to simulate it from there. For some reason, pygame was not compatible with using deepcopy, or any method utilizing the copy package of python. The copy wasn’t working properly and the simulations were being carried right on the actual game class as opposed to a copy. So instead, I had to manually generate all the simulations by making a numpy array to replicate the board and copy the information down for every simulation.
    The part that was hard for me was combining the mcts and cnn. Normally, the CNN is supposed to generate a probability distribution of moves, but since my cnn model generated one best move, I had to make modifications so that it relied more on traditional mcts searching methods, where the best move is selected as the node to expand onto while still having back propagation carried out the same way, randomly. The part that worked well was constructing the mcts overall. I expected it to have a lot of problems at first regarding compatibility issues with the actual pygame, but the systematic implementation where I was able to connect the Omok pygame to the mcts go back and forth retrieving information to make the AI play was not hard at all.
    I used a combination of Monte Carlo Tree Search and Convolutional Neural Networks. The convolutional neural networks would output an “ideal” based on a deep analysis of the board. Then the move would be inputted into the monte carlo tree search so that the tree can focus more on the simulations based on the move. I first came across monte carlo tree search when I was researching AlphaGo and how it was very effective for two player strategy games that don’t involve luck. I found a connect four AI article that I based a lot of my project on. As I studied more about it, I found more limitations on implementing only the tree search, so I used ChatGPT and a youtube video I found online to construct and train a CNN omok AI.
    I think focusing more on game theory and how trees (like the one I used) can be an interesting one because it connects a lot back to fundamental computer science and mathematical theory on data structures including depth first search or tree traversal, which I had to learn from the internet. I think it would have been a lot helpful if I started off from knowing those things.
    https://github.com/dbstjrgus/Magister.git

  5. Ryan Bauroth says:

    1. My project is an AI model that is trained to play the game Hnefatafl. To do this, it uses a Deep Q-Learning Algorithm alongside Monte Carlo simulations and hard coding victory conditions.
    2. The hardest part of the learning curve for me was understanding the Q-Learning Algorithm. I initially had ChatGPT generate a lot of my code because my understanding of the algorithm wasn’t very strong to start with. This led me down a rabbit hole with a ton of errors because ChatGPT would miss small details—for example, rewards needed to be updated not only based on the original turn, but also on the opponent’s turn after as rewards needed to account for lost pieces. This meant that I spent a good bit of time debugging ChatGPT’s code until eventually I gave up. While I still did use AI to generate some of my code, I spent a lot more time learning how everything worked before I implemented it. In comparison to my last project, for which I generated my models and had them work first try and then I figured out how they functioned, this meant I had a lot more control over how I set up my algorithm.
    3. As you might imagine, the area I was most comfortable with (basic python script), was the area that felt the most seamless for my project. While learning PyGame did pose a bit of a challenge, once I understood it, building the majority of the game logic from scratch only took a couple hours one night. On the other hand, tuning my approach in order to have a successful playthrough was quite the challenge for me (and one that I didn’t fully succeed in). Starting with only the Q-Learning approach, I found that my model struggled to figure out how to win—a pretty important thing to learn. I believe this is because my model didn’t have enough time to train and might have gotten there eventually, but given my setup on a macbook running occasionally, I could only train it so much. Instead, I decided to pivot to including a Monte Carlo approach; however, this only meant more confusion as the documentation on the library I decided to use was terrible and I really had to dig through their example code to even get anything to work. Plus, once I got something running, it didn’t have the possibility to edit certain key features I would have liked. As such, I had to pivot once again, hard coding the victory conditions into the project. Ultimately, even with all this effort, I wasn’t able to build a bot as strong as I would have liked.
    4. I mention the process of building my algorithms in the answer above. All of them were built from scratch + some AI assistance. I did some reading on chess and snake AI, but ultimately these resources were not helpful to my end product (although fascinating).
    For this class next year, I probably wouldn’t recommend a board game as I have done as it was pretty complicated and wasn’t the easiest intro project. Instead, I might suggest something like the snake game. It seems pretty straightforward for those that want to get a good grasp of how the model they are training works, plus there are plenty of resources out there.
    5. https://github.com/Ryan-Bauroth/Project02_PygAIme

  6. Danielle Li says:

    1. I programmed an AI that learned how to play Flappy Bird using NEAT (NeuroEvolution of Augmenting Topologies) in a gym environment. I had obtained the basic Flappy Bird code from Code With Tim’s tutorial. The pipes were adjusted so that they moved up and down at random speeds and the direction that they moved was random as well in order to make the game harder. The bird didn’t move horizontally but just up and down. Instead the background/pipes moved horizontally i implemented NEAT in a gym environment to create my AI model. The model would run multiple generations of flappy birds with 20 birds, and at the end of each generation, the bird that survived the longest would be used to create the next generation. The model then learned which neural networks were strong and slowly learned how to play the game until it could completely play the game (highest score before having to manually quit was 12293).

    2. I think learning how the actual neural network works as well as reinforcement learning was the steepest part of the learning curve for this project. In the beginning, most of the information I received was only about how the nodes worked and how there were connections. However, I didn’t actually conceptually understand how the neural networks worked and mutated, so when I went to adjust the model and try to use it for my project, it brought upon a lot of bugs. I just didn’t understand how the code that was explained on youtube videos was related to machine learning. I didn’t know the difference between Deep Q learning and NEAT, so I wasn’t sure which to use in the beginning. I especially wasn’t sure how rewards worked because I heard my classmates understood this concept, but I didn’t know how to implement the AI into my project. I had to do a lot of personal research through youtube videos and reading papers, and then I started to understand how the genomes and nodes/connections worked. Learning how this worked mathematically helped a lot because I thought of it like my linear regression AI model from my last project where I had multiple inputs which had their own weightings or “slopes” but in this project, the weighting was constantly adjusted to find the best fit. The functions between the variables were sometimes created and deleted in order to find the best fit (bias acted like the y-intercept to vertically shift the values. Using the pyame library was also difficult because I didn’t understand how to download the different libraries in the beginning, so I had a lot of trouble trying to work the terminal. I had downloaded some libraries like tensorflow but the code refused to acknowledge that it was downloaded onto my computer. I also tried to fix things by adjusting my PATH but that ended up messing things up even more. I had to use chatgpt to get my terminal fixed, and I used other libraries to obtain my results.

    3. Adding and adjusting my inputs went well because it was similar to my last project with using multiple variables. It didn’t work seamlessly like in the content of my code, but implementing it went seamlessly except for when I forgot to adjust my configuration.txt to my input nodes being more than 3. Adding certain inputs had led my bird to learn less effectively, so that was something I had to later adjust, but the actual code implementation went pretty well. My biggest problem debugging was when it said that I didn’t have the right libraries in the beginning. I later found out that this was because my python libraries didn’t match up with the libraries I downloaded. I also had an issue when I tried to save my best bird model. For some reason, my pickle library wasn’t accepting the format that I was saving the file. Calling the file that contained my best bird model specifically to later play the game with only the best bird also caused a lot of trouble because I had to adjust the rest of my code so that it could then run a “best bird” model separately from a “train” model. The best bird model could then run by itself forever until it died as opposed to quitting when it reached the fitness threshold. The code writing here was really difficult because I wasn’t familiar with how the code for NEAT worked as well as pickle so I had used some of the commands wrong. My other problem that I ran into was that when I originally had multiple inputs, it caused my bird to overtrain and it ended up taking more generations to learn the game. I had to adjust this with different variables, in order to find a consistent best bird agent. I had to talk to ChatGPT in order to fix some of these problems because my code just wouldn’t run on those specific lines.

    4. I originally followed a few tutorials I found online using youtube and previously created githubs. I decided to pick Tech With Tim because he had an in depth explanation for all the coding decisions he had made. I had originally had a different flappy bird game but it used human controls and had a lot of methods that dealt with sound effects so I decided to not choose that one. Insead I decided to use the tutorial by Tech With Tim and work through it, but I also wanted to adjust the game in my own way so I didn’t keep everything that he had. I wanted to initially change how the game was played (new obstacles) as well as the inputs in order to see if the bird learned faster with more inputs. The project first creates an initial population of 20 birds with the neural network containing random weights for the connections between the input, hidden, and output nodes. The input nodes were the Bird’s y position, Bird’s velocity, distance from the pipe, y position of the top of the pipe, y position of the bottom pipe, the distance from the bird to the ground, the pipe’s speed, and the vertical direction of the pipe. These were deemed useful to determine if the bird should jump or not at that certain frame. Then these input values are connected to the hidden nodes, which mutate and evolve to allow the bird to perform better. The weighted sum and the bias are added together and then inputted into the activation function, and we used TanH in this project because that was the one that was used by Tech With Tim (successful). The function would then tell us a value between -1 and 1, and if the value is above 0.5, then the bird would jump. The bird would then fly with this neural network. A fitness reward system was then incorporated. First, each bird’s “fitness” increased by 0.1 for each frame the bird stayed alive, 5 for each pipe passed, 1 for being very close to the pipe, and -1 for colliding with the pipe. The birds, therefore, are encouraged to navigate through the pipes successfully. For each generation, the bird with the highest fitness score is kept and its neural network is mutated in order to create a new generation using the bird with the highest survival. This way, the birds can slowly evolve and learn how to play the game. This continues for multiple generations until the fitness threshold is set, and the bird that reaches this fitness threshold (should be able to completely play flappy bird) is saved as the “best bird”.

    5. I would start with Flappy Bird or the Google Dinosaur game because they’re the easiest to program and understand even if they had never learned AI algorithms for games because all the algorithm had to decide was if the agent (bird or dinosaur) should jump or not depending on predetermined inputs/reward system. The neural network needed to understand these games is relatively simple, and customizable if the student wanted to adjust to a different game that needed a reward system. If this class wanted to do more project that were based on an AI game, then I would shift to what Aarav had used for his Mario Bros game because that required not just the fitness system that I had implemented but also more in depth because of the different states that the agent had to learn (obstacle, springing, enemies, etc.). It required a lot more complex training and multiple AI algorithms. I wouldn’t recommend a board game because that requires a lot of strategy which is a bit hard for the AI to learn personally. There is so much strategy and different states/possible outcomes for each move that the AI model would need a lot of time and resources to learn even the first few steps of a game like chess or viking chess.

    6. https://github.com/dani0621/Project02_PygAIme.git

  7. Shreya Rao says:

    Please give a full description of the nature of your first AI/ML game project.
    My “PianoTiles_PygAIme” project is a reinforcement-learning-based game developed with Python and Pygame. The game simulates a classic tile-clicking experience where the objective is to “click” or “select” falling tiles before they reach the screen’s bottom. I used a Deep Q-Network (DQN) as the AI agent, enabling the AI to learn the game rules through interaction, optimizing its performance over time by maximizing a cumulative reward system. The DQN is structured to learn over several episodes, with each episode being a full run of the game where tiles progressively fall faster, thus increasing difficulty. The AI observes tile positions and other relevant game states, then selects actions (tile clicks) using a Q-value policy. Through multiple runs, the agent learns to make better choices, improving its score and accuracy in tile selection. This setup simulates a simplified reinforcement learning environment where the AI’s reward function encourages it to “click” tiles correctly, penalizing it for mistakes or missed tiles.

    What was the steepest part of the learning curve for this project?
    Reinforcement Learning Concepts: Since reinforcement learning is conceptually different from other machine learning approaches (e.g., supervised learning), the initial challenge was understanding the balance between exploration (trying new actions) and exploitation (choosing known best actions). Implementing the epsilon-greedy policy, which gradually decreases exploration as the AI learns, was essential but tricky to fine-tune for stable and effective performance.
    Experience Replay and Memory Management: The DQN agent uses an experience replay buffer to store past experiences (state-action-reward-next state tuples). Understanding how to manage this memory buffer efficiently was vital, especially because removing older experiences (FIFO approach) and random sampling helped stabilize training by breaking the correlation between consecutive game states.

    What went “right” with your project? As in, what worked seamlessly? What went “wrong” with your project? As in, what were your biggest hurdles or where did you have the most trouble debugging or getting your project to run?
    What Went Right:
    One of the major successes was the DQN agent’s ability to learn tile-clicking dynamics and improve over episodes. The neural network, composed of three fully connected layers with ReLU activations, worked seamlessly once properly trained. The epsilon decay strategy helped balance exploration and exploitation, enabling the AI to make increasingly accurate decisions as it learned. Also, the reward system, designed with progressive rewards based on streaks and score benchmarks, motivated the agent effectively, enhancing the AI’s learning rate and improving its overall performance. Additionally, Pygame’s rendering capabilities worked well to maintain game flow, and the game’s difficulty scaling contributed to a dynamic experience that allowed the AI to adapt to increasing challenges, providing a rewarding demonstration of AI learning.
    What Went Wrong:
    One of the main difficulties was debugging reward calculations and determining the optimal penalty for mistakes. Initially, the agent had difficulty distinguishing correct from incorrect tile-clicking actions, leading to erratic behavior. After refining the reward structure to incorporate progressive rewards for streaks and scaling rewards based on the score, the AI’s performance became more consistent.
    Another hurdle was managing the epsilon decay rate and the DQN’s learning rate. Setting these too high caused the agent to learn slowly, missing opportunities for rapid improvement, while setting them too low made the agent’s behavior overly erratic (which is what happened with my first version). Furthermore, implementing the experience replay buffer required careful tuning; too few samples would hinder learning, while too many samples created slow training times and increased computational demands.

    Describe the AI/ML algorithm your game implements.
    This project applies a Deep Q-Network (DQN), a neural network-based approach to reinforcement learning, to predict Q-values for state-action pairs in a tile-clicking game. The algorithm relies on several core components to guide its learning process. First, state representation uses normalized tile positions and the game’s difficulty level to provide the AI with an overview of the game environment. For consistency, a maximum of nine tiles is assumed, and padding with zeros standardizes input size, which supports stable learning. The Q-network architecture comprises three fully connected layers with ReLU activation functions, which allows the DQN to learn patterns within tile positions and game dynamics and to generalize across game states without overfitting. An epsilon-greedy policy manages the exploration-exploitation tradeoff, with epsilon decaying over episodes so the AI gradually shifts from exploration to exploiting learned strategies. The agent begins with high exploration, reducing random actions as it gains confidence through accumulated learning. Experience replay enhances the DQN’s training by storing tuples of state-action-reward-next state and sampling random batches for each training step, which breaks temporal correlations and prevents overfitting by diversifying training examples. While the DQN framework was inspired by online resources and reinforcement learning tutorials, this project’s specific application to a tile-clicking game was adapted independently in Pygame. Custom reward functions and difficulty scaling add layers of complexity to align the AI’s learning process with the game mechanics.

    If you had to teach this class next year, what project would you recommend to students in the Advanced Topics class to give them a broad and comprehensive overview of some fundamental AI algorithms to implement in a game?
    If I were to teach this class, I would recommend a “Snake AI” project for students, as it provides an introduction to AI and reinforcement learning within a structured environment. The Snake game allows students to implement basic to intermediate AI algorithms without being overwhelmed by game dynamics. This project lets students work with grid-based state representations and explore reinforcement learning techniques, beginning with rule-based AI and progressing to Q-learning or DQN if they choose. The structure of Snake’s gameplay supports visualizing the learning process, as students can see the agent learning to avoid obstacles and pursue food, making reinforcement learning concepts visible. This project offers a broad overview of foundational AI topics, from state representation to action selection, making it an ideal introduction to reinforcement learning in gaming.

    Github link: https://github.com/Srao2020/PianoTiles_PygAIme.git

  8. Alex Ru says:

    My project is an AI that plays Tetris. My model is a Recurrent PPO (proximal policy optimization) model that is trained via reinforcement learning in a Gymnasium environment. It is a policy-based model that limits how much the policy is updated. The model also implements LSTM (long short-term memory), which lets the AI store previous states. My AI takes in an observation dictionary, which represents what the AI sees. Specifically, the AI sees the board (including the active piece), the active piece mask (a square that masks the active piece), the hold piece, and the queue. To apply my AI to a PyGame, I extracted the necessary information and passed it into the AI. The AI returns an integer, which represents a move.

    The steepest part of the learning curve was figuring out what model to use and understanding the Gym environment. I use Stable Baselines 3, which offers a lot of different models to use. I started with DQN, but I wanted to try other models because DQN is somewhat simplistic. Also, some of the models work with specific types of Gym environments, so I also had to figure out which models would work with my environment. I ended up reading a lot of the documentation on the Stable Baselines 3 page to figure out which model to use and how. Even after figuring out which model to use, I still had trouble fine-tuning my model. To test my model after fine-tuning, I had to let it train for a day or two to see substantial results. This made fine-tuning a massive pain because it took a long time. So I would say creating the model was the hardest part.

    Training the model went pretty smoothly. I read the documentation for the Gym environment I used, which helped me train and test my AI. I had to learn a bit about the parameters for the train function, but they were generally self explanatory. Applying the model to the PyGame was pretty difficult. The Gym environment and the PyGame are not consistent with each other, so I had to figure out how the PyGame worked and modify it accordingly. For example, the board variable in the Gym environment includes the active piece, while the board variable in the PyGame did not include the active piece. Also, to understand the input, I had to print out the observation dictionary and test in a separate Gym environment. Overall, it was tedious and annoying because I had to make sure everything was consistent. But what’s weird is that it still doesn’t work perfectly. The performance between the two environments are not entirely the same, but I can’t seem to figure out why.

    The AI I implemented was provided by Stable Baselines 3, but was fine-tuned by me (with assistance from ChatGPT). I first tried DQN, but found it not complex enough. After some more digging, I decided to use Recurrent PPO because it fit my environment and seemed promising. After some training, I increased the entropy coefficient, made the network more complex, and changed some other values. I wanted to make it try a lot of different options because Tetris has a lot of different options. I also wanted to make it more complex, since the AI takes in a lot of numbers and can return 7 different moves. Most of the fine-tuning was by hand and ChatGPT was not very helpful.

    I would recommend working with a game that less complex. A game with a simple premise with one or two keys would work significantly better with reinforcement learning than something complex. I would also recommend looking through a lot of existing libraries to see which are the best. If you end up using a subpar library, it’ll take a lot of work to get your project to work.

    https://github.com/alexru26/Project02_ML.git

  9. Tyler Slomianyj says:

    Please give a full description of the nature of your first AI/ML game project.
    This project implements a reinforcement learning-based model to drive in a car game using Double Deep Q-Learning (DDQN). The model learns to drive by optimizing its actions based on predicted rewards through the use of a neural network and adjusts the weights of different NN nodes based on a comparison between the eval and target NN predictions.

    What was the steepest part of the learning curve for this project? Was it learning how to implement the AI or how to use the Pygame/Pyglet library? Please elaborate and explain your answer.
    Definitely the topics behind the models. Q-learning in itself is already a complicated topic which involves large matrices and complicated equations. When you combine that with the intricacies of neural networks it gets pretty tricky pretty quickly.

    What went “right” with your project? As in, what worked seamlessly? What went “wrong” with your project? As in, what were your biggest hurdles or where did you have the most trouble debugging or getting your project to run?
    What went right with my project was probably the planning. I definitely think I had a good base and idea of what I was going to do. But obviously, I wasn’t able to execute well. The training and creating of the model didn’t work as I wanted it to. I believe if I was given more time to look more in-depth into the issue and change how my algorithms are working, I would be able to get the car to drive at least a little bit.

    Describe the AI/ML algorithm your game implements. Did you work through a tutorial you found online? Did you start from scratch because you were motivated by a particular game or algorithm and you wanted to implement it using Pygame/Pyglet?
    As described above, I used DDQN. I worked through a tutorial I found online because I thought it would help me understand better and get the outline of what I needed for the project I wanted to do.

    If you had to teach this class next year, what project would you recommend to students in the Advanced Topics class to give them a broad and comprehensive overview of some fundamental AI algorithms to implement in a game?
    I feel that my project may have been a little too ambitious in the sense that I tried to start out with one that wasn’t very feasible to learn and use in the little time we had. I would recommend a simpler game and maybe try to just master Q-learning instead or another more basic algorithm that will help boost their understanding.

    Include your Github repo URL so your classmates can look at your code.
    https://github.com/tslom/Project02_pygAIme

  10. Andrew Lim says:

    Please use your reply to this blog post to detail the following:
    Please give a full description of the nature of your first AI/ML game project.
    This project utilizes deep Q-networks to train an agent to play Pacman. It uses openAI’s gymnasium environments (specifically gymnasium[atari]). It uses CNNs to process observations, which are images that are grayscaled and scaled to 84×84. Then I stack 4 images so that the DQN can understand direction. Then it trains by using epsilon-greedy, and it has a dueling architecture that helps the network separately assess the value of a state and the advantage of each action, which helps the model learn more efficiently. I also included my own reward function that results in a -100 reward for dying to a ghost, and I normalize the rewards to fix abnormally high reward values. It also has two networks: a Q-network and a target network. The purpose of separating them was to reduce overfitting.

    What was the steepest part of the learning curve for this project? Was it learning how to implement the AI or how to use the Pygame/Pyglet library? Please elaborate and explain your answer.
    I’d say trying to create an AI with the correct architecture was the hardest, seeing as I did not do so successfully. I was set back by the Gym’s lack of information, so the only states that I could input into the AI for processing were images. As a result, I had to learn a lot about CNNs, preprocessing observations (grayscaling and lowering pixels), and memory usage (reducing batch size, lowering replay buffer memory that stores experiences). Lastly, it was very confusing trying to combine all of these elements to create a logically sound AI that could beat Pacman.

    What went “right” with your project? As in, what worked seamlessly? What went “wrong” with your project? As in, what were your biggest hurdles or where did you have the most trouble debugging or getting your project to run?
    I was able to 1) Finally install the necessary packages to import the Pacman Gymnasium (this took more hours than I would have liked), 2) Create an AI that can play the Pacman. However, my successes end there. Due to the nature of my AI in that it uses so much memory storing and analyzing images, it takes up too much time training and crashes before reaching an adequate number of training episodes. As a result, I am unable to determine if my AI can improve with training over time, and my AI also cannot beat the game, or play it well at all. I had the most trouble debugging the frame stacking and replay buffer because trying to maintain a strict guideline for inputs was confusing. If at any point the DQN received the wrong input scaling, it would crash, and it took a lot of debugging.

    Describe the AI/ML algorithm your game implements. Did you work through a tutorial you found online? Did you start from scratch because you were motivated by a particular game or algorithm and you wanted to implement it using Pygame/Pyglet?
    The algorithm I presented to class was a Double DQN. I already explained how this works in the first question, so I won’t go over that again. I did not use a tutorial, but found a nice Medium article (https://towardsdatascience.com/advanced-dqns-playing-pac-man-with-deep-reinforcement-learning-3ffbd99e0814). I basically went off the architecture it explains here to try to replicate that myself. I did try the final model, Noisy Networks N-Step Prioritized Double Dueling DQN. To go over this model quickly, it introduces a noise layer into the DQN that randomly gives noise to adjust exploration in the model. It’s dynamic, so it can be better than epsilon-greedy. N-step allows the model to evaluate rewards over multiple steps, rather than one, so it helps with understanding how a set of actions influence rewards. Prioritized is short for prioritized replay buffer, so instead of a standard replay buffer with experiences and associated rewards, it calculates the importance of experiences based on different predicted q-values and actual q-values. The greater the difference, the more important. It required too much memory to use effectively, however.
    If you had to teach this class next year, what project would you recommend to students in the Advanced Topics class to give them a broad and comprehensive overview of some fundamental AI algorithms to implement in a game?
    I would definitely recommend not inputting raw images into the AI. It is hard for AI to learn effectively on sets of images, and they would be better off on other inputs, such as distance from Pacman to ghost or Pacman to pellet. I would also suggest not overcomplicating the project, and focusing on refining it further to make it better. With all these in mind, I recommend doing Pacman, but not using a Gymnasium environment. Actually, UC Berkeley has a project that is entirely dedicated to Pacman. I think it can be great for learning about reinforcement learning, Pygame, and even recursive techniques. I know we’re doing something on finding the closest path, and that can be implemented in Pacman actually if you want to find distances between objects. This guy, https://pacman.calvinjc.com/, uses recursion to calculate future ghost and Pacman positions. I think Pacman has a lot to explore, and is pretty complex, so it would be a good game.
    Include your Github repo URL so your classmates can look at your code.
    https://github.com/AndrewLim0314/Project02_PygAImev2.1

  11. Cameron Morris says:

    Please give a full description of the nature of your first AI/ML game project.
    My project uses deep learning to optimize the way that it plays Othello, also called Revesi by some. Using epsilon to introduce randomness, the model transitions from exploration to exploitation. It uses a normal torch model as well.
    What was the steepest part of the learning curve for this project? Was it learning how to implement the AI or how to use the Pygame/Pyglet library? Please elaborate and explain your answer.
    I think it was generally the AI and trying to figure out whether the AI was good, since I was using a two-player game, it was hard to tell if they were getting better. As such, I found the hardest part to be evaluating my model, but also trying to understand some of the concepts such as the loss and how to implement it.
    What went “right” with your project? As in, what worked seamlessly? What went “wrong” with your project? As in, what were your biggest hurdles or where did you have the most trouble debugging or getting your project to run?
    It worked well in the start because it was getting better, but latter on it got harder and harder to tell if it was getting better since I was only basing it off itself instead of trying to rain it against something like a minimax. Most of the bugs I ran into had to deal with the shape of my matrices because of the way I was inputting the data as a group of 30 8 x 8 boards so they sometimes got compressed or decompressed into 240 x 8 instead of 30 x 64.
    Describe the AI/ML algorithm your game implements. Did you work through a tutorial you found online? Did you start from scratch because you were motivated by a particular game or algorithm and you wanted to implement it using Pygame/Pyglet?
    I just browsed online and used the different sources to learn how to use deep Q learning. The reason I chose it was because it was the one that was recommended to start with because it was simple. Accordingly, I used it, though in the future I might want to try it with different models including something like monte carlo which Ryan used so it won’t be based entirely based on itself.
    If you had to teach this class next year, what project would you recommend to students in the Advanced Topics class to give them a broad and comprehensive overview of some fundamental AI algorithms to implement in a game?
    I would recommend a non-simple single-player, this is because it is easier to evaluate and also will allow them to experiment with different kinds of models. An example of this would be a jumping game similar to that of Aarav’s because it allows the person to try to maximize different aspects.
    Include your Github repo URL so your classmates can look at your code.
    https://github.com/CameronJMorris/Project02_PygAIme.git

  12. Matthew Guo says:

    2048 Game with Reinforcement Learning Bot
    This project is a Python implementation of the popular 2048 game, with an added feature of a reinforcement learning (RL) bot trained using Q-learning. Players can interact with the game via a Pygame-based GUI, and the project includes a Tkinter interface for configuration. The bot learns to play the game by optimizing moves based on rewards, allowing it to achieve higher scores over time.

    Features
    Interactive Game Mechanics
    A 4×4 grid where players can merge tiles by moving them in four directions.
    GUI implemented using Pygame for smooth tile movement and animations.
    Optional bot player using RL for automated gameplay.
    Bot Features
    Utilizes Q-learning to evaluate actions based on states and rewards.
    Supports saving/loading the Q-table for persistent training results.
    Configurations
    Tkinter interface to choose between human or bot players.
    Option to enable/disable animations.
    Learning Process Visualization
    Matplotlib integration for visualizing training results, such as the improvement in the bot’s performance over time.
    Steepest Part of the Learning Curve
    Implementing Q-learning

    Mapping the 2048 game mechanics to an RL framework (state representation, action space, and reward calculation) required a deep understanding of the algorithm.
    Debugging incorrect Q-table updates and transitions delayed progress.
    Game Mechanics in Pygame

    Early issues with tile merging and movement logic caused cascading problems for bot training.
    Synchronizing animations with game state updates was tricky, especially when toggling between human and bot players.
    What Went Right
    The Pygame-based GUI worked seamlessly, providing a responsive and visually engaging experience for human players.
    The Tkinter interface made it easy to switch between player modes and customize settings.
    Saving/loading Q-tables helped streamline the training process by avoiding repetitive initial learning phases.
    What Went Wrong
    Game Logic Bugs: Incorrect tile merging behavior disrupted early bot training. Fixing this required extensive debugging and unit testing.
    Slow Training Convergence: The Q-learning bot required a large number of iterations to show meaningful improvements, highlighting the game’s inherent complexity.
    Hyperparameter Tuning: Balancing exploration and exploitation, along with fine-tuning learning rate and discount factor, was time-consuming but crucial for the bot’s success.
    Algorithm Description
    The RL bot uses Q-learning, a model-free RL algorithm, to train itself to play 2048:

    State Representation
    The board’s current state is represented by the positions and values of all tiles.
    Action Space
    The bot can choose between four possible actions: moving tiles up, down, left, or right.
    Reward System
    Rewards are based on the sum of tile values merged during a move, encouraging the bot to create higher-value tiles.

    Training Process

    The bot starts with a random Q-table and gradually improves by playing games and updating Q-values.
    Saving/loading the Q-table ensures progress persists between sessions.
    Best Online Resources
    Q-learning Basics: Reinforcement Learning Explained
    2048 Implementation in Python: 2048 Python Tutorial
    Reinforcement Learning in Games: GeeksforGeeks – RL in Games
    Future Recommendations for Advanced Topics Class
    For next year’s class, consider assigning projects that combine AI algorithms with gameplay, such as:

    Pac-Man AI: Implement pathfinding for ghosts or reinforcement learning for Pac-Man’s decision-making.
    Chess AI: A simplified chess engine using minimax and alpha-beta pruning.
    Maze Generation and Solving: Incorporate BFS, DFS, and A* algorithms for generating and solving mazes.
    These projects cover fundamental AI topics like pathfinding, decision-making, and RL while being engaging and creative.

    GitHub Repository
    The project repository URL: https://github.com/a-me-lia/2048.git

Leave a Reply