Journal Reflection #2: Insights into your first ML project

Please use your reply to this blog post to detail the following:

  1. A full description of the nature of your first ML project.
  2. Some insights into what predictions you were able to make on the dataset you used for your project.
  3. What types of conclusions can you derive from the images/graphs you’ve created with your project? If you didn’t create charts or graphs and instead explored things like Markov Chains, how much work do you think you need to do to further refine your project to make its output more realistic?
  4. Did you create any formulas to examine or rate the data you parsed?

Take the time to look through the project posts of your classmates. You don’t have to comment on the posts of your classmates, but if you want to give them praise or just comment on their project if you found it cool, please do.


Skip to comment form

  1. A full description of the nature of your first ML project.

    This project has evolved into 3 stages. In the first stage, I used openCV and a CascadeClassifier on pretrained data to recognize faces on my computer’s webcam and draw a box around it. Facial detection models find facial features like the nose, mouth, eyes, etc and use the distance between them to distinguish faces. Using a pretrained model, my next stage was matching names to faces. The model assigned encoding values for each specific face in the dataset and stored them. Once the webcam opened, the model scanned for any faces and once it found a face, it compared the face’s landmarks and encodings it to the encodings in the dataset to see if there were any matches. If there were close matches, the program would draw a box around that face and label it with the closet match. Once matched, the program would also mark the person’s name and the time their face was detected in a .csv file. Future uses of this could be an attendance system where the student just needs to look at a camera and attendance is automatically taken. If the student was already marked on the attendance the program did not mark them a second time. The third stage which is still in progress is figuring out how to train the model to detect faces with masks better. I found a really good dataset on kaggle which has people with, without, and incorrectly worn masks which would be really good for making sure people are wearing their masks correctly. I think you might be able to do two transfer learning steps where you first teach the model to recognize people with masks through that dataset and then get important landmarks and compare them to a dataset.

    Some insights into what predictions you were able to make on the dataset you used for your project.

    The algorithm was able to pretty accurately predict whose face was whose and match them to a name. When it was unsure, it would cycle through a bunch of possibilities because it takes the highest percentage one which could vary if all of them are possible matches. In these cases, people were wearing masks so it was hard to tell a face at all. The model was really good at telling my face with a mask though.

    What types of conclusions can you derive from the images/graphs you’ve created with your project? If you didn’t create charts or graphs and instead explored things like Markov Chains, how much work do you think you need to do to further refine your project to make its output more realistic?

    I can conclude that the model is really good at detecting full faces and matching them with a name if the name is in the dataset; however, in the age of COVID, masks make it really difficult. I recently read an article where some companies are now able to match faces with up to 96% accuracy with masks which is really impressive. To make my project more realistic, I could implement transfer learning and use MobileNetV2 which is a really powerful face detection algorithm and teach it to detect faces with masks. One possible implementation is to have the algorithm place masks on where it would match the facial landmarks (like the mouth) and train it from there.

    Link to article:,Facial%20recognition%20up%20to%2096%20percent%20accurate%20with,2020%20Biometric%20Technology%20Rally%20tests&text=The%20median%20system%20performance%20was,for%2096%20percent%20of%20subjects.

    Did you create any formulas to examine or rate the data you parsed?

    If the algorithm could not detect a face in the image that it was labeling faces to, it would skip it. While I was trying to have the algorithm detect images with masks, it skipped a lot of them because it was originally trained on full faces.

  2. 1. The main component of my project was a price predictor for rideshare fares in Boston Massachusetts. I looked through a few datasets to find one that I could make a map with, because I was really excited about using the Geopandas library that I read about online. I also really like Boston as a city and I’m interested in transportation, so this dataset about ridesharing services was a great choice. I spent a lot of time figuring out how to use Geopandas, and I finally made the blue map of average rideshare prices in different districts in Boston that I shared in my presentation. Then, I went back and made a lot of other kinds of plots (violin plots, box plots, dot plots, bar charts) to help me explore other relationships between variables, like weather, time, and price. Then, I spent a lot of time cleaning the data and created my prediction model with the SVM, or Support Vector Machine, model. Then, I created another SVM model whose sole role was to predict the distance between the source and destination districts in Boston. I pickled both of those SVM models so I could use them in a different python file where I opened up a GUI with the Tkinter library where the user could input the information, including source, destination, number of passengers, temperature, time of day, brand, and weather. After taking in the information, the algorithm runs every option for car type within the brand selected by the user and displays those prices in ascending order in the GUI.

    2. The prediction model for price based on all those mentioned factors achieved about an 80% accuracy rate on the testing data, and the prediction model for distance based on source and destination achieved about a 98% accuracy rate on the testing data. So, I trust the conclusions they predicted. Some insights from the map that I created about the average price by district in Boston was that the average price tended to be highest in Fenway. Both Northeastern University and Boston University are located in Fenway, so my guess is that rideshare services increase the prices in the Fenway region because they know that most college students don’t have cars, so they have to take a rideshare service. Also, the tuition at those universities is pretty expensive, so they know that those students can probably afford a more expensive ride. From my other graphs, I found that time of day and weather also have big impacts on the surge multiplier (which determines the price) of these rides. This isn’t necessarily surprising, as Uber shares this information, but it was cool to see such a clear trend. Also, from my prediction model and GUI, I found that Lyft tends to be cheaper than Uber, which I could easily click back and forth between the two brands and see that Lyft rides tended to be cheaper.

    3. Besides what I mentioned above, I think there would be some useful conclusions/applications of this project if I posted it online. It would allow people to predict the different prices of rideshare apps depending on several important factors that the person themselves could control. In the Uber and Lyft apps, you can only see the pricing options for the current time, current weather, current temperature, etc. This will help consumers make more informed choices about when the use rideshare apps and which services they use. Posting this online might also help create more accountability and transparency between these rideshare services, because people could see clearly how these rideshare apps can manipulate their prices to take advantage of consumers. Also, I could post the graphs and maps I made and discussed above to provide extra evidence for these patterns that I have noticed.

    4. I did not create my own formula to classify the data or create the model, as I just used the prewritten SVM model. However, I did have to write a few different algorithms to sort through each possible price of the different types of rides for each brand. Also, while they were being classified and the prices were predicted, I sorted them in ascending order. I also wrote a few algorithms to clean the data that I got in the data sets, especially to clean the data so that I could properly combine it with the location data in order to combine it with the map of Boston data from online to make my Geopandas plot. In the future, I definitely want to explore OpenAI more to find some other cool machine learning models that I can use on future projects and also write more of my own models. I’m also excited to learn more about graphics with the pygame project, as learning how to use Tkinter and Geopandas were some of the highlights of this project for me.

    Here is a picture of my GUI:

  3. A full description of the nature of your first ML project.

    For my first real machine project (I have messed around with libraries before but never fully fleshed out a project), I decided to go into the field of sentiment analysis and made a project that aimed to discern a user’s political leaning based on their Reddit comments. I made two versions of the model, one using the GPT-3 API and the other using a basic Multinomial Naive Bayes model trained on prelabeled Reddit comments. For the sake of readability, I will split this question into three sections:

    data preparation/cleaning
    training and running the model(s)
    general principals that I learned

    Data Preparation/Cleaning

    Data Model
    I found that most Reddit posts were heavily dependent on images and thus could not be quickly processed and imputed into a model therefore I focused on comments. I also decided that manually labeling a comment as democratic or republican would be a massive waste of time. To this end, I scraped the subreddit r/Conservative and r/Democrats for their top posts, found the users who created those posts, and scraped those users’ top comments. This would guarantee that the comments I selected would be from clearly democratic or republican users, and it would circumvent the need for manual labeling.

    Pulling the Data
    I requested access to the Reddit API and used the PRAW python wrapper to pull the data. I then used the included pandas dataframe tooling to filter out links, non-alphanumeric characters, strange spacing (newlines and double spacing), as well as lines that were under 30 characters.

    Creating the Model
    For this project, I used the Multinomial Naive Bayes model from sklearn. I first combined all of the data and labeled it with 0 being conservative and 1 being democrat. From this I created a bag of words model and used the joblib package to save the model.

    Some insights into what predictions you were able to make on the dataset you used for your project.
    From my dataset, I predicted if a user was conservative or democratic with 0.65% accuracy using the Naive Bayes model. This was mostly based on keywords likely to be used by either conservatives or democrats since the Naive Bayes model essentially just iteratively runs over the words in a comment and assigns the probability to that word, in particular, being in a conservative or democratic comment.

    What types of conclusions can you derive from the images/graphs you’ve created with your project? If you didn’t create charts or graphs and instead explored things like Markov Chains, how much work do you think you need to do to further refine your project to make its output more realistic?

    I can conclude that my hypothesis was correct and that Conservatives and Democrats have a different vocabulary. I know this since the naive Bayes model had an accuracy of 0.65, and the sample size for the test data was over 600, so there is no way that this result could have been merely a coincidence. For further refinement of this project, I think I could have used an SVM model, which, according to the article linked below, usually yields a five-point increase inaccuracy. I could have also spent more time on the data selection part of the project, and scraped more user comments and eliminated all non-political comments from the sample data. One way to filter non-political comments is to only let in comments that were made in political subreddits, or eliminate all of the data which the model was less than 40% confident that it was a democrat or republican comment. If I had even more time, I would have chosen to implement an even more complex method to train the model, perhaps not doing something as simple as bag-of-words and instead using a method like word2vec that would not eliminate sentence structure and word order from the data.

    Note: one method I tried was to add a third category for non-political data and have the model be able to differentiate between democrat, republican, and non-political for each comment. This approach only reduced the accuracy to 0.49. This could have either been since the data was not labeled with non-political comments, causing the model to appear less accurate artificially. Another reason this might not have worked was because it used the model itself to add the third category of data, so having it run twice on a data sample could have caused the uncertainty to pile.

    Did you create any formulas to examine or rate the data you parsed?

    As explained above, I used the predict_proba attribute of the naive Bayes model in sklearn to create a segment of non-political comments, but this only reduced the accuracy. I also wrote a simple function that would give me the model’s accuracy from the training and test data.

  4. #1. My original plans for my first ML project fell through as the time dwindled for me to complete it and I kept being beckoned by rabbit holes related to the project at hand. While I originally planned to find some way to generate Haikus, my final project turned out to be one which generated sample dialogue from the (original) Star Wars movies. There are a number of reasons I didn’t end up going down the haiku route, most notably the amount of time it takes to create poetry falling a specific meter with a computer. The meter actually is the most technically challenging part, and in my research, I stumbled across a couple dissertations on how best to count syllables (in English) with a computer. While I was researching/planning a program to write Haikus, I wanted to get more familiar with Markov chains, and Python in general (a language I was completely new to). This practice project with Markov chains developed into my final product. Essentially, what the program does is sort through the entire dialogues of the original Star Wars movies, removing anyone who did not speak often. After some data manipulation to sort everything out, the function marcov was called, which executed a Markov chain on the dialogue associated with each character I fed it. In combination with a Markov-ed determined order of characters, a sample dialogue was produced.

    #2. It was interesting seeing what words commonly came up for each character and the “tone” each character typically had after their words were parsed through the algorithm. There were no great revelations formed by the lines of dialogue it came up with, but it was good fun.

    #3. There’s a lot that could be done to make the dialogue more ‘realistic’ as it was far from being so. The dialogue itself was nonsensical and while it did resemble a sentence, carried no meaning or continuity. What would have been helpful is having far more lines per character to test on, although we are limited by the number of Star Wars movies that exist (that number does seem to be steadily increasing though, so maybe we aren’t limited by that). The Markov chain only has so much it can do, and unfortunately continuity is not one of them. In order to confront this issue – as continuity is often valued in movies – is to train the “text generation” using a neural network (as Ben and Tyler did) or some other form of ‘smart’ AI text-generation.

    #4. There weren’t any formulas to parse through the results, and I’m not entirely sure how to implement one for Markov chains. Any function would probably have to be more complex than the chain itself, and at that point I’d rather just implement a better alternative to the Markov chain.

  5. 1. My project was mostly just me trying to figure out how pandas worked so it was relatively simple. You put in mechanics from the dataset and it gives you the first game it can find with all of those mechanics, or tells you there isn’t one if it cannot find one. I’m not sure if this really counted as machine learning, but pandas is used for machine learning so I guess it counts.

    2. I mostly realized that pretty much any mechanic could go hand in hand with another since I never seemed to find a group (at least when using pairs) that wasn’t in one of the top 100.

    3. If I were to refine this project, I’d have it give you an actual score based on your mechanics, domains, minimum age, min/max players, etc. As is, this is just checking to see if a set of mechanics has done well before or if it hasn’t

    4. No formulas really, the main part of this project was just figuring out how to do literally anything with pandas.

  6. 1. A full description of the nature of your first ML project.

    At first, I was completely stumped as to what I wanted to do for this project. I had never made an AI/ML project before and didn’t know where to start. My first bit of inspiration came when Mr. Cochran made a Markov chain in class. Although it was not something I made myself, I still wanted to use it for a personal project. I have a friend who types a very distinct way on a certain online chat app. I figured out that if I used a bot made for said online chat app and scraped one server for all of this friend’s messages, I could get a text file with all of them in it. Then, I could run the Markov chain and use the bot to spit back out a fake message made by that friend. Was this fully my own idea and code? No. Was this an invasion of privacy? Maybe (don’t worry, he was okay with it). But was this a good source of inspiration, which allowed me to realize that I wanted to focus on artificial text generation for this project? Yes.

    My first idea for a website to use for this project was If you know me, I really enjoy performing, writing and listening to music, and while I don’t believe every word Pitchfork says, I find joy in looking at other people’s opinions of music that I like. Most importantly, Pitchfork reviews are written in a very pretentious tone, which was perfect for the kind of AI I wanted to make.

    I was lucky enough to be sitting across from Ben, who introduced me to GPT-2. GPT-2 is an AI pre-trained model which can be used quite easily to generate text. What was interesting to me about GPT-2 is that it can be fine-tuned to one dataset, meaning that it would follow any sort of differences in the text that you fine-tune it towards.

    I found a large SQLite file containing around 25,000 (!) Pitchfork reviews, which made their way into a dictionary and through a tokenizer, a function that assigns integer values to different words or pieces of words. These tokens were then used to train and test my own model of GPT-2 fine-towards the dataset of reviews. This is when I handed this project off to Ben to run on his computer because running this even with an upper limit of 100 reviews would have taken upwards of 12 hours on my MacBook. Then, once I got the generated model back from Ben, I used the pipeline function to generate reviews based on user input.

    2. Some insights into what predictions you were able to make on the dataset you used for your project.

    What was interesting to me was how much this model used terms commonly used in music reviews. “Guitar solo”, “synth loop”, “debut album” and similar terms appear in the majority of reviews generated by this model. One thing that really stuck out to me was this model’s generation of album and band names. GPT-2 is trained on the whole internet, so sometimes it will auto-generate a title based on a movie, book, or other kind of work. My favorite example of this is the “No Country for Old White Men” instance from today’s class. Another interesting thing was the details the AI would spit out about existing celebrities — if you input “Kanye West”, you’re almost sure to hear the phrase “Chicago native” in the review.

    3. What types of conclusions can you derive from the images/graphs you’ve created with your project? If you didn’t create charts or graphs and instead explored things like Markov Chains, how much work do you think you need to do to further refine your project to make its output more realistic?

    My review wasn’t trained on the entire dataset, as that would take too much time. I think the biggest potiential change in making the AI be more realistic is only training the model on newer reviews. Pitchfork’s modern reviews incorporate more commonly used lingo and pop culture references of the past couple years, which will make the model do so as well. Right now, you’ll hear phrases used in reviews dating back decades ago — like 2002.

    4. Did you create any formulas to examine or rate the data you parsed?

    No, but I think that when Ben ran it on his computer he was given a value for the total loss of the model. He reported that the loss was low, meaning that this model does a good job mimicking the dataset but may occasionally copy large portions of text from reviews.

  7. My first project explored text generation through GPT2. It seemed fun to pick Yo La Tengos as the generation target as they are quite relevant to the class. YLTs have the added bonus of being numerous and fairly consistent, fairing well with my intended outcome. I was able to source every YLT on the mail server up to the start of my project, updating manually 2 or so times. I then used a python library to parse the .msg files and derive the body. However, I didn’t feel comfortable feeding the raw HTML body to GPT2, so I used BeautifulSoup to parse the body into clean text. After doing some cleaning, I cut out the actual YLT part, leaving behind the Week Ahead and any club announcements. After converting the cleaned text into tokens, then pytorch tensors, I stored them in a dataset. From my research, I found that HuggingFace was a good repository for pre-trained models. I fine-tuned their pretrained GPT2 model by using their included trainer. After some testing, I found some desired parameters that gave me the lowest loss (a measure on how well the model generates a text compared to the eval set). I wanted the generated outputs to be accessible despite the long waits required for generating text. In order to do this, I devised a webserver that had generation works pushing text to a queue. When a user requests a new YLT, the server pulls from the queue. If the queue is empty it errors out.

    The filtered dataset was suprising clean. There where very little strange formatting issues (possibly due to me filtering out unicode characters). Because of my filtering, I lost the sentace newlines that lanis is so used to doing. That was an unfortunent occuerence. After a breif analysis, I noticed that lanis loved starting his YLTs with “I hope that…”. Thus I predicted that most of my outputs would also start with that statement. Knowing how YLTs sound generally I thought that the topics would revolve around “the week ahead” and various pop culture references.

    The generated output was surprisingly coherent. Here is an example:
    “Welcome to our first official week of classes. Last week was full of suspense; anticipation, joy and uncertainty. We spent Monday reading and contemplating various ways to avoid the daily grind of studying, practicing and grading (with a particular emphasis on how to properly process information). Throughout the week there was talk, lots of emails and lots of tutorials to prepare for exams. In the end, though, the bulk of our students had little choice but to finish their studies. In fact, most did not finish. At least one graduate told me that her final homework consisted of two days of tedious, futile, repeatable homework. I saw her frustration grow when she continued to repeat the same excuses, but she kept coming back to the same tired excuses….”
    The passage continues. Unlike a Markov Chain, GPT2 is already trained on English text, thus lending itself useful for small datasets (around 175 YLTs). Imagine teaching a baby english and how to write YLTs vs teaching someone who knows english how to write YLTs. Regardless, I think I dont think I could have made the model more realistics. It already knew context and kept references throughout the text, regardless of the nonsensical nature of it. Maybe if I had a better understand of the model/neural networks I could tailor my trainer to be more effective. I considered training GPT2 on sentences rather then whole bodies of text if the outcome was of poor quality. This could increase the effectiveness of the bodies (and also may solve the issue of not being able to generate yLT endings.)

    I used premade algorithms.

Leave a Reply