2018 Winter Data Analytics Competition – MCM

I took a brief break from working on side projects to participate in the MCM (Mathematical Contest in Modeling) competition. I decided to compete in this competition mainly after enjoying and seeing success in the previous semester’s GDMS data analytics competition (post linked here) in which my team received an honorable mention in the beginner’s division. Again, my team consisted of friends Kali Liang (CMDA major) and Eric Fu (BIT major). Looking back on both of these competitions, I would definitely say the diversity in majors really helped in our ability to come together and perform, mainly due to our differences in perspective when it came to looking at the problem as a whole. Similar to the previous competition, we were given a .csv file with about 600 categories involving energy consumption and production rates per year in Arizona, California, New Mexico, and Texas, and were asked to come up with energy profiles for each state and to predict energy consumption and production in 2025 and 2050, all in the context of using cleaner energy sources. For reference, we tackled problem C (all problems linked here).

We used a mixture of VBA (Excel), Python, and R to analyze the given data. My main job was to determine which subsets of data were interesting and useful and to plot them using Python/matplotlib.

Moving on to the actual data analysis, we started by creating energy profiles for each of the four given states. We split our profiles into 3 graphs we deemed relevant, including overall energy consumption, overall energy production, and energy usage by sector in each state. The main hurdle for this section was deciding how to cut down the data to where each graph wouldn’t be too cluttered but the data would still be accurate enough to use. For instance, for energy consumption alone, each state’s initial graph had over 20 sub-plots, making the graph hard to read at first glance. This was due to the inclusion of ultimately insignificant values (for instance wood and waste energy consumption). As a result, we cut these insignificant sub-sets, facilitating visual analysis on each graph. The downside to this was not having a fully accurate representation of total consumption and production in each state, but we decided the sacrifice was worth it for the visual clarity provided. az_profile_cons.png
Energy consumption graph for Arizona. You can see that as the Y values approach 0, the higher the concentration of plot points (and this is after cutting more than 10 sub-sets).

Next, we moved onto attempting to define a more accurate illustration of the data set in regards to the context of the problem, renewable energy usage. We decided the best way to do this was to put everything in terms of the estimated amount of pollution each state was producing per year. We determined this value for each state per year by calculating the summation of each states energy consumption value multiplied by the transfer rate between that specific energy type and CO2 emissions. Essentially, we arranged Jet Fuel, Natural Gas, Coal, Petroleum, Motor Gasoline, and LPG consumptions per year in a single-column matrix and found its dot product with a constant single-column matrix that consisted of the conversion coefficients for that specific energy type and CO2 emissions, labeled C in the graphic below. We determined the conversion coefficients by pulling data from the EIA (here).

Screen Shot 2018-02-19 at 2.08.00 PM.png

Screen Shot 2018-02-19 at 2.09.34 PM.pngExcerpt from our final paper detailing the values of the C conversion constant and giving Arizona’s numbers for 2009.

Using the information above (Note: the structure of X_AZ follows that of C (in order of energy type from top to bottom, given in billion BTU. A conversion factor of 1000 Mill BTU / 1 Bill BTU was added) we are able to calculate an estimation of the CO2 emissions in Arizona in 2009 (X_AZ_2009 * C = 93307874.83 metric tons of CO2. Given the population of Arizona in 2009, we can then determine an estimation of CO2 emissions per capita of Arizona in 2009 (abt. 14 metric tons of C02 per capita). We then generalized this process in order to create a graph of CO2 emissions in each state per year.

total_c02_capita.png
Graph detailing CO2 emissions per capita per year in each of the states. We used this information to generally rank each state in terms of pollution production, in order to circle back and create a discussion surrounding using cleaner energy sources.

Next came predicting energy consumption/production in 2025 in the context of cleaner energy usage. For this, as we began to run out of time (we only had four days for this competition) we resorted to using a basic linear regression to predict the values for 2025 and 2050. The main notable conclusion from this experiment was seeing how high energy consumption values were predicted to become, especially in regards to the very rapidly increasing CO2 emissions per capita for each state. As an example, total energy consumption in Arizona, using the linear regression model, was projected to increase by over 800,000 BTU. This, paired with the estimated increase in CO2 emissions per capita of abt. 5 metric tons per capita shows just how quickly pollution is projected to get out of control. Of the four states, Texas had the worst CO2 emission rates per capita estimated by 2050, seeing an increase of 37 metric tons per capita from 2009 (a projected value of abt 76 metric tons per capita). Again, we used these values to stress the importance of switching to more renewable energy sources in the coming years, before damage becomes irreversible.

Using all of the above data, we discussed in our conclusion the importance of moving away from commonly-used, bad for the environment energy sources. With the exponential increase in population in all of these states and really throughout the world, energy consumption and production has understandably seen a drastic increase in recent years. However, with the threat of global warming, each state needs to do whatever it can in order to slow or even reverse the effects brought on by these increased consumption and production rates.

I’m incredibly thankful to have competed in the GDMS competition prior to this competition, because without prior experience, we definitely would have had no idea where to start and probably would not have finished on time (of which we only barely did given the work detailed above, after making a numerous amount of assumptions ultimately weakening our argument). All in all, this competition was equally taxing on our mental health as it was fun to compete in. We look forward to participating in future data analysis competitions in the future. Unfortunately, we don’t hear back about the results until some time in April. If you want to see our final paper (written in LaTeX) in detail, you can find it here.

Advertisements

2018 Winter AI Project – Connect 5 Minimax

After finishing my previous project, I still found myself with plenty of time in-between classes (thankfully I’m only taking 4 courses this semester as opposed to my usual 6, so hopefully this trend continues throughout the semester). Also, as I recently took up a UTA position for CS2505 at VT (Intro to Comp Org), I found myself with plenty of down time on-shift as the early assignments are easy and most students don’t need help as of yet. Thankfully, some friends come and spend time with me to make the time go by faster. One of these friends typically brings a Go board along with her and loves to play connect 5 with us (think tic-tac-toe, but on a Go board, and you have to connect 5 instead of 3). After countless games, I thought back to my minimax algorithm for tic-tac-toe (here), and wondered if it applied smoothly to connect 5. And so, I decided to start working on this project.

First was to create the game. I was drawn to Java because I’m most comfortable working with object-heavy projects in Java (as opposed to something like Python). Coding the basic game was simple. The board itself is basically an 18×18 tic-tac-toe board. As such, I just used a 2D integer array size 18×18 and used constants for black, white, and empty spaces. I used swing for rendering, and KeyListeners for user input. Arrow keys navigate and enter places a piece. Upon placing a piece, control is handed over to the AI to decide the move they are going to take.

Screen Shot 2018-02-05 at 2.24.53 PM.png
Empty gameboard. The cursor is indicated with the red square (moves with arrow key input). As you’ll see later, the blue cursor (not shown here) indicates the previous move.

Next came calculating the heuristics for scoring any given board. I used this post by Ofek Gila as a foundation and built on that. Essentially, the board is traversed in every direction (horizontal, vertical, topleft->bottomright diagonal, topright->bottomleft diagonal) and consecutive pieces are counted and scored, with the sum total being the overall score of the current board state. The scores for consecutive pieces are based on the perceived value of that shape (i.e. a 5 in a row is worth much more than a single piece). I wanted this version to play very defensively (an annoying strategy for veterans of the game), so there are noticeable score differences based on what the color of the consecutive pieces is. Afterward, I began writing the minimax algorithm. The basic jist of minimax is to look a set number of moves into the future, score the current board state, and play off of the assumption that your opponent is going to make the best move for them at their given board state (hence the name minimax — you’re trying to maximize your gains and the opponent is trying to minimize them). I settled on looking 2 moves into the future, as it fit the balance of running quickly and decent decision making. If you want to learn more about minimax, feel free to look at the code for this project (link at the bottom).

Screen Shot 2018-02-05 at 2.23.14 PM.png
Using minimax, the AI (white) was able to defeat the user (black) here, while also stopping a win for black (if the white piece at the upper-right was not placed, black would have won)

Using this method, I ran into a couple of speed bumps. The major issue was performance. Especially at the beginning of the game, even after the user’s move, there are still 323 possible moves (18^2=324, -1 for the placed piece). That in addition to looking n moves into the future, meant that there are about 323^n checks that the algorithm needs to make (the number is actually slightly less than 323^n, as it doesn’t factor in placing new pieces and repeating the process, but it is essentially that value for small values of n, so we will ignore it for this basic analysis). To solve this, I cropped the board down to only ‘relevant’ spaces (I defined relevant here as any space around all of the current pieces in play with a buffer of 2 spaces on every side). Using this method, I was able to reduce the number of checks for 2 moves from 323^2=104,329 to 25^2=625 (placing one piece and creating a buffer of size 2 on every side creates a 5×5 grid, resulting in the 25 seen there (5^2 = 25)). Other than that, there is a bug in which the AI would let the player win if it won later down the decision-tree (in other words, it detected a win but didn’t take into account the opponent winning sooner). I expect this to be an error with scoring, but as I want to continue working on other projects, I will leave this bug in for now.

Screen Shot 2018-02-05 at 2.21.13 PM.png
You can see that the AI (white) detected a win, so it ignored black’s 3 in a row, ensuring a win for black.

Testing this out on some of my friends, I found that it played fairly decently. However, due to some minor bugs in the score evaluation (detailed at the end of the previous paragraph), veterans could consistently beat the bot (albeit after hard, long-fought games). This project was a fun one that I could share with friends, so I definitely enjoyed working on it and seeing it get better along the way. I encourage you to download the project and try playing against it yourself. You can find the github for this project here.