For my machine learning class I made a program that would use the historical lifespan of the PvP world map (Cyrodiil) to predict what the map would look like in an hour. It eventually got to an average accuracy of 95%. I used the Wabbajack NA PvP Campaign, as it was the most active at the time and would have many changes every hour. I just graduated with a BS in Computer Science and thought I would share this project that used ESO. (If anything, it was a good excuse to play more ESO during finals week)
Overview: Each keep, outpost, and temple was given a number to represent an index/identification. Every hour, each one of these buildings would be assigned a value to represent which faction was controlling that building. 1 representing AD, 2 representing DC, and 3 representing EP. This array of values for each building makes up a map. The history of a campaign is made up of changes between these two maps. These changes are represented by an alphanumeric key, where if a faction’s control over a building remained after an hour, the value would remain the same (1->1=1). But if control of a building changed in that hour, then a letter would represent which faction lost control and which other faction gained control (1->2=A, 1->3=B, 2->1=C…). This would make a change like this, 2222333322132111 -> 2223331312232111 = 222D33E3C2A32111, this key would serve three purposes. 1) It will be saved as a historical change to reference later when determining a prediction. 2) The direction of offensive pushes or losses can be derived from the changes. 3) Can be used to calculate a “momentum” to know the rate at which factions are gaining or losing buildings.
Determining a prediction: The momentum from all the factions is calculated from the last hour and then compared to all the history to find the closest match. If there are several points in history with the same momentum (highly likely as history grows) then keys throughout history are compared to the key from the last hour of changes. Whichever historical key had the same momentum as the current momentum and closest to the latest key would be chosen as closest historical time stamp. From that time stamp, a historical map is pulled from the following hour as the most likely outcome to occur in the next hour.
Issues and Errors: When I started recording data, ESO had just come out and was plagued by bots and people were cracking down. To avoid risk of my account being banned, I did not set up an automation program to keep logging into my game an recording the PvP map. So I played ESO or try to log on every hour to manually collect data. Thus, time collected between data wasn’t always consistent and had large gaps for nights I actually went to sleep or weekends that I physically couldn’t log on. This made some odd keys where changes that happened over several days the program thinks happened in the course of a hour, causing some inaccuracies.
Results: When new map data is added, the predicted map is compared to it in order to produce an error value, or how many buildings were incorrectly predicted. Every 5 hours, the sum of the last five error values is added together, and then the average of that sum measures the average accuracy. The average is used to determine if the prediction software is learning and becoming more accurate, or if it isn’t, then to use special error prevention methods to recalculate how data is weighed and put the predictor back on path. In the end, after a long series of consistent data and lots of history to reference, the prediction software ended with a final series of predictions having only an average error of 1.4 buildings incorrect every 5 hours, with 4 out 5 predictions having no errors at all (100% Accuracy).
Future of the Project: This project was very basic as I was very limited on time. I would have liked to have added the connections between buildings, scrolls, camps around keeps, etc. I also hope to one day make it accessible online as an interactive website across multiple campaigns. But I will first need a to be able to collect the data consistently without using my game, which is probably the biggest barrier holding back continued use of the project.
Thank you for reading and I hope you all enjoyed my academic excuse to play more ESO. I would love to know if you have any thoughts, questions, or suggestions about my project. For anyone interested in a bit more technical detail, the final report written up by my partner MaslabDroid is below.
Abstract
ESO Predictor is a learning program that attempts to determine which keeps in the Player-Versus-Player (PVP) area of Elder Scrolls Online (ESO) will be taken by which faction next. The program uses Naïve Bayesian Classification, looking over a history of ownership and changes in the map in order to attempt to predict which keep will be taken by which faction next.
Introduction
In the Massively Multiplayer Online (MMO) game of ESO, there exists a PVP area consisting of 18 keeps that can be held by any one of three factions at a time. It is critical for defenders to know which keep will be assaulted next and which one needs a little less defending. ESO Predictor aims to help them in that regard by predicting, using an overall history of keep ownership changes, which keeps are most likely to be attacked next.
Methods
The following are the basic methods ESO Predictor uses to function.
Naïve Bayesian Classification
Naïve Bayesian Classification is the primary tool that ESO Predictor uses in order to predict which keeps will be taken by which faction next. This form of classification looks over the entire history of keep ownership changes. For the purposes of this program, the history is a series of vectors containing alphanumeric codes that indicate the keep and how it changed. For instance, a 1 would indicate that the keep stayed in the possession of faction 1, a B would mean the keep was taken from faction 1 by faction 3, and an F means the keep was taken from faction 3 by faction 2.
1->1=1, 2->2=3, 3->3=3, 1->2=A, 1->3=B, 2->1=C, 2->3=D, 3->1=E, 3->2=F
When the latest map is updated, the Bayesian algorithm looks over the entire history of changes to find the vector that is most similar to the new one. Once it finds a similar vector, it constructs a prediction vector based off the vector that comes after the similar vector. In order to find the most similar vector, however, the distances between the latest actual vector and the other vectors in history. The first thing done to narrow down which ones are suitable is by using Euclidean Distance.
Euclidean Distance
The Euclidean Distance is fairly simple in this regard. The Euclidean distance is simply an evaluation of the absolute difference between an element of the actual vector and the same element in one of the historical vectors. Once all distances between vector elements have been calculated, the historical vector that has the lowest Euclidean Distance value associated with it is considered the closest to the actual vector.
Hamming Distance
The Euclidean Distance, however, is not enough. It is possible for multiple vectors to have the same Euclidean Distance while being significantly dissimilar to each other, so ESO Predictor uses Hamming Distance to further narrow down the possibilities. Using the results from the previous distance calculations, the Hamming Distance Algorithm compares the set of ‘close’ vectors. Instead of the total difference in distance, the Hamming Distance is simply the amount of elements per vector that are different from the actual vector. Again, the vector with the lowest score at the end wins, and is used as a comparison to the newest vector.
After these calculations are done, the historical vector after the vector that ended up being closest to the newest vector is used to predict what the next vector will be.
BenchmarksError Rate
To track how accurately the program was predicting the next possible map, the predicted map would be saved and compared to the next inputted map. Using a Hamming distance method, the number of keep ownerships that were not the same would be the error of the last prediction. Since our data had consistency problem on the account manual data collecting, the sums of the last 5 errors would be used to output an average error rate which were used as benchmarks in order to determine if the programs predictions were becoming more accurate. The error benchmarks also revealed how inconsistency in collected data affected the accuracy of the program. When data was collected at a consistent rate (avg. 1 hour) then the average error rate would eventually decrease, signifying that it was predicting more accurately in relation to the correct momentums of the 3 factions. However, when long breaks between data occurred, the error rate would spike as a result of the changes in momentums not being weighted properly. This problem eventually corrects itself as the momentums shift to more accurately reflect direction and rate of faction control spread.
Time
The more data is retrieved, the faster the algorithm performs overall, starting at a worst time of O(n^3) it is gradually performing closer to an average time of O(n log n) time efficiency where n is the number of timestamps in the history. This is because as more historical vectors are added, then it becomes less likely to have multiple vectors with the same minimum Euclidean distances. Eventually to the point where there is only one minimum value found and running through the hemming has no effect on time narrowing down a prediction because duplicate vectors with an eventually reach Euclidean distance of 0 would have the same Hamming distance as well. Leaving just momentum checks to find the best map out of the filtered best possible vectors. This was measured by the amount of time it took to calculate a prediction for every time a historical vector was added, where time to complete the operation was measured in microseconds.
Results
As can be seen in figure 3, our program successfully converges to a relatively low error rate, meaning that it more and more successfully begins to predict which keeps will be taken by which faction next. This is, of course, assuming that data collection is consistent and constant. However, even when data was not collected consistently and the error rate went up, the continuous addition of new consistent data and use of momentums as weights eventually brought the error rate back down to a higher average accuracy.
Discussion
While the error does indeed converge and decrease over time, there is one massive barrier standing in the way of this being an effective way to predict faction movements: Data collection must be done by hand. This means getting a screenshot of the PVP map as it stands every hour or so and translating that into a vector of current faction holdings. Elder Scrolls Online has recently been plagued by automated scripts and bots designed to maximize the amount of resources a player gains by automating boss killing and treasure collecting. As such, Zenimax has begun to crack down on the use of any automated scripts, including those that simply collect data. Therefore all data collection for this program must be done by hand. This can leave massive gaps in between periods of time, in which significantly new data may have been generated. As the program is more efficient and effective the more data it has, this means that a significant amount of time must be spent simply collecting the data for use and translating it. Should this crackdown be lifted or permission from Zenimax gained to collect data autonomously, then this program would be much more viable in the long run. Should this data be able to be collected autonomously, the program could be expanded to include connections between keeps to account for such things as travel difficulty. It could also include strategic resource locations. These could be used in conjunction with the current keep ownership vectors to more easily narrow down troop movements and get higher prediction accuracies. However while autonomous data collection is not allowed, these options are not viable.