The following visualizations explore the pokemon dataset, including the various attributes (e.g., speed, defense) of the pokemon. I hope you are able to learn something new and just have a good time exploring!
Getting familiar with your dataset is always a top priority.
Use the parallel coordinates plot below to get a better understanding of the creatures in this universe. To take advantage of the plot, make sure to play with it! For example, try the following:
Filter the Data: Click and drag vertically along one of the axes to allow only pokemon who meet that criteria to be seen. Click anywhere along the axis outside the filter to reset it.
Drag the Axes: If you want to see two columns closer to each other, just click on the title of one and drag it closer to the other.
Hover over the lines: To see the name of the pokemon, hover over the line and you will see the pokemon's name.
This dataset will contain all information across all pokemon (up to 721 now). I think you will find it much more useful than an empty pokedex!
Another Perspective
Now that you have a better understanding of the dataset, let's take another look using a statistical technique called Principal Component Analysis (PCA).
PCA is a dimension reduction technique. If you look at the parallel coordinates plot above, you can see that we have six dimensions to view (e.g., speed and health points). PCA answers the question: What if I had to save the same amount of information with fewer columns? On this dataset, using only four columns can explain almost 90% of the information contained in the six columns above.
Each axis in the graphs below corresponds to a combination of columns that 'get at' something. For example, take a look at the second principal component: Speed vs Defense. As you go to one side of the axis the pokemon listed there have much higher speed than they do defense, whereas going in the other direction will yield pokemon with much higher defense than speed. A pokemon with very high values in both categories will be closer to the middle. This area will also include pokemon which score very low in both categories. So whether weak or strong, balanced pokemon should stick around zero whereas those specialized in a certain area will appear closer to the end of one of the axes.
Keep in mind that these graphs will really be useful in telling the difference between pokemon in general rather than finding an individual pokemon. The point here is to get an understanding of how certain groups of attributes move together.
To better understand what you are viewing, each axis of the subplots below corresponds to one of the following (Note that attributes in parentheses less significant than those outside of parentheses.):
Overall (This is an overall measure of the strength of a pokemon, with a slight advantage given to Attack, Special Attack, and Special Defense; the higher the score of a pokemon here, the better.)
Speed (and Special Attack) vs. Defense (and Special Defense)
Attack (and Health Points) vs. Special Defense (and Special Attack)
Health Points vs. Defense, Speed, and Attack
Once again, this plot is interactive: Click and drag across any of the subplots to filter the pokemon that you are looking at. The pokemon that are not within that area will be dimmed across all of the other subplots as well. To remove a filter, just click anywhere outside of it and all the pokemon will be displayed again.
Generation
1
2
3
4
5
6
Make your team!
Now that you have had the opportunity to explore the dataset, try building some teams for comparison. The final plot will take two teams of six pokemon and compute the average attributes for each.
Highlight the dots with your mouse to see the actual averages across the teams. This will also make that team's color more prominent in the plot.
Writeup
Discussion
A critical part of how your visualizations are interpreted and graded will be based on your discussion. Your discussion should include the following sections:
Techniques (per visualization)
Interactivity (per visualization -or- overall)
Feedback (overall)
Challenges (overall)
Conclusion (per visualization -or- overall)
Each section in your discussion should be well-marked on your webpage and be approximately 2 to 5 paragraphs each. See below for details on what you should include in each section.
Techniques (per visualization) [2-5 Paragraphs]
Question
For each of your D3 visualizations, include the following information:
Describe how you encoded the data. Include specifics on which columns/rows are mapped to which pre-attentive attributes in your visualization, and describe the design choices you made.
Provide an evaluation of the lie factor, data density, and data-ink ratio of your visualization.
Discuss what you think the visualization excels at, such as showing an overview, identifying outliers, patterns, clusters, or trends, or providing context for other visualizations. In other words, describe why you chose to include this particular visualization versus other alternatives.
Include a brief summary of any additional visualizations you provided as well. If you implemented them using something other than D3, please state as much in your discussion.
Answer
Parallel Coordinates
The data is encoded with lines that pass through various points on poles, showing the values of many points at once in two dimensions, and also allowing users to easily see how different observations are similar or different.
I mainly used colors as preattentive attributes. I chose a different color for each generation (which I continue in the scatterplot matrix as well). This highlights the differences between generations so that users can get a better understanding of how the creators' creature design changed over the years. I also used different intensity of colors when brushing and filtering to bring the desired information into context and making it more easy to ignore information not included in the filter. To further assist users in seeing the results of their filter, I have made a tooltip that boldens and blackens lines, and shows pokemon's name on mouseover.
This plot does not have any lie factor; everything is a normal scale on the poles and there is no area or anything difficult to compute.
The data density of this plot is high. There is a lot of information in a small amount of space so brushing is very helpful to minimize what you are seeing.
The data-to-ink ratio of this plot is high; removing anything would lead to loss in information.
This visualization excels at letting users get into as much detail with the original dataset as they could possible want.
I chose to include this particular visualization instead of others because of how easy it is to get information out of and its high level of interactivity.
Scatterplot Matrix
Each of the pokemon are encoded as a circle in each of the subcharts in the scatterplot matrix.
As in the parallel coordinates plot, I colored the pokemon by generation (including a legend for reference). This will allow for the user to continue with this plot without having to learn another coloring scheme. Also, when filtering the data the unchosen creatures lose their color and fade into the background to help distinguish between where the selected pokemon appear in all graphs.
The plot does not inherently have any lie factor, but some exists because of the principal component analysis. This is the hardest plot to interpret, so I made sure to explain things a little better here in order to minimize the loss of information of the user due to (1) not showing the actual scores and (2) lack of familiarity with PCA. My hope here really was to emphasize general principles and good creature design rather than to give particular values.
This plot has a high data density; there is a lot of information in a small amount of space. This makes brushing very helpful to minimize the amount of information you have to process at once.
The data-to-ink ratio here is medium since scatterplot matrices have the same information twice. However, because people are not as good at switching axes mentally, I still think it makes sense to have both so that you can get two looks and a better view of the same story.
This visualization excels at finding outliers and patterns within the dataset. And, since the circles are encoded by color, it really allows you to see how the creators' patterns changed (or didn't) throughout the games.
I chose to include this particular visualization instead of others because I wanted to be able to see how several factors influence each other at once. Other visualizations I considered would not have allowed such straightforward comparisons.
Radar Plot
The average across all statistics on both teams is encoded using hexagons, though the point of interest is really where on an axis one team's point falls versus the other's.
This plot uses different colors of each team as preattentive attributes to make it easy to separate the different teams selected. Also the color intensity of the hexagons change based on mouseover to more easily see each team's attributes.
There should not be any lie factor since the comparison being made is the points to which each pole reaches and the the area itself.
The data density of the plot is rather low. The whole thing only encodes 12 different points across two groups. Compare that with the two other plots and this is easily the outlier.
The data-to-ink ratio is low. A lot of color just used for area; an outline could have been done for a better ratio, but I think the overall user experience of having a filled in hexagon is worth the sacrifice in this category.
This visualization excels at interactivity with users. I see this as being the most fun to play around with as you can use what you learned in the previous plots to compare new ideas.
I chose to include this particular visualization instead of others because of how simple it is to use in the comparison of teams. Also, pokemon has traditionally used radar plots, so people interested in this dataset will most likely have seen these before and understand them without any additional effort.
I used CrossFilter in order to filter data based on what the user chooses in the drop-down menu.
This plot does a great job of contrasting two different groups of pokemon, letting users see how their team's overall statistics can change by changing their teams.
Question
Discuss the interactivity implemented in your project. Indicate the type of interactivity and describe how it enhances your visualization. For example, interactivity can help provide focus + context, help overcome overplotting issues, decrease or increase data density, and so on.
Answer
Parallel Coordinates
Tooltip.
The ultimate goal of my visualization is to allow users the ability to form better teams as they play through the games. (See the radar plot for more on this.) After filtering the data in order to see what pokemon meet certain criteria I thought it would be very beneficial to see which of the pokemon meet the filtering criteria. So when a user hovers over each line in the parallel coordinates plot, the line indicated my the mouse becomes much easier to see in comparison with the rest, and the pokemon's name is displayed over it so that the user can learn which pokemon it is.
Filtering Poles.
There are over 700 pokemon, so allowing the user to decrease the amount of information being viewed at a time is really helpful to answer questions they may have
Movable Poles.
It can be difficult to compare two metrics that are not close together, so allowing the user to move metrics into an order that more sense or for easier comparison
Scatterplot matrix
Brushing.
The point of the scatterplot matrix is to allow users to explore the data in a different way than the first plot (parallel coordinates). In the first plot they are given access to the raw data itself, and allowed to filter and understand which pokemon excel in which areas. In this visualization, the data has gone through principal component analysis to allow users to get a general sense of how pokemon's skills move together, and get an idea of what stat-combos they should focus on. (E.g. if their team has a lot of heavy hitters, what stats might they be lacking and therefore need to be on guard against?) This allows them to get a good sense of this. Brushing in one area affects all of the other areas in the scatterplot matrix.
Radar Plot
Filtering via Drop-Down Menu.
I think this plot will have the most reusability of all my visualizations because there are so many different combinations for users to investigate.
Rather than relying on the user's memory of so many names, I thought it would be easier for them to select from a menu. Also, you can use the keyboard to jump to a particular letter, which gives faster turnarounds in team comparisons.
Color Change.
If you want to focus on one team's statistics, you can highlight that hexagon and it will become more prominent as the other fades.
Show Values.
For more curious users, the attribute's average value will appear on mouseover.
Feedback (overall) [2-5 Paragraphs]
Question
Discuss the prototype you demonstrated in-class, the changes you made based on feedback, the feedback you found particularly helpful (and why), and feedback that you did not agree with (and why). If nothing falls under each category (for example, you did not make any changes based on feedback), please state as much so you are not docked points for missing information.
Answer
The prototype I demonstrated in class was the parallel coordinates plot. The feedback on it was really helpful in pointing out ways to make it easier to understand the information being displayed. In particular, the ideas that I implemented from those given to me were to change the opacity of the lines to make it easier to see what was behind them, and to make the lines thinner for a similar effect. In order to make sure that the pokemon of interest did not get lost with these effects I added hover actions (in particular displaying the pokemon's name and boldening the line) in order to make sure that the user could see which pokemon remained after the filters were applied.
Since I decided to color by generation it was suggested that I put that at the far left, which I think was a great idea to make it easier to users to interpret.
Challenges (overall) [2-5 Paragraphs]
Question
Discuss the challenges you encountered during this project, how you addressed the challenge, or why you did not address the challenge. Use this discussion to (a) help illustrate for others how difficult small changes can be, and (b) to try and earn some credit for your work that did not make it into the final visualizations.
If you ran into 0 challenges, I'll assume you are a visualization/JavaScript/D3 expert and raise my expectations of your work accordingly!
Answer
I ran into many challenges while working through this project. Below are the ones that stand out the most from my experiences:
Separation of Data.
Initially I was hoping to have one set of data that all of the plots used, but instead tailored one dataset to each of the visualizations so that it would be easiest to work with. For the radar plot this makes sense, since I have allowed users to refresh the data and do not want to have to redraw all of the plots to update that one, but I think I could have been more clever and used one pass of the data to draw the first two if I had more time to do so.
Parsing.
Some pokemon have some non-standard characters (such as dashes and apostrophes) which made parsing a little funny as I was cleaning my dataset. While getting my dataset ready for the scatterplot matrix, I did not realize I had errors which was leading to dots being placed in very random locations.
Scatterplot Matrix Legend.
I spent a lot of time trying to get the legend to work within the realm of the matrix before coming to the conclusion that I could just put it near the graph in a way that was easy for the user to digest.
CrossFilter.
It took about a day to be able to get enough out of this library to use it for my project. It works much better with .json than .csv data, so I created another version of my dataset in this format.
Understanding what was being returned ended up being the hardest part of working with this library. typeof was very useful in determining what I was working with and what I needed to search for, to better understand how to extract information. I had to go to many blogs for this since people assumed a certain level of javascript knowledge.
JavaScript.
When looking at javascript code, it is not too difficult to understand what is happening, but I do not know enough to be able to generate it from scratch, so I spent a lot of time examining implementations that were similar to what I was working with in order to find useful code that I could alter for my purposes. Even after doing so it was difficult to get exactly what I wanted, so there were times I had to resort to other methods (such as drawing my legend outside of the plot margins on the scatterplot matrix).
Filtering.
When I tried to display the generation as a category and not a real number, the filtering stopped working correctly: All of the values were only filterable from a very small portion of the pole towards the top. I looked into the issue but was not able to figure out how to make it work properly, so I just left them as real numbers. This adds a few unnecessary labels to the plot but I do not feel it is too distracting and the correct functionality really makes it worthwhile.
Color.
Initially I was hoping to give users the ability to color by a pokemon's types as well as generation but when I actually did there were too many colors to make sense of, and it ended up being overwhelming and unhelpful.
Overplotting.
The parallel coordinates plot has too much information in it. I got some great feedback to try thinner and transparent lines which makes it easier to see.
The problem is that each successive generation is laid over the previous making the earliest generations difficult to see without filtering (by brushing). Filtering is really key to making this plot helpful.
Question
What did you learn about the dataset from your visualizations? This is a difficult but critical part of your discussion. We care utmost about the accuracy and informativeness in the field of information visualization, and you must convince me that your visualizations were informative. Use your visualizations to make conclusions about the data, explain those conclusions, and explain how your visualization supports those conclusions.
Answer
Parallel Coordinates.
Just looking at different generations it was easy to see when new types were introduced, which I found very interesting from a world-building perspective.
It was interesting seeing Special Attack and Special Defense because the distinction came in after the first generation, which is where I logged the most hours. Having them separate adds another factor to consider when choosing your team, and it was nice having such a simple way to filter and compare between all pokemon.
Scatterplot Matrix.
Seeing the dimensions grouped together really made me realize how (from a stats standpoint) evenly the different creatures were created. From this point of view it looks like the creators did their best to make sure that pokemon were not too disparate in their abilities.
Running PCA was not enough to understand everything, however. Visualizing really helped drive home two facts:
Powerful pokemon tend to be evenly balanced across "Attack vs Special Defense" and "Speed and Defense", but vary from a "Health Points vs Defense" point of view.
Overall, across generations, the creators made balanced creatures. It is great to see that some time obviously went into the creation of pokemon.
Radar Plot.
I created and compared two teams, one that was obviously superior to the other to see what would happen, and it was definitely reflected well in the statistics.
I created two more even teams and found that it is easy to see how one can be favored in certain areas. For example, I created my usual team (which I always made while playing) and compared it to the rival's team and found that mine favored special ability and speed whereas the rival's favors defense, attack, and health points.
After these scenarios I think this visualization is best at getting a basic understanding of where one team will be favored compared to the other, and whether or not you would need to change your team in order to get an advantage there.
From playing the game you get a naive perspective on strong versus weak pokemon, but seeing this in action really highlighted the disparity in ability between strong and weak teams.