Pokemon Analysis Project in ML and Data Science using Python

assignment advertisement

Description of DataSet

We have selected the dataset that has data related to pokemon. Our data set has meaningful columns. We have 721 rows and 23 columns. In our dataset, we have considered the isLegendary column as a base column in each of our machine learning models. This is a binary column that has only categorical data in it for the Pokemon Analysis Project in ML and Data Science using Python
Below is the description of dataset columns:

  1. Type_1: Primary type of Pokémon. It is related the nature, with its lifestyle and with the movements it is able to learn for the fighting time. This categorical value can take 18 different values: Bug, Dark, Dragon, Electric, Fairy, Fighting, Fire, Flying, Ghost, Grass, Ground, Ice, Normal, Poison, Psychic, Rock, Steel, and Water.
  2. Type_2: Pokémon can have two types, but not all of them do. The possible values this second type can take are the same as the variable Type_1.
  3. Total: The sum of all the base battle stats of a Pokémon. It should be a good indicator of the overall strength of a Pokémon. It is the sum of the next six variables. Each of them represents a base battle stat. All the battle stats are continuous yet integer variables, i.e. the number of values they can take is infinite in theory, or just very big in the practice.
  4. HP: Base health points of the Pokémon. The bigger it is, the longer the Pokémon will be able to stay in a fight before they faint and leave the combat.
  5. Attack: Base attack of the Pokémon. The bigger it is, the more damage its physical attacks will deal to the enemy Pokémon.
  6. Defense: Base defense of the Pokémon. The bigger it is, the less damage it will receive when being hit by a physical attack.
  7. Sp_Atk: Base special attack of the Pokémon. The bigger it is, the more damage its special attacks will deal with the enemy Pokémon.
  8. Sp_Def: Base special defense of the Pokémon. The bigger it is, the less damage it will receive when being hit by a special attack.
  9. Speed: Base speed of the Pokémon. The bigger it is, the more times the Pokémon will be able to attack the enemy.
  10. Generation. The generation where Pokémon was released. It is an integer between 1 and 6, so it is a numerical discrete variable. It could let us analyze the development or the growth of the game through the years.
  11. isLegendary: Boolean indicating whether the Pokémon is legendary or not. Legendary Pokémon tend to be stronger, have unique abilities, be really hard to find, and be even harder to catch.
  12. Color: Color of the Pokémon according to the Pokédex. The Pokédex distinguishes between ten colors: Black, Blue, Brown, Green, Grey, Pink, Purple, Red, White, and Yellow.
  13. hasGender: Boolean indicating the Pokémon can be classified as male or female.
  14. Pr_Male: In case the Pokémon has Gender, the probability of its being male. The probability of being female is, of course, 1 minus this value.
  15. Egg_Group_1: Categorical value indicating the egg group of the Pokémon. It is related to the race of the Pokémon.
  16. Egg_Group_2: Similarly to the case of the Pokémon types, Pokémon can belong to two egg groups.
  17. hasMegaEvolution: Boolean indicating whether a Pokémon can mega-evolve or not. Mega-evolving is the property that some Pokémon have and allows them to change their appearance, types, and stats during combat into a much stronger form.
  18. Height_m: Height of the Pokémon according to the Pokédex, measured in meters. It is a numerical continuous variable.
  19. Weight_kg: Weight of the Pokémon according to the Pokédex, measured in kilograms. It is also a numerical continuous variable.
  20. Catch_Rate: Numerical variable indicating how easy is to catch a Pokémon when trying to capture it to make it part of your team. It is bounded between 3 and 255.
  21. Body_Style: Body style of the Pokémon according to the Pokédex. 14 categories of body style are specified: bipedal_tailed, bipedal_tailless, four_wings, head_arms, head_base, head_legs, head_only, insectoid, multiple_bodies, quadruped, serpentine_body, several_limbs, two_wings, and with_fins.

CODES for Pokemon Analysis Project in ML and Data Science using Python

EDA

importing required libraries

In the above cell, we are importing the libraries. We have imported seaborn, matplotlib, numpy, and pandas.
We have used pd. read excel ( ) function to read the excel dataset.

datasheet

We are using shape ( ) function to print the number of rows and columns in the dataset.

df.shape

Next, we are using the dtypes ( ) function to print the data type of each column. The screenshot is attached below

df.types

Afterward, we have used the isnull( ) function followed by sum( ) to find the number of null values in each column.

This is a data science and ml project of Pokemon Analysis using Python for final year students

df.isnull

We have used describe ( ) function to print the basic statistical values of all data columns that are numeric

df.describe

Data Visualization

sns.heatmap

In the above screenshot, we have plotted a heat map using the seaborn library for df.corr ( ).
Afterward, we started plotting the box plots for all the numeric columns present in our
dataset.
Firstly, we have plotted the box plot for the ‘HP’ column of our dataset.

sns.boxplot

Next, we have plotted the box plot for the ‘ATTACK’ column of our dataset.

attack

Next, we have plotted the box plot for the ‘Defense’ column of our dataset.

defense boxplot

Next, we have plotted the box plot for the ‘Sp Atk’ column of our dataset.

This is an AI and ML project of Pokemon Analysis Project with Python for final year project

sp_atk boxplot

Next, we have plotted the box plot for the ‘Sp Def’ column of our dataset.

sp_def boxplot

Next, we have plotted the box plot for the ‘Speed’ column of our dataset

speed boxplot

Next, we have plotted the box plot for the ‘Generation’ column of our dataset.

generation boxplot

Next, we have plotted the box plot for the ‘Height m’ column of our dataset.

height boxplot

Here, you can check that this project is an assignment for usually final year students related to deep learning, ML, Data Science, AI, etc. We also do every kind of project related to programming, web development, mobile apps, AI, ML, and Data Science. Just contact this number on WhatsApp to know more–> +91-9760648231

Next, we have plotted the box plot for the ‘Weight kg’ column of our dataset.

weight boxplot

Next, we have plotted the box plot for the ‘Catch Rate’ column of our dataset. We can see that
many outliers are present in the column values.

boxplot catch_rate

We have generated a line graph for body styles. We have plotted their weights on the line graph

lineplot

We have also presented the count of pokemon for each body style. We have differentiated them based on their gender.

counting for pokemon

Lastly, we have plotted the number of legendary and non-legendary pokemon.

legendry and non-legendry pokemon

Machine Learning part for Pokemon Analysis Project in ML and Data Science using Python

Firstly, we need to prepare the dataset for training and testing the models.
We are dropping irrelevant columns for that

dropping

Then, we are printing the information of resultant (relevant) dataframe columns

df.info

Model 1: RANDOM FOREST CLASSIFIER

We are first, importing the sklearn libraries required for the random forest classifier

importing required libraries

Then, we are considering the isLegendary column of the dataframe because of its categorical nature. We have dropped the isLegendary column from the dataset and stored it in the X variable. The value of the isLegendary column is stored in the Y variable.
We have used train_test_split ( ) to split the data into a 20: 80 ratio.

divide data into train and test data

We have obtained an accuracy score of 99.31 % and a ROC AUC score of 0.996. We can say that our model is pretty much accurate to use.

random forest classifier object creation

We have used a classification report from sklearn metrics. We have generated the confusion matrix for the random forest model.

heatmap

Model 2: DECISION TREE CLASSIFIER

We are first, importing the sklearn libraries required for the decision tree classifier.

decision tree classifier object initialization

We have stored the decision tree classifier in the ‘MODEL’ variable.

isLegendary data division into train and test data

Then, we are considering the isLegendary column of the dataframe because of its categorical
nature. We have dropped the isLegendary column from the dataset and stored it in the X variable. The value of the isLegendary column is stored in the Y variable.
We have used train_test_split ( ) to split the data into 25: 75 ratio.

predicted accuracy

We have obtained an accuracy score of 97.24 % and a ROC AUC score of 0.9523. We can say that our model is pretty much accurate to use.

heatmap

We have used the classification report from sklearn metrics. We have generated the confusion matrix for the decision tree classifier. We have plotted it using the heat map function of the seaborn library. We have 163 false negatives and 13 true positives.

Thank you for reading our “Pokemon Analysis Project in ML and Data Science using Python” article.


Also Read:

Share:

Author: Harry

Hello friends, thanks for visiting my website. I am a Python programmer. I, with some other members, write blogs on this website based on Python and Programming. We are still in the growing phase that's why the website design is not so good and there are many other things that need to be corrected in this website but I hope all these things will happen someday. But, till then we will not stop ourselves from uploading more amazing articles. If you want to join us or have any queries, you can mail me at admin@violet-cat-415996.hostingersite.com Thank you