
In this article, let’s build the Hate speech detection project in Python. In the current era of the Internet, it is obvious that almost everyone has social media apps to connect and interact with people around the world. At the same time, social media is a place where a lot of personal opinions have been shared about anyone. And most of the time those opinions are offensive and hateful.
Project Overview: Hate Speech Detection
Project Name: | Hate Speech Detection in Machine Learning with Python |
Abstract: | In the project, we will learn how to do Hate Speech Detection using Python programming language |
Language/Technologies Used: | Python, NLTK, Pandas, NumPy |
IDE | Google Colab or Jupyter |
Python version (Recommended): | 3.8 or 3.9 |
Type: | Machine Learning and Deep Learning Project |
Developer: | Keerthana Buvaneshwaran |
Updates: | 0 |
What is Hate Speech detection?
Hate speech detection is the model which identifies and detects the hateful and offensive speech being poured on the internet. Social media is a place for many people to make hateful and offensive comments about others. So hate speech detection has become an important solution to problems in today’s online world.
As we understood the main goal to build this project, let’s start with building the Hate Speech detection project in python.
Steps in building Hate Speech detection using Machine Learning
Before moving into the implementation part directly, let us get an insight into the steps in building a Hate Speech detection project with Python.
- Set up the development environment
- Understand the data
- Import the required libraries
- Preprocess the data
- Split the data
- Build the model
- Evaluate the results
Setting up the development environment
The first major step is to set up the development environment for building a Hate Speech detection project with Python. For developing a Hate Speech detection project you should have the system with Jupyter notebook software installed. Else, you can also use Google Colab https://colab.research.google.com/ for developing this project.
Understanding the data
The dataset for building our hate speech detection model is available on www.kaggle.com. The dataset consists of Twitter hate speech detection data, used to research hate-speech detection. The text in the data is classified as hate speech, offensive language, and neither. Due to the nature of the study, it’s important to note that this dataset contains text that can be considered racist, sexist, homophobic, or generally offensive.
You can find the dataset for hate speech detection here https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset
There are 7 columns in the hate speech detection dataset. They are index, count, hate_speech, offensive_language, neither, class and tweet. The description of the column is as follows.
index – This column has the index value
count– It has the number of users who coded each tweet
hate_speech – This column has the number of users who judged the tweet to be hate speech
offensive_language – It has the number of users who judged the tweet to be offensive
neither – This has the number of users who judged the tweet to be neither offensive nor non-offensive
class – it has a class label for the majority of the users, in which 0 denotes hate speech, 1 means offensive language and 2 denotes neither of them.
tweet – This column has the text tweet.
Importing the required libraries
After analyzing the data our next step is to import the required libraries for our project. Some of the libraries we use in this project are pandas, numpy, scikit learn, and nltk.
#Importing the packages
import pandas as pd
import numpy as np
from sklearn. feature_extraction. text import CountVectorizer
from sklearn. model_selection import train_test_split
from sklearn. tree import DecisionTreeClassifier
We are going to import NLTK( The Natural Language Toolkit) library, used for symbolic and statistical natural language processing for English written in the Python programming language.
import nltk
import re
#nltk. download(‘stopwords’)
from nltk. corpus import stopwords
stopword=set(stopwords.words(‘english’))
stemmer = nltk. SnowballStemmer(“english”)
After importing the required libraries, it is time to load the data in our project.
data = pd. read_csv(“data.csv”)
#To preview the data
print(data. head())
Output:

Preprocessing the data
In Data preprocessing, we prepare the raw data and make it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model. When creating a machine learning project, it is not always a case that we come across clean and formatted data. And while doing any operation with data, it is mandatory to clean it and put it in a formatted way. So for this, we use the data preprocessing task.
data[“labels”] = data[“class”]. map({0: “Hate Speech”, 1: “Offensive Speech”, 2: “No Hate and Offensive Speech”})
data = data[[“tweet”, “labels”]]
print(data. head())
Output:

We have used two important Natural Language processing terms, stopword and stemmer. Stopwords are the useless words (data), in natural language processing. We can avoid those words from the input. Stemming is the process of producing morphological variants of a root word. We have to find the stem word for each text better and easy prediction.
def clean (text):
text = str (text). lower()
text = re. sub(‘[.?]’, ”, text)
text = re. sub(‘https?://\S+|www.\S+’, ”, text)
text = re. sub(‘<.?>+’, ”, text)
text = re. sub(‘[%s]’ % re. escape(string. punctuation), ”, text)
text = re. sub(‘\n’, ”, text)
text = re. sub(‘\w\d\w‘, ”, text)
text = [word for word in text.split(‘ ‘) if word not in stopword]
text=” “. join(text)
text = [stemmer. stem(word) for word in text. split(‘ ‘)]
text=” “. join(text)
return text
data[“tweet”] = data[“tweet”]. apply(clean)
Splitting the data
The next important step is to explore the dataset and divide the dataset into training and testing data.
x = np. array(data[“tweet”])
y = np. array(data[“labels”])
cv = CountVectorizer()
X = cv. fit_transform(x)
# Splitting the Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
Building the model
After segregating the data, our next work is to find a good algorithm suited for our model. We can use a Decision tree classifier for building the Hate Speech detection project. Decision Trees are a type of Supervised Machine Learning used mainly for classification problems.
#Model building
model = DecisionTreeClassifier()
#Training the model
model. fit(X_train,y_train)
Evaluating the results
The final step in machine learning model building is prediction. In this step, we can measure how well our model performs for the test input.
#Testing the model
y_pred = model. predict (X_test)
y_pred
Output:

#Accuracy Score of our model
from sklearn. metrics import accuracy_score
print (accuracy_score (y_test,y_pred))
Output:
0.8745567917838366
We can infer that our model for Hate speech detection performs with an accuracy of 87 percent.
#Predicting the outcome
inp = “You are too bad and I dont like your attitude”
inp = cv.transform([inp]).toarray()
print(model.predict(inp))
Output:
[‘Offensive Speech’]
inp = “It is really awesome”
inp = cv. transform([inp]). toarray()
print(model. predict(inp))
Output:
[‘No Hate and Offensive Speech’]
Complete code for Hate Speech Detection in Python
#Importing the packages import pandas as pd import numpy as np from sklearn. feature_extraction. text import CountVectorizer from sklearn. model_selection import train_test_split from sklearn. tree import DecisionTreeClassifier import nltk import re nltk. download('stopwords') from nltk. corpus import stopwords stopword=set(stopwords.words('english')) stemmer = nltk. SnowballStemmer("english") data = pd. read_csv("data.csv") #To preview the data print(data. head()) data["labels"] = data["class"]. map({0: "Hate Speech", 1: "Offensive Speech", 2: "No Hate and Offensive Speech"}) data = data[["tweet", "labels"]] print(data. head()) def clean (text): text = str (text). lower() text = re. sub('[.?]', '', text) text = re. sub('https?://\S+|www.\S+', '', text) text = re. sub('<.?>+', '', text) text = re. sub('[%s]' % re. escape(string. punctuation), '', text) text = re. sub('\n', '', text) text = re. sub('\w\d\w', '', text) text = [word for word in text.split(' ') if word not in stopword] text=" ". join(text) text = [stemmer. stem(word) for word in text. split(' ')] text=" ". join(text) return text data["tweet"] = data["tweet"]. apply(clean) x = np. array(data["tweet"]) y = np. array(data["labels"]) cv = CountVectorizer() X = cv. fit_transform(x) #Splitting the Data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) #Model building model = DecisionTreeClassifier() #Training the model model. fit(X_train,y_train) #Testing the model y_pred = model. predict (X_test) y_pred#Accuracy Score of our model from sklearn. metrics import accuracy_score print (accuracy_score (y_test,y_pred)) #Predicting the outcome inp = "You are too bad and I dont like your attitude" inp = cv.transform([inp]).toarray() print(model.predict(inp))
Conclusion
In this article, we have built a project for Hate Speech detection using Machine Learning. Hate speech is one of the serious issues we see on social media platforms like Facebook and Twitter. Hope you enjoyed this article by building a project to detect hate speech with Python.
Also Read:
- How to Ace Your Machine Learning Assignment – A Guide for Beginners
- Contact Management System Project in Python
- Top 10 Resources to Find Machine Learning Datasets in 2022
- Reinforcement learning in Python
- Python SQLite Tutorial
- Machine Learning Techniques and Applications|Assignment Help
- Student Management System Project in Python
- Face recognition Python
- 20 Python Projects for Resume
- Restaurant management system project in Python
- Employee Management System Project in Python
- Bank Management System Project in Python
- Hate speech detection
- Hospital Management System Project in Python
- Control PC from anywhere using Python
- Attendance Management System Project in Python
- MNIST Handwritten Digit Classification using Deep Learning
- Space Invaders game using Python
- How to make KBC Quiz game in python?
- Stock Price Prediction using Machine Learning
- Control Mouse with hand gestures detection python
- Create a Vehicle Parking Management Project in Python
- Build Your Own Map Flight Tracking Application with Python
- Traffic Signal Violation Detection System using Computer Vision
- Deepfake Detection Project Using Deep-Learning
- Employment Trends Of Private & Government Sector in UAE | Data Analysis
- Pokemon Analysis Project in ML and Data Science using Python
- Garment Factory Analysis Project using Python
- Creating a Pong Game using Python Turtle
- Create a Stopwatch Using Python Tkinter
Browse Post Tags-
assignment Hate speech detection Project in Python
Hate speech detection Project in Python assignment
homework Hate speech detection Project in Python
Hate speech detection Project in Python homework
assignment Hate speech detection Project in machine learning
Hate speech detection Project in machine learning assignment
homework Hate speech detection Project in machine learning
Hate speech detection Project in machine learning homework
assignment Hate speech detection Project in deep learning
Hate speech detection Project in deep learning assignment
homework Hate speech detection Project in deep learning
Hate speech detection Project in deep learning homework