How to extract YouTube comments using Python?

How to extract YouTube comments using Python?

Today, we will learn how to extract YouTube comments using Python with the help of google APIs. Extracting comments from youtube can be a daunting task. But with the help of google API, it can be made very simple. All you need is a google API key for youtube data. Need source code? Click here to copy the code to extract YouTube comments.

Getting started

Let’s view the steps to get the API key:

  • Go to https://console.cloud.google.com/
  • Create a project
  • Click on APIs and services
  • Click on enable APIs and services
  • Search for youtube data API v3
  • Click on this
extract YouTube comments using Python with Google's YouTube API
  • Enable this API
  • Go to credentials and click create new credentials
  • Select API key
  • You will get your API key. Copy this and save it for future use.

Now let’s start building the project.

You will need to install the google API client library for python. Run the below command on your terminal

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Approach to Extract YouTube Comments

Youtube does not display all of its comments at once. We get comments in groups of 20 by default. To get further comments we need to look at the next page token that we get while querying the comments. This can be used to get the next batch of comments and so on.

We create 2 recursive functions for our task. One is get_comments which finds all the top-level comments. Another is get_replies to get all the replies or the entire comment thread of a particular comment.

You need to enter your API key in the given field.

Python Code to Extract YouTube Comments

from googleapiclient.discovery import build

video_id = "CfttIk4Yjqg"
api_key = '<Enter your api key here>'

# recursive function to get all replies in a comment thread
def get_replies(comment_id, token):
    replies_response = yt_object.comments().list(part = 'snippet', maxResults = 100, parentId = comment_id, pageToken = token).execute()

    for reply in replies_response['items']:
        all_comments.append(reply['snippet']['textDisplay'])

    if replies_response.get("nextPageToken"):
        return get_replies(comment_id, replies_response['nextPageToken'])
    else:
        return []


# recursive function to get all comments
def get_comments(youtube, video_id, next_view_token):
    global all_comments

    # check for token
    if len(next_view_token.strip()) == 0:
        all_comments = []

    if next_view_token == '':
        # get the initial response
        comment_list = youtube.commentThreads().list(part = 'snippet', maxResults = 100, videoId = video_id, order = 'relevance').execute()
    else:
        # get the next page response
        comment_list = youtube.commentThreads().list(part = 'snippet', maxResults = 100, videoId = video_id, order='relevance', pageToken=next_view_token).execute()
    # loop through all top level comments
    for comment in comment_list['items']:
        # add comment to list
        all_comments.append([comment['snippet']['topLevelComment']['snippet']['textDisplay']])
        # get number of replies
        reply_count = comment['snippet']['totalReplyCount']
        all_replies = []
        # if replies greater than 0
        if reply_count > 0:
            # get first 100 replies
            replies_list = youtube.comments().list(part='snippet', maxResults=100, parentId=comment['id']).execute()
            for reply in replies_list['items']:
                # add reply to list
                all_replies.append(reply['snippet']['textDisplay'])

            # check for more replies
            while "nextPageToken" in replies_list:
                token_reply = replies_list['nextPageToken']
                # get next set of 100 replies
                replies_list = youtube.comments().list(part = 'snippet', maxResults = 100, parentId = comment['id'], pageToken = token_reply).execute()
                for reply in replies_list['items']:
                    # add reply to list
                    all_replies.append(reply['snippet']['textDisplay'])

        # add all replies to the comment
        all_comments[-1].append(all_replies)

    if "nextPageToken" in comment_list:
        return get_comments(youtube, video_id, comment_list['nextPageToken'])
    else:
        return []


all_comments = []

# build a youtube object using our api key
yt_object = build('youtube', 'v3', developerKey=api_key)

# get all comments and replies
comments = get_comments(yt_object, video_id, '')

for comment, replies in all_comments:
    print(comment)
    if len(replies) > 0:
        print("There are", len(replies), "replies")
        print("\tReplies:")
        for reply in replies:
            print("\t" + reply)
    print()

Output:

Vir is actually a philosopher disguised as a comedian because if not with humor, how else will the world understand bitter truths? <br>This was beyond amazing!!! &lt;33

When he started with the monkey and scientists, I thought, &quot;This is gonna be a boring one. &quot;.... But oh maan, you rocked it!! πŸ₯΅πŸ”₯πŸ’―πŸ’―
There are 3 replies
	Replies:
	Well i was going to skip the video<br><br>But your comment made me stay<br>And i must stay<br>It was worth it
	@Sridhar C Me 2 πŸ˜‰
	Must admit... I was confused for a while there too

He just said a lot and expressed a lot, only if one really understands πŸ‘ŒπŸ»πŸ‘ŒπŸ»πŸ‘πŸ»πŸ‘πŸ»

This guy can do both Delhi Belly and a set on Indian Feminism, incredible range man.
There are 3 replies
	Replies:
	TIL i realised that was vir das lmaoπŸ˜‚πŸ˜‚πŸ˜‚achcha hua ye comment padha
	@Snehil Vishwakarma just reading that movies name has ruined my day
	I think a better comparison would be Mastizaade and this set

This is Hard Slap Of Reality Check With Flavour of Humor!!!! πŸ”₯πŸ’―

Also Read:

Share:
Avatar of Mohsin Shaikh

Author: Mohsin Shaikh