Regular Expression In Python – RegEx



Definition


The regex indicates the usage of Regular Expression In Python. The Python regex helps in searching the required pattern by the user i.e. the user can find a pattern or search for a set of strings. To perform regex, the user must first import the re package.

Syntax:

import re

To build a regular expression, the user may use metacharacters, special sequences, and sets.


Metacharacters

The metacharacters are special characters used in building the Regular Expression In Python which has specific meanings in it. Some of the commonly used metacharacters are:


MetacharacterDescriptionExample
[]A condition can be provided to specify the required set of characters.[a-j], [0-5]
\ This specifies the usage of escape sequence characters like special sequences.“\d”
.Specifies any number of characters present between two strings or a set of strings.py…n
^This should be placed before the string or any condition to specify that the result must return the string beginning with the given string.^the
$This should be placed at the end of the string or condition to specify that the result must return the string ending with the given string.python$
*This should be placed at the end of the given pattern. This specifies that the result must have zero or more occurrences of the given pattern. “oo*”
+This should be placed at the end of the given pattern. This specifies that the result must have at least one or more occurrences of the given pattern. “oo+”
{} The exact number of required occurrences must be mentioned inside the brackets. “oo{1}”
|Two patterns or string will be given. It will check either one among them is present or not. practice | study

Example1

import re
txt = "Learning python is easy"
x = re.findall("^Learning.*easy$", txt)
print(x)

Output

['Learning python is easy']

Example2

import re
txt = "Learning python is easy"
x = re.findall("py...n", txt)
print(x)

Output

['python']

Special sequences


The special sequences are like escape sequences. They are followed by a character after \. Some of the commonly used special sequences are:


  • \A – It is placed before the required string to be searched. It returns the matched string. Example: \AHello
  • \b – This is placed before the beginning of the pattern or at the end of the pattern. It returns if the given pattern is found in the beginning or at the end. Example: h”\bello” – searches if ello pattern is found at the beginning of any word. h”ello\b” – searches if ello pattern is found at the end of any word.
  • \B – This is the exact opposite of \b. It returns only the strings when the pattern is not in the beginning or at the end.
  • \d – This can be used to find whether any numbers are present in the given string.
  • \D – Exact opposite of \d. It returns the strings given in the input that does not have any numbers.

Example1

import re
test_string= "Hello world"
x = re.findall("\AHello", test_string)
print(x)

Output

['Hello']

Example2

import re
test_string= "Hello world"
x = re.findall(r"ello\b", test_string)
print(x)

Output

['ello']

Sets


The sets are always represented by [] brackets. The sets are special characters placed inside the brackets in Regular Expression In Python. Some of the common usages of sets are:


  • [xyz] – If any of the characters specified in the sets are matched, then it returns the list of matched characters.
  • [a-j] – If any of the characters specified in the sets are matched, then it returns the list of matched characters in alphabetical order.
  • [^xyz] –  Except the characters specified in the sets other characters are returned.
  • [123] –  If any of the numbers specified in the sets are matched, then it returns the list of matched numbers.
  • [0-9] – If any of the numbers specified in the sets are matched, then it returns the list of matched numbers in order.
  • [0-4][0-4] – This is used for finding two-digit numbers i.e. it returns the number between 00 and 44.
  • [*+] – The special characters which are specified in the sets are matched, then it returns the list of matched characters.

Example1

import re
txt = "Happy learning..."
x = re.findall("[hpyz]", txt)
print(x)

Output

['p', 'p', 'y']

Example2

import re
txt = "Happy learning..."
x = re.findall("[...]", txt)
print(x)

Output

['.', '.', '.']

Funtions


The regex expression mainly uses 4 functions. They are:

  • findall()
  • search()
  • split()
  • sub()

findall()


The findall() method is used to find all the matches and returns it in the form of a list.

Example

import re
txt = "Happy learning..."
x = re.findall("earn", txt)
print(x)

Output

['earn']

search()


The search() method is used to search the given pattern and return the matched items in a list.

Example1

import re
txt = "Happy learning..."
x = re.search("earn", txt)
if(x):
    print('Match found')
else:
    print('No match found')

Output

Match found

Example2

import re
txt = "Happy learning..."
x = re.search("python", txt)
if(x):	
    print('Match found')
else:
    print('No match found')

Output

No match found

split()


The split() method is used to split the string matched with the condition specified.

Example

import re
txt = "Happy learning..."
x = re.split("\s", txt) # used to split the text at each white space
print(x)

Output

['Happy', 'learning...']

sub()


The sub() method is used to substitute the given text at the specified position.

Example

import re
txt = "Happy python learning..."
x = re.sub("\s","-", txt)
print(x)

Output

Happy-python-learning...

Note: For sub(), the user can also pass another parameter mentioning the number of times in which the substitute should be used.

Example

import re
txt = "Happy python learning..."
x = re.sub("\s","-", txt,1)
print(x)

Output

Happy-python learning...

Also Read:


Create Language Translator Using Python

Get Any Country Date And Time Using Python

Get Jokes with Python

Snake Game in Python using Pygame

Covid-19 Tracker Application Using Python

YouTube Video Downloader Application Using Python