Introduction to Regular Expressions in Python

In this class, We discuss Introduction to Regular Expressions in Python.

For Complete YouTube Video: Click Here

In our previous lessons, we discussed numpy module. Click here.

Follow our playlist for a better understanding of the complete python language.

The concept of regular expressions is used much in many of the applications.

Take an example and understand the concept of the regular expression.

Regular Expressions

dot Symbol

In regular expressions, the symbol dot represents a character.

Take an example string, string=” hello how are youh”.

Suppose we want to identify the character present after character h in the given string. We need regular expressions.

We use the findall function from the regular expression module.

We discuss methods and functions in regular expressions in our later classes.

Remember, findall function will take the expression and string as input.

The findall function will check the matching expression in the given string.

The matching expressions will return as output by the findall function.

The below example shows the program using findall function.

import re
string="hello how are youh"
x=re.findall('h.',string)
print(x)

Output:
['he', 'ho']

x=re.findall(‘h.’,string) the regular expression is ‘h.’

The expression ‘h.’ informs to identify characters matching h followed by any character. The symbol dot tells any character.

The output of the program is [‘he’,’ ho’]. These are the matching characters found in the string.

In the example string, we have the character ‘h’ at the end of the line. The expression does not consider last ‘h’. Because after ‘h’ we do not have any character.

Use of dot symbol.

Example: ruby and rubi both the words are in usage. We write a regular expression ‘rub.’ to consider both words as one word.

The expression’ rub.’ will consider both the words from the text document.

^ symbol

The symbol ‘^’ tells the expression to check at the begining of the line.

Example: string = “hello how are you”

findall(“^hello”, string) this function will check the word at the begining of the line.

Take a multi-line string and apply the symbol’ ^’. 

The expression will not match each line in the multi-line string.

To use the ‘^’ to each line in the multi-line string. We need a flag in the findall function.

We discuss flags in the following classes.

The below example shows the use of ‘^’ symbol on a single line string.

string="hello how are you"
x=re.findall('^h.',string)
print(x)

Output:
['he']

$ symbol

We use the symbol ‘$’ to mention the end of the line.

The character ‘$’ is precisely opposite to the ‘^’ symbol. It checks at the end of the line.

The below example shows the program using the character ‘$’.

string="hello how are you"
x=re.findall('you$',string)
print(x)

Output:
['you']

[] symbol

We use the square brackets symbol to mention a set of characters.

Take few examples and understand the symbol ‘[]’ better.

‘[hs].’ the expression will identify matching characters, either ‘h’ or ‘s’.

After ‘h’ any character or after ‘s’ is accepted by the expression because we have a dot after square brackets.

string="hello how are you so good"
x=re.findall('[hs].',string)
print(x)

Output:
['he', 'ho', 'so']

‘[^hs]’ the expression will consider characters except ‘h’ and ‘s’.

The symbol ‘^’ in square brackets does not mention the start of the line. We consider the character ‘^’ as neither ‘h’ nor ‘s’.

string="hello how are you so good"
x=re.findall('[^hs].',string)
print(x)

Output:
['el', 'lo', ' h', 'ow', ' a', 're', ' y', 'ou', ' s', 'o ', 'go', 'od']

‘[a-z].’ the expression will consider characters from a to z.

string="hello how are you so good"
x=re.findall('[a-z].',string)
print(x)

Output:
['he', 'll', 'o ', 'ho', 'w ', 'ar', 'e ', 'yo', 'u ', 'so', 'go', 'od']

| symbol

The symbol ‘|’ is used to say either.

The example hello|how will consider either hello or how.

The below example shows the program using the symbol ‘|’.

string="hello how are you so good"
x=re.findall('hello|how',string)
print(x)

Output:
['hello', 'how']

* Symbol

We use the symbol ‘*” to represent zero or more occurrences of an expression.

The character ‘*’ is important.

The below example shows the program using the symbol ‘*’.

string="aba abba aa acba"
x=re.findall('ab*a',string)
print(x)

Output:
['aba', 'abba', 'aa']

In the above program, we mentioned the expression ‘ab*a’.

The expression ‘ab*a’ will identify ‘aba’ in the string. Because after character’ a’ b can obtain zero or any number of times.

Here, ‘b’ occurred one time.

The expression ‘ab*a’ will identify ‘abba’ in the string. Because after the character’ a’ character ‘b’ occurred two times.

Finally, the ‘ab*a’ will identify ‘aa’ in the string because the character ‘b’ occurred zero times.

+ symbol

We use the symbol ‘+’ is used to represent one or more occurrences of an expression.

The symbol ‘+’ is similar to character ‘*’, but the star is zero or more occurrences. ‘+’ is one or more occurrences.

The below example shows the program using the symbol ‘+’.

string="aba abba aa acba"
x=re.findall('ab+a',string)
print(x)

Output:
['aba', 'abba']

{} Symbol

The symbol’ {}’ is used to mention the count of occurrence of the expression.

The expression ‘ab{2}a’ will identify ‘abba’. Because ‘b’ has to have occurred two times.

The below example shows the program using the symbol’ {}’.

string="aba abba aa acba"
x=re.findall('ab{2}a',string)
print(x)

Output:
['abba']

The term ‘ab{2,5}a’ will take 2,3,4,or5 occurrences of the character b.

string="aba abba aa acba abbba"
x=re.findall('ab{2,5}a',string)
print(x)

Output:
['aba', 'aa']

? symbol

The symbol ‘?’ will check for zero or one occurrence of the expression.

The below example shows the program using the character ‘?’.

string="aba abba aa acba"
x=re.findall('ab?a',string)
print(x)

Output:
['aba', 'aa']