Some Advanced usage on regex
get float number:
import re
target = 'total growth is : 8%-8.5%'
rate = re.findall('(\d+(?:\.\d+)?)', target)
print(rate)
['8', '8.5']
#\d is any int, + is one to many
#(?:...)
#A non-capturing group allows you to apply quantifiers to part of your regex but does not capture/assign an ID.
#so it means: find int or, if int follows a "." and any int afterwards, combine the int with ".int"
#For example, repeating 1-3 digits and a period 3 times can be done like #this: /(?:\d{1,3}\.){3}\d{1,3}/
#above regex can match string like : 219.254.232.123 or 232.212.222.3434 (can only go until 232.212.222.343),
#can not match 232.2122.222.3434 or 232.212.22a.3434
#`(\d+(?:\.\d+)?)` is a group, so result will be put in a list.
#if you just want to match any int or .xxx or xxxx.xxx,just use
rate = re.findall('(\d+(\.\d+)?)', target)
print(rate)
[('8', ''), ('8.5', '.5')]
#which will return a list to you
#if you just want to find a float, use below:
target = 'total growth is : 8%-8.5% or 9.546%'
rate = re.findall('(\d+\.\d+)', target)
print(rate)
['8.5', '9.546']
A bit deeper:
A(?=B)
:Match String A ends with String B
A(?!B)
:Match String A NOT end with String B
(?<=B)A
:Match String A start with String B
(?<!B)A
: Match String A NOT start with String B
target = 'in 2021 financial year total growth is : 8%-8.5% or 9.546%'
rate = re.findall(r'(\d+(?:\.\d+)?)(?![\d financial])', target)
print(rate)
['8', '8.5', '9.546']
# :! means **NOT end** with
# ?![\d financial] means number followed by " financial" or a digit, so 2021 is totally excluded. if with only ?![ financial], it wil match 202
Find any number before %
target = 'in 2021 financial year total growth is : 8%-8.5% or 9.546%'
rate = re.findall('\d+(?=%)', target)
print(rate)
['8', '5', '546']
#to find float before % :
rate = re.findall('(\d+\.\d+)(?=%)', target) #combine previous regex
print(rate)
['8.5', '9.546']
# to find int or float before %:
rate = re.findall('(\d+(?:\.\d+)?)(?=%)', target) #combine previous regex
print(rate)
['8', '8.5', '9.546']
# find year:
rate = re.findall('\d+(?= financial)', target)
print(rate)
['2021']
No. 2 test:
A(?!B)
:Match String A NOT end with String B
# find any number not end with " financial"
rate = re.findall('\d+(?![\d financial])', target)
print(rate)
['8', '8', '5', '9', '546']
No. 3 test:
(?<=B)A
:Match String A start with String B
text = 'Price: $10000.00'
print(re.findall(r'(?<=\$)\d+', text))
['10000']
No. 4 test:
(?<!B)A
: Match String A NOT start with String B
text = 'Price: $10000.00, Quantity: 5'
print(re.findall(r'(?<![\d\$\.])\d+', text))
['5']
Now come to the import ant one, regex for a IPv4 address
IPPattern = re.compile('(?<![\.\d])(?:25[0-5]\.|2[0-4]\d\.|[1]\d\d\.|[1-9]\d\.|[1-9]\.)(?:25[0-5]\.|2[0-4]\d\.|[01]?\d\d?\.){2}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)(?![\.\d])')
