本站分享:AI、大数据、数据分析师培训认证考试,包括:Python培训Excel培训Matlab培训SPSS培训SAS培训R语言培训Hadoop培训Amos培训Stata培训Eviews培训

Python正则表达式_python正则表达式提取_python正则表达式匹配

python培训 cdadata 2944℃

Python正则表达式_python正则表达式提取_python正则表达式匹配

类似Perl的正则表达式在Python模块重新提供充分支持。如果出现错误,而编译或使用正则表达式re模块中引发的异常re.error.

我们将涵盖两个重要的功能,将用于处理正则表达式。但第一一件小事:有各种字符,将有特殊的意义,当他们在使用正则表达式。为了避免任何混淆,同时处理正则表达式,我们将使用原始字符串作为r’expression“.

match 函数

这个函数尝试Re配合可选的标志图案,字符串.

下面是这个函数的语法:

re.match(pattern, string, flags=0)

下面是参数的描述:

Parameter Description
pattern This is the regular expression to be matched.
string This is the string which would be searched to match the pattern
flags You can specifiy different flags using bitwise OR (|). These are modifiers which are listed in the table below.

re.match函数返回一个成功的匹配对象,失败返回None。我们将使用group(num)或groups()匹配对象的函数相匹配的表达.

Match Object Methods Description
group(num=0) This methods returns entire match (or specific subgroup num)
groups() This method return all matching subgroups in a tuple (empty if there weren’t any)

例子:

#!/usr/bin/python
import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'(.*) are(.*)', line, re.M|re.I)

if matchObj:
   print "matchObj.group() : ", matchObj.group()
   print "matchObj.group(1) : ", matchObj.group(1)
   print "matchObj.group(2) : ", matchObj.group(2)
else:
   print "No match!!"

这将产生以下结果:

matchObj.group(): Cats are
matchObj.group(1) : Cats
matchObj.group(2) :

search 函数

这与可选的标志字符串内第一次出现的RE模式中的函数搜索.

下面是这个函数的语法:

re.search(pattern, string, flags=0)

下面是参数的描述:

Parameter Description
pattern This is the regular expression to be matched.
string This is the string which would be searched to match the pattern
flags You can specifiy different flags using bitwise OR (|). These are modifiers which are listed in the table below.

re.search可以函数返回一个成功的匹配对象,失败返回None。我们将使用group(num)或groups()匹配对象的函数相匹配的表达.

Match Object Methods Description
group(num=0) This methods returns entire match (or specific subgroup num)
groups() This method return all matching subgroups in a tuple (empty if there weren’t any)

例子:

#!/usr/bin/python
import re

line = "Cats are smarter than dogs";

matchObj = re.search( r'(.*) are(.*)', line, re.M|re.I)

if matchObj:
   print "matchObj.group() : ", matchObj.group()
   print "matchObj.group(1) : ", matchObj.group(1)
   print "matchObj.group(2) : ", matchObj.group(2)
else:
   print "No match!!"

这将产生以下结果:

matchObj.group(): Cats are
matchObj.group(1) : Cats
matchObj.group(2) :

匹配和搜索:

Python提供了两种不同的基于正则表达式的基本操作:只匹配字符串的开头匹配的检查,而搜索匹配字符串中的任何地方检查(这是什么Perl的默认情况下).

例子:

#!/usr/bin/python
import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print "match --> matchObj.group() : ", matchObj.group()
else:
   print "No match!!"

matchObj = re.search( r'dogs', line, re.M|re.I)
if matchObj:
   print "search --> matchObj.group() : ", matchObj.group()
else:
   print "No match!!"

这将产生以下结果:

No match!!
search --> matchObj.group() :  dogs

搜索和替换:

最重要的re使用正则表达式的方法,有些是子.

语法:

re.sub(pattern, repl, string, max=0)

这种方法替换RE模式与repl字符串出现的所有的,除非最大提供代出现的所有。此方法将返回修改后的字符串.

例子:

以下是例子:

#!/usr/bin/python

phone = "2004-959-559 #This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print "Phone Num : ", num

# Remove anything other than digits
num = re.sub(r'D', "", phone)    
print "Phone Num : ", num

这将产生以下结果:

Phone Num :  2004-959-559
Phone Num :  2004959559

 

正则表达式修饰符 – 选项标志

正则表达式的文字可能包含一个可选的修饰符来控制各方面的匹配。修饰符被指定为一个可选的标志。您可以提供多个修改使用异或(|),如前所示,并可以由其中的一个表示:

Modifier Description
re.I Performs case-insensitive matching.
re.L Interprets words according to the current locale.This interpretation affects the alphabetic group (w and W), as well as word boundary behavior (b and B).
re.M Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).
re.S Makes a period (dot) match any character, including a newline.
re.U Interprets letters according to the Unicode character set. This flag affects the behavior of w, W, b, B.
re.X Permits “cuter” regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash), and treats unescaped # as a comment marker.

正则表达式模式:

除控制字符,(* ^$()[] {} |?),所有的字符与自身匹配。你可以逃避控制字符前面加上一个反斜杠.

下表列出的是在Python的正则表达式语法.

Pattern Description
^ Matches beginning of line.
$ Matches end of line.
. Matches any single character except newline. Using m option allows it to match newline as well.
[…] Matches any single character in brackets.
[^…] Matches any single character not in brackets
re* Matches 0 or more occurrences of preceding expression.
re+ Matches 1 or more occurrence of preceding expression.
re? Matches 0 or 1 occurrence of preceding expression.
re{ n} Matches exactly n number of occurrences of preceding expression.
re{ n,} Matches n or more occurrences of preceding expression.
re{ n, m} Matches at least n and at most m occurrences of preceding expression.
a| b Matches either a or b.
(re) Groups regular expressions and remembers matched text.
(?imx) Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected.
(?-imx) Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected.
(?: re) Groups regular expressions without remembering matched text.
(?imx: re) Temporarily toggles on i, m, or x options within parentheses.
(?-imx: re) Temporarily toggles off i, m, or x options within parentheses.
(?#…) Comment.
(?= re) Specifies position using a pattern. Doesn’t have a range.
(?! re) Specifies position using pattern negation. Doesn’t have a range.
(?> re) Matches independent pattern without backtracking.
w Matches word characters.
W Matches nonword characters.
s Matches whitespace. Equivalent to [tnrf].
S Matches nonwhitespace.
d Matches digits. Equivalent to [0-9].
D Matches nondigits.
A Matches beginning of string.
Z Matches end of string. If a newline exists, it matches just before newline.
z Matches end of string.
G Matches point where last match finished.
b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
B Matches nonword boundaries.
n, t, etc. Matches newlines, carriage returns, tabs, etc.
1…9 Matches nth grouped subexpression.
10 Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.

 

正则表达式示例:

原义字符:

Example Description
python Match “python”.

字符类:

Example Description
[Pp]ython Match “Python” or “python”
rub[ye] Match “ruby” or “rube”
[aeiou] Match any one lowercase vowel
[0-9] Match any digit; same as [0123456789]
[a-z] Match any lowercase ASCII letter
[A-Z] Match any uppercase ASCII letter
[a-zA-Z0-9] Match any of the above
[^aeiou] Match anything other than a lowercase vowel
[^0-9] Match anything other than a digit

特殊的字符类:

Example Description
. Match any character except newline
d Match a digit: [0-9]
D Match a nondigit: [^0-9]
s Match a whitespace character: [ trnf]
S Match nonwhitespace: [^ trnf]
w Match a single word character: [A-Za-z0-9_]
W Match a nonword character: [^A-Za-z0-9_]

重复案例:

Example Description
ruby? Match “rub” or “ruby”: the y is optional
ruby* Match “rub” plus 0 or more ys
ruby+ Match “rub” plus 1 or more ys
d{3} Match exactly 3 digits
d{3,} Match 3 or more digits
d{3,5} Match 3, 4, or 5 digits

不可贪婪的重复:

此相匹配的数量最小的重复:

Example Description
<.*> Greedy repetition: matches “perl>”
<.*?> Nongreedy: matches “” in “perl>”

用括号分组:

Example Description
Dd+ No group: + repeats d
(Dd)+ Grouped: + repeats Dd pair
([Pp]ython(, )?)+ Match “Python”, “Python, python, python”, etc.

反向引用:

这再次匹配先前匹配的组:

Example Description
([Pp])ython&1ails Match python&rails or Python&Rails
([‘”])[^1]*1 Single or double-quoted string. 1 matches whatever the 1st group matched . 2 matches whatever the 2nd group matched, etc.

替代品:

Example Description
python|perl Match “python” or “perl”
rub(y|le)) Match “ruby” or “ruble”
Python(!+|?) “Python” followed by one or more ! or one ?

锚:

这需要指定匹配的位置

Example Description
^Python Match “Python” at the start of a string or internal line
Python$ Match “Python” at the end of a string or line
APython Match “Python” at the start of a string
PythonZ Match “Python” at the end of a string
bPythonb Match “Python” at a word boundary
brubB B is nonword boundary: match “rub” in “rube” and “ruby” but not alone
Python(?=!) Match “Python”, if followed by an exclamation point
Python(?!!) Match “Python”, if not followed by an exclamation point

带括号的特殊语法:

Example Description
R(?#comment) Matches “R”. All the rest is a comment
R(?i)uby Case-insensitive while matching “uby”
R(?i:uby) Same as above
rub(?:y|le)) Group only without creating 1 backreference

 

转载请注明:数据分析 » Python正则表达式_python正则表达式提取_python正则表达式匹配

喜欢 (0)or分享 (0)