Python正则表达式_python正则表达式提取_python正则表达式匹配
类似Perl的正则表达式在Python模块重新提供充分支持。如果出现错误,而编译或使用正则表达式re模块中引发的异常re.error.
我们将涵盖两个重要的功能,将用于处理正则表达式。但第一一件小事:有各种字符,将有特殊的意义,当他们在使用正则表达式。为了避免任何混淆,同时处理正则表达式,我们将使用原始字符串作为r’expression“.
match 函数
这个函数尝试Re配合可选的标志图案,字符串.
下面是这个函数的语法:
re.match(pattern, string, flags=0)
下面是参数的描述:
Parameter | Description |
---|---|
pattern | This is the regular expression to be matched. |
string | This is the string which would be searched to match the pattern |
flags | You can specifiy different flags using bitwise OR (|). These are modifiers which are listed in the table below. |
re.match函数返回一个成功的匹配对象,失败返回None。我们将使用group(num)或groups()匹配对象的函数相匹配的表达.
Match Object Methods | Description |
---|---|
group(num=0) | This methods returns entire match (or specific subgroup num) |
groups() | This method return all matching subgroups in a tuple (empty if there weren’t any) |
例子:
#!/usr/bin/python import re line = "Cats are smarter than dogs"; matchObj = re.match( r'(.*) are(.*)', line, re.M|re.I) if matchObj: print "matchObj.group() : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2) else: print "No match!!"
这将产生以下结果:
matchObj.group(): Cats are matchObj.group(1) : Cats matchObj.group(2) :
search 函数
这与可选的标志字符串内第一次出现的RE模式中的函数搜索.
下面是这个函数的语法:
re.search(pattern, string, flags=0)
下面是参数的描述:
Parameter | Description |
---|---|
pattern | This is the regular expression to be matched. |
string | This is the string which would be searched to match the pattern |
flags | You can specifiy different flags using bitwise OR (|). These are modifiers which are listed in the table below. |
re.search可以函数返回一个成功的匹配对象,失败返回None。我们将使用group(num)或groups()匹配对象的函数相匹配的表达.
Match Object Methods | Description |
---|---|
group(num=0) | This methods returns entire match (or specific subgroup num) |
groups() | This method return all matching subgroups in a tuple (empty if there weren’t any) |
例子:
#!/usr/bin/python import re line = "Cats are smarter than dogs"; matchObj = re.search( r'(.*) are(.*)', line, re.M|re.I) if matchObj: print "matchObj.group() : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2) else: print "No match!!"
这将产生以下结果:
matchObj.group(): Cats are matchObj.group(1) : Cats matchObj.group(2) :
匹配和搜索:
Python提供了两种不同的基于正则表达式的基本操作:只匹配字符串的开头匹配的检查,而搜索匹配字符串中的任何地方检查(这是什么Perl的默认情况下).
例子:
#!/usr/bin/python import re line = "Cats are smarter than dogs"; matchObj = re.match( r'dogs', line, re.M|re.I) if matchObj: print "match --> matchObj.group() : ", matchObj.group() else: print "No match!!" matchObj = re.search( r'dogs', line, re.M|re.I) if matchObj: print "search --> matchObj.group() : ", matchObj.group() else: print "No match!!"
这将产生以下结果:
No match!! search --> matchObj.group() : dogs
搜索和替换:
最重要的re使用正则表达式的方法,有些是子.
语法:
re.sub(pattern, repl, string, max=0)
这种方法替换RE模式与repl字符串出现的所有的,除非最大提供代出现的所有。此方法将返回修改后的字符串.
例子:
以下是例子:
#!/usr/bin/python phone = "2004-959-559 #This is Phone Number" # Delete Python-style comments num = re.sub(r'#.*$', "", phone) print "Phone Num : ", num # Remove anything other than digits num = re.sub(r'D', "", phone) print "Phone Num : ", num
这将产生以下结果:
Phone Num : 2004-959-559 Phone Num : 2004959559
正则表达式修饰符 – 选项标志
正则表达式的文字可能包含一个可选的修饰符来控制各方面的匹配。修饰符被指定为一个可选的标志。您可以提供多个修改使用异或(|),如前所示,并可以由其中的一个表示:
Modifier | Description |
---|---|
re.I | Performs case-insensitive matching. |
re.L | Interprets words according to the current locale.This interpretation affects the alphabetic group (w and W), as well as word boundary behavior (b and B). |
re.M | Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string). |
re.S | Makes a period (dot) match any character, including a newline. |
re.U | Interprets letters according to the Unicode character set. This flag affects the behavior of w, W, b, B. |
re.X | Permits “cuter” regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash), and treats unescaped # as a comment marker. |
正则表达式模式:
除控制字符,(* ^$()[] {} |?),所有的字符与自身匹配。你可以逃避控制字符前面加上一个反斜杠.
下表列出的是在Python的正则表达式语法.
Pattern | Description |
---|---|
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[…] | Matches any single character in brackets. |
[^…] | Matches any single character not in brackets |
re* | Matches 0 or more occurrences of preceding expression. |
re+ | Matches 1 or more occurrence of preceding expression. |
re? | Matches 0 or 1 occurrence of preceding expression. |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,} | Matches n or more occurrences of preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of preceding expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers matched text. |
(?imx) | Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?-imx) | Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?: re) | Groups regular expressions without remembering matched text. |
(?imx: re) | Temporarily toggles on i, m, or x options within parentheses. |
(?-imx: re) | Temporarily toggles off i, m, or x options within parentheses. |
(?#…) | Comment. |
(?= re) | Specifies position using a pattern. Doesn’t have a range. |
(?! re) | Specifies position using pattern negation. Doesn’t have a range. |
(?> re) | Matches independent pattern without backtracking. |
w | Matches word characters. |
W | Matches nonword characters. |
s | Matches whitespace. Equivalent to [tnrf]. |
S | Matches nonwhitespace. |
d | Matches digits. Equivalent to [0-9]. |
D | Matches nondigits. |
A | Matches beginning of string. |
Z | Matches end of string. If a newline exists, it matches just before newline. |
z | Matches end of string. |
G | Matches point where last match finished. |
b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
B | Matches nonword boundaries. |
n, t, etc. | Matches newlines, carriage returns, tabs, etc. |
1…9 | Matches nth grouped subexpression. |
10 | Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code. |
正则表达式示例:
原义字符:
Example | Description |
---|---|
python | Match “python”. |
字符类:
Example | Description |
---|---|
[Pp]ython | Match “Python” or “python” |
rub[ye] | Match “ruby” or “rube” |
[aeiou] | Match any one lowercase vowel |
[0-9] | Match any digit; same as [0123456789] |
[a-z] | Match any lowercase ASCII letter |
[A-Z] | Match any uppercase ASCII letter |
[a-zA-Z0-9] | Match any of the above |
[^aeiou] | Match anything other than a lowercase vowel |
[^0-9] | Match anything other than a digit |
特殊的字符类:
Example | Description |
---|---|
. | Match any character except newline |
d | Match a digit: [0-9] |
D | Match a nondigit: [^0-9] |
s | Match a whitespace character: [ trnf] |
S | Match nonwhitespace: [^ trnf] |
w | Match a single word character: [A-Za-z0-9_] |
W | Match a nonword character: [^A-Za-z0-9_] |
重复案例:
Example | Description |
---|---|
ruby? | Match “rub” or “ruby”: the y is optional |
ruby* | Match “rub” plus 0 or more ys |
ruby+ | Match “rub” plus 1 or more ys |
d{3} | Match exactly 3 digits |
d{3,} | Match 3 or more digits |
d{3,5} | Match 3, 4, or 5 digits |
不可贪婪的重复:
此相匹配的数量最小的重复:
Example | Description |
---|---|
<.*> | Greedy repetition: matches “perl>” |
<.*?> | Nongreedy: matches “” in “perl>” |
用括号分组:
Example | Description |
---|---|
Dd+ | No group: + repeats d |
(Dd)+ | Grouped: + repeats Dd pair |
([Pp]ython(, )?)+ | Match “Python”, “Python, python, python”, etc. |
反向引用:
这再次匹配先前匹配的组:
Example | Description |
---|---|
([Pp])ython&1ails | Match python&rails or Python&Rails |
([‘”])[^1]*1 | Single or double-quoted string. 1 matches whatever the 1st group matched . 2 matches whatever the 2nd group matched, etc. |
替代品:
Example | Description |
---|---|
python|perl | Match “python” or “perl” |
rub(y|le)) | Match “ruby” or “ruble” |
Python(!+|?) | “Python” followed by one or more ! or one ? |
锚:
这需要指定匹配的位置
Example | Description |
---|---|
^Python | Match “Python” at the start of a string or internal line |
Python$ | Match “Python” at the end of a string or line |
APython | Match “Python” at the start of a string |
PythonZ | Match “Python” at the end of a string |
bPythonb | Match “Python” at a word boundary |
brubB | B is nonword boundary: match “rub” in “rube” and “ruby” but not alone |
Python(?=!) | Match “Python”, if followed by an exclamation point |
Python(?!!) | Match “Python”, if not followed by an exclamation point |
带括号的特殊语法:
Example | Description |
---|---|
R(?#comment) | Matches “R”. All the rest is a comment |
R(?i)uby | Case-insensitive while matching “uby” |
R(?i:uby) | Same as above |
rub(?:y|le)) | Group only without creating 1 backreference |