Synopsis: 如果你想匹配的是字面字符串，那么你通常只需要调用基本字符串方法就行，比如 str.find() , str.endswith() , str.startswith() 或者类似的方法。对于复杂的匹配需要使用正则表达式和 re 模块，如果你想使用同一个模式去做多次匹配，你应该先将模式字符串预编译为模式对象。match() 总是从字符串开始去匹配，如果你想查找字符串任意部分的模式出现位置，使用 findall() 方法去代替

强烈推荐正则表达式在线测试网站： https://regex101.com/

1. 标准库模块 re

更多详情参考官方文档：

Python3中使用re模块支持正则表达式（Regular Expression），需要定义一个用于匹配的模式（pattern）字符串，以及一个要匹配的字符串（string）。简单的匹配：

In [1]: import re

In [2]: m = re.match('My', 'My name is wangy')

In [3]: m
Out[3]: <_sre.SRE_Match object; span=(0, 2), match='My'>

In [4]: m.group()  # 等价于m.group(0)
Out[4]: 'My'

In [5]: m.start(), m.end()
Out[5]: (0, 2)

In [6]: m.span()
Out[6]: (0, 2)

其中，My是正则表达式模式，最简单的，只匹配字符My本身。而My name is wangy是想要检查的字符串，re.match()函数用于查看字符串是不是以正则模式开头。

如果你仅仅是做一次简单的文本匹配/搜索操作的话，可以直接使用 re 模块级别的函数，比如re.match。如果你打算做大量的匹配和搜索操作的话，最好先编译正则表达式，然后再重复使用它：

In [1]: import re

In [2]: p = re.compile('[a-z]+')  # [a-z]+ 是正则模式，表示1个或多个小写字母

In [3]: p
Out[3]: re.compile(r'[a-z]+', re.UNICODE)

In [4]: if p.match('hello123'):   # p是预编译后的正则模式，它也有match等方法，只是参数不同，不需要再传入正则模式。判断字符串'hello123'是否以1个或多个小写字母开头
   ...:     print('yes')
   ...: else:
   ...:     print('no')
   ...:     
yes

In [5]: if p.match('123hi'):      # 重用预编译过的正则模式
   ...:     print('yes')
   ...: else:
   ...:     print('no')
   ...:     
no

模块级别的函数会将最近编译过的模式缓存起来，因此并不会消耗太多的性能，但是如果使用预编译模式的话，你将会减少查找和一些额外的处理损耗。

1.1 使用`match()`从字符串开头开始匹配

可以使用模块级别的re.match()或预编译模式的p.match()，如果字符串是以正则表达式开头，则表明匹配成功，返回匹配到的对象，比如<_sre.SRE_Match object; span=(0, 2), match='My'>，如果匹配失败，返回None

In [1]: import re

In [2]: m1 = re.match('wangy', 'wangy is a handsome boy.')  # 模块级的match方法

In [3]: m1  # 匹配成功，返回Match对象
Out[3]: <_sre.SRE_Match object; span=(0, 5), match='wangy'>

In [4]: m1.group()  # Match对象有group()、start()、end()、span()等方法
Out[4]: 'wangy'

In [5]: m2 = re.match('mayun', 'wangy is a handsome boy.')  # 匹配失败

In [6]: type(m2)  # 返回None
Out[6]: NoneType

In [7]: p = re.compile('wangy')  # 预编译正则模式也是可以的

In [8]: p.match('wangy is a handsome boy.')  # 调用预编译正则模式的match方法
Out[8]: <_sre.SRE_Match object; span=(0, 5), match='wangy'>

1.2 使用`search()`寻找首次匹配

如果字符串中有多个地方与正则表达式匹配的话，search()方法返回第一次匹配到的结果：

search(pattern, string, flags=0)
    Scan through string looking for a match to the pattern, returning
    a match object, or None if no match was found.
(END)

In [1]: import re

In [2]: s = 'I wish I may, I wish I might have a dish of fish tonight.'

In [3]: re.search('wish', s)
Out[3]: <_sre.SRE_Match object; span=(2, 6), match='wish'>

In [4]: re.search('wish', s).span()
Out[4]: (2, 6)

1.3 使用`findall()`或`finditer()`寻找所有匹配

前面两个函数都是查找到一个匹配后就停止，如果要查找字符串中所有的匹配项，可以使用findall()

In [1]: import re

In [2]: text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'

In [3]: p = re.compile('\d+/\d+/\d+')

In [4]: p.findall(text)
Out[4]: ['11/27/2012', '3/13/2013']

findall()方法会搜索文本并以列表形式返回所有的匹配。如果你想以迭代方式返回匹配，可以使用finditer()方法来代替，比如：

In [5]: iters = p.finditer(text)

In [6]: iters
Out[6]: <callable_iterator at 0x7f94c1703f98>

In [7]: for m in iters:
   ...:     print(m)
   ...:     
<_sre.SRE_Match object; span=(9, 19), match='11/27/2012'>
<_sre.SRE_Match object; span=(34, 43), match='3/13/2013'>

1.4 使用`split()`按匹配切分

字符串的str.split()方法只适应于非常简单的字符串分割情形，它并不允许有多个分隔符或者是分隔符周围不确定的空格。当你需要更加灵活的切割字符串的时候，最好使用 re.split() 方法：

In [1]: import re

In [2]: line = 'asdf fjdk;   afed,  fjek,asdf,   foo'

In [3]: re.split(r'[;,\s]\s*', line)  # 正则模式表示 ;或,或空白字符且它们的后面再跟0个或多个空白字符
Out[3]: ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

1.5 使用`sub()`替换匹配

对于简单的字面模式，直接使用字符串的str.replace()方法即可，比如：

In [1]: text = 'yeah, but no, but yeah, but no, but yeah'

In [2]: text.replace('yeah', 'yep')
Out[2]: 'yep, but no, but yep, but no, but yep'

对于复杂的模式，请使用re模块中的sub()，比如你想将形式为 11/27/2012 的日期字符串改成 2012-11-27 。示例如下：

In [1]: import re

In [2]: text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'

In [3]: re.sub('(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text)
Out[3]: 'Today is 2012-11-27. PyCon starts 2013-3-13.'

sub()函数中的第一个参数是被匹配的模式，第二个参数是替换模式。反斜杠数字比如\3指向前面模式的第3个捕获组，此时要加r指定为原始字符串，否则会被Python自动转义为\x03

对于更加复杂的替换，可以传递一个替换回调函数来代替。一个替换回调函数的参数是一个Match对象，也就是match()或者find()返回的对象。使用group()方法来提取特定的匹配部分。回调函数最后返回替换字符串。比如：

In [1]: import re

In [2]: from calendar import month_abbr

In [3]: def change_date(m):
   ...:     mon_name = month_abbr[int(m.group(1))]
   ...:     return '{} {} {}'.format(m.group(2), mon_name, m.group(3))
   ...: 
   ...: 

In [4]: text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'

In [5]: p = re.compile(r'(\d+)/(\d+)/(\d+)')

In [6]: p.sub(change_date,