Python 正则表达式分组匹配提取替换字符串(回调函数)

Python中使用正则表达式可以实现匹配到的字符串进行提取和替换,并且可以每次匹配执行一个回调函数进行处理,本文主要介绍Python 正则表达式分组匹配提取替换字符串的方法,以及相关的示例代码。

1、Python正则表达式

参考文档:Python 正则表达式(RegEx)

2、re.sub(pattern, repl, string, count=0, flags=0)

Python中实现正则表达式提取替换,需要使用re.sub(),具体参数说明如下:

1)pattern参数

pattern是正则表达式字符串。字符串前一般加r

参考文档:Python 字符串前r、b、u和f的前缀作用及用法

2)repl参数

repl是被替换成的内容,可以是字符串,也可以是函数。字符串匹配后直接替换,函数则会每次匹配的字符串会调用执行进行处理。

3)string参数

string表示要被处理,要被替换的字符串。

4)count参数

count是匹配替换的次数

5)flags参数

IGNORECASE(简写I),匹配对大小写不敏感。

LOCALE(简写L),locales是C语言库中的一项功能,是用来为需要考虑不同区域语言的编程提供帮助的。

MULTILINE(简写M),^匹配字符串的开始和字符串中每行的开始。同样的, $元字符匹配字符串结尾和字符串中每行的结尾。

DOTALL(简写S),此模式下.的匹配不受限制,可匹配任何字符,包括换行符,但默认是不能匹配换行符。

VERBOSE(简写X),冗余模式, 此模式忽略正则表达式中的空白和#号的注释。

3、使用re.sub()提取替换字符串

可以使用re.sub()提取html中指定内容并进行替换,如下,

import re
def replace_num(str):
  print("--------------")  
  print(str.group())  
  print(str.group(1))#匹配到的第1个分组
  print(str.group(2))#匹配到的第2个分组
  print("--------------") 
  return ""
my_str = '''
<p><strong>1、new 运算符</strong>:用于创建对象和调用构造函数。这个我们创建对象实例就比较常用了,比如:</p><pre class="prettyprint linenums">     StringBuilder str=new  StringBuilder();</pre><p><strong>2、new 修饰符</strong>:在用作修饰符时,new 关键字可以显式隐藏从基类继承的成员。简单的说,就是现在写的这个类,想写一个和基类中相同的成员,而不继承基类的。运用多态的特性时,也不会调用这个显示隐藏的方法。具体看下如下代码:</p><pre class="prettyprint linenums">using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp2
{
    public class Program
    {
        static void Main(string[] args)
        {
            animal a = new animal();
            a.name = "animal";
            a.say();
            cat c = new cat();
            c.name = "cat";
            c.say();
            dog d = new dog();
            d.name = "dog";
            d.say();
            sheep s = new sheep();
            s.name = "sheep";
            s.say();
            animal a1 = new cat();
            a1.say();
            animal a2 = new dog();
            a2.say();
            animal a3 = new sheep();
            a3.say();
        }
    }
    class animal
    {
        public string name { get; set; }
        public virtual void say()
        {
            Console.WriteLine("animal say");
        }
    }
    class cat : animal
    {
        public override void say()
        {
            Console.WriteLine("cat say");
        }
    }
    class dog : animal
    {
        public new void say()   //这个方法被显示隐藏了
        {
            Console.WriteLine("dog say");
        }
    }
    class sheep : animal
    {
    }
}<br></pre><p><strong>3、new 约束</strong>:用于在泛型声明中约束可能用作类型参数的参数的类型。举个例子看一下,泛型类中T要求有一个无参的构造函数,代码如下,</p><pre class="prettyprint linenums">using System;<br>using System.Collections.Generic;<br>namespace ConsoleApplication2<br>{<br>&nbsp; &nbsp; public class Employee<br>&nbsp; &nbsp; {<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;private string name;<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;private int id;<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;public Employee()<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;{<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;name = "Temp";<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;id = 0;<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;}<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;public Employee(string s, int i)<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;{<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;name = s;<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;id = i;<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;}<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;public string Name<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;{<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;get { return name; }<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;set { name = value; }<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;}<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;public int ID<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;{<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;get { return id; }<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;set { id = value; }<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;}<br>&nbsp; &nbsp; }<br>&nbsp; &nbsp; class ItemFactory where T : new()<br>&nbsp; &nbsp; {<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;public T GetNewItem()<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;{<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;return new T();<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;}<br>&nbsp; &nbsp; }<br>&nbsp; &nbsp; public class Test<br>&nbsp; &nbsp; {<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;public static void Main()<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;{<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;ItemFactory EmployeeFactory = new ItemFactory();
            //若没有则会有The Employee must have a public parameterless constructor 错误。
            Console.WriteLine("{0}'ID is {1}.", EmployeeFactory.GetNewItem().Name, EmployeeFactory.GetNewItem().ID);<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;}<br>&nbsp; &nbsp; }<br>}</pre>
'''
result = re.sub(r'<pre class=\"prettyprint linenums\s*([a-z]*?)\"\s*>(.+?)</pre>', replace_num, my_str,flags=re.I|re.M|re.S)
print(result) #每次匹配一个数字,执行函数,获取替换后的值


推荐阅读
cjavapy编程之路首页