正则表达式速查表：完整指南

什么是正则表达式？

正则表达式（regex 或 regexp）是定义搜索模式的字符序列。它们是文本处理、模式识别和数据提取最强大的工具之一。

为什么使用正则表达式？

正则表达式对以下场景至关重要：

表单验证（电子邮件、电话号码、密码）
数据提取（日志解析、网页抓取）
文本处理（查找和替换、格式化）
代码重构（重命名变量、更新语法）
输入清理（安全性、防止注入攻击）

如何阅读这份速查表

本指南按渐进式部分组织：

基础知识 - 基本语法和模式
高级功能 - 环视断言、命名组、Unicode
特定语言 - Python、JavaScript、PHP、C#、Java、Go、Ruby 示例
VS Code 中的正则 - 查找/替换转换及超过 20 个示例
常用模式 - 可复制粘贴的电子邮件、URL、日期等模式
实用案例 - 真实世界应用
问题排查 - 常见错误和性能提示

每个部分包括：

清晰的解释
可视化示例
可复制的代码片段
专业提示和陷阱

字面字符

最简单的正则表达式是字面字符序列：

abc

匹配："序列 abc 中的 abc"

大小写敏感性

默认情况下，正则表达式区分大小写：

你好 匹配 "你好" 但不匹配 "您好"
使用 i 标志进行不区分大小写匹配：/你好/i

特殊字符（元字符）

这 12 个字符在正则表达式中具有特殊含义，必须用 \ 转义才能进行字面匹配：

. ^ $ * + ? { } [ ] \ | ( )

示例：

\. 匹配字面点号
\$ 匹配美元符号
\( 匹配字面括号

字符	转义	示例	匹配
`.`（点号）	`\.`	`3\.14`	"3.14"
`$`（美元）	`\$`	`\$100`	"$100"
`*`（星号）	`\*`	`a\*b`	"a*b"

字符类

预定义字符类

模式	描述	等价于	示例	匹配
`\d`	任意数字	`[0-9]`	`\d\d`	"42"
`\D`	任意非数字	`[^0-9]`	`\D+`	"abc"
`\w`	单词字符	`[a-zA-Z0-9_]`	`\w+`	"hello_123"
`\W`	非单词字符	`[^a-zA-Z0-9_]`	`\W`	"@"、"#"
`\s`	空白字符	`[ \t\n\r\f\v]`	`\s+`	" "（空格）
`\S`	非空白	`[^ \t\n\r\f\v]`	`\S+`	"你好"
`.`	除换行符外的任意字符	-	`a.c`	"abc"、"a1c"

专业提示： \w 默认不包含 Unicode 字母。使用 \p{L} 支持 Unicode（JavaScript/Python）。

自定义字符类

模式	描述	示例	匹配
`[abc]`	匹配 a、b 或 c 中的任意一个	`[aeiou]`	元音字母："a"、"e"、"i"、"o"、"u"
`[^abc]`	匹配 a、b、c 之外的任意字符	`[^0-9]`	非数字
`[a-z]`	范围：小写字母	`[a-z]+`	"hello"
`[A-Z]`	范围：大写字母	`[A-Z]+`	"HELLO"
`[0-9]`	范围：数字	`[0-9]{4}`	"2025"
`[a-zA-Z]`	组合：所有字母	`[a-zA-Z0-9]`	字母数字

示例：

[aeiou]       → 匹配任意元音
[^aeiou]      → 匹配任意辅音（非元音）
[a-z0-9]      → 匹配小写字母和数字
[a-zA-Z0-9_]  → 等同于 \w（单词字符）

量词

量词指定模式应该匹配多少次。

基本量词

模式	描述	示例	匹配
`*`	0 次或多次	`ab*c`	"ac"、"abc"、"abbc"、"abbbc"
`+`	1 次或多次	`ab+c`	"abc"、"abbc"（不匹配 "ac"）
`?`	0 次或 1 次（可选）	`colou?r`	"color"、"colour"
`{n}`	恰好 n 次	`\d{4}`	"2025"（恰好 4 位数字）
`{n,}`	n 次或更多	`\d{2,}`	"42"、"123"、"9999"
`{n,m}`	n 到 m 次之间	`\d{2,4}`	"42"、"123"、"2025"

贪婪 vs. 懒惰量词

贪婪（默认）：匹配尽可能多

<.*>      → 匹配："<div>你好</div>"（整个字符串）

懒惰（非贪婪）：匹配尽可能少（在量词后添加 ?）

<.*?>     → 匹配："<div>" 和 "</div>" 分别匹配

贪婪	懒惰	描述
`*`	`*?`	0 次或多次（懒惰）
`+`	`+?`	1 次或多次（懒惰）
`?`	`??`	0 次或 1 次（懒惰）
`{n,m}`	`{n,m}?`	n 到 m 次（懒惰）

示例：

文本："你好" 和 "世界"

".*" 匹配："你好" 和 "世界"（贪婪）
".*?" 匹配："你好" 和 "世界" 分别匹配（懒惰）

锚点和边界

锚点匹配位置，而不是字符。

模式	描述	示例	匹配
`^`	字符串/行的开头	`^你好`	"你好世界"（在开头）
`$`	字符串/行的结尾	`世界$`	"你好世界"（在结尾）
`\b`	单词边界	`\b猫\b`	"这只猫在睡觉"（不匹配 "猫咪"）
`\B`	非单词边界	`\B咪`	"猫咪"（咪不在边界）
`\A`	字符串的开头（非行）	`\A你好`	仅当 "你好" 在最开头时匹配
`\z`	字符串的结尾（非行）	`世界\z`	仅当 "世界" 在最末尾时匹配
`\Z`	字符串结尾（最终换行符之前）	`世界\Z`	匹配 "世界" 或 "世界\n"

示例：

^猫$          → 匹配："猫"（整行是 "猫"）
\b猫\b        → 匹配："这只猫在睡觉" 中的 "猫"（完整单词）
\B咪          → 匹配："猫咪" 中的 "咪"（不在边界）

多行模式（m 标志）：

不使用 m：^ 和 $ 匹配整个字符串的开头/结尾
使用 m：^ 和 $ 匹配每行的开头/结尾

分组和选择

捕获组

捕获组 (...) 记住匹配的文本：

(\d+)-(\d+)   → 匹配："123-456"
                组 1："123"
                组 2："456"

反向引用（重用捕获的组）：

(\w)\1        → 匹配："aa"、"bb"、"cc"（重复字符）
(\w+) \1      → 匹配："你好 你好"（重复单词）

非捕获组

当需要分组但不需要捕获时使用 (?:...)：

(?:https?://)  → 分组 "http://" 或 "https://" 但不捕获

为什么使用非捕获？

更好的性能（无内存开销）
更清晰的反向引用（编号组仅计算捕获组）

选择（OR）

使用 | 表示"匹配这个或那个"：

猫|狗          → 匹配："猫" 或 "狗"
gr(e|a)y      → 匹配："grey" 或 "gray"

示例：

(先生|女士|博士)\.?  → 匹配："先生"、"女士"、"博士"
https?://           → 匹配："http://" 或 "https://"

环视断言

环视是零宽度断言，匹配位置（如锚点）但带有条件。

正向先行断言 `(?=...)`

如果前方的模式匹配则匹配（但不消耗它）：

\d(?=px)      → 匹配："10px" 中的 "10"（不包括 "px" 部分）

用例：密码验证

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&]).{8,}$

分解：

(?=.*[A-Z]) - 必须包含大写字母
(?=.*[a-z]) - 必须包含小写字母
(?=.*\d) - 必须包含数字
(?=.*[@$!%*?&]) - 必须包含特殊字符
.{8,} - 至少 8 个字符

负向先行断言 `(?!...)`

如果前方的模式不匹配则匹配：

\d(?!px)      → 匹配："10em" 中的 "10"（不匹配 "10px"）

用例：排除某些单词

\b(?!test)\w+  → 匹配不以 "test" 开头的单词

正向后行断言 `(?<=...)`

如果后方的模式匹配则匹配：

(?<=¥)\d+     → 匹配："¥100" 中的 "100"（不包括 "¥" 部分）

用例：提取价格

(?<=价格：¥)\d+\.\d{2}  → 匹配："价格：¥29.99" 中的 "29.99"

负向后行断言 `(?<!...)`

如果后方的模式不匹配则匹配：

(?<!¥)\d+     → 匹配："100" 但不匹配 "¥100" 中的

摘要表：

类型	语法	描述	示例
正向先行	`(?=...)`	如果后跟...则匹配	`q(?=u)` 匹配 "queen" 中的 "q"
负向先行	`(?!...)`	如果不后跟...则匹配	`q(?!u)` 匹配 "iraq" 中的 "q"
正向后行	`(?<=...)`	如果前面是...则匹配	`(?<=¥)\d+` 匹配 "¥10" 中的 "10"
负向后行	`(?<!...)`	如果前面不是...则匹配	`(?<!¥)\d+` 匹配 "10" 中的 "10"

命名捕获组

命名组 (?<名称>...) 使正则表达式更具可读性：

JavaScript：

const dateRegex = /(?<年>\d{4})-(?<月>\d{2})-(?<日>\d{2})/;
const match = '2025-01-17'.match(dateRegex);

console.log(match.groups.年);   // "2025"
console.log(match.groups.月);   // "01"
console.log(match.groups.日);   // "17"

Python：

import re

pattern = r'(?P<年>\d{4})-(?P<月>\d{2})-(?P<日>\d{2})'
match = re.search(pattern, '2025-01-17')

print(match.group('年'))   # "2025"
print(match.group('月'))   # "01"
print(match.group('日'))   # "17"

C#：

var pattern = @"(?<年>\d{4})-(?<月>\d{2})-(?<日>\d{2})";
var match = Regex.Match("2025-01-17", pattern);

Console.WriteLine(match.Groups["年"].Value);   // "2025"

原子组和占有量词

原子组 `(?>...)`

一旦匹配，该组不会回溯。防止灾难性回溯：

(?>\d+)bar    → 匹配："123bar"（快速）

不使用原子组：

\d+bar        → 尝试："123bar"、"12bar"、"1bar"（不匹配时很慢）

占有量词

贪婪	占有	描述
`*`	`*+`	0 次或多次（无回溯）
`+`	`++`	1 次或多次（无回溯）
`?`	`?+`	0 次或 1 次（无回溯）

用例： 防止复杂模式上的灾难性回溯。

Unicode 支持

现代正则表达式引擎支持 Unicode 类别和脚本。

Unicode 类别 `\p{...}`

JavaScript（ES2018+）：

const letters = /\p{L}+/u;     // 任意字母（任何语言）
const numbers = /\p{N}+/u;     // 任意数字
const currency = /\p{Sc}/u;    // 货币符号

Python：

import regex  # 注意：需要 'regex' 模块，不是 're'

letters = regex.compile(r'\p{L}+')

常见 Unicode 类别：

类别	描述	示例
`\p{L}`	字母	"a"、"字"、"א"
`\p{N}`	数字	"1"、"①"、"一"
`\p{S}`	符号	"$"、"©"、"♥"
`\p{Sc}`	货币符号	"$"、"€"、"¥"
`\p{P}`	标点符号	"."、"!"、"?"
`\p{Z}`	分隔符	空格、制表符

Unicode 脚本：

/\p{Script=Greek}/u     → 匹配希腊字母："α"、"β"、"γ"
/\p{Script=Cyrillic}/u  → 匹配西里尔字母："а"、"б"、"в"
/\p{Script=Han}/u       → 匹配汉字

否定：

/\P{L}+/u   → 匹配任何不是字母的内容

修饰符和标志

标志改变正则表达式模式的解释方式。

标志	名称	描述	示例
`i`	不区分大小写	忽略大小写	`/hello/i` 匹配 "Hello"
`g`	全局	查找所有匹配项	`/cat/g` 找到所有 "cat"
`m`	多行	`^` 和 `$` 匹配行首/行尾	`/^hello/m`
`s`	Dotall	`.` 也匹配换行符	`/a.b/s` 匹配 "a\nb"
`u`	Unicode	启用 Unicode 功能	`/\p{L}+/u`
`x`	扩展	忽略空白（自由间距）	允许注释
`y`	粘性	在精确位置匹配	仅 JavaScript

示例：

不区分大小写（i）：

/你好/i.test('您好')   // false（汉字不同）
/hello/i.test('HELLO') // true

全局（g）：

'猫 狗 猫'.match(/猫/g)   // ["猫", "猫"]

多行（m）：

const text = '第1行\n第2行';
/^第2行/m.test(text)   // true（不使用 'm'：false）

Dotall（s）：

/a.b/s.test('a\nb')   // true（不使用 's'：false）

内联修饰符

将标志应用于模式的部分：

(?i)hello      → "hello" 不区分大小写
(?-i)WORLD     → "WORLD" 区分大小写
(?i:hello)     → 仅 "hello" 不区分大小写

条件模式

语法：(?(条件)真|假)

示例：匹配带引号或不带引号的字符串

("|')?[^"'\r\n]*(?(1)\1)

分解：

("|')? - 可选捕获开始引号
[^"'\r\n]* - 匹配内容
(?(1)\1) - 如果组 1 匹配（开始引号），则匹配相同的结束引号

匹配：

"你好" ✅
'世界' ✅
测试 ✅（无引号）
"混合' ❌（引号不匹配）

正则表达式中的注释

内联注释 `(?# 注释)`

\d{3}(?# 区号)-\d{4}(?# 号码)-\d{4}(?# 分机号）

自由间距模式（`x` 标志）

忽略空白并允许注释：

(?x)
  \d{3}     # 区号
  -         # 分隔符
  \d{4}     # 前四位
  -         # 分隔符
  \d{4}     # 后四位

对于复杂模式更具可读性！

本节演示如何在 7 种流行编程语言中使用正则表达式。每种语言都有自己的正则表达式 API，但模式语法大部分保持一致。

JavaScript / Node.js

创建正则表达式模式

// 字面量表示法（最常见）
const pattern1 = /\d{3}-\d{4}/;

// 构造函数（当模式是动态的时）
const pattern2 = new RegExp('\\d{3}-\\d{4}');
// 注意：字符串中的反斜杠必须转义

// 带标志
const pattern3 = /hello/gi;  // 全局，不区分大小写

字符串方法

// .match() - 查找匹配项
const text = '联系：138-1234-5678 或 139-8765-4321';
const matches = text.match(/\d{3}-\d{4}-\d{4}/g);
console.log(matches);  // ["138-1234-5678", "139-8765-4321"]

// .matchAll() - 获取所有带组的匹配项（ES2020）
const emailPattern = /([\w.-]+)@([\w.-]+\.[a-z]{2,})/gi;
const emails = 'admin@example.com, user@test.org';
for (const match of emails.matchAll(emailPattern)) {
  console.log(`用户：${match[1]}，域名：${match[2]}`);
}
// 用户：admin，域名：example.com
// 用户：user，域名：test.org

// .search() - 查找第一个匹配项的位置
const pos = '你好世界'.search(/世界/);
console.log(pos);  // 2

// .replace() - 替换匹配项
const phone = '(+86) 138 1234 5678';
const cleaned = phone.replace(/[^\d]/g, '');
console.log(cleaned);  // "8613812345678"

// .replaceAll() - 替换所有匹配项（ES2021）
const text2 = '猫 狗 猫';
const result = text2.replaceAll(/猫/g, '鸟');
console.log(result);  // "鸟 狗 鸟"

// .split() - 按模式分割
const csv = '苹果,香蕉, 橙子 , 葡萄';
const fruits = csv.split(/\s*,\s*/);
console.log(fruits);  // ["苹果", "香蕉", "橙子", "葡萄"]

RegExp 方法

// .test() - 返回布尔值
const isEmail = /^[\w.-]+@[\w.-]+\.[a-z]{2,}$/i;
console.log(isEmail.test('user@example.com'));  // true

// .exec() - 返回匹配详情（或 null）
const pattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = pattern.exec('日期：2025-01-17');
if (match) {
  console.log(match[0]);  // "2025-01-17"（完整匹配）
  console.log(match[1]);  // "2025"（组 1）
  console.log(match[2]);  // "01"（组 2）
  console.log(match[3]);  // "17"（组 3）
}

命名组（ES2018+）

const pattern = /(?<年>\d{4})-(?<月>\d{2})-(?<日>\d{2})/;
const match = '2025-01-17'.match(pattern);

console.log(match.groups.年);   // "2025"
console.log(match.groups.月);   // "01"
console.log(match.groups.日);   // "17"

// 命名反向引用
const dupeWord = /\b(?<词>\w+)\s+\k<词>\b/i;
console.log(dupeWord.test('你好 你好'));  // true

Unicode 支持（ES2018+）

// 匹配任意字母（包括重音、中文、阿拉伯文等）
const letters = /\p{L}+/u;
console.log(letters.test('café'));   // true
console.log(letters.test('你好'));    // true

// 匹配表情符号
const emoji = /\p{Emoji}/u;
console.log(emoji.test('你好 👋'));  // true

Python

`re` 模块

import re

# 编译模式（推荐用于重用）
pattern = re.compile(r'\d{3}-\d{4}-\d{4}')

# 或直接使用
re.search(r'\d{3}-\d{4}-\d{4}', '打电话 138-1234-5678')

核心函数

import re

# re.search() - 查找第一个匹配项
match = re.search(r'\d{3}-\d{4}-\d{4}', '联系：138-1234-5678 或 139-8765-4321')
if match:
    print(match.group())  # "138-1234-5678"
    print(match.start())  # 3（位置）
    print(match.end())    # 16

# re.match() - 匹配字符串的开头
match = re.match(r'\d+', '123 南京路')
print(match.group() if match else None)  # "123"

match = re.match(r'\d+', '南京路 123')
print(match)  # None（不以数字开头）

# re.fullmatch() - 匹配整个字符串
result = re.fullmatch(r'\d{3}-\d{4}-\d{4}', '138-1234-5678')
print(bool(result))  # True

result = re.fullmatch(r'\d{3}-\d{4}-\d{4}', '打电话 138-1234-5678')
print(bool(result))  # False（有额外文本）

# re.findall() - 查找所有匹配项（返回列表）
text = '价格：¥10、¥25、¥100'
prices = re.findall(r'¥(\d+)', text)
print(prices)  # ['10', '25', '100']

# re.finditer() - 查找所有匹配项（返回迭代器）
for match in re.finditer(r'¥(\d+)', text):
    print(f'找到 ¥{match.group(1)} 在位置 {match.start()}')
# 找到 ¥10 在位置 3
# 找到 ¥25 在位置 9
# 找到 ¥100 在位置 15

# re.sub() - 替换匹配项
phone = '(+86) 138 1234 5678'
cleaned = re.sub(r'[^\d]', '', phone)
print(cleaned)  # "8613812345678"

# re.split() - 按模式分割
csv = '苹果,香蕉, 橙子 , 葡萄'
fruits = re.split(r'\s*,\s*', csv)
print(fruits)  # ['苹果', '香蕉', '橙子', '葡萄']

组和命名组

import re

# 编号组
pattern = r'(\d{4})-(\d{2})-(\d{2})'
match = re.search(pattern, '日期：2025-01-17')
if match:
    print(match.group(0))  # "2025-01-17"（完整匹配）
    print(match.group(1))  # "2025"
    print(match.group(2))  # "01"
    print(match.group(3))  # "17"
    print(match.groups())  # ('2025', '01', '17')

# 命名组 (?P<名称>...)
pattern = r'(?P<年>\d{4})-(?P<月>\d{2})-(?P<日>\d{2})'
match = re.search(pattern, '2025-01-17')
if match:
    print(match.group('年'))    # "2025"
    print(match.group('月'))    # "01"
    print(match.group('日'))    # "17"
    print(match.groupdict())    # {'年': '2025', '月': '01', '日': '17'}

标志

import re

# 不区分大小写
re.search(r'hello', 'HELLO', re.IGNORECASE)  # 或 re.I

# 多行（^ 和 $ 匹配行首/行尾）
re.search(r'^第2行', '第1行\n第2行', re.MULTILINE)  # 或 re.M

# Dotall（. 匹配换行符）
re.search(r'a.b', 'a\nb', re.DOTALL)  # 或 re.S

# Verbose（带注释的自由间距模式）
pattern = re.compile(r'''
    \d{3}     # 区号
    -         # 分隔符
    \d{4}     # 前四位
    -         # 分隔符
    \d{4}     # 后四位
''', re.VERBOSE)  # 或 re.X

# 组合标志使用 |
pattern = re.compile(r'hello', re.IGNORECASE | re.MULTILINE)

使用函数替换

import re

# 使用函数进行动态替换
def double_number(match):
    num = int(match.group())
    return str(num * 2)

text = '我有 5 个苹果和 10 个橙子'
result = re.sub(r'\d+', double_number, text)
print(result)  # "我有 10 个苹果和 20 个橙子"

# 使用命名组
def format_name(match):
    return f"{match.group('姓').upper()}，{match.group('名')}"

pattern = r'(?P<名>\w+)\s+(?P<姓>\w+)'
text = '张三 李四'
result = re.sub(pattern, format_name, text)
print(result)  # "李四，张三"

PHP

PCRE 函数

<?php

// preg_match() - 查找第一个匹配项
$pattern = '/\d{3}-\d{4}-\d{4}/';
$text = '联系：138-1234-5678 或 139-8765-4321';

if (preg_match($pattern, $text, $matches)) {
    echo $matches[0];  // "138-1234-5678"
}

// preg_match_all() - 查找所有匹配项
preg_match_all('/\d{3}-\d{4}-\d{4}/', $text, $matches);
print_r($matches[0]);  // ["138-1234-5678", "139-8765-4321"]

// preg_replace() - 替换匹配项
$phone = '(+86) 138 1234 5678';
$cleaned = preg_replace('/[^\d]/', '', $phone);
echo $cleaned;  // "8613812345678"

// preg_split() - 按模式分割
$csv = '苹果,香蕉, 橙子 , 葡萄';
$fruits = preg_split('/\s*,\s*/', $csv);
print_r($fruits);  // ["苹果", "香蕉", "橙子", "葡萄"]

// preg_grep() - 按模式过滤数组
$words = ['苹果', '香蕉', '杏', '橙子'];
$aWords = preg_grep('/^杏/', $words);
print_r($aWords);  // ["杏"]
?>

命名组

<?php
$pattern = '/(?P<年>\d{4})-(?P<月>\d{2})-(?P<日>\d{2})/';
$text = '2025-01-17';

if (preg_match($pattern, $text, $matches)) {
    echo $matches['年'];   // "2025"
    echo $matches['月'];   // "01"
    echo $matches['日'];   // "17"
}
?>

修饰符（标志）

<?php
// i - 不区分大小写
preg_match('/hello/i', 'HELLO');  // 匹配

// m - 多行
preg_match('/^第2行/m', "第1行\n第2行");  // 匹配

// s - Dotall（. 匹配换行符）
preg_match('/a.b/s', "a\nb");  // 匹配

// x - 自由间距（忽略空白）
$pattern = '/
    \d{3}     # 区号
    -         # 分隔符
    \d{4}     # 前四位
    -         # 分隔符
    \d{4}     # 后四位
/x';

// u - UTF-8 支持
preg_match('/\w+/u', '咖啡');  // 匹配（包含中文）

// 组合修饰符
preg_match('/hello/imu', $text);
?>

使用回调替换

<?php
$text = '我有 5 个苹果和 10 个橙子';

$result = preg_replace_callback('/\d+/', function($matches) {
    return (int)$matches[0] * 2;
}, $text);

echo $result;  // "我有 10 个苹果和 20 个橙子"
?>

C# (.NET)

Regex 类

using System;
using System.Text.RegularExpressions;

// 静态方法（简单使用）
string text = "联系：138-1234-5678 或 139-8765-4321";
Match match = Regex.Match(text, @"\d{3}-\d{4}-\d{4}");
if (match.Success)
{
    Console.WriteLine(match.Value);  // "138-1234-5678"
}

// 查找所有匹配项
MatchCollection matches = Regex.Matches(text, @"\d{3}-\d{4}-\d{4}");
foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}
// 输出：
// 138-1234-5678
// 139-8765-4321

// 替换
string phone = "(+86) 138 1234 5678";
string cleaned = Regex.Replace(phone, @"[^\d]", "");
Console.WriteLine(cleaned);  // "8613812345678"

// 分割
string csv = "苹果,香蕉, 橙子 , 葡萄";
string[] fruits = Regex.Split(csv, @"\s*,\s*");
// ["苹果", "香蕉", "橙子", "葡萄"]

编译的 Regex（更好的性能）

using System.Text.RegularExpressions;

// 编译以供重用（重复使用时快得多）
Regex pattern = new Regex(@"\d{3}-\d{4}-\d{4}", RegexOptions.Compiled);

string text = "联系：138-1234-5678";
Match match = pattern.Match(text);
if (match.Success)
{
    Console.WriteLine(match.Value);
}

RegexOptions（标志）

using System.Text.RegularExpressions;

// 不区分大小写
Regex.IsMatch("HELLO", "hello", RegexOptions.IgnoreCase);

// 多行
Regex.Match("第1行\n第2行", "^第2行", RegexOptions.Multiline);

// Singleline（. 匹配换行符）
Regex.Match("a\nb", "a.b", RegexOptions.Singleline);

// 编译的（更好的性能）
var pattern = new Regex(@"\d+", RegexOptions.Compiled);

// 组合选项
var opts = RegexOptions.IgnoreCase | RegexOptions.Multiline;
Regex.Match(text, pattern, opts);

命名组

using System;
using System.Text.RegularExpressions;

string pattern = @"(?<年>\d{4})-(?<月>\d{2})-(?<日>\d{2})";
Match match = Regex.Match("2025-01-17", pattern);

if (match.Success)
{
    Console.WriteLine(match.Groups["年"].Value);   // "2025"
    Console.WriteLine(match.Groups["月"].Value);   // "01"
    Console.WriteLine(match.Groups["日"].Value);   // "17"
}

使用 MatchEvaluator 替换

using System;
using System.Text.RegularExpressions;

string text = "我有 5 个苹果和 10 个橙子";

string result = Regex.Replace(text, @"\d+", match =>
{
    int num = int.Parse(match.Value);
    return (num * 2).ToString();
});

Console.WriteLine(result);  // "我有 10 个苹果和 20 个橙子"

Java

Pattern 和 Matcher 类

import java.util.regex.Pattern;
import java.util.regex.Matcher;

// 编译模式
Pattern pattern = Pattern.compile("\\d{3}-\\d{4}-\\d{4}");
String text = "联系：138-1234-5678 或 139-8765-4321";

// 创建匹配器
Matcher matcher = pattern.matcher(text);

// 查找第一个匹配项
if (matcher.find()) {
    System.out.println(matcher.group());  // "138-1234-5678"
}

// 查找所有匹配项
matcher.reset();  // 重置到开头
while (matcher.find()) {
    System.out.println(matcher.group());
}
// 输出：
// 138-1234-5678
// 139-8765-4321

String 方法

// matches() - 检查整个字符串是否匹配
boolean isPhone = "138-1234-5678".matches("\\d{3}-\\d{4}-\\d{4}");
System.out.println(isPhone);  // true

// replaceAll() - 替换所有匹配项
String phone = "(+86) 138 1234 5678";
String cleaned = phone.replaceAll("[^\\d]", "");
System.out.println(cleaned);  // "8613812345678"

// replaceFirst() - 替换第一个匹配项
String text = "猫 狗 猫";
String result = text.replaceFirst("猫", "鸟");
System.out.println(result);  // "鸟 狗 猫"

// split() - 按模式分割
String csv = "苹果,香蕉, 橙子 , 葡萄";
String[] fruits = csv.split("\\s*,\\s*");
// ["苹果", "香蕉", "橙子", "葡萄"]

Pattern 标志

import java.util.regex.Pattern;

// 不区分大小写
Pattern pattern = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);

// 多行
Pattern.compile("^第2行", Pattern.MULTILINE);

// Dotall（. 匹配换行符）
Pattern.compile("a.b", Pattern.DOTALL);

// 注释（自由间距）
Pattern.compile("""
    \\d{3}     # 区号
    -          # 分隔符
    \\d{4}     # 前四位
    -          # 分隔符
    \\d{4}     # 后四位
    """, Pattern.COMMENTS);

// 组合标志
int flags = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE;
Pattern.compile("pattern", flags);

命名组

import java.util.regex.Pattern;
import java.util.regex.Matcher;

Pattern pattern = Pattern.compile("(?<年>\\d{4})-(?<月>\\d{2})-(?<日>\\d{2})");
Matcher matcher = pattern.matcher("2025-01-17");

if (matcher.find()) {
    System.out.println(matcher.group("年"));   // "2025"
    System.out.println(matcher.group("月"));   // "01"
    System.out.println(matcher.group("日"));   // "17"
}

高级替换

import java.util.regex.Pattern;
import java.util.regex.Matcher;

String text = "我有 5 个苹果和 10 个橙子";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(text);

StringBuffer result = new StringBuffer();
while (matcher.find()) {
    int num = Integer.parseInt(matcher.group());
    matcher.appendReplacement(result, String.valueOf(num * 2));
}
matcher.appendTail(result);

System.out.println(result);  // "我有 10 个苹果和 20 个橙子"

Go (Golang)

`regexp` 包

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // 编译模式
    pattern := regexp.MustCompile(`\d{3}-\d{4}-\d{4}`)
    text := "联系：138-1234-5678 或 139-8765-4321"

    // 查找第一个匹配项
    match := pattern.FindString(text)
    fmt.Println(match)  // "138-1234-5678"

    // 查找所有匹配项
    matches := pattern.FindAllString(text, -1)
    fmt.Println(matches)  // [138-1234-5678 139-8765-4321]

    // 检查是否匹配
    isMatch := pattern.MatchString("138-1234-5678")
    fmt.Println(isMatch)  // true

    // 替换全部
    phone := "(+86) 138 1234 5678"
    cleaned := regexp.MustCompile(`[^\d]`).ReplaceAllString(phone, "")
    fmt.Println(cleaned)  // "8613812345678"

    // 分割
    csv := "苹果,香蕉, 橙子 , 葡萄"
    fruits := regexp.MustCompile(`\s*,\s*`).Split(csv, -1)
    fmt.Println(fruits)  // [苹果 香蕉 橙子 葡萄]
}

子匹配（组）

package main

import (
    "fmt"
    "regexp"
)

func main() {
    pattern := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    text := "日期：2025-01-17"

    // FindStringSubmatch 返回 [完整, 组1, 组2, ...]
    matches := pattern.FindStringSubmatch(text)
    if matches != nil {
        fmt.Println(matches[0])  // "2025-01-17"（完整匹配）
        fmt.Println(matches[1])  // "2025"（组 1）
        fmt.Println(matches[2])  // "01"（组 2）
        fmt.Println(matches[3])  // "17"（组 3）
    }

    // FindAllStringSubmatch 用于所有匹配项
    text2 := "日期：2025-01-17 和 2024-12-31"
    allMatches := pattern.FindAllStringSubmatch(text2, -1)
    for _, match := range allMatches {
        fmt.Printf("年：%s，月：%s，日：%s\n", match[1], match[2], match[3])
    }
    // 年：2025，月：01，日：17
    // 年：2024，月：12，日：31
}

命名组

package main

import (
    "fmt"
    "regexp"
)

func main() {
    pattern := regexp.MustCompile(`(?P<年>\d{4})-(?P<月>\d{2})-(?P<日>\d{2})`)
    text := "2025-01-17"

    match := pattern.FindStringSubmatch(text)
    if match != nil {
        // 获取命名组索引
        names := pattern.SubexpNames()
        result := make(map[string]string)
        for i, name := range names {
            if i != 0 && name != "" {
                result[name] = match[i]
            }
        }
        fmt.Println(result["年"])   // "2025"
        fmt.Println(result["月"])   // "01"
        fmt.Println(result["日"])   // "17"
    }
}

使用函数替换

package main

import (
    "fmt"
    "regexp"
    "strconv"
)

func main() {
    text := "我有 5 个苹果和 10 个橙子"
    pattern := regexp.MustCompile(`\d+`)

    result := pattern.ReplaceAllStringFunc(text, func(s string) string {
        num, _ := strconv.Atoi(s)
        return strconv.Itoa(num * 2)
    })

    fmt.Println(result)  // "我有 10 个苹果和 20 个橙子"
}

Ruby

正则表达式字面量

# 字面量表示法
pattern = /\d{3}-\d{4}-\d{4}/

# 带标志
pattern_ci = /hello/i  # 不区分大小写
pattern_multi = /^line/m  # 多行

# 构造函数（用于动态模式）
pattern = Regex.new('\d{3}-\d{4}-\d{4}')

String 方法

text = '联系：138-1234-5678 或 139-8765-4321'

# match() - 返回 MatchData 或 nil
match = text.match(/\d{3}-\d{4}-\d{4}/)
if match
  puts match[0]  # "138-1234-5678"
end

# scan() - 查找所有匹配项
matches = text.scan(/\d{3}-\d{4}-\d{4}/)
puts matches  # ["138-1234-5678", "139-8765-4321"]

# =~ 运算符 - 返回第一个匹配项的索引
index = text =~ /\d{3}-\d{4}-\d{4}/
puts index  # 3

# sub() - 替换第一个匹配项
result = '猫 狗 猫'.sub(/猫/, '鸟')
puts result  # "鸟 狗 猫"

# gsub() - 替换所有匹配项
phone = '(+86) 138 1234 5678'
cleaned = phone.gsub(/[^\d]/, '')
puts cleaned  # "8613812345678"

# split() - 按模式分割
csv = '苹果,香蕉, 橙子 , 葡萄'
fruits = csv.split(/\s*,\s*/)
puts fruits  # ["苹果", "香蕉", "橙子", "葡萄"]

捕获组

pattern = /(\d{4})-(\d{2})-(\d{2})/
match = '2025-01-17'.match(pattern)

if match
  puts match[0]  # "2025-01-17"（完整匹配）
  puts match[1]  # "2025"（组 1）
  puts match[2]  # "01"（组 2）
  puts match[3]  # "17"（组 3）
end

命名组

pattern = /(?<年>\d{4})-(?<月>\d{2})-(?<日>\d{2})/
match = '2025-01-17'.match(pattern)

if match
  puts match[:年]   # "2025"
  puts match[:月]   # "01"
  puts match[:日]   # "17"
end

使用块替换

text = '我有 5 个苹果和 10 个橙子'

result = text.gsub(/\d+/) { |num| (num.to_i * 2).to_s }
puts result  # "我有 10 个苹果和 20 个橙子"

# 使用命名组
pattern = /(?<名>\w+)\s+(?<姓>\w+)/
text = '张三 李四'

result = text.gsub(pattern) do |match|
  m = Regexp.last_match
  "#{m[:姓].upcase}，#{m[:名]}"
end
puts result  # "李四，张三"

标志

# i - 不区分大小写
/hello/i.match('HELLO')  # 匹配

# m - 多行（. 匹配换行符）
/a.b/m.match("a\nb")  # 匹配

# x - 自由间距（忽略空白）
pattern = /
  \d{3}     # 区号
  -         # 分隔符
  \d{4}     # 前四位
  -         # 分隔符
  \d{4}     # 后四位
/x

# o - 编译一次（优化）
pattern = /\d+/o

Visual Studio Code 的查找和替换（Ctrl/Cmd+H）支持具有强大转换功能的正则表达式。本节展示开发人员每天使用的 28 个实用示例。

访问查找和替换

键盘快捷键：

查找：Ctrl+F（Windows/Linux）/ Cmd+F（Mac）
替换：Ctrl+H（Windows/Linux）/ Cmd+H（Mac）
启用正则：点击 .* 按钮或按 Alt+R

提示：

使用 Ctrl+Enter（Cmd+Enter）替换全部
替换前预览匹配（高亮显示为黄色）
使用 F3 / Shift+F3 在匹配项之间导航

大小写转换

VS Code 支持用于大小写转换的特殊替换序列：

序列	效果	示例
`\l`	下一个字符小写	`\l`
`\u`	下一个字符大写	`\u`
`\L`	所有后续字符小写	`\L`
`\U`	所有后续字符大写	`\U`
`\E`	结束大小写转换	`\U\E`

示例 1：首字母大写

查找：

\b(\w)(\w*)

替换：

\u$1$2

之前：

hello world

之后：

Hello World

本节提供超过 25 个即用型正则表达式模式。

电子邮件验证

^[\w.-]+@[\w.-]+\.[a-z]{2,}$

电话号码（中国）

^(\+?86)?[-.\s]?1[3-9]\d{9}$

日期（ISO 8601）

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

URL

^https?://(?:www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b

IPv4 地址

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

十六进制颜色

^#?([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

强密码

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

用户名（3-16 字符）

^[a-zA-Z0-9_-]{3,16}$

UUID v4

^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$

还有其他 15+ 个带完整文档的模式！

开发中的真实世界正则表达式应用。

表单验证

const validators = {
  email: /^[\w.-]+@[\w.-]+\.[a-z]{2,}$/i,
  phone: /^1[3-9]\d{9}$/
};

数据提取

从文本中提取电子邮件：

import re
emails = re.findall(r'[\w.-]+@[\w.-]+\.[a-z]{2,}', text)

日志解析

pattern = r'(?P<ip>[\d.]+).+(?P<status>\d{3})'

代码重构

在 VS Code 中转换旧的 API 调用：

查找： apiClient\.get$'([^']+)'$
替换： fetch('$1').then(r => r.json())

常见错误

1. 忘记转义特殊字符

❌ 错误：file.txt
✅ 正确：file\.txt

2. 贪婪 vs. 懒惰

❌ 贪婪：<.*> 匹配整个 <div>文本</div>
✅ 懒惰：<.*?> 分别匹配 <div> 和 </div>

3. 不使用锚点

❌ /\d{3}/ 匹配 "abc123def" 中的 "123"
✅ /^\d{3}$/ 仅匹配恰好 "123"

性能提示

使用特定字符类而不是 .
尽可能使用锚点
避免嵌套量词（灾难性回溯）
使用原子组提高性能
编译模式以供重用

调试

在 regex101.com 上测试
使用带注释的 verbose 模式
将复杂模式分解为部分
测试边缘情况

在线工具

正则测试器

regex101.com - 最佳测试器，带说明
regexr.com - 可视化正则构建器
regexpal.com - 简单快速测试

可视化工具

debuggex.com - 铁路图
regexper.com - 可视化工具

学习资源

regexone.com - 交互式课程
regexlearn.com - 分步指南
regular-expressions.info - 文档

IDE 扩展

Regex Previewer（VS Code）
Regex Tester（VS Code）

字符类

模式	匹配
`\d`	数字 [0-9]
`\w`	单词 [a-zA-Z0-9_]
`\s`	空白
`.`	任意字符

量词

模式	含义
`*`	0 次或多次
`+`	1 次或多次
`?`	0 次或 1 次
`{n}`	恰好 n 次

锚点

模式	含义
`^`	行首
`$`	行尾
`\b`	单词边界

标志

标志	含义
`i`	不区分大小写
`g`	全局
`m`	多行
`s`	Dotall

一般问题

问：贪婪量词和懒惰量词有什么区别？

答：贪婪匹配尽可能多。懒惰（*?、+?）匹配尽可能少。

问：如何匹配字面点号？

答：用反斜杠转义：\.

问：什么是灾难性回溯？

答：当正则尝试许多组合时，导致速度变慢。避免嵌套量词，如 (a+)+。

问：正则能完美验证电子邮件吗？

答：不能。使用正则进行基本格式检查，然后通过电子邮件验证。

问：如何跨多行匹配？

答：使用 s 标志，或使用 [\s\S]* 代替 .*。

VS Code 特定

问：如何在 VS Code 中替换为大写？

答：使用 \u（下一个字符大写）、\U（全部大写）。

问：我可以在 VS Code 文件搜索中使用正则吗？

答：可以！按 Ctrl+Shift+F 并启用正则（Alt+R）。

Regex Data Extractor Chrome 扩展

我们的 Regex Data Extractor 帮助您使用本指南中的模式从网页中提取数据。

核心功能

模式库：预构建模式
实时测试：在任何网页上测试正则
多格式导出：CSV、JSON、Excel、PDF
批量提取：从多个页面提取

示例：提取电子邮件

安装 Regex Data Extractor
导航到任何网页
点击扩展图标
输入模式：[\w.-]+@[\w.-]+\.[a-z]{2,}
点击"提取"
导出为 CSV/JSON

示例：提取价格

模式：¥([0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?)
捕获：¥1,234.56、¥99.99

专业提示

保存常用模式
使用命名组获取结构化数据
先测试模式
导出进行数据分析

获取 Regex Data Extractor →

什么是正则表达式？

为什么使用正则表达式？

如何阅读这份速查表

字面字符

大小写敏感性

特殊字符（元字符）

字符类

预定义字符类

自定义字符类

量词

基本量词

贪婪 vs. 懒惰量词

锚点和边界

分组和选择

捕获组

非捕获组

选择（OR）

环视断言

正向先行断言 (?=...)

负向先行断言 (?!...)

正向后行断言 (?<=...)

负向后行断言 (?<!...)

命名捕获组

原子组和占有量词

原子组 (?>...)

占有量词

Unicode 支持

Unicode 类别 \p{...}

常见 Unicode 类别：

Unicode 脚本：

否定：

修饰符和标志

内联修饰符

条件模式

正则表达式中的注释

内联注释 (?# 注释)

自由间距模式（x 标志）

JavaScript / Node.js

创建正则表达式模式

字符串方法

RegExp 方法

命名组（ES2018+）

Unicode 支持（ES2018+）

Python

re 模块

核心函数

组和命名组

标志

使用函数替换

PHP

PCRE 函数

命名组

修饰符（标志）

使用回调替换

C# (.NET)

Regex 类

编译的 Regex（更好的性能）

RegexOptions（标志）

命名组

使用 MatchEvaluator 替换

Java

Pattern 和 Matcher 类

String 方法

Pattern 标志

命名组

高级替换

Go (Golang)

regexp 包

子匹配（组）

命名组

使用函数替换

Ruby

正则表达式字面量

String 方法

捕获组

命名组

使用块替换

标志

访问查找和替换

大小写转换

正向先行断言 `(?=...)`

负向先行断言 `(?!...)`

正向后行断言 `(?<=...)`

负向后行断言 `(?<!...)`

原子组 `(?>...)`

Unicode 类别 `\p{...}`

内联注释 `(?# 注释)`

自由间距模式（`x` 标志）

`re` 模块

`regexp` 包