PowerShell 技能连载 - 正则表达式高级技巧

适用于 PowerShell 5.1 及以上版本

正则表达式是文本处理的瑞士军刀,PowerShell 通过 -match-replace 运算符和 [regex] 类提供了丰富的正则支持。日常运维中,日志解析、数据提取、配置校验、文件重命名等场景都离不开正则。掌握高级正则技巧,可以让原本需要多步处理的文本操作浓缩到一条表达式中。

本文将介绍命名捕获组、零宽断言、正则表达式编译优化,以及实用的文本处理模式。

命名捕获组与匹配结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# 使用命名捕获组提取结构化数据
$logLine = '2025-08-27 14:30:15 [ERROR] [SRV01] Connection timeout to db.prod.local:5432'

$pattern = '^(?<Date>\d{4}-\d{2}-\d{2})\s+(?<Time>\d{2}:\d{2}:\d{2})\s+\[(?<Level>\w+)\]\s+\[(?<Server>\w+)\]\s+(?<Message>.+)$'

if ($logLine -match $pattern) {
$Matches.Date
$Matches.Time
$Matches.Level
$Matches.Server
$Matches.Message

# 转为自定义对象
$logEntry = [PSCustomObject]@{
Date = $Matches.Date
Time = $Matches.Time
Level = $Matches.Level
Server = $Matches.Server
Message = $Matches.Message
}
$logEntry | Format-List
}

# 批量解析日志文件
$logContent = @'
2025-08-27 14:30:15 [ERROR] [SRV01] Connection timeout to db.prod.local:5432
2025-08-27 14:31:02 [WARN] [SRV02] Disk usage at 85%
2025-08-27 14:32:10 [INFO] [SRV01] Backup completed successfully
2025-08-27 14:33:45 [ERROR] [SRV03] Service IIS crashed
2025-08-27 14:35:00 [INFO] [SRV02] User login: admin
'@

$pattern = '^(?<Date>\S+)\s+(?<Time>\S+)\s+\[(?<Level>\w+)\]\s+\[(?<Server>\w+)\]\s+(?<Message>.+)$'

$entries = $logContent -split "`n" | ForEach-Object {
if ($_ -match $pattern) {
[PSCustomObject]@{
Date = $Matches.Date
Time = $Matches.Time
Level = $Matches.Level
Server = $Matches.Server
Message = $Matches.Message.Trim()
}
}
}

$entries | Where-Object { $_.Level -eq 'ERROR' } | Format-Table -AutoSize

执行结果示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Date       Time
---- ----
2025-08-27 14:30:15

Date : 2025-08-27
Time : 14:30:15
Level : ERROR
Server : SRV01
Message : Connection timeout to db.prod.local:5432

Date Time Level Server Message
---- ---- ----- ------ -------
2025-08-27 14:30:15 ERROR SRV01 Connection timeout to db.prod.local:5432
2025-08-27 14:33:45 ERROR SRV03 Service IIS crashed

零宽断言与精确提取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# 正向先行断言 (?=...):匹配后面跟着特定内容的位
# 提取 URL 中的域名
$url = 'https://blog.vichamp.com/2025/08/powershell-tips/'
if ($url -match 'https?://(?<Domain>[^/]+)') {
Write-Host "域名:$($Matches.Domain)"
}

# 正向后行断言 (?<=...):匹配前面是特定内容的位
# 提取 JSON 字符串中的键值
$jsonText = '{"name":"MyApp","version":"3.2.1","port":8080}'
keyValuePairs = [regex]::Matches($jsonText, '(?<=")(\w+)":\s*"?([^",{}]+)"?')
foreach ($match in $keyValuePairs) {
Write-Host "键:$($match.Groups[1].Value) 值:$($match.Groups[2].Value)"
}

# 负向断言:匹配不以特定内容开头的行
$lines = @(
'# 这是注释'
'server = prod-db01'
'# 另一条注释'
'port = 5432'
'timeout = 30'
)

# 非注释的配置行
$configLines = $lines | Where-Object { $_ -match '^(?!\s*#)(?<Key>\w+)\s*=\s*(?<Value>.+)$' }
foreach ($line in $configLines) {
if ($line -match '^(?!\s*#)(?<Key>\w+)\s*=\s*(?<Value>.+)$') {
Write-Host "$($Matches.Key) => $($Matches.Value.Trim())"
}
}

# 使用 [regex] 类进行精确替换
$template = 'Hello {name}, your order {orderId} has been shipped to {city}.'

$replacements = @{
name = '张三'
orderId = 'ORD-20250827-001'
city = '北京'
}

# 使用 MatchEvaluator 动态替换
$result = [regex]::Replace($template, '\{(\w+)\}', {
param($match)
$key = $match.Groups[1].Value
if ($replacements.ContainsKey($key)) {
$replacements[$key]
} else {
$match.Value
}
})

Write-Host $result

执行结果示例:

1
2
3
4
5
6
7
8
域名:blog.vichamp.com
键:name 值:MyApp
键:version 值:3.2.1
键:port 值:8080
server => prod-db01
port => 5432
timeout => 30
Hello 张三, your order ORD-20250827-001 has been shipped to 北京.

正则表达式编译与性能优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# 编译正则表达式提升重复匹配性能
$patterns = @{
IPv4 = '^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$'
Email = '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
DateTime = '^\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}(:\d{2})?(\.\d+)?$'
URL = '^https?://[^\s/$.?#].[^\s]*$'
GUID = '^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$'
}

# 预编译所有正则
$compiled = @{}
foreach ($entry in $patterns.GetEnumerator()) {
$compiled[$entry.Key] = [regex]::new($entry.Value, 'Compiled, IgnoreCase')
}

# 验证函数
function Test-Format {
param(
[Parameter(Mandatory)]
[string]$Value,

[Parameter(Mandatory)]
[ValidateSet('IPv4', 'Email', 'DateTime', 'URL', 'GUID')]
[string]$Format
)

$regex = $compiled[$Format]
return $regex.IsMatch($Value)
}

# 批量验证测试
$testCases = @(
@{ Value = '192.168.1.100'; Format = 'IPv4' }
@{ Value = '999.999.999.999'; Format = 'IPv4' }
@{ Value = 'admin@example.com'; Format = 'Email' }
@{ Value = 'not-an-email'; Format = 'Email' }
@{ Value = '2025-08-27T14:30:00'; Format = 'DateTime' }
@{ Value = 'https://blog.vichamp.com'; Format = 'URL' }
@{ Value = 'a3e1f2b4-5c6d-7e8f-9a0b-1c2d3e4f5a6b'; Format = 'GUID' }
@{ Value = 'not-a-guid'; Format = 'GUID' }
)

foreach ($case in $testCases) {
$isValid = Test-Format -Value $case.Value -Format $case.Format
$status = if ($isValid) { "有效" } else { "无效" }
Write-Host ("{0,-40} [{1}] {2}" -f $case.Value, $case.Format, $status)
}

# 性能对比:编译 vs 非编译
$sampleText = "User admin (admin@example.com) logged in from 192.168.1.100 at 2025-08-27T14:30:00"
$emailPattern = '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

# 非编译模式
$sw = [System.Diagnostics.Stopwatch]::StartNew()
for ($i = 0; $i -lt 10000; $i++) {
$sampleText -match $emailPattern | Out-Null
}
$sw.Stop()
Write-Host "`n非编译模式 10000 次:$($sw.ElapsedMilliseconds) ms"

# 编译模式
$compiledRegex = [regex]::new($emailPattern, 'Compiled')
$sw.Restart()
for ($i = 0; $i -lt 10000; $i++) {
$compiledRegex.IsMatch($sampleText) | Out-Null
}
$sw.Stop()
Write-Host "编译模式 10000 次:$($sw.ElapsedMilliseconds) ms"

执行结果示例:

1
2
3
4
5
6
7
8
9
10
11
192.168.1.100                            [IPv4] 有效
999.999.999.999 [IPv4] 无效
admin@example.com [Email] 有效
not-an-email [Email] 无效
2025-08-27T14:30:00 [DateTime] 有效
https://blog.vichamp.com [URL] 有效
a3e1f2b4-5c6d-7e8f-9a0b-1c2d3e4f5a6b [GUID] 有效
not-a-guid [GUID] 无效

非编译模式 10000 次:128 ms
编译模式 10000 次:34 ms

实用文本处理模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# 1. 提取 CSV 中的引号字段(处理内嵌逗号)
$csvLine = '张三,"工程部,高级工程师",北京,100001'
$fields = [regex]::Matches($csvLine, '(?<=^|,)(?:"(?<q>[^"]*)"|(?<n>[^,]*))')

$values = foreach ($f in $fields) {
if ($f.Groups['q'].Success) { $f.Groups['q'].Value }
else { $f.Groups['n'].Value }
}
Write-Host "CSV 字段:$($values -join ' | ')"

# 2. 清理多余空白
$messy = ' Hello World this is a test '
$clean = $messy -replace '\s+', ' ' -replace '^\s+|\s+$', ''
Write-Host "清理前:[$messy]"
Write-Host "清理后:[$clean]"

# 3. 文件名安全化
$unsafeNames = @(
'Report: Q3/2025 <Final>.xlsx'
'Notes (draft #2).txt'
'Data | Backup & Archive.csv'
'配置文件 - 生产环境.json'
)

$safeNames = $unsafeNames | ForEach-Object {
$safe = $_ -replace '[\\/:*?"<>|]', '_'
$safe = $safe -replace '\s+', ' '
$safe = $safe.Trim()
[PSCustomObject]@{
Original = $_
Safe = $safe
}
}
$safeNames | Format-Table -AutoSize

# 4. 批量重命名文件(基于正则提取)
function Invoke-RegexRename {
param(
[string]$Path = ".",
[string]$Pattern,
[string]$Replace,
[switch]$WhatIf
)

$files = Get-ChildItem $Path -File
$renamed = 0

foreach ($file in $files) {
$newName = $file.Name -replace $Pattern, $Replace
if ($newName -ne $file.Name) {
Write-Host "$($file.Name) -> $newName"
if (-not $WhatIf) {
Rename-Item $file.FullName -NewName $newName
}
$renamed++
}
}

Write-Host "`n共处理 $renamed 个文件" -ForegroundColor Green
}

# 示例:将 IMG_20250827_143015.jpg 重命名为 2025-08-27_14-30-15.jpg
# Invoke-RegexRename -Pattern 'IMG_(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})' `
# -Replace '$1-$2-$3_$4-$5-$6' -WhatIf

# 5. 日志时间范围过滤
function Select-LogByTimeRange {
param(
[string]$LogPath,
[datetime]$Start,
[datetime]$End,
[string]$TimePattern = '(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})'
)

$regex = [regex]::new($TimePattern, 'Compiled')
$lines = Get-Content $LogPath

$filtered = foreach ($line in $lines) {
$match = $regex.Match($line)
if ($match.Success) {
$timestamp = [datetime]::ParseExact($match.Groups[1].Value, 'yyyy-MM-dd HH:mm:ss', $null)
if ($timestamp -ge $Start -and $timestamp -le $End) {
$line
}
}
}

Write-Host "时间范围 $Start ~ $End:共 $($filtered.Count) 条" -ForegroundColor Cyan
return $filtered
}

执行结果示例:

1
2
3
4
5
6
7
8
9
CSV 字段:张三 | 工程部,高级工程师 | 北京 | 100001
清理前:[ Hello World this is a test ]
清理后:[Hello World this is a test]
Original Safe
-------- ----
Report: Q3/2025 <Final>.xlsx Report_ Q3_2025 _Final_.xlsx
Notes (draft #2).txt Notes (draft #2).txt
Data | Backup & Archive.csv Data _ Backup & Archive.csv
配置文件 - 生产环境.json 配置文件 - 生产环境.json

注意事项

  1. 贪婪 vs 非贪婪:默认量词(*+)是贪婪的,会匹配尽可能多的字符,使用 *?+? 切换为非贪婪模式
  2. 性能陷阱:复杂的嵌套量词可能导致灾难性回溯,对大文本使用 (?= 断言)或预编译正则
  3. 字符转义:在 PowerShell 字符串中,反斜杠需要双写或使用单引号字符串避免转义冲突
  4. Unicode 支持\w 在 .NET 中匹配 Unicode 字符(包括中文),如果只匹配 ASCII 请使用 [a-zA-Z0-9_]
  5. RegexOptions:常用的有 IgnoreCase(忽略大小写)、Multiline^$ 匹配行首行尾)、Singleline. 匹配换行)
  6. 测试工具:推荐使用 regex101.com 在线测试正则表达式,支持 .NET 风格语法高亮

PowerShell 技能连载 - 正则表达式深度应用

适用于 PowerShell 5.1 及以上版本

正则表达式是文本处理的终极武器——从日志分析、数据提取到输入验证,几乎所有文本处理场景都离不开正则。PowerShell 基于 .NET 的正则引擎,支持完整的正则语法,包括命名捕获组、零宽断言、平衡组等高级特性。掌握正则表达式可以大幅减少文本处理代码量,将几十行的字符串操作压缩为一行模式匹配。

本文将深入讲解 PowerShell 中的正则表达式应用,包括常用模式、高级特性和性能优化。

基础模式匹配

PowerShell 中使用正则的主要方式有:-match 运算符、-replace 运算符和 Select-String 命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# -match 运算符(自动填充 $Matches 变量)
$text = "用户 admin 于 2025-06-10 08:30:15 登录"
if ($text -match '用户 (\w+) 于 (\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2})') {
Write-Host "用户名:$($Matches[1])"
Write-Host "日期:$($Matches[2])"
Write-Host "时间:$($Matches[3])"
}

# -replace 运算符(替换匹配文本)
$phone = "手机号:13812345678"
$masked = $phone -replace '(\d{3})\d{4}(\d{4})', '$1****$2'
Write-Host "脱敏后:$masked"

# 批量数据清洗
$logs = @(
"192.168.1.100 - - [10/Jun/2025:08:30:15 +0800] GET /api/users 200 1234"
"10.0.0.55 - - [10/Jun/2025:08:30:22 +0800] POST /api/login 401 56"
"172.16.0.10 - - [10/Jun/2025:08:31:01 +0800] GET /static/css 304 0"
)

$pattern = '^(\S+)\s.*\[(.+?)\]\s+(\w+)\s+(\S+)\s+(\d+)\s+(\d+)'
foreach ($line in $logs) {
if ($line -match $pattern) {
[PSCustomObject]@{
IP = $Matches[1]
Time = $Matches[2]
Method = $Matches[3]
Path = $Matches[4]
Status = $Matches[5]
Size = $Matches[6]
}
}
} | Format-Table -AutoSize

执行结果示例:

1
2
3
4
5
6
7
8
9
10
11
用户名:admin
日期:2025-06-10
时间:08:30:15

脱敏后:手机号:138****5678

IP Time Method Path Status Size
-- ---- ------ ---- ------ ----
192.168.1.100 10/Jun/2025:08:30:15 +0800 GET /api/users 200 1234
10.0.0.55 10/Jun/2025:08:30:22 +0800 POST /api/login 401 56
172.16.0.10 10/Jun/2025:08:31:01 +0800 GET /static/css 304 0

命名捕获组

命名捕获组使正则表达式更可读,通过名称而非数字引用匹配结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 使用命名捕获组 (?<name>pattern)
$logLine = '2025-06-10 08:30:15 ERROR [Database] Connection timeout after 30s'
$pattern = '(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(?<level>\w+)\s+\[(?<module>\w+)\]\s+(?<message>.+)'

if ($logLine -match $pattern) {
Write-Host "时间:$($Matches.timestamp)"
Write-Host "级别:$($Matches.level)"
Write-Host "模块:$($Matches.module)"
Write-Host "消息:$($Matches.message)"
}

# 使用 [regex] 类获得更强的控制
$regex = [regex]::new($pattern)
$match = $regex.Match($logLine)

if ($match.Success) {
foreach ($name in @('timestamp', 'level', 'module', 'message')) {
Write-Host "$name = $($match.Groups[$name].Value)"
}
}

# 批量提取日志中的结构化数据
$structuredLogs = Get-Content "C:\Logs\app.log" -Tail 100 | ForEach-Object {
if ($_ -match $pattern) {
[PSCustomObject]@{
Timestamp = $Matches.timestamp
Level = $Matches.level
Module = $Matches.module
Message = $Matches.message
}
}
}

$structuredLogs | Where-Object { $_.Level -eq 'ERROR' } |
Format-Table -AutoSize

执行结果示例:

1
2
3
4
5
6
7
8
9
时间:2025-06-10 08:30:15
级别:ERROR
模块:Database
消息:Connection timeout after 30s

Timestamp Level Module Message
--------- ----- ------ -------
2025-06-10 08:30:15 ERROR Database Connection timeout after 30s
2025-06-10 08:35:22 ERROR API Upstream timeout (504)

常用正则模式库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 常用验证模式
$patterns = @{
IPv4 = '^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$'
Email = '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
URL = '^https?://[\w\-]+(\.[\w\-]+)+[/#?]?.*$'
GUID = '^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$'
Chinese = '[一-鿿]+'
Phone = '^1[3-9]\d{9}$'
DateISO = '^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$'
HexColor = '^#(?:[0-9a-fA-F]{3}){1,2}$'
SemVer = '^\d+\.\d+\.\d+(?:-[a-zA-Z0-9.]+)?(?:\+[a-zA-Z0-9.]+)?$'
MAC = '^[0-9A-Fa-f]{2}(?::[0-9A-Fa-f]{2}){5}$'
}

# 验证函数
function Test-Pattern {
param([string]$InputObject, [string]$Pattern)

$regex = $patterns[$Pattern]
if (-not $regex) {
Write-Error "未知模式:$Pattern"
return $false
}
return $InputObject -match $regex
}

# 测试
Test-Pattern -InputObject "192.168.1.100" -Pattern "IPv4" # True
Test-Pattern -InputObject "not-an-ip" -Pattern "IPv4" # False
Test-Pattern -InputObject "admin@contoso.com" -Pattern "Email" # True
Test-Pattern -InputObject "1.2.3" -Pattern "SemVer" # True

执行结果示例:

1
2
3
4
True
False
True
True

高级替换技巧

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 使用脚本块进行条件替换
$text = "订单号 ORD-2025-001,金额 ¥1234.56,客户 张三"
$result = $text -replace '(\d{4})', {
param($match)
$year = $match.Groups[1].Value
if ([int]$year -gt 2020) { "[$year]" } else { $year }
}
Write-Host $result

# 使用 MatchEvaluator 进行复杂替换
$names = "john doe, JANE smith, bob wilson"
$capitalized = [regex]::Replace($names, '\b(\w)(\w+)', {
param($m)
$m.Groups[1].Value.ToUpper() + $m.Groups[2].Value.ToLower()
})
Write-Host $capitalized

# 模板变量替换
$template = "亲爱的 {{name}},您的订单 {{orderId}} 已发货,预计 {{date}} 送达。"
$data = @{ name = '张三'; orderId = 'ORD-2025-001'; date = '2025-06-12' }

$result = [regex]::Replace($template, '\{\{(\w+)\}\}', {
param($m)
$key = $m.Groups[1].Value
$data[$key] ?? "{{$key}}"
})
Write-Host $result

执行结果示例:

1
2
3
订单号 ORD-[2025]-001,金额 1234.56,客户 张三
John Doe, Jane Smith, Bob Wilson
亲爱的 张三,您的订单 ORD-2025-001 已发货,预计 2025-06-12 送达。

注意事项

  1. 性能优先:对大文本量操作时,预编译正则 [regex]::new($pattern, 'Compiled') 可以提升性能
  2. 贪婪 vs 非贪婪:默认贪婪匹配(.* 匹配尽可能多),使用 .*? 进行非贪婪匹配
  3. 转义特殊字符:使用 [regex]::Escape($text) 转义正则特殊字符,避免用户输入破坏正则
  4. 多行模式:处理多行文本时使用 (?m) 标志使 ^$ 匹配行首行尾
  5. Unicode 支持:PowerShell 正则支持 Unicode 类别,如 \p{L} 匹配任何语言的字母
  6. 测试工具:推荐使用 regex101.com 在线测试正则表达式,支持实时匹配和解释