适用于 PowerShell 5.1 及以上版本
PowerShell 的便利性往往以性能为代价——管道对象传递、灵活的类型转换、丰富的 .NET 集成,这些特性在处理小规模数据时非常方便,但面对大量数据(数万行 CSV、上千个文件、数百台服务器)时,性能瓶颈会非常明显。理解 PowerShell 的性能特征并掌握优化技巧,可以将脚本执行时间从数小时缩短到数秒。
本文将讲解常见的性能陷阱、优化技巧、内存管理策略,以及如何度量和对比脚本性能。
性能度量
优化之前先度量。PowerShell 提供了多种性能测量工具:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
| Measure-Command { Get-ChildItem C:\ -Recurse -File | Where-Object { $_.Extension -eq '.log' } } | Select-Object TotalSeconds, TotalMilliseconds
$sw = [System.Diagnostics.Stopwatch]::StartNew()
1..10000 | ForEach-Object { $_ * 2 }
$sw.Stop() Write-Host "耗时:$($sw.Elapsed.TotalMilliseconds) 毫秒"
$iterations = 10000
$results = @( @{ Name = 'ForEach-Object(管道)' Time = (Measure-Command { 1..$iterations | ForEach-Object { $_ * 2 } }).TotalMilliseconds } @{ Name = 'foreach 语句' Time = (Measure-Command { foreach ($i in 1..$iterations) { $i * 2 } }).TotalMilliseconds } @{ Name = 'LINQ' Time = (Measure-Command { [System.Linq.Enumerable]::Range(1, $iterations) | ForEach-Object { $_ * 2 } }).TotalMilliseconds } )
$results | ForEach-Object { [PSCustomObject]@{ 方法 = $_.Name 耗时ms = [math]::Round($_.Time, 2) } } | Sort-Object 耗时ms | Format-Table -AutoSize
|
执行结果示例:
1 2 3 4 5 6 7 8 9 10 11
| TotalSeconds TotalMilliseconds
2.34 2345.67
耗时:15.23 毫秒
方法 耗时ms
foreach 语句 12.34 LINQ 45.67 ForEach-Object(管道) 234.56
|
注意:foreach 语句通常比 ForEach-Object 快 10-20 倍,因为后者需要经过完整的管道处理。在性能敏感的场景中优先使用 foreach 语句。
字符串拼接优化
字符串操作是 PowerShell 中最常见的性能陷阱之一:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| Measure-Command { $result = "" 1..10000 | ForEach-Object { $result += "Line $_`n" } } | Select-Object TotalMilliseconds
Measure-Command { $sb = [System.Text.StringBuilder]::new() 1..10000 | ForEach-Object { [void]$sb.AppendLine("Line $_") } $result = $sb.ToString() } | Select-Object TotalMilliseconds
Measure-Command { $lines = 1..10000 | ForEach-Object { "Line $_" } $result = $lines -join "`n" } | Select-Object TotalMilliseconds
Measure-Command { $result = 1..10000 | ForEach-Object { "Line $_" } | Join-String -Separator "`n" } | Select-Object TotalMilliseconds
|
执行结果示例:
1 2 3 4 5 6
| TotalMilliseconds ----------------- 2345.67 # += 45.23 # StringBuilder 38.45 # 数组 + -join 32.12 # Join-String
|
集合操作优化
处理大量数据时,集合的选择至关重要:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| Measure-Command { $array = @() 1..5000 | ForEach-Object { $array += "Item-$_" } } | Select-Object TotalMilliseconds
Measure-Command { $list = [System.Collections.Generic.List[string]]::new() 1..5000 | ForEach-Object { $list.Add("Item-$_") } $result = $list.ToArray() } | Select-Object TotalMilliseconds
$data = 1..10000 | ForEach-Object { @{ Id = $_; Name = "Item-$_" } }
Measure-Command { $data | Where-Object { $_.Id -eq 5000 } } | Select-Object TotalMilliseconds
$lookup = @{} $data | ForEach-Object { $lookup[$_.Id] = $_ }
Measure-Command { $lookup[5000] } | Select-Object TotalMilliseconds
|
执行结果示例:
1 2 3 4 5 6 7 8 9 10
| TotalMilliseconds ----------------- 1234.56 # 数组 += 12.34 # List<T>
# Where-Object 过滤 TotalMilliseconds: 45.67
# 哈希表查找 TotalMilliseconds: 0.12
|
文件处理优化
处理大量文件时,I/O 操作往往是最大的性能瓶颈:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
| Measure-Command { $lines = @() Get-Content "C:\Logs\large-app.log" | ForEach-Object { if ($_ -match 'ERROR') { $lines += $_ } } } | Select-Object TotalMilliseconds
Measure-Command { $lines = Select-String -Path "C:\Logs\large-app.log" -Pattern 'ERROR' } | Select-Object TotalMilliseconds
Measure-Command { $lines = Get-Content "C:\Logs\large-app.log" -ReadCount 5000 | ForEach-Object { $_ | Where-Object { $_ -match 'ERROR' } } } | Select-Object TotalMilliseconds
Measure-Command { $reader = [System.IO.StreamReader]::new("C:\Logs\large-app.log") $errors = while (-not $reader.EndOfStream) { $line = $reader.ReadLine() if ($line -match 'ERROR') { $line } } $reader.Close() } | Select-Object TotalMilliseconds
Measure-Command { Import-Csv "C:\Data\large.csv" | Where-Object { $_.Status -eq 'Active' } } | Select-Object TotalMilliseconds
Measure-Command { $reader = [System.IO.StreamReader]::new("C:\Data\large.csv") $header = $reader.ReadLine() -split ',' $active = while (-not $reader.EndOfStream) { $values = $reader.ReadLine() -split ',' if ($values[3] -eq 'Active') { $obj = [ordered]@{} for ($i = 0; $i -lt $header.Count; $i++) { $obj[$header[$i]] = $values[$i] } [PSCustomObject]$obj } } $reader.Close() } | Select-Object TotalMilliseconds
|
执行结果示例:
1 2 3 4 5 6
| TotalMilliseconds ----------------- 5678.90 # 逐行 += 拼接 123.45 # Select-String 89.23 # ReadCount 分批 34.56 # StreamReader
|
内存管理
处理大量数据时,内存管理同样重要:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
| $proc = Get-Process -Id $PID Write-Host "工作集:$([math]::Round($proc.WorkingSet64/1MB, 2)) MB" Write-Host "私有内存:$([math]::Round($proc.PrivateMemorySize64/1MB, 2)) MB"
[System.GC]::GetTotalMemory($false) / 1MB
[System.GC]::Collect() [System.GC]::WaitForPendingFinalizers() [System.GC]::Collect()
Write-Host "GC 后内存:$([math]::Round([System.GC]::GetTotalMemory($true)/1MB, 2)) MB"
$list = [System.Collections.Generic.List[PSObject]]::new(10000)
Measure-Command { using ($reader = [System.IO.StreamReader]::new("C:\Logs\large-app.log")) { while (-not $reader.EndOfStream) { $line = $reader.ReadLine() } } } | Select-Object TotalMilliseconds
function Process-LargeFile { param([string]$Path, [int]$ChunkSize = 10000)
$reader = [System.IO.StreamReader]::new($Path) $chunk = [System.Collections.Generic.List[string]]::new($ChunkSize) $lineNum = 0
while (-not $reader.EndOfStream) { $chunk.Add($reader.ReadLine()) $lineNum++
if ($chunk.Count -ge $ChunkSize) { Process-Chunk -Data $chunk -LineStart ($lineNum - $ChunkSize)
$chunk.Clear() [System.GC]::Collect() } }
if ($chunk.Count -gt 0) { Process-Chunk -Data $chunk -LineStart ($lineNum - $chunk.Count) }
$reader.Close() }
|
执行结果示例:
1 2 3 4 5
| 工作集:245.67 MB 私有内存:312.34 MB
GC 前内存:156.78 MB GC 后内存:89.23 MB
|
管道优化
理解管道的执行方式对优化至关重要:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| Measure-Command { $data = Get-ChildItem C:\Projects -Recurse -File $largeFiles = $data | Where-Object { $_.Length -gt 1MB } $recentFiles = $data | Where-Object { $_.LastWriteTime -gt (Get-Date).AddDays(-7) } $ps1Files = $data | Where-Object { $_.Extension -eq '.ps1' } } | Select-Object TotalMilliseconds
Measure-Command { $largeFiles = [System.Collections.Generic.List[IO.FileInfo]]::new() $recentFiles = [System.Collections.Generic.List[IO.FileInfo]]::new() $ps1Files = [System.Collections.Generic.List[IO.FileInfo]]::new()
foreach ($file in (Get-ChildItem C:\Projects -Recurse -File)) { if ($file.Length -gt 1MB) { $largeFiles.Add($file) } if ($file.LastWriteTime -gt (Get-Date).AddDays(-7)) { $recentFiles.Add($file) } if ($file.Extension -eq '.ps1') { $ps1Files.Add($file) } } } | Select-Object TotalMilliseconds
Get-ChildItem C:\ -Recurse -File | Where-Object { $_.Extension -eq '.log' }
Get-ChildItem C:\ -Recurse -Filter *.log
|
执行结果示例:
1 2 3 4
| TotalMilliseconds ----------------- 345.67 # 多次遍历 123.45 # 单次遍历
|
注意事项
- 先度量再优化:不要凭直觉优化,使用
Measure-Command 确认瓶颈所在
- foreach 优于 ForEach-Object:在不需要管道流式处理时,使用
foreach 语句代替 ForEach-Object
- **避免数组 +=**:用
[List<T>] 或 [ArrayList] 替代,或直接赋值为数组
- 哈希表用于查找:需要频繁查找时,将数据构建为哈希表,避免线性扫描
- 流式处理:处理大量数据时,尽量逐条处理而不是全部加载到内存
- 合理使用 GC:不要频繁调用
[GC]::Collect(),它会暂停所有线程。仅在处理完大对象后手动触发