PowerShell 技能连载 - CSV 高级处理
适用于 PowerShell 5.1 及以上版本
CSV(逗号分隔值)是运维中最常见的数据交换格式——导出用户清单、导入配置数据、处理监控报表、批量操作清单。虽然 PowerShell 的 Import-Csv 和 Export-Csv 命令简单易用,但面对大数据量、复杂转换、编码问题、多文件合并等场景时,需要掌握更多技巧。
本文将讲解 CSV 处理的高级技巧和实用场景。
CSV 读写基础
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| $employees = @( [PSCustomObject]@{ Name = "张三"; Department = "工程部"; Salary = 25000; JoinDate = "2023-03-15" } [PSCustomObject]@{ Name = "李四"; Department = "市场部"; Salary = 20000; JoinDate = "2023-06-01" } [PSCustomObject]@{ Name = "王五"; Department = "工程部"; Salary = 30000; JoinDate = "2022-01-10" } [PSCustomObject]@{ Name = "赵六"; Department = "人事部"; Salary = 22000; JoinDate = "2024-02-20" } [PSCustomObject]@{ Name = "孙七"; Department = "工程部"; Salary = 28000; JoinDate = "2023-09-05" } )
$employees | Export-Csv -Path "C:\Data\employees.csv" -NoTypeInformation -Encoding UTF8 Write-Host "已导出 CSV" -ForegroundColor Green
$data = Import-Csv -Path "C:\Data\employees.csv" -Encoding UTF8 Write-Host "读取 $($data.Count) 条记录" -ForegroundColor Cyan
$data | Where-Object { $_.Department -eq "工程部" } | Sort-Object Salary -Descending | Format-Table Name, Salary -AutoSize
|
执行结果示例:
数据转换与计算
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| $data = Import-Csv "C:\Data\employees.csv"
$enhanced = $data | Select-Object *, @{ N = 'SalaryK' E = { [math]::Round([int]$_.Salary / 1000, 1) } }, @{ N = 'YearsOfService' E = { [math]::Round(((Get-Date) - [datetime]$_.JoinDate).TotalDays / 365, 1) } }, @{ N = 'AnnualSalary' E = { [int]$_.Salary * 12 } }
$enhanced | Format-Table Name, Department, SalaryK, YearsOfService, AnnualSalary -AutoSize
$deptStats = $data | Group-Object Department | ForEach-Object { $avgSalary = [math]::Round(($_.Group | ForEach-Object { [int]$_.Salary } | Measure-Object -Average).Average) $maxSalary = ($_.Group | ForEach-Object { [int]$_.Salary } | Measure-Object -Maximum).Maximum
[PSCustomObject]@{ Department = $_.Name Count = $_.Count AvgSalary = $avgSalary MaxSalary = $maxSalary } }
Write-Host "`n部门统计:" -ForegroundColor Cyan $deptStats | Format-Table -AutoSize
$pivot = $data | Group-Object Department | ForEach-Object { $names = ($_.Group | Select-Object -ExpandProperty Name) -join ', ' [PSCustomObject]@{ Department = $_.Name Count = $_.Count Members = $names } } $pivot | Format-Table -AutoSize
|
执行结果示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| ---- ---------- ------- -------------- ------------ .. .. .. .. ..
---------- ----- ---------- ---------
---------- ----- ------- , ,
|
大文件高效处理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
| function Process-LargeCsv { param( [Parameter(Mandatory)] [string]$Path,
[scriptblock]$ProcessBlock,
[int]$ReportInterval = 10000 )
$reader = [System.IO.StreamReader]::new($Path, [System.Text.Encoding]::UTF8) $header = $reader.ReadLine()
if (-not $header) { $reader.Close() return }
$columns = $header -split ',' $count = 0 $output = @()
while ($null -ne ($line = $reader.ReadLine())) { $count++ $values = $line -split ',' $obj = [ordered]@{} for ($i = 0; $i -lt $columns.Count; $i++) { $obj[$columns[$i].Trim('"')] = if ($i -lt $values.Count) { $values[$i].Trim('"') } else { "" } }
$result = & $ProcessBlock ([PSCustomObject]$obj) if ($result) { $output += $result }
if ($count % $ReportInterval -eq 0) { Write-Host " 已处理 $count 行..." -ForegroundColor DarkGray } }
$reader.Close() Write-Host "处理完成:$count 行" -ForegroundColor Green return $output }
$filtered = Process-LargeCsv -Path "C:\Data\large-dataset.csv" -ProcessBlock { param($row) if ([int]$row.Amount -gt 10000) { $row } } -ReportInterval 50000
Write-Host "筛选结果:$($filtered.Count) 条"
function Export-CsvFast { param( [Parameter(Mandatory)] [object[]]$Data,
[Parameter(Mandatory)] [string]$Path )
$writer = [System.IO.StreamWriter]::new($Path, $false, [System.Text.Encoding]::UTF8)
try { if ($Data.Count -gt 0) { $properties = $Data[0].PSObject.Properties.Name $writer.WriteLine(($properties | ForEach-Object { "`"$_`"" }) -join ',')
foreach ($item in $Data) { $values = foreach ($prop in $properties) { $val = $item.$prop if ($null -eq $val) { '""' } elseif ($val -match '[,""\r\n]') { "`"$($val -replace '"', '""')`"" } else { "`"$val`"" } } $writer.WriteLine(($values -join ',')) } } } finally { $writer.Close() } }
|
执行结果示例:
1 2 3 4 5
| 已处理 10000 行... 已处理 20000 行... 已处理 30000 行... 处理完成:32500 行 筛选结果:1250 条
|
多文件合并与对比
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
| function Merge-CsvFiles { param( [Parameter(Mandatory)] [string[]]$Paths,
[string]$OutputPath )
$allData = @() foreach ($path in $Paths) { if (Test-Path $path) { $data = Import-Csv $path -Encoding UTF8 $source = [System.IO.Path]::GetFileName($path) $data | ForEach-Object { $_ | Add-Member -NotePropertyName "Source" -NotePropertyValue $source -Force } $allData += $data Write-Host "已加载:$source ($($data.Count) 行)" -ForegroundColor Green } }
$allData | Export-Csv $OutputPath -NoTypeInformation -Encoding UTF8 Write-Host "合并完成:$($allData.Count) 行 => $OutputPath" -ForegroundColor Cyan }
$dateFiles = 1..5 | ForEach-Object { "C:\Data\report-$(Get-Date).AddDays(-$_).ToString('yyyyMMdd').csv" } Merge-CsvFiles -Paths $dateFiles -OutputPath "C:\Data\report-weekly.csv"
function Compare-CsvData { param( [Parameter(Mandatory)][string]$ReferenceFile, [Parameter(Mandatory)][string]$DifferenceFile, [Parameter(Mandatory)][string]$KeyColumn )
$ref = Import-Csv $ReferenceFile -Encoding UTF8 $diff = Import-Csv $DifferenceFile -Encoding UTF8
$refKeys = $ref | Select-Object -ExpandProperty $KeyColumn $diffKeys = $diff | Select-Object -ExpandProperty $KeyColumn
$added = $diffKeys | Where-Object { $_ -notin $refKeys } $removed = $refKeys | Where-Object { $_ -notin $diffKeys } $common = $refKeys | Where-Object { $_ -in $diffKeys }
Write-Host "CSV 对比结果:" -ForegroundColor Cyan Write-Host " 新增:$($added.Count) 条" -ForegroundColor Green Write-Host " 删除:$($removed.Count) 条" -ForegroundColor Red Write-Host " 共有:$($common.Count) 条" -ForegroundColor Gray
if ($added) { Write-Host "`n新增记录:" -ForegroundColor Green; $added | ForEach-Object { Write-Host " + $_" } } if ($removed) { Write-Host "`n删除记录:" -ForegroundColor Red; $removed | ForEach-Object { Write-Host " - $_" } } }
Compare-CsvData -ReferenceFile "C:\Data\servers-old.csv" -DifferenceFile "C:\Data\servers-new.csv" -KeyColumn "ServerName"
|
执行结果示例:
1 2 3 4 5 6 7 8 9 10 11 12
| CSV 对比结果: 新增:3 条 删除:1 条 共有:47 条
新增记录: + SRV-NEW-01 + SRV-NEW-02 + SRV-NEW-03
删除记录: - SRV-OLD-05
|
注意事项
- 编码问题:
Import-Csv 在 PowerShell 5.1 中默认使用 ANSI 编码,中文 CSV 务必指定 -Encoding UTF8
- 引号处理:CSV 中包含逗号的字段应该用双引号包裹,
Import-Csv 会自动处理
- 大文件性能:
Import-Csv 会将整个文件加载到内存,超大文件使用流式处理
- 类型转换:CSV 所有值都是字符串,数值比较和计算前需要显式转换类型
- NoTypeInformation:
Export-Csv 务必加 -NoTypeInformation,否则第一行会输出 .NET 类型信息
- 日期格式:CSV 中的日期格式不统一时,使用
[datetime]::ParseExact() 指定格式解析