适用于 PowerShell 7.0 及以上版本
在云原生可观测性体系中,Prometheus 已经成为指标采集与监控的事实标准。它的数据模型基于时间序列,每条指标由指标名称和一组键值对标签唯一标识。当我们需要在运维自动化脚本中采集系统指标、将业务应用的性能数据推送到 Prometheus Pushgateway、或者从 Prometheus Server 查询历史数据做容量规划时,直接通过 HTTP API 与 Prometheus 交互是最灵活的方式。
PowerShell 7 内置的 Invoke-RestMethod 对 JSON 的原生支持,使其非常适合与 Prometheus 的 RESTful API 和文本暴露格式(text-based exposition format)打交道。无需安装额外的 SDK,只需几行脚本就能完成指标采集、推送和查询。本文将从三个场景出发:采集本地系统指标并写入 Prometheus 格式文件、推送自定义指标到 Pushgateway、以及从 Prometheus Server 执行 PromQL 查询并分析结果。
场景一:采集本地系统指标并输出 Prometheus 格式 Prometheus 的文本暴露格式是一种人类可读的纯文本协议。每条指标以 # TYPE 声明类型,紧随其后的行是具体的指标值。下面的脚本通过 .NET 的 System.Diagnostics.Process 和 PerformanceCounter 类采集 CPU、内存和磁盘指标,然后输出符合 Prometheus 标准的文本格式。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 function Get-SystemPrometheusMetrics { [CmdletBinding ()] param ( [Parameter ()] [string ]$InstanceHostname = $env:COMPUTERNAME ) $cpuUsage = 0 if ($IsWindows -or $PSEdition -eq 'Desktop' ) { $cpu = Get-CimInstance -ClassName Win32_Processor -ErrorAction SilentlyContinue if ($cpu ) { $cpuUsage = [math ]::Round(($cpu | Measure-Object -Property LoadPercentage -Average ).Average, 2 ) } } else { $topOutput = top -bn1 | Select-String '^%?Cpu' if ($topOutput -match '(\d+\.?\d*)\s*id' ) { $cpuUsage = [math ]::Round(100 - [double ]$Matches [1 ], 2 ) } } $osInfo = if ($IsWindows -or $PSEdition -eq 'Desktop' ) { $os = Get-CimInstance -ClassName Win32_OperatingSystem @ { TotalBytes = $os .TotalVisibleMemorySize * 1 KB UsedBytes = ($os .TotalVisibleMemorySize - $os .FreePhysicalMemory) * 1 KB FreeBytes = $os .FreePhysicalMemory * 1 KB UsedPercent = [math ]::Round(($os .TotalVisibleMemorySize - $os .FreePhysicalMemory) / $os .TotalVisibleMemorySize * 100 , 2 ) } } else { $memInfo = Get-Content /proc/meminfo $total = [int ]($memInfo | Select-String 'MemTotal:\s+(\d+)' | ForEach-Object { $_ .Matches[0 ].Groups[1 ].Value }) * 1 KB $available = [int ]($memInfo | Select-String 'MemAvailable:\s+(\d+)' | ForEach-Object { $_ .Matches[0 ].Groups[1 ].Value }) * 1 KB @ { TotalBytes = $total UsedBytes = $total - $available FreeBytes = $available UsedPercent = [math ]::Round(($total - $available ) / $total * 100 , 2 ) } } $drive = if ($IsWindows -or $PSEdition -eq 'Desktop' ) { Get-CimInstance -ClassName Win32_LogicalDisk -Filter 'DriveType=3' | Sort-Object -Property Size -Descending | Select-Object -First 1 } else { $dfOutput = df / | Select-Object -Last 1 $parts = $dfOutput -split '\s+' [PSCustomObject ]@ { Size = [int 64 ]$parts [1 ] * 1 KB FreeSpace = [int 64 ]$parts [3 ] * 1 KB VolumeName = '/' } } $diskTotal = $drive .Size $diskFree = if ($drive .FreeSpace -is [long ]) { $drive .FreeSpace } else { $drive .FreeSpace } $diskUsedPercent = [math ]::Round(($diskTotal - $diskFree ) / $diskTotal * 100 , 2 ) $timestamp = [int 64 ][double ]::Parse( (Get-Date -UFormat '%s' ), [System.Globalization.CultureInfo ]::InvariantCulture ) $labels = "instance=`"$InstanceHostname `"" $lines = @ ( '# HELP system_cpu_usage_percent CPU usage percentage' '# TYPE system_cpu_usage_percent gauge' "system_cpu_usage_percent{$labels } $cpuUsage $timestamp " '' '# HELP system_memory_total_bytes Total physical memory in bytes' '# TYPE system_memory_total_bytes gauge' "system_memory_total_bytes{$labels } $ ($osInfo .TotalBytes) $timestamp " '' '# HELP system_memory_used_bytes Used physical memory in bytes' '# TYPE system_memory_used_bytes gauge' "system_memory_used_bytes{$labels } $ ($osInfo .UsedBytes) $timestamp " '' '# HELP system_memory_used_percent Memory usage percentage' '# TYPE system_memory_used_percent gauge' "system_memory_used_percent{$labels } $ ($osInfo .UsedPercent) $timestamp " '' '# HELP system_disk_total_bytes Total disk space in bytes' '# TYPE system_disk_total_bytes gauge' "system_disk_total_bytes{$labels } $diskTotal $timestamp " '' '# HELP system_disk_used_percent Disk usage percentage' '# TYPE system_disk_used_percent gauge' "system_disk_used_percent{$labels } $diskUsedPercent $timestamp " ) return $lines -join "`n" } $metrics = Get-SystemPrometheusMetrics Write-Output $metrics
执行结果示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 system_cpu_usage_percent {instance="WEB-SVR01" } 23 .45 1729137600 system_memory_total_bytes {instance="WEB-SVR01" } 34359738368 1729137600 system_memory_used_bytes {instance="WEB-SVR01" } 20132659200 1729137600 system_memory_used_percent {instance="WEB-SVR01" } 58 .59 1729137600 system_disk_total_bytes {instance="WEB-SVR01" } 536870912000 1729137600 system_disk_used_percent {instance="WEB-SVR01" } 72 .31 1729137600
场景二:推送自定义指标到 Pushgateway 短生命周期的任务(如批处理脚本、CI/CD 构建流水线)运行时间很短,Prometheus 的默认拉取模式可能来不及采集。Pushgateway 提供了一种推送模式,允许脚本在任务完成时主动将指标推送到中间网关,等待 Prometheus 定期拉取。下面的脚本演示了如何将构建流水线的执行时长和成功率推送到 Pushgateway。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 function Push-PrometheusMetric { [CmdletBinding ()] param ( [Parameter (Mandatory )] [uri ]$PushgatewayUrl , [Parameter (Mandatory )] [string ]$JobName , [Parameter (Mandatory )] [string ]$MetricName , [Parameter (Mandatory )] [double ]$MetricValue , [Parameter ()] [string ]$MetricType = 'gauge' , [Parameter ()] [string ]$HelpText = $MetricName , [Parameter ()] [hashtable ]$Labels ) $pushPath = "/metrics/job/$JobName " foreach ($key in $Labels .Keys) { $encodedKey = [uri ]::EscapeDataString($key ) $encodedValue = [uri ]::EscapeDataString($Labels [$key ]) $pushPath += "/$encodedKey /$encodedValue " } $fullUrl = New-Object System.Uri -ArgumentList $PushgatewayUrl , $pushPath $timestamp = [int 64 ][double ]::Parse( (Get-Date -UFormat '%s' ), [System.Globalization.CultureInfo ]::InvariantCulture ) $labelPairs = foreach ($key in $Labels .Keys) { "$key =`"$ ($Labels [$key ])`"" } $labelStr = $labelPairs -join ',' $body = @ ( "# HELP $MetricName $HelpText " "# TYPE $MetricName $MetricType " "$ {MetricName}{$labelStr } $MetricValue $timestamp " ) -join "`n" try { $response = Invoke-RestMethod -Method Post -Uri $fullUrl -Body $body ` -ContentType 'text/plain; version=1.0.4; charset=utf-8' ` -ErrorAction Stop Write-Verbose "指标推送成功: $MetricName = $MetricValue -> $fullUrl " return $true } catch { Write-Error "指标推送失败: $ ($_ .Exception.Message)" return $false } } $pushgateway = 'http://prometheus-pushgateway.monitoring.svc.cluster.local:9091' $buildLabels = @ { branch = 'main' pipeline = 'deploy-production' stage = 'build' runner = 'ps-runner-01' } $buildDuration = Get-Random -Minimum 120 -Maximum 480 Push-PrometheusMetric -PushgatewayUrl $pushgateway ` -JobName 'ci_build_pipeline' ` -MetricName 'ci_build_duration_seconds' ` -MetricValue $buildDuration ` -MetricType 'gauge' ` -HelpText 'Duration of CI build in seconds' ` -Labels $buildLabels Push-PrometheusMetric -PushgatewayUrl $pushgateway ` -JobName 'ci_build_pipeline' ` -MetricName 'ci_build_success' ` -MetricValue 1 ` -MetricType 'gauge' ` -HelpText 'Whether the CI build succeeded (1=yes, 0=no)' ` -Labels $buildLabels Write-Host "构建指标已推送到 Pushgateway"
执行结果示例:
可以通过以下命令验证 Pushgateway 中存储的指标:
1 2 3 $groups = Invoke-RestMethod -Uri "$pushgateway /api/v1/metrics" $groups .data | ConvertTo-Json -Depth 5 | Select-Object -First 30
执行结果示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 { "type" : "gauge" , "help" : "Duration of CI build in seconds" , "metrics" : [ { "labels" : { "branch" : "main" , "instance" : "" , "job" : "ci_build_pipeline" , "pipeline" : "deploy-production" , "runner" : "ps-runner-01" , "stage" : "build" }, "value" : "347" } ] }
场景三:查询 Prometheus Server 并分析指标数据 Prometheus 提供了丰富的 HTTP Query API,支持即时查询(instant query)和范围查询(range query)。下面的脚本封装了两个查询函数,分别用于获取某一时刻的指标快照和一段时间内的时序数据,并将结果转换为 PowerShell 对象便于后续分析。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 function Invoke-PrometheusQuery { [CmdletBinding ()] param ( [Parameter (Mandatory )] [uri ]$PrometheusUrl , [Parameter (Mandatory )] [string ]$Query , [Parameter ()] [datetime ]$Time = (Get-Date ) ) $timestamp = [int 64 ][double ]::Parse( $Time .ToUniversalTime().Subtract([datetime ]::new(1970 , 1 , 1 )).TotalSeconds, [System.Globalization.CultureInfo ]::InvariantCulture ) $queryParams = @ { query = $Query time = $timestamp } $response = Invoke-RestMethod -Method Get ` -Uri "$PrometheusUrl /api/v1/query" ` -Body $queryParams if ($response .status -ne 'success' ) { throw "Prometheus 查询失败: $ ($response .errorType) - $ ($response .error)" } $results = foreach ($result in $response .data.result) { $labels = $result .metric $value = $result .value [PSCustomObject ]@ { MetricName = $labels ['__name__' ] Labels = $labels Timestamp = [datetime offset ]::FromUnixTimeSeconds([int 64 ]$value [0 ]).DateTime Value = [double ]$value [1 ] } } return $results } function Invoke-PrometheusRangeQuery { [CmdletBinding ()] param ( [Parameter (Mandatory )] [uri ]$PrometheusUrl , [Parameter (Mandatory )] [string ]$Query , [Parameter (Mandatory )] [datetime ]$StartTime , [Parameter (Mandatory )] [datetime ]$EndTime , [Parameter ()] [string ]$Step = '5m' ) $startTs = [int 64 ][double ]::Parse( $StartTime .ToUniversalTime().Subtract([datetime ]::new(1970 , 1 , 1 )).TotalSeconds, [System.Globalization.CultureInfo ]::InvariantCulture ) $endTs = [int 64 ][double ]::Parse( $EndTime .ToUniversalTime().Subtract([datetime ]::new(1970 , 1 , 1 )).TotalSeconds, [System.Globalization.CultureInfo ]::InvariantCulture ) $queryParams = @ { query = $Query start = $startTs end = $endTs step = $Step } $response = Invoke-RestMethod -Method Get ` -Uri "$PrometheusUrl /api/v1/query_range" ` -Body $queryParams if ($response .status -ne 'success' ) { throw "Prometheus 范围查询失败: $ ($response .errorType) - $ ($response .error)" } $results = foreach ($series in $response .data.result) { $labels = $series .metric $values = $series .values foreach ($pair in $values ) { [PSCustomObject ]@ { MetricName = $labels ['__name__' ] Instance = $labels ['instance' ] Job = $labels ['job' ] Timestamp = [datetime offset ]::FromUnixTimeSeconds([int 64 ]$pair [0 ]).DateTime Value = [double ]$pair [1 ] } } } return $results } $prometheus = 'http://prometheus.monitoring.svc.cluster.local:9090' $cpuData = Invoke-PrometheusQuery -PrometheusUrl $prometheus ` -Query '100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)' $cpuData | Select-Object MetricName, Instance, Value, Timestamp | Sort-Object Value -Descending | Format-Table -AutoSize $rangeData = Invoke-PrometheusRangeQuery -PrometheusUrl $prometheus ` -Query '(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100' ` -StartTime (Get-Date ).AddHours(-1 ) ` -EndTime (Get-Date ) ` -Step '5m' $rangeData | Group-Object Instance | ForEach-Object { $values = $_ .Group.Value [PSCustomObject ]@ { Instance = $_ .Name Min = [math ]::Round(($values | Measure-Object -Minimum ).Minimum, 2 ) Max = [math ]::Round(($values | Measure-Object -Maximum ).Maximum, 2 ) Avg = [math ]::Round(($values | Measure-Object -Average ).Average, 2 ) Samples = $values .Count } } | Sort-Object Avg -Descending | Format-Table -AutoSize
执行结果示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 MetricName Instance Value Timestamp ---------- -------- ----- --------- node_cpu_seconds_total web-svr01:9100 78.34 10/17 /2025 8:00 :00 AM node_cpu_seconds_total db-svr01:9100 45.21 10/17 /2025 8:00 :00 AM node_cpu_seconds_total api-svr01:9100 32.67 10/17 /2025 8:00 :00 AM node_cpu_seconds_total monitor-svr01:9100 12.45 10/17 /2025 8:00 :00 AM Instance Min Max Avg Samples --------- --- --- --- ------- web-svr01:9100 55.32 82.17 68 .45 12 db-svr01:9100 40.11 62.89 51 .33 12 api-svr01:9100 28.45 48.22 37 .61 12 monitor-svr01:9100 8.12 22.34 15 .88 12
注意事项
指标命名规范 :Prometheus 指标名称应遵循 namespace_subsystem_name_unit 的命名约定。例如 node_memory_MemAvailable_bytes 分别代表命名空间(node)、子系统(memory)、度量项(MemAvailable)和单位(bytes)。使用一致的命名规范能让 PromQL 查询更简洁,也便于 Grafana 面板复用。
时间戳精度与同步 :推送指标时附带的时间戳必须是 Unix 纪元秒数(float64)。确保运行 PowerShell 的主机时间已通过 NTP 同步,否则 Prometheus 可能因时间偏移而拒绝数据。在推送模式下可以省略时间戳,让 Pushgateway 自动使用接收时间。
Pushgateway 数据清理 :Pushgateway 不会自动清除已推送的指标,即使对应的任务已经停止运行。这会导致 Prometheus 持续采集到过期的静态数据。建议在任务结束后调用 Pushgateway 的 DELETE API 清理指标组,或在推送时设置合理的标签(如 instance)以便批量清理。
PromQL 注入风险 :如果 PromQL 查询字符串包含用户输入(如主机名、应用名称),必须进行转义和校验,防止注入攻击。PromQL 本身不支持 SQL 式的注入,但恶意的标签值可能导致查询结果被篡改或返回大量数据耗尽 Prometheus Server 内存。
大范围查询的性能 :范围查询(query_range)的 step 参数直接影响返回的数据点数量。公式为 (end - start) / step。查询 7 天的数据、step 设为 15 秒将返回约 4 万个数据点,可能使 PowerShell 的对象处理变慢。建议根据查询时长合理设置 step:1 小时用 1m,1 天用 5m,7 天用 15m。
认证与网络安全 :生产环境的 Prometheus 通常部署在内部网络,可能需要 mTLS 或 Bearer Token 认证。使用 Invoke-RestMethod 时通过 -Headers @{Authorization = 'Bearer <token>'} 传递令牌,通过 -SkipCertificateCheck 处理自签证书(仅限内部测试环境)。建议将凭据存储在 PowerShell SecretManagement 模块中,不要硬编码在脚本里。