当从 CSV 加载数据到 Excel 中时,我们无法指定格式。
1 2 3 4 5
| PS> Get-Quote
Text ---- If you don't know anything about computers, just remember that they are machines that do exactly w...
|
1 2 3 4 5
| PS> Get-Quote -Topics men
Text Author ---- ------ But man is not made for defeat. A man can be destroyed but not defeated. Ernest Hemingway (1899–1...
|
1 2 3 4 5 6 7 8
| PS> Get-Quote -Topics jewelry WARNING: Topic 'jewelry' not found. Try a different one!
PS> Get-Quote -Topics jewel
Text ---- Cynicism isn't smarter, it's only safer. There's nothing fluffy about optimism . … People have th...
|
以下脚本首先加载 HTML 内容,然后使用正则表达式来搜集 HTML 中的引用。当然这只适用于原文有规律的情况。wikiquotes 的引用模式是这样的:
1
| <li><ul>Quote<ul><li>Author</li></ul></li>
|
所以以下代码将搜索这个模式,然后清理结构中找到的文本:需要移除 HTML 标签,例如链接,多个空格需要合并为一个空格(通过嵌套函数 Remove-Tag
)。
以下是代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
| function Get-Quote ($Topics='Computer', $Count=1) { function Remove-Tag ($Text) { $tagCount = 0 $text = -join $Text.ToCharArray().Foreach{ switch($_) { '<' { $tagCount++} '>' { $tagCount--; ' '} default { if ($tagCount -eq 0) {$_} } }
} $text -replace '\s{2,}', ' ' }
$pattern = "(?im)<li>(.*?)<ul><li>(.*?)</li></ul></li>"
Foreach ($topic in $topics) { $url = "https://en.wikiquote.org/wiki/$Topic"
try { $content = Invoke-WebRequest -Uri $url -UseBasicParsing -ErrorAction Stop } catch [System.Net.WebException] { Write-Warning "Topic '$Topic' not found. Try a different one!" return }
$html = $content.Content.Replace("`n",'').Replace("`r",'') [Regex]::Matches($html, $pattern) | ForEach-Object { [PSCustomObject]@{ Text = Remove-Tag $_.Groups[1].Value Author = Remove-Tag $_.Groups[2].Value Topic = $Topic } } | Get-Random -Count $Count } }
Get-Quote Get-Quote -Topic Car Get-Quote -Topic Jewel Get-Quote -Topic PowerShell
|