Skip to Content
bash shell脚本编程经典实例 (第2版)
book

bash shell脚本编程经典实例 (第2版)

by Carl Albing, JP Vossen
January 2021
Intermediate to advanced
581 pages
15h 7m
Chinese
Posts & Telecom Press
Content preview from bash shell脚本编程经典实例 (第2版)
250
13
5.18
5.20
6.15
13.1
13.3
 解析
HTML
13.3.1
 问题
你想将
HTML
中的字符串提取出来。例如,在一堆
HTML
中提取出
<a>
标签内形如
href="
urlstringstuff
"
的字符串。
13.3.2
 解决方案
要想用
shell
快捷地解析
HTML
,而且也不要求万无一失,可以尝试下列做法。
cat $1 | sed -e 's/>/>\
/g' | grep '<a' |
while
IFS='"' read a b c ; do echo $b;
done
13.3.3
 讨论
bash
解析
HTML
绝非易事,主要因为
bash
基本上是面向行的,而
HTML
的设计是将换
行符视为空白字符。因此,我们经常会看到跨多行的
HTML
标签。
<a
href="blah..." rel="blah..." media="blah..."
target= "blah..."
>
<a>
标签的写法有两种,一种要使用独立的结束标签
</a>
,另一种则不用,其是以
/>
作为
结尾的单个
<a>
标签。在起止标签之间可能存在占据了一行或数行的多个其他标签,这在
解析的时候实在有些让人挠头,我们给出的简单的
bash
解析方法往往不能确保万无一失。
接下来我们将分步骤讲解解决方案。首先,将出现在一行中的多个标签拆分成一行一个标签。
cat file | sed -e 's/>/>\
/g'
没错,反斜线后面紧跟着换行符,以此将每个标签的结尾字符(
>
)替换成该字符本身加
上一个换行符。这就使得一行只出现一个标签,另外可能还会伴随少量额外的空行 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

bash 网络安全运维

bash 网络安全运维

Paul Troncone, Carl Albing
Linux 内核观测技术BPF

Linux 内核观测技术BPF

David Calavera, Lorenzo Fontana

Publisher Resources

ISBN: 9787115553782