辰东全部小说,我欲封天耳根小说

主頁(yè) > 知識(shí)庫(kù) > Linux的命令行中一些文本操作技巧的實(shí)例分享

Linux的命令行中一些文本操作技巧的實(shí)例分享

正則表達(dá)式
翻譯領(lǐng)域不乏讓人摸不著頭腦的詞匯，比如“句柄”、“套接字”、“魯棒性”。當(dāng)然，“正則表達(dá)式”也屬于這一類詞匯。我剛接觸正則表達(dá)式的時(shí)候，對(duì)這個(gè)名詞感到非常迷惑。深入了解之后，才突然明白，原來(lái)所謂的 regular expression，其實(shí)就是“有規(guī)律、有模式的字符串”而已。

很少有一門(mén)技術(shù)，只需要投入少量的學(xué)習(xí)成本即可獲得巨大的價(jià)值回報(bào)。正則表達(dá)式就屬于這一類技術(shù)?？上Ш芏嗳吮凰艽a般的語(yǔ)法形式當(dāng)頭棒喝，甚至連門(mén)都不得而入。

為什么你應(yīng)該學(xué)習(xí)正則表達(dá)式？其一，在實(shí)踐中應(yīng)用這門(mén)技術(shù)其實(shí)不難，只需理解為數(shù)不多的幾個(gè)元字符以及并不復(fù)雜的語(yǔ)法，就能夠獲得強(qiáng)大的文本操控能力；其二，正則表達(dá)式往往能提供處理文本的最簡(jiǎn)單最高效的解決方法（有時(shí)也許是唯一的解法）。遇上復(fù)雜的情況，如果你不會(huì)正則表達(dá)式，就只好束手無(wú)策、黯然神傷了。

正則表達(dá)式入門(mén)容易，精通卻難。本文并不打算挑戰(zhàn)此項(xiàng)任務(wù)^^

文本檢索
grep 命令可以完成簡(jiǎn)單的文本搜索任務(wù)。

先來(lái)準(zhǔn)備一份文本材料，把 grep 的幫助頁(yè)保存為文本文件：

復(fù)制代碼

代碼如下:

> man grep | col -b > grephelp.txt

下面，我想檢索 grephelp.txt 文件中所有包含 "find" 這個(gè)單詞的文本行：

復(fù)制代碼

代碼如下:

> grep "find" grephelp.txt
To find all occurrences of the word `patricia' in a file:
To find all occurrences of the pattern `.Pp' at the beginning of a line:
To find all lines in a file which do not contain the words `foo' or

我希望匹配到的文本使用不同的顏色顯示，可以添加 --color 選項(xiàng)，默認(rèn)的顏色是紅色。

復(fù)制代碼

代碼如下:

> grep --color "find" grephelp.txt

我希望在匹配結(jié)果中顯示文件名和行號(hào)，使用 -H 選項(xiàng)可以顯示文件名，使用 -n 選項(xiàng)可以顯示行號(hào)：

復(fù)制代碼

代碼如下:

> grep -H -n --color "find" grephelp.txt
grephelp.txt:252: To find all occurrences of the word `patricia' in a file:
grephelp.txt:256: To find all occurrences of the pattern `.Pp' at the beginning of a line:
grephelp.txt:265: To find all lines in a file which do not contain the words `foo' or

很多時(shí)候，我們需要知道匹配行前后的上下文。-A 和 -B 這兩個(gè)選項(xiàng)會(huì)是你的好朋友。-A n 表示顯示匹配行以及其后的 n 行；-B n 表示顯示匹配行以及之前的 n 行?，F(xiàn)在，我們?cè)谄ヅ湫械那昂蠓謩e額外顯示兩行：

復(fù)制代碼

代碼如下:

> grep -A 2 -B 2 -H -n --color "find" grephelp.txt
grephelp.txt-250-
grephelp.txt-251-EXAMPLES
grephelp.txt:252: To find all occurrences of the word `patricia' in a file:
grephelp.txt-253-
grephelp.txt-254- $ grep 'patricia' myfile
--
--
grephelp.txt-254- $ grep 'patricia' myfile
grephelp.txt-255-
grephelp.txt:256: To find all occurrences of the pattern `.Pp' at the beginning of a line:
grephelp.txt-257-
grephelp.txt-258- $ grep '^\.Pp' myfile
--
--
grephelp.txt-263- match any character.
grephelp.txt-264-
grephelp.txt:265: To find all lines in a file which do not contain the words `foo' or
grephelp.txt-266- `bar':
grephelp.txt-267-

如果需要查找所有不包含 "find" 的文本行，該怎么做呢？很簡(jiǎn)單，使用 -v 選項(xiàng)即可。

grep 還有兩個(gè)變體，egrep 和 fgrep。相對(duì)于僅支持基本正則模式（BREs）的 grep 來(lái)說(shuō)，egrep 支持?jǐn)U展正則模式（EREs），因而檢索能力更為強(qiáng)大；fgrep 是所有三個(gè)工具中速度最快的一個(gè)，因?yàn)樗耆恢С终齽t模式。

文本替換
tr 命令可以完成簡(jiǎn)單的字符轉(zhuǎn)換任務(wù)。例如，可以通過(guò) tr 把 grephelp.txt 文件轉(zhuǎn)換為全文大寫(xiě)：

復(fù)制代碼

代碼如下:

> cat grephelp.txt | tr '[:lower:]' '[:upper:]'

簡(jiǎn)而言之，tr 的工作就是把第一個(gè)集合中的字符轉(zhuǎn)換為第二個(gè)集合中的相應(yīng)的字符。常用的字符集合有下面這些：

[:alnum:]：字母數(shù)字
[:alpha:]：字母
[:cntrl:] ：控制字符
[:digit:]：數(shù)字
[:graph:]: 圖形字符
[:lower:]：小寫(xiě)字母
[:print:]：可打印字符
[:punct:]：標(biāo)點(diǎn)符號(hào)
[:space:]：空白字符
[:upper:]：大寫(xiě)字母
[:xdigit:]：十六進(jìn)制數(shù)字
tr 命令的應(yīng)用場(chǎng)景非常受限，如果希望進(jìn)行更加靈活的模式替換，我們還有 sed（也就是 stream editor，流編輯器）。

把文件中所有的 "find" 文本替換為 "search"：

復(fù)制代碼

代碼如下:

> sed "s/find/search/g" grephelp.txt

這條命令中，s 表示執(zhí)行“替換操作”，/find/search/ 表示把 "find" 替換為 "search"，g 表示對(duì)一行中所有的匹配進(jìn)行替換。sed 默認(rèn)把處理結(jié)果打印到標(biāo)準(zhǔn)輸出，我們可以通過(guò)重定向把處理結(jié)果轉(zhuǎn)儲(chǔ)到一個(gè)新文件中，或者使用選項(xiàng) -i 把結(jié)果直接寫(xiě)回原文件（有風(fēng)險(xiǎn)，需謹(jǐn)慎）：

復(fù)制代碼

代碼如下:

> sed -i "s/find/search/g" grephelp.txt

把文件中所有的數(shù)字 n 替換為 "--n--" 的形式：

復(fù)制代碼

代碼如下:

> sed -E "s/([0-9]+)/--\1--/g" grephelp.txt

選項(xiàng) -E 表示在處理過(guò)程中使用擴(kuò)展的正則模式（EREs），替換命令中的 \1 表示引用正則表達(dá)式的第一個(gè)捕獲分組。請(qǐng)注意，-E 這個(gè)選項(xiàng)只在 Mac OS X 系統(tǒng)和 FreeBSD 系統(tǒng)上有效，其他 Unix 系統(tǒng)需要使用另一個(gè)等效的選項(xiàng) -r。

sed 的功能遠(yuǎn)不止這一些，篇幅所限，不可能詳細(xì)講解 sed 的用法。如果希望學(xué)習(xí)更多，請(qǐng)移步這篇文章。

文本去重

復(fù)制代碼

代碼如下:

> cat -n sonnet116.txt
1 Let me not to the marriage of true minds
2 Admit impediments. Love is not love
3 Which alters when it alteration finds,
4 Or bends with the remover to remove:
5 O, no! it is an ever-fix`ed mark,
6 O, no! it is an ever-fix`ed mark,
7 That looks on tempests and is never shaken;
8 It is the star to every wand'ring bark,
9 Whose worth's unknown, although his heighth be taken.
10 Love's not Time's fool, though rosy lips and cheeks
11 Love's not Time's fool, though rosy lips and cheeks
12 Love's not Time's fool, though rosy lips and cheeks
13 Within his bending sickle's compass come;
14 Love alters not with his brief hours and weeks,
15 But bears it out even to the edge of doom:
16 If this be error and upon me proved,
17 I never writ, nor no man ever loved.

這是莎士比亞的一首十四行詩(shī)，只可惜第5行和第10行有重復(fù)（而且第10行重復(fù)了3次）。怎么查看文本中重復(fù)的行呢？uniq 命令可以幫助你。

復(fù)制代碼

代碼如下:

> uniq -d sonnet116.txt
O, no! it is an ever-fix`ed mark,
Love's not Time's fool, though rosy lips and cheeks

選項(xiàng) -d 表示僅輸出重復(fù)的行。如果需要去重，使用不帶選項(xiàng)的 uniq 命令就可以了：

復(fù)制代碼

代碼如下:

> uniq sonnet116.txt
Let me not to the marriage of true minds
Admit impediments. Love is not love
Which alters when it alteration finds,
Or bends with the remover to remove:
O, no! it is an ever-fix`ed mark,
That looks on tempests and is never shaken;
It is the star to every wand'ring bark,
Whose worth's unknown, although his heighth be taken.
Love's not Time's fool, though rosy lips and cheeks
Within his bending sickle's compass come;
Love alters not with his brief hours and weeks,
But bears it out even to the edge of doom:
If this be error and upon me proved,
I never writ, nor no man ever loved.

想要查看每一行究竟重復(fù)了多少次？沒(méi)問(wèn)題，使用選項(xiàng) -c：

復(fù)制代碼

代碼如下:

> uniq -c sonnet116.txt
1 Let me not to the marriage of true minds
1 Admit impediments. Love is not love
1 Which alters when it alteration finds,
1 Or bends with the remover to remove:
2 O, no! it is an ever-fix`ed mark,
1 That looks on tempests and is never shaken;
1 It is the star to every wand'ring bark,
1 Whose worth's unknown, although his heighth be taken.
3 Love's not Time's fool, though rosy lips and cheeks
1 Within his bending sickle's compass come;
1 Love alters not with his brief hours and weeks,
1 But bears it out even to the edge of doom:
1 If this be error and upon me proved,
1 I never writ, nor no man ever loved.

文本排序
假設(shè)有這樣一個(gè)報(bào)表文件，第一列是月份，第二列是當(dāng)月的銷(xiāo)售個(gè)數(shù)：

復(fù)制代碼

代碼如下:

> cat report.txt
March,19
June,50
February,17
May,18
August,16
April,31
May,18
July,26
January,24
August,16

這個(gè)文件的內(nèi)容不僅順序是亂的，而且還有重復(fù)。我希望按字母表順序排序，可以下面這個(gè)命令：

復(fù)制代碼

代碼如下:

> sort report.txt
April,31
August,16
August,16
February,17
January,24
July,26
June,50
March,19
May,18
May,18

選項(xiàng) -u （表示 unique）可以在排序結(jié)果中去除重復(fù)行：

復(fù)制代碼

代碼如下:

> sort -u report.txt
April,31
August,16
February,17
January,24
July,26
June,50
March,19
May,18

能不能按照月份排序呢？選項(xiàng) -M （表示 month-sort）可以幫助我們：

復(fù)制代碼

代碼如下:

> sort -u -M report.txt
January,24
February,17
March,19
April,31
May,18
June,50
July,26
August,16

按照第二列的數(shù)字進(jìn)行排序也是很簡(jiǎn)單的：

復(fù)制代碼

代碼如下:

> sort -u -t',' -k2 report.txt
August,16
February,17
May,18
March,19
January,24
July,26
April,31
June,50

上面的例子中，選項(xiàng) -t',' 表示以逗號(hào)為分隔符對(duì)文本進(jìn)行列分割；-k2 表示對(duì)第2列進(jìn)行排序。

當(dāng)然了，把結(jié)果逆序排列也并非不可能：

復(fù)制代碼

代碼如下:

> sort -u -r -t',' -k2 report.txt
June,50
April,31
July,26
January,24
March,19
May,18
February,17
August,16

文本統(tǒng)計(jì)
wc 命令用來(lái)完成文本統(tǒng)計(jì)工作，通過(guò)使用不同的選項(xiàng)，它可以統(tǒng)計(jì)文件中的字節(jié)數(shù)（-c），字符數(shù)（-m），單詞數(shù)（-w）與行數(shù)（-l）。

例如，查看 grephelp.txt 這個(gè)文件總共有多少個(gè)單詞：

復(fù)制代碼

代碼如下:

> wc -w grephelp.txt
1571 grephelp.txt

查看 sonnet116.txt 這個(gè)文件總共有多少不重復(fù)的行（廢話，十四行詩(shī)當(dāng)然是有14行）：

復(fù)制代碼

代碼如下:

> uniq sonnet116.tx6 | wc -l
14

你還應(yīng)該試試 Awk 與 Perl
如果上面介紹的工具仍然不能滿足你，也許你需要火力更強(qiáng)的武器。試試 Awk 與 Perl 吧。

Awk 也是一款上古神器，它的年齡可能和 sed 不相上下。Awk 可謂是專門(mén)為了文本處理而生，它的語(yǔ)法和特性非常適合用于操縱文本和生成報(bào)表。如需學(xué)習(xí)，請(qǐng)參考這篇文章，你會(huì)喜歡上它的。

長(zhǎng)久以來(lái)，Perl 背負(fù)了“只寫(xiě)語(yǔ)言”的惡名。實(shí)際上，只要處理得當(dāng)，用 Perl 一樣可以寫(xiě)出模塊清晰的、容易閱讀和理解的代碼。根據(jù)我的經(jīng)驗(yàn)，使用 Perl 的場(chǎng)合 80% 以上與文本處理有關(guān)。Perl 內(nèi)置的正則表達(dá)式支持可能是所有語(yǔ)言中最好的，再加上簡(jiǎn)潔緊湊的語(yǔ)法以及便利的操作符，這些特性幫助 Perl 成了文本處理領(lǐng)域當(dāng)仁不讓的霸主。

標(biāo)簽：景德鎮(zhèn) 河南海北威海黔南天門(mén) 欽州鶴壁

巨人網(wǎng)絡(luò)通訊聲明：本文標(biāo)題《Linux的命令行中一些文本操作技巧的實(shí)例分享》，本文關(guān)鍵詞 Linux,的,命令,行中,一些,；如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問(wèn)題，煩請(qǐng)?zhí)峁┫嚓P(guān)信息告之我們，我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò)，涉及言論、版權(quán)與本站無(wú)關(guān)。