示例数据:
Lorem dolor sit amet consectetur
Lorem ipsum dolor sit ,
Lorem dolor sit amet ,
Lorem dolor ipsum sit !
将示例数据另存为note.txt,使用如下脚本:
awk 'NR==FNR {for(i=1;i<=NF;i++) c[i,$i]++;next}
{f=line="";
for(i=1;i<=NF;i++)
{k=i SUBSEP $i;
if(k in c)
{f=1; line=line sprintf("%d %s",c[k],$i); delete c[k]};
line=line "\t"}
if(f) print line}' note.txt{,}
输出:
4 Lorem 3 dolor 2 sit 2 amet 1 consectetur
1 ipsum 1 dolor 2 sit 2 ,
1 ipsum 1 !
或者
awk '{
for (col = 1; col <= NF; ++col) {
++count[col " " $col]
}
} END {
for (colWord in count) {
split(colWord, s, " ")
col=s[1]
word=s[2]
print col " " count[colWord] " " word
}
}' note.txt | sort -k1,1n -k2,2nr
输出:
1 4 Lorem
2 3 dolor
2 1 ipsum
3 2 sit
3 1 dolor
3 1 ipsum
4 2 amet
4 2 sit
5 2 ,
5 1 !
5 1 consectetur