天下有比grep实现更好、更快、更强大的grep吗

www.qtchina.net

天下有比grep实现更好、更快、更强大的grep吗

发布： 2008-06-10 10:21

如果你没有看到这个软件，看到它的介绍，真正试用，真的是很难相信。

这个软件的名字叫ack , 其网站标榜: "better than grep, a search tool for programmers"

一个gentoo开发人员对其评介为”令人激动的、在某些情况下可替换grep的新工具"，并且他还对 ack 站点列出的“十大胜出理由“一一做了解释。

但是这个软件即使更快也不能完全取代grep，因为它运行的时候用到perl解释器及perl标准模块。
既然用perl解释的，那么它怎么能比c写的grep快呢？ That's really a problem.

这个gentoo开发人员在写一个ack的每日小技巧的文档，链接是： ack 每日小技巧

十大胜出理由:

It's blazingly fast because it only searches the stuff you want
searched.

Wait, how does it know what I want? A "http://en.wikipedia.org/wiki/DWIM">DWIM-Interface at last? Not
quite. First off, ack is faster than grep
for simple searches. Here's an example:
```
$ time ack 1Jsztn-000647-SL exim_main.log >/dev/null

real    0m3.463s

user    0m3.280s

sys     0m0.180s

$ time grep -F 1Jsztn-000647-SL exim_main.log >/dev/null

real    0m14.957s

user    0m14.770s

sys     0m0.160s
```
Two notes: first, yes, the file was in the page cache before I
ran ack; second, I even made it easy for grep by telling
it explicitly I was looking for a fixed string (not that it helped
much, the same command without -F was faster by about
0.1s). Oh and for completeness, the exim logfile I searched has
about two million lines and is 250M. I've run those tests ten times
for each, the times shown above are typical.

So yes, for simple searches, ack is faster than grep.
Let's try with a more complicated pattern, then. This time, let's
use the pattern (klausman|gentoo) on the same file. Note
that we have to use -E for grep to use extended
regexen, which ack in turn does not need, since it
(almost) always uses them. Here, grep takes its sweet
time: 3:56, nearly four minutes. In contrast, ack
accomplished the same task in 49 seconds (all times averaged over
ten runs, then rounded to integer seconds).

As for the "being clever" side of speed, see below, points 5 and
6

ack is pure Perl, so it runs on Windows just fine.

This isn't relevant to me, since I don't use windows for
anything where I might need grep. That said, it might be a killer
feature for others.

The standalone version uses no non-standard modules, so you can
put it in your ~/bin without fear.

Ok, this is not so much of a feature than a hard criterion. If I
needed extra modules for the whole thing to run, that'd be a deal
breaker. I already have tons of libraries, I don't need more
undergrowth around my dependency tree.

Searches recursively through directories by default, while
ignoring .svn, CVS and other VCS directories.

This is a feature, yet one that wouldn't pry me away from grep:
-r is there (though it distinctly feels like an
afterthought). Since ack ignores a certain set of files
and directories, its recursive capabilities where there from the
start, making it feel more seamless.

ack ignores most of the crap you don't want to search

To be precise:
- VCS directories
- blib, the Perl build directory
- backup files like foo~ and #foo#
- binary files, core dumps, etc.
Most of the time, I don't want to search those (and have to
exclude them with grep -v from find results). Of
course, this ignore-mode can be switched off with ack
(-u). All that said, it sure makes command lines shorter
(and easier to read and construct). Also, this is the first spot
where ack's Perl-centricism shows. I don't mind, even though I
prefer that other language with
P.

Ignoring .svn directories means that ack is faster than grep
for searching through trees.

Dupe. See Point 5

Lets you specify file types to search, as in --perl or
--nohtml.

While at first glance, this may seem limited, ack comes
with a plethora of definitions (45 if I counted correctly), so it's
not as perl-centric as it may seem from the example. This feature
saves command-line space (if there's such a thing), since it avoids
wild find-constructs. The docs mention that --perl also
checks the shebang line of files that don't have a suffix, but make
no mention of the other "shipped" file type recognizers doing
so.

File-filtering capabilities usable without searching with ack
-f. This lets you create lists of files of a given type.

This mostly is a consequence of the feature above. Even if it
weren't there, you could simply search for "."

Color highlighting of search results.

While I've looked upon color in shells as kinda childish for a
while, I wouldn't want to miss syntax highlighting in vim, colors
for ls (if they're not as sucky as the defaults we had for years)
or match highlighting for grep. It's really neat to see that yes,
the pattern you grepped for indeed matches what you think it does.
Especially during evolutionary construction of command lines and
shell scripts.

Uses real Perl regular expressions, not a GNU subset

Again, this doesn't bother me much. I use
egrep/grep -E all the time, anyway. And I'm no
Perl programmer, so I don't get withdrawal symptoms every time I
use another regex engine.

Allows you to specify output using Perl's special
variables

This sounds neat, yet I don't really have a use case for
it. Also, my perl-fu is weak, so I probably won't use it anyway.
Still, might be a killer feature for you.

The docs have an example:

ack '(Mr|Mr?s)\. (Smith|Jones)'
--output='$&'

Many command-line switches are the same as in GNU grep:

Specifically mentioned are -w, -c and
-l. It's always nice if you don't have to look up all the
flags every time.

Command name is 25% fewer characters to type! Save days of
free-time! Heck, it's 50% shorter compared to grep -r

Okay, now we have proof that not only the ack webmaster
can't count, he's also making up reasons for fun. Works for me.

原文： http://qtchina.tk/?q=node/182

天下有比grep实现更好、更快、更强大的grep吗

十大胜出理由:

It's blazingly fast because it only searches the stuff you want searched.

ack is pure Perl, so it runs on Windows just fine.

The standalone version uses no non-standard modules, so you can put it in your ~/bin without fear.

Searches recursively through directories by default, while ignoring .svn, CVS and other VCS directories.

ack ignores most of the crap you don't want to search

Ignoring .svn directories means that ack is faster than grep for searching through trees.

Lets you specify file types to search, as in --perl or --nohtml.

File-filtering capabilities usable without searching with ack -f. This lets you create lists of files of a given type.

Color highlighting of search results.

Uses real Perl regular expressions, not a GNU subset

Allows you to specify output using Perl's special variables

Many command-line switches are the same as in GNU grep:

Command name is 25% fewer characters to type! Save days of free-time! Heck, it's 50% shorter compared to grep -r

It's blazingly fast because it only searches the stuff you want
searched.

The standalone version uses no non-standard modules, so you can
put it in your ~/bin without fear.

Searches recursively through directories by default, while
ignoring .svn, CVS and other VCS directories.

Ignoring .svn directories means that ack is faster than grep
for searching through trees.

Lets you specify file types to search, as in --perl or
--nohtml.

File-filtering capabilities usable without searching with ack
-f. This lets you create lists of files of a given type.

Allows you to specify output using Perl's special
variables

Command name is 25% fewer characters to type! Save days of
free-time! Heck, it's 50% shorter compared to grep -r