the_silver_searcher: Intergalactic searcher smackdown

Uh-oh. I can already sense some friction between the_silver_searcher and ack, which you might remember as a sort of replacement for grep.

2014-10-30-6m47421-ag-ack 2014-10-30-6m47421-ag-ag 2014-10-30-6m47421-ag-grep

Left-to-right, ack, the_silver_searcher and grep. Please pay attention to the results of time.

Because it seems to me that both the_silver_searcher (which I will henceforth abbreviate by its executable ag, because that name is a little cumbersome to type) and ack pound their chests on their home pages, and suggest they are faster than grep. And ag definitely holds itself out as faster than ack. And I could swear ack claimed a speed difference over grep.

And yet in subsequent tests, I get results a lot like this …

grep ack ag
time … “Time” list.txt real 0m0.025s
user 0m0.010s
sys 0m0.003s
real 0m0.387s
user 0m0.280s
sys 0m0.010s
real 0m0.051s
user 0m0.017s
sys 0m0.007s

And grep is regularly faster than ack or ag. Q.E.D. :\

For the record, list.txt is a 75K list of television and movie titles that I scraped off the Internet just for this searching smackdown. No funky characters or encoding issues that I could see, and nothing but simple ASCII characters.

Maybe it’s my imagination and none of these programs attempted to be faster than another. And it’s important to remember that my tests are particularly gummy. I have an unnatural knack for screwing up some of the simplest setups, and an unscientific, unprofessional, unstatistical freestyle search-to-the-death would be no exception.

All that aside, ag strikes me as every bit as useful or friendly as ack, and seems to follow a lot of the same flag and output conventions. In fact, some of them appear identical. 😕

I suppose that means your choice of search tools will depend mostly on … your choice. In the mean time I’m going to turn up the heat on all three of these and check their results when I turn them loose on time {grep,ack,ag} main /lib/ … 😯 That will separate the men from the boys. 👿

P.S.: Thanks to riptor for the tip. 😉

11 thoughts on “the_silver_searcher: Intergalactic searcher smackdown

    1. K.Mandla Post author

      I thought that might be affecting things, and the numbers in the table are third or fourth attempts. I wouldn’t lend too much credibility to my tests either, it was a terrible way to see how they perform. 😉

  1. thameera

    As for me, what drew me to ack (instead of grep) was the ease. For example, not searching in .git or .svn folders, and lots of other sane defaults for programmers.

    1. K.Mandla Post author

      True. I don’t program much (actually, never, if I can help it … I don’t want to embarrass myself) so features like that don’t really shine for me. To each his own! 😀

  2. Pingback: Links 31/10/2014: Rubin Leaves Google, Neelie Kroes Ends EU Career | Techrights

  3. Geoff Greer

    Hi there. I’m the author of ag. I commend you for putting claims to the test, but I feel the need to address your conclusions.

    Ack and ag are optimized for searching entire codebases. The main reason you’re not seeing much of a performance difference is because your benchmark is searching a single file. That means many of ag’s tricks (multiple threads, ignoring binary files, obeying gitignore/hgignore/etc) are unavailable.

    You’ll also notice that ag printed line numbers, while grep didn’t. This may surprise you, but counting line numbers is an expensive operation. It requires re-scanning the entire file to count-up newlines. That means ag is reading the file twice in your benchmark, while grep only scans it once. Although it hurts performance, I think the trade-off is worthwhile. Line numbers are a useful default for a code searching tool.

    If you want a more typical benchmark, I suggest grabbing a large codebase (say… PHP: https://github.com/php/php-src), building it, and then trying a recursive search. Below are my results on a 2013 MacBook Air. Times are medians of 5 runs, so the fs cache is hot.

    ggreer@carbon:~/code/php-src% time grep -r time_t .

    grep -r time_t . 10.11s user 1.80s system 76% cpu 15.489 total

    ggreer@carbon:~/code/php-src% time ack time_t

    ack time_t 5.93s user 0.69s system 98% cpu 6.730 total

    ggreer@carbon:~/code/php-src% time ag time_t

    ag time_t 1.35s user 0.52s system 171% cpu 1.090 total

    These numbers are impressive when you notice the size of the directory:

    ggreer@carbon:~/code/php-src% du -sh
    354M .

    Benchmarking is great for verifying performance claims, but it’s important to choose a benchmark that accurately reflects the typical use case. Otherwise, one risks being misled.

    1. K.Mandla Post author

      Thanks Geoff, that’s a much better analysis of how they differ. I don’t get many opportunities to work with — or search through — huge trees of code, so it’s good to see a proper breakdown on how all three differ. Cheers, and thanks again! 🙂

  4. Pingback: cscope: The code navigator | Inconsolation

  5. Pingback: Bonus: A dozen more remainders | Inconsolation

Comments are closed.