Uh-oh. I can already sense some friction between the_silver_searcher and ack, which you might remember as a sort of replacement for grep.
Left-to-right, ack, the_silver_searcher and grep. Please pay attention to the results of time
.
Because it seems to me that both the_silver_searcher (which I will henceforth abbreviate by its executable ag
, because that name is a little cumbersome to type) and ack pound their chests on their home pages, and suggest they are faster than grep. And ag definitely holds itself out as faster than ack. And I could swear ack claimed a speed difference over grep.
And yet in subsequent tests, I get results a lot like this …
grep | ack | ag | |
time … “Time” list.txt | real 0m0.025s user 0m0.010s sys 0m0.003s |
real 0m0.387s user 0m0.280s sys 0m0.010s |
real 0m0.051s user 0m0.017s sys 0m0.007s |
And grep is regularly faster than ack or ag. Q.E.D.
For the record, list.txt is a 75K list of television and movie titles that I scraped off the Internet just for this searching smackdown. No funky characters or encoding issues that I could see, and nothing but simple ASCII characters.
Maybe it’s my imagination and none of these programs attempted to be faster than another. And it’s important to remember that my tests are particularly gummy. I have an unnatural knack for screwing up some of the simplest setups, and an unscientific, unprofessional, unstatistical freestyle search-to-the-death would be no exception.
All that aside, ag strikes me as every bit as useful or friendly as ack, and seems to follow a lot of the same flag and output conventions. In fact, some of them appear identical. 😕
I suppose that means your choice of search tools will depend mostly on … your choice. In the mean time I’m going to turn up the heat on all three of these and check their results when I turn them loose on time {grep,ack,ag} main /lib/
… 😯 That will separate the men from the boys. 👿
P.S.: Thanks to riptor for the tip. 😉
One issue that often skews benchmarks is the filesystem cache. Try running the same command multiple times (3-5 should be enough) until the result stabilizes (a “hot cache” benchmark). Or you could clear the cache before each benchmark (a “cold cache” benchmark) by using these instructions: http://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system .
I thought that might be affecting things, and the numbers in the table are third or fourth attempts. I wouldn’t lend too much credibility to my tests either, it was a terrible way to see how they perform. 😉
As for me, what drew me to ack (instead of grep) was the ease. For example, not searching in .git or .svn folders, and lots of other sane defaults for programmers.
True. I don’t program much (actually, never, if I can help it … I don’t want to embarrass myself) so features like that don’t really shine for me. To each his own! 😀
Pingback: Links 31/10/2014: Rubin Leaves Google, Neelie Kroes Ends EU Career | Techrights
A similar tool is The Platinum Searcher
https://github.com/monochromegane/the_platinum_searcher
Oh no, not another one. … 😯 😉 Thanks! I’ll add it to the list. Cheers!
Hi there. I’m the author of ag. I commend you for putting claims to the test, but I feel the need to address your conclusions.
Ack and ag are optimized for searching entire codebases. The main reason you’re not seeing much of a performance difference is because your benchmark is searching a single file. That means many of ag’s tricks (multiple threads, ignoring binary files, obeying gitignore/hgignore/etc) are unavailable.
You’ll also notice that ag printed line numbers, while grep didn’t. This may surprise you, but counting line numbers is an expensive operation. It requires re-scanning the entire file to count-up newlines. That means ag is reading the file twice in your benchmark, while grep only scans it once. Although it hurts performance, I think the trade-off is worthwhile. Line numbers are a useful default for a code searching tool.
If you want a more typical benchmark, I suggest grabbing a large codebase (say… PHP: https://github.com/php/php-src), building it, and then trying a recursive search. Below are my results on a 2013 MacBook Air. Times are medians of 5 runs, so the fs cache is hot.
ggreer@carbon:~/code/php-src% time grep -r time_t .
…
grep -r time_t . 10.11s user 1.80s system 76% cpu 15.489 total
ggreer@carbon:~/code/php-src% time ack time_t
…
ack time_t 5.93s user 0.69s system 98% cpu 6.730 total
ggreer@carbon:~/code/php-src% time ag time_t
…
ag time_t 1.35s user 0.52s system 171% cpu 1.090 total
These numbers are impressive when you notice the size of the directory:
ggreer@carbon:~/code/php-src% du -sh
354M .
Benchmarking is great for verifying performance claims, but it’s important to choose a benchmark that accurately reflects the typical use case. Otherwise, one risks being misled.
Thanks Geoff, that’s a much better analysis of how they differ. I don’t get many opportunities to work with — or search through — huge trees of code, so it’s good to see a proper breakdown on how all three differ. Cheers, and thanks again! 🙂
Pingback: cscope: The code navigator | Inconsolation
Pingback: Bonus: A dozen more remainders | Inconsolation