datamash: Statistical tools for raw numerical data

I like tools that do simple things in obvious ways. I like tools that have color too, but sometimes I’m willing to forgive that, and award points on cleverness.

Here’s datamash, a GNU tool, doing something fairly straightforward.


Forgive my rotten formatting. I was trying to line up the sums under their appropriate columns, but I ran out of patience with it. What you should see there are two arbitrary columns of numbers, and datamash summing both on the fly.

It’s not a terribly earth-shattering function, but it does make a lot of otherwise tricky number functions accessible to flat numeric data files. So you don’t have to import into a spreadsheet to get a sum, a mean, a max, a min or whatever.

And you don’t have to rely on statistical packages like r or octave to do some simple budget analysis. 😉

datamash also does some rather clever text formatting tricks, which might be reason enough to keep it installed. Observe:


So you can feed datamash a series of columns vertically, and it will run them out horizontally. Omigoshthatissocool.

datamash is in AUR and testing/unstable. I don’t know why it’s not in the standard repos for either distro, except that it may be too new. The development pages for datamash suggest it started about a year and half ago, but saw most of its activity within the past six months. Give it time.

A lot of what datamash can do — particularly the higher statistical functions — are much more than I would need on a day-to-day basis. But if you keep something like r or octave on board for regular data analysis, you might consider datamash as a lighter alternative.

2 thoughts on “datamash: Statistical tools for raw numerical data

  1. Pingback: paste: What I thought join would be | Inconsolation

  2. Pingback: Bonus: 2014 in review | Inconsolation

Comments are closed.