Tag Archives: statistics

genstats: Quick statistical reports

I can see the usefulness of genstats almost immediately. Given a text file, you can pull out a frequency report with very little effort.

2015-02-20-6m47421-genstats

And since genstats handles its input and output in much the same way as cut, you should be able to get the information you want within just a few minutes of compiling it.

It may not be appealing to you to track word frequency in a flat text file, but consider what you could do with genstats and your average log file. That is the author’s suggestion, and the screenshot on the home page gives a good example of some advanced genstats usage.

My only complaint about genstats is so trivial that I’m embarrassed to mention it. In a file with 12 lines, and with four of the words in the second field being the same, the display should read “33%,” not “0.33%.” The latter is a third of a percent, while the former is a third of the whole. Or perhaps there is another calculation at work there, that I’m not sure about.

I get the picture though. And having said that, I suppose it’s worth mentioning that gentstats doesn’t give you a lot of control over the output. As best I can tell, that is the only style of report you’ll see from genstats.

genstats appears to be a free-roaming program; it’s not in Arch/AUR or Debian. So if you want something quick and easy to package, this might be one. πŸ˜‰

datamash: Statistical tools for raw numerical data

I like tools that do simple things in obvious ways. I like tools that have color too, but sometimes I’m willing to forgive that, and award points on cleverness.

Here’s datamash, a GNU tool, doing something fairly straightforward.

2014-10-02-6m47421-datamash-01

Forgive my rotten formatting. I was trying to line up the sums under their appropriate columns, but I ran out of patience with it. What you should see there are two arbitrary columns of numbers, and datamash summing both on the fly.

It’s not a terribly earth-shattering function, but it does make a lot of otherwise tricky number functions accessible to flat numeric data files. So you don’t have to import into a spreadsheet to get a sum, a mean, a max, a min or whatever.

And you don’t have to rely on statistical packages like r or octave to do some simple budget analysis. πŸ˜‰

datamash also does some rather clever text formatting tricks, which might be reason enough to keep it installed. Observe:

2014-10-02-6m47421-datamash-02

So you can feed datamash a series of columns vertically, and it will run them out horizontally. Omigoshthatissocool.

datamash is in AUR and testing/unstable. I don’t know why it’s not in the standard repos for either distro, except that it may be too new. The development pages for datamash suggest it started about a year and half ago, but saw most of its activity within the past six months. Give it time.

A lot of what datamash can do — particularly the higher statistical functions — are much more than I would need on a day-to-day basis. But if you keep something like r or octave on board for regular data analysis, you might consider datamash as a lighter alternative.

mange: The program with a delicate name

Every time I find a csv tool of some sort, I end up wishing I had more chances to work with csv files. The first program for today is a great example, even if I have to be careful how I phrase these next few sentences. This is mange:

2014-09-21-6m47421-mange-01 2014-09-21-6m47421-mange-02 2014-09-21-6m47421-mange-03

Before I am hounded by rabid animal rights activists, just let me say I didn’t pick the name. I can’t find any sort of explanation as to why “mange” is the title, unless there are non-English and non-other-languages-I-speak references. If you know, let me in on the secret.

And I’d like to know, because mange is a pretty good program. I don’t come across many csv editors — viewers, yes, and utilities, yes. Even spreadsheets for the console. But now that I think of it, not many editors. Finding mange is a lucky event.

mange works in a straightforward fashion — arrow keys to navigate cells, enter to edit them. mange will stick to an editor mode and fall cell by cell as you edit, which makes data entry much easier.

mange also has the sense enough to display and keep a header row, as you can see in the images above. And it seems to handle terminal width and four-way scrolling without too much effort.

I did see a couple of screen corruption problems, usually when editing a long field on a wide spreadsheet that was pressed up against the rightmost edge. I have a feeling there might be a small tweak to get the screen to refresh properly after editing a cell that stretches over the screen width.

mange has a couple of features I didn’t get to, just because they’re tied to the statistical package r, and the time it would take me to learn to work them together would delay this post until about Thursday. So take it on faith that mange can feed data into r, and generate plots and graphs.

Your best bet for getting started with mange is the man page, where most of the controls and the editing-command-navigation modes are explained. It won’t take long.

I’m sad to see that the last update to mange was around three years ago, which makes me wonder if the list of coming attractions in the README file is ever going to materialize. I guess that remains to be seen. :\

st and st: One for numbers, one for … numbers and letters?

I’ve got two programs named “st” on the list, which caused no end of confusion for about an hour today. Oddly, I can’t show much of either one.

The first one comes, yet again, from the suckless.org gang, because everything in the suckless stable starts with the letter S.

Just kidding. In this case st is a simple terminal and unfortunately every rendition — which is quite a few according to the AUR — failed to build for me. Most of those seemed to be from git, so maybe I just grabbed it at the wrong time.

I’m going to keep trying though, mostly because suckless.org usually does things right, and I plan on trying out some of their other toys too.

The other st is a miniature statistics tool, capable of skimming through numbers in a text file and providing some fundamental statistical information — a minimum, maximum, mean, standard deviation and a few other things.

I could build st but I’m not a perl pro, and it only gave me error messages when I tried to kick it into action. I have a feeling I hadn’t put all st’s little important parts in the right places.

The second st isn’t in AUR or Debian, so if you decide to try it out, there might be a little work involved. Not that a little work ever hurt anybody. … πŸ˜‰

ss: A quick dump of socket statistics

There’s a nifty little socket reader lumped in with iproute2, and if you didn’t think to look for it, you probably wouldn’t know it was there. Here’s ss:

2014-05-08-6m47421-ss

I’ve prodded ss a little more than is necessary there, just to keep the results on the screen and lined up neatly; that’s what head and column are for. ss will do that for you for free, but then the results wouldn’t make a good screenshot. πŸ™„

ss can read network or X server info; the man page lists quite a few options for each, as well as some examples to get you started.

ss will also allow you to filter stats by TCP states, match ports or addresses, and a slew of other nifty tricks. The man page compares ss to netstat; I’d probably lump them into the same category too, just out of naivetΓ©.

And yet strangely, I’ve not heard as much about ss. Perhaps this is another one of those Linux secrets I keep stumbling into. … 😐