Tag Archives: format

numfmt: Saving you a trip to Google

Sometimes people accuse me of misnumbering things when I type “4Mhz” or “64Kb,” because there is a difference in numbering and how the abbreviations are used. Which is true.

In my defense, I’ll say that I adopted the uppercase-lowercase style as a concession to the AP Stylebook, which (last time I checked) doesn’t rule on things like mebibytes. I have other reasons, which are rather boring (skip to the end if you want to know).

Regardless, “Mb” seems to cover the idea for me, and I can’t recall ever running into problems, except perhaps when talking about labeling actual, exact, bit-for-bit drive storage space. Which is a scam, anyway. πŸ‘Ώ

But the next time you want to scold your friendly neighborhood Linux blogger, you can critique the writer’s integrity with numfmt. Another of the coreutils gems, you can get almost any proper formatting for a number in as simple a command as this:

kmandla@6m47421: ~$ numfmt --to=iec-i 64738
64Ki

Or …

kmandla@6m47421: ~$ numfmt --to=iec 64738
64K

Or …

kmandla@6m47421: ~$ numfmt --to=si 64738
65K

And so forth, and so on. numfmt will also convert back from abbreviated formats, in such as this:

kmandla@6m47421: ~$ numfmt --from=si 65K
65000

kmandla@6m47421: ~$ numfmt --from=iec 65K
66560

kmandla@6m47421: ~$ numfmt --from=iec-i 65Ki
66560

A little boring, but useful when I need to convert 59.75Gi back into a full string of numbers. …

kmandla@6m47421: ~$ numfmt --from=iec-i 59.75Gi
64156073984

Remember this, the next time you need to partition a drive with fdisk. It might just save you a trip to Google. πŸ˜‰

P.S.: For those who really need to know … my rationale on deviating from pure AP Style on MB and MHz and so forth, is that the words are spelled out in full as “megabyte” and “megahertz.” We don’t write “megaByte” or “megaHertz,” or for that matter, split the words as “mega byte” or “mega Hertz.” Since the abbreviations “MB” and “MHz” are deviations from other abbreviations like “km” for kilometer and “kHz” for kilohertz, my mind says “Mb” and “Mhz” aren’t polluting any rules on uppercase and lowercase as they should appear in print. It seems these are determined on a case-by-case basis of what looks right. Of course, you could say that about a lot of things in the AP Stylebook. …

paste: What I thought join would be

I just showed paste in the last post but I haven’t mentioned it on this site. I probably should have reversed the order here, since paste is one of the last coreutils toys I was holding back from the leftover slurry.

paste does what I thought join should do — concatenate two separate data files, line by line. Again, this is something easier done than said:

2014-10-02-6m47421-paste-01

That looks almost identical to what join does; here’s where they differ:

2014-10-02-6m47421-paste-02

paste at least hints that there were omissions in one column or the other. join, on the other hand, skipped over those items, and demanded they be sorted. :\

Of course, seeing paste and join side-by-side makes a lot more sense in why they’re named as they are. join links together corresponding entries according to a sorted order. paste just forces them together, even when something is missing.

I’d still like to see paste insert a tab where the first list is missing a line, but at least now I get the picture.

I handed datamash a small gold star for transposing its output, and paste has a similar function in its -s flag.

2014-10-02-6m47421-paste-03

So you can run out vertical lists horizontally, if you are so inclined.

I’m quickly running out of coreutils titles, and I do so enjoy learning about them. Perhaps one day I shall start a blog that only steps through that and the util-linux package, and looks at each tool one at a time. … Nah, who’s got time for that? πŸ˜•

column: With oddly satisfying results

I’m going to stick to the C section for a day or two, and hopefully whittle down the disproportionate number of titles I have listed there. I’m not sure why, but it seems that between October of last year and now, I managed to collect 30+ titles in the C section alone. And it hasn’t helped that ls vimwiki/ | shuf -n1 kept pulling stuff from outside that band.

column is on the list, and is something I use on a weekly, if not daily basis. Here’s column, in its most daring escapade yet: πŸ™„

2014-07-28-lv-c5551-column

And the attraction should be immediate. If we’re going to talk about tools that improve readability, column needs to be at the top of the list. Even when combined with yesterday’s deluge of colorificated diff tools, column makes things better.

2014-07-27-lv-c5551-wdiff-colordiff-column

It’s not always perfect, but I have a feeling that the escape sequences used to trigger colors might interfere with the final results. No major loss.

The point is that column, by default, and especially when used in conjunction with the -t flag, is going to be a real improvement for scanning lists of data and finding corresponding entries. Keep that in mind next time you’re working with csv files.

column takes very few options, and in general they are only affect how the rows and columns are generated, or determining display width. You won’t find a whole lot of frills with column, even if it does amazing work.

I know what you’re thinking at this point: You’re imagining that a utility as simple and cool as this could only come from one place — coreutils. Surprise: This appears in util-linux in Arch, and bsdutils in Debian. 😯 O_o Why? IDK. IANADD. πŸ˜€

xlhtml and ppthtml: Not only Word needs converting

It’s true, there is more to life than just .doc files — there are .xls and .ppt files to worry about.

xlhtml and its cohort ppthtml take some of the fun of working with proprietary closed-source office file formats, and whirl them around until they make HTML pages.

2014-07-04-6m47421-ppthtml 2014-07-04-6m47421-xlhtml

I have to admit that I touched up the HTML on one file, because the original spreadsheet had a black background and the final product wasn’t quite visible.

But other than that small point, which was specific to the sample spreadsheet I downloaded, what you see is what xlhtml will get you. No muss, no fuss, piece of cake, easy as pie.

There have been quite a few file format converters in this little adventure, and I daresay there may be even more before the journey is through.

Where some other converters took the low road and just extracted the information held in xls or ppt files, xlhtml and Co. do the right thing and convert them into proper HTML pages, where the table format comes through perfectly.

Unfortunately, xlhtml and ppthtml fall down in the same way as many of their brethren — years out of date, there’s not much aside from turn-of-the-century formats they can handle.

So if you’re looking to convert that Microsoft Commercial Office Professional Gold Pack Business Edition 2016 file into a nifty HTML page to astound your boss, it may take a different tool.

Which makes me wonder … why do most of these file converter tools seem to fall flat after 2006 or so? For some reason the jump to docx with Word 2007 seems to be the jumping-off point for most file converters — not all, but most.

I daresay the shift to docx occurred alongside the rise of OpenOffice.org and the golden years of Ubuntu, as well as the ascent of multiprocessor home machines. Call me crazy, but I bet the prevalence of higher-powered machines and better graphical software support in free software systems inadvertently led to a decreased need for text-based document conversion tools.

When I write K.Mandla’s Big Book of Text-based Linux Software History, I’ll devote a whole chapter to it. I promise. πŸ˜‰

par: After pandoc, par for the course

Since we just got done converting documents, it only makes sense to jump into formatting them too.

MrFrood mentioned par waaay back when we were in the F section, and so as promised. …

2014-02-21-lv-r1fz6-par

The home page for par gives some credit to fmt and I suppose if you read about its history, it makes sense.

And as you can see above, or on the home page, it produces a similar effect. The downside is … it is considerably far more complex than fmt.

The man page and help flags might give you some guidance, but what little I could get done up there was shamelessly stolen from the examples.

For what I can tell, learning to use par effectively is going to take two things: a lot of time and enough of a demand to keep your focus on it.

Unfortunately, I’m just flitting past par and don’t have enough of a workload to make regular use of it. I have a feeling though, that if fmt is interesting but not powerful enough for what you need, par is the answer.

o3read and odt2txt: Converters, by the numbers

O-section titles are few and far between; I think I have about 15 total, and already I’ve discarded one for requiring a particular brand of cell phone. πŸ™„

On the menu today is o3read, with a quick bounce forward to odt2txt. Both are file converters.

2014-02-15-lv-r1fz6-o3read 2014-02-15-lv-r1fz6-odt2txt

odt2txt behaves much in the way you might expect: Tack on the file and it will draw out the content and send it to STDOUT. It works with .odt and .sxw files, making it a good choice for sending the contents to a new destination.

On the other hand, o3read is a little more cumbersome. A holdover from the Siag Office suite, it requires you to unzip the target .sxw file, pluck out the content.xml file, then pipe the results on through.

Otherwise it will sit and stare at you — or worse trickle out a spiel of garbage. πŸ˜• For a while there, I thought it was broken.

o3read itself is less to my liking than its accompanying tools, o3totxt and o3tohtml. o3totxt is useful in sending raw text to the console (odt2txt style), and o3tohtml, as you might have inferred, couches everything in HTML. Which is useful.

Of course, you have to manhandle both in the same way as o3read, to get the results you want. This is where your razor-sharp scripting skills come together to create that singular command to encapsulate all three o3read tools. πŸ˜€

Or you could just stick with odt2txt. Up to you. πŸ˜‰

highlight: Converting code for display

Next is highlight (not to be confused with highlighter) which changes source code to formatted text.

2013-11-23-lv-r1fz6-highlight

It’s not a code converter so much as a formatter, as I understand it. I doubt it’s going to change your old Fortran code to PET Basic. πŸ™„

But as you can see, it plucks through C and formats it neatly in HTML, for display on web pages and so forth.

highlight has a lot of other features I just couldn’t show right now. And the home page has a huge list of languages it can interpret and convert.

This is definitely one of those tools that I would have more use for if I did much coding. As it is though, it’s a curiosity, but not something I can really put to use.

You, on the other hand. … πŸ™‚