Tag Archives: filter

html-xml-utils: A sweet suite

I’m in favor of any tool that can strip away the manure that masquerades as XML files. I have no earthly idea why anyone would use that style or arrangement voluntarily, especially when simpler and cleaner arrangements are so much … cleaner and simpler to work with. :\

So if you hand me a suite of 10 or 12 tools that scrape away at XML and HTML files, I’m like a kid on Christmas Day. Here’s html-xml-utils, which is just a toy box full of goodies. Which unfortunately means I can only show one or two.

hxnormalize, I imagine, improves readability for pages with frequent links. Go from this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <title>Simple page</title>


<h1>A simple HTML page</h1>

<p>This is a very simple HTML page, made from scratch for the purpose of testing some <a href="http://www.w3.org/Tools/HTML-XML-utils/man1/" target="_blank">tools</a> in the <a href="http://www.w3.org/Tools/HTML-XML-utils/" target="_blank">html-xml-utils</a> package.


to this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "">

    <title>Simple page</title>

    <h1>A simple HTML page</h1>

    <p>This is a very simple HTML page, made from scratch for the
      purpose of testing some <a
      target="_blank">tools</a> in the <a
      target="_blank">html-xml-utils</a> package.</p>

Not only does every line break at a link, which makes them easy to spot, but some closing tags have been corrected, because I gave hxnormalize the -x flag.

I can re-use my example with hxprintlinks, which will number every link in the document, and add a reference list at the bottom of the page.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <title>Simple page</title>
<h1>A simple HTML page</h1>
<p>This is a very simple HTML page, made from scratch for the purpose of testing some <a href="http://www.w3.org/Tools/HTML-XML-utils/man1/" target="_blank">[1]tools</a> in the <a href="http://www.w3.org/Tools/HTML-XML-utils/" target="_blank">[2]html-xml-utils</a> package.


Of course, pipe hxnormalize into hxprintlinks, and some of that will be cleaned up a little. πŸ˜‰

If you remember xidel or xmlstarlet, you might remember how it’s possible to pull single elements out of an XML file, for further editing. hxextract can do that, and here are the results of hxextract command .config/openbox/rc.xml on my system:

kmandla@6m47421: ~/downloads$ hxextract command rc.xml 
<command>gmrun</command><command>urxvtc -e alpine -d 0</command><command>urxvtc -e wicd-curses</command><command>urxvtc -g 142x60 -e /home/kmandla/.scripts/mc.sh</command><command>/home/kmandla/.scripts/cleanup.sh</command><command>urxvtc -e htop</command><command>urxvtc -e alsamixer</command><command>/home/kmandla/.scripts/volume.sh</command><command>urxvtc -e alsamixer -D equal</command><command>urxvtc -g 142x60 -e elinks</command><command>/home/kmandla/.scripts/browser.sh</command><command>urxvtc -g 35x9 -e tty-clock -x -t -B</command><command>urxvtc -g 24x12 -e clockywock</command><command>urxvtc -e vim</command><command>urxvtc -e sc</command><command>urxvtc -e wyrd</command><command>urxvtc -e tudu</command><command>urxvtc -e mocp</command><command>pidgin</command><command>urxvtc -g 80x24 -title rhapsody -e /home/kmandla/.scripts/chatnews.sh</command><command>urxvtc</command>

Not pretty, but a step forward in terms of finding miscreant keyboard commands in my rc.xml file. 😐

There is a lot more — a lot more — available in html-xml-utils that I just don’t have the time and resources to touch on. Look for tools that will convert from XML to asc files, tools that will build tables of contents and bibliographies for entire trees of files, and even a few that transpose tables or just pull out links. That one, hxwls, is mighty clever. …

I leave it to you to explore the rest of that suite. If you’re like me and can only scratch your head a the ascent of XML as a data format, this will be fun for you to play with.

Oh, and I almost forgot: Theodore gets credit for mentioning this one. Thanks, Theodore. πŸ˜‰

ack: A grep for programmers

I am vastly undequalified to speak about ack, since the home page makes it abundantly clear that it’s a perl-based grep-like tool intended for programmers.

I am not a programmer, unless an emergency arises, and then there would be a better chance of success by employing a blind, one-armed, drunk monkey to smack at a keyboard for a few hours. I am quite confident of that.

But ack is on my list and I’d feel slightly guilty if I didn’t include it, since it was relayed to me by email from a reader (who asked to remain nameless). So here is my best attempt at ack, and the wizardry it reportedly can perform.


Let’s see. Color? Check. Easily readable output? Check. Searches entire tree? Check. Returns results with file and line number annotated? Check.

Well then, that’s my whole list.

As I understand it (because again, this is not a tool intended for the peons, like me) ack is supposedly faster or more complete than the traditional grep, and carries defaults specific to searching trees of source code files.

Not that grep can’t do those things. …


Only that ack supposedly does them better. Or at least that’s the impression I get from the Web site.

Like I said, I’m not a programmer so I don’t feel qualified to pick between the two. I have no preference for either, and I’m likely to stick with grep just because grep appears on my Arch system (and some others) by default.

I can be lazy that way. πŸ™„

grc: More colorizing for terminal output

I really should have sifted through the titles I had waiting back in June, when I finished the alphabetical sequence, and plucked out all the colorizers. I’m still finding them, months later. Like sand in my shoes after going to the beach. πŸ™„

Here’s grc, which is a lightweight colorizer for common terminal commands.

2014-08-23-6m47421-grc-01 2014-08-23-6m47421-grc-02

grc works on the same premise as some others we’ve seen; there is a shortlist of commands it knows and can colorize, but beyond that you’ll need to roll up your sleeves.

For what I see in the Arch version, which installed color profiles to /usr/share/grc/, grc can handle configure, cvs, diff, dig, esperanto (?), gcc, ifconfig, irclog, ldap, log, ls, mount, mtr, netstat, ping, proftpd, ps, traceroute and wdiff by default. So as you can see, it isn’t limited to commands.

I was going to question the logic of a few of those additions — mostly ls — because they already have color options. But this is oh-so-lovely. …


Which just goes to show that there’s nothing that has been done, which can’t be possibly done better. Nice work.

If grc encounters something it doesn’t know, you get the program’s natural output, uncolored and as best I can tell, unchanged. So you might be safe using grc within an alias, and just calling for it as part of your daily routine. I like the idea of seeing that grc ls -lha all the time. … πŸ˜€

Diving into grc won’t cost you much in disk space either. The tarball is barely 25K, and even when packaged, you’re talking about possibly 27K, for example, for the Debian version.

What’s clear at this point is, I need to find a way to distinguish between programs like grc, which colorize specific output, and color filters that pluck out specific words and add color to them. Not like it would matter though, since I’d be the only one making that distinction. … :\

acoc: More filter-based colorizing for the terminal

I had some personal issues that needed attention yesterday, so I didn’t get the chance to post anything. I’ll make up for it today though; here’s acoc to start us off right.


My “week” of output colorizers never really ends. πŸ˜‰ acoc works in a similar fashion to colorwrapper and rainbow, by using command-specific filters to predict and assign color to a program’s output.

That means acoc has specific programs that it knows how to colorize. It’s not a grep-ish tool like colorex; instead it’s looking for specific output from specific programs. In some cases that’s prefereable, in others it’s not. It will depend on what you need a colorizing tool for.

Right now, acoc’s list of recognized commands is somewhat small. There are about 30 here, and perhaps six to eight of those are distro-specific. There’s also an added complication that some of the commands don’t seem to produce any color with acoc, which might mean it’s looking for something different from what my Arch system can produce.

acoc allows for custom filters, and the syntax appears to be fairly straightforward. So if there’s something that you need and acoc doesn’t support it out-of-the-box, you can build it in a few minutes.

Altogether, acoc reminds me of colour more than anything, since it too will probably require a little attention before it meets all your needs. acoc definitely has more filters available by default, but still not as many as colorwrapper.

Now let’s see if I can find something that’s not a text colorization tool. … O_o

colorex: I might as well include them all

This randomized revisitation of the C section has been a veritable tapdance through colorization tools. If I had known I had so many, I would have devoted a whole week to them.

I have one more that starts with C, and then I’ll return to the entire alphabet, as determined by shuf and ls. And since I’m on a roll with colorizing tools, I might as well finish up with colorex.


A lot of the “shortcomings” I mentioned in clog are resolved in colorex. For example, it will, as you can see, colorize specific chunks or strings of letters, for as many times as they appear on a line. It can also bounce between colors even when they are exactly adjacent.

And most of colorex’s syntax is at the command line, so you can declare a color as you build the filter command. It also adds a blink code, underlining effects and bolding … only some of which is visible in a virtual console, but you get the idea.

clog had a very straightforward configuration style, but colorex will require you to be a little more adept at the command line. Expect to escape some of your more complex searches and/or regexes to make sure colorex understands what you want.

As an added touch, colorex has a randomization command, which will either surprise you with its results or drive you batty with the spattered color effects.


Not since toilet has there been such a commanding use of color on my lowly X41. …

I should mention that the random effects only seem to work on a full line. And out of fairness, I should mention that colorex doesn’t have the same degree of control over color — like red on purple text — that you can get quite easily with clog. Perhaps that will be in future versions.

In spite of those shortcomings, I’m more inclined to adopt colorex than clog, just because it feels like there’s a stronger sense of control with the former than with the latter. It may not offer the same range of controls and it might be a little more challenging to configure, but it definitely picks up what clog stepped over.

clog: Custom color for logs

I seem to be awash of colorizing programs as I chip away at the C section. This is clog, which the home page describes as a colorized log tail utility.


That’s mostly true. It does colorize text and it does apply more to logs than straight text files. It lacks a feature or two that would make it a peer of tail.

Mostly, it lacks the ability to strain out the last lines of a log. By default, clog dumps everything to STDOUT, and ignores flags like -10 that are native to tail. A bit of a misnomer, then.

You have two options in your .clogrc file: either highlight an entire line, or highlight matching sequences of letters. (You can also suppress lines, which might be useful.) In that way you could use clog as a kind of stylized grep, and add a few more color options.

Some shortcomings: When you highlight a string of letters or a word, you will only see highlighting on the first occurrence. As far as I could tell, there was no way to highlight the same sequence multiple times in the same line. Several different colors on the same line will work, but only the first match for each color.

Furthermore, if you ask for full-line highlighting and the line matches more than one filter, you’ll only see one highlighting. I couldn’t make clog split highlighting. You can, however, highlight letter sequences overtop of line highlighting.

Those are shortcomings, but only if you’re trying to make clog behave like a strict color filter, and not a log colorizer. Think of clog like ccze, not like colout (or highlighter or pygments).

On the plus side, clog’s syntax for screening colors is terrifically simple. Step through the first three or four examples and you’ll have multicolor log displays in no time. And clog supplies date and time functions in case you want to stamp the output with either of those.

clog is a good tool, but not one I plan on adopting. I rarely peek at my logs anyway, and clog doesn’t handle enough grep-like colorizing to take over that role. If its abilities expand, I might consider it.

colour: Something to build on to

I’ll show you a snapshot of colour, and you can decide if you’re willing to build upon it.


colour is remarkably similar to colorwrapper, but will probably require a little more effort to bring up to the same level of flexibility. Both programs use profiles to filter program output and insert color.

But while colorwrapper (and rainbow for that matter) seems to have a few more profiles available to it, colour has only two that I see — and both of them are intended for nodetool, which I think is somehow related to Apache Cassandra. That’s waaay beyond my scope.

But colour comes with a couple of example files by default, so what you see in the screenshot is just those examples pumped through their respective profiles. I’m afraid I can’t show much more than that.

And since colour doesn’t seem to have any other configurations, I’m out of ways to use it. I know I should probably start making my own, but I’m short on time and only halfway interested in building up a collection of configurations for colour.

And probably you’ll be in the same boat. If you find colour promising, you’ll likely spend some time drafting your own configurations, for the programs you like and use.

It’s your decision — colorwrapper or rainbow, the more complete utilities … or colour, the underdog with potential. πŸ˜‰

colorwrapper: Quite obviously, the author had me in mind

Boring. Dull. Uninteresting.

2014-07-17-6m47421-date-pstree 2014-07-17-6m47421-netstat

Exciting! Interesting! Readable!

2014-07-17-6m47421-colorwrapper-01 2014-07-17-6m47421-colorwrapper-02

I get razzed occasionally about my preference for color in … well, in just about anything that passes through the console. I must not be the only one, for as many colorize tools as there are.

colorwrapper — instead of screening program output for colorizable (is that a word?) strings, or requiring you to select a color scheme from a list — uses an executable preset profile to catch and colorize the results of a command. (I’m avoiding the word “wrapper” here, just for clarity.)

I don’t know how to explain that properly. But here’s a snippet from the profile that produced the colorized version of pstree, above.

path /bin:/usr/bin:/sbin:/usr/sbin:
ifnarg -G:-U
base cyan
digit green+:default
match white:default |
match white:default )
match white:default init

I trimmed away a little bit for the sake of space. But I think you get the idea — the profile tells colorwrapper what to pluck out, and where the colors appear.

By default, colorwrapper comes with a huge list of profiles for commonplace Unix-y commands, and if you take a little time you should be able to either edit those to your particular color scheme, or to produce some of your own. For what I’ve seen of colorwrapper, the markdown is fairly intuitive.

This is something we’ve seen before, in programs like rainbow. Compare what rainbow does with ping, to what colorwrapper does:


Not everything is green and cyan with colorwrapper, by the way. It depends on the tool and the profile. πŸ˜‰

A few caveats, because there always are some: colorwrapper is pushing toward five years without an apparent update; I mention that out of a sense of obligation and because I know there are some folks who won’t touch a program older than a few weeks, regardless of how well it works. Whatever. ❓

That does suggest though that some profiles (I didn’t check them all) might not be current with what the tool can do. The author talks about colorizing top, for example, and as we all know, top has its own built-in color scheme system. We all know that, right? RIGHT?! πŸ‘Ώ

Second, I am sure you noticed that there was a slight difference between the output of straight pstree and colorwrapper’s version of pstree. You might want to step through the profiles you like best, to make sure the output you prefer is what appears with colorwrapper.

Also, if you want to give colorwrapper a test run before committing, notice that the make command has an option for localinstall which will drop everything into $HOME/.cw … which makes it easier to get rid of, when you realize K.Mandla is stark raving mad. πŸ™„

I’ll leave the rest for you to discover. The home page for colorwrapper is particularly sparse, but the bundled documentation is quite good. Watch for headers and footers too, which throw an added color element into the equation. Have fun! πŸ™‚

xidel: Taking away the pain of XML

I avoid XML like the plague. I am not a programmer, so configuration files and software that use XML are anathema to me. And where I have to use it, like in Openbox’s rc.xml and menu.xml files, I look for just about any way out of it.

xidel describes itself as a tool that will “download and extract data from HTML/XML pages.” The home page supplies quite a few examples of that.

2014-07-04-6m47421-xidel-01 2014-07-04-6m47421-xidel-02

Yes, xidel can retrieve web pages, and yes, xidel can extract the data that’s embedded in them, so you don’t have to pick through it to find what you need.

But it can also sift through configuration files and pull out, for example, the programs executed in an Openbox menu.xml file.


For someone like me, who considers XML to be cruel and unusual punishment, that is a very nifty trick. The next time I need to switch window managers and want to convert the list of keybindings I know, xidel will be there to expedite it.

At this point you might ask, “What’s the benefit of this over an HTML stripper, perhaps like dehtml?”

Mostly in its flexibility, I would answer. dehtml yanks the core text out of an HTML page, but xidel allows you to filter or search through a file, and control the output.

I’m definitely no expert, but it only took me about 20 minutes with a few examples to get xidel working how I wanted. If you need to wrangle XML pages on a regular basis (and I feel bad for you if you do), I’m sure you can get xidel to work on your project in a matter of minutes.

Spend a little time with the parser documentation, and you’ll see how you can send extracted data into variables, loop through documents for specific tags, and otherwise make your life sooo much easier.

I like it when a program makes my life easier. πŸ˜€

rainbow: Color filtering in a different way

I’ve seen quite a few text colorizers in the past year or so, the most recent being pygments, but even as far back as highlighter or colout.

rainbow seems to work a little differently from those.


rainbow has quite a few presets to correspond to common commands and applications. As you can see, you can tack on the command to rainbow, and the results are “converted” from dull black and white to something a bit more … rainbow-y. πŸ˜€

You can make your own configurations to match the applications you use most, and voila! Your boring old text app is now in vibrant living color.

This sounds like my kind of program. :mrgreen:

Unfortunately (there had to be an “unfortunately” coming, didn’t there?), it seems some of the best effects are intended for terminal emulators. Some effects were lost or mangled at the framebuffer, and just didn’t line up quite right.

And there seems to be a slight discrepancy between what rainbow can accept as a command, and what it reads as flags for its own interpretation. In other words, sometimes flags for the target application were being seized by rainbow, which triggered errors. I’m still working on ways around that.

And while I appreciate the length and breadth of available sample configurations, I have to wonder if it’s really necessary to have some of them.


top, when injected into rainbow’s sample config, isn’t nearly as impressive as top on its own.

So it’s a bit caveat emptor: For some things, it’s sheer genius. For others, not so much.

In the mean time, I intend to work out a configuration for nethack. The world needs a colorized version of nethack. πŸ™„