Tag Archives: html

highlight: Converting code for display

Next is highlight (not to be confused with highlighter) which changes source code to formatted text.


It’s not a code converter so much as a formatter, as I understand it. I doubt it’s going to change your old Fortran code to PET Basic. 🙄

But as you can see, it plucks through C and formats it neatly in HTML, for display on web pages and so forth.

highlight has a lot of other features I just couldn’t show right now. And the home page has a huge list of languages it can interpret and convert.

This is definitely one of those tools that I would have more use for if I did much coding. As it is though, it’s a curiosity, but not something I can really put to use.

You, on the other hand. … 🙂

diffh: Make your diff easier to see

This one is similar to dailystrips, in that it generates an HTML page as its main output. But this time, it’s working in tandem with diff, to make things a little easier on the eyes.

Definitely a picture is worth a thousand words here.


Not that there’s a lot to point out there, but with the -u flag in diff, piped through diffh, you can come up with a visually clear representation of what diff is trying to tell you.

I’m not a programmer so maybe there are better, more obvious ways to show diff visually, but I can see where a couple of large files would be easier to understand this way.

And that’s all I can think of to say. 😐

dailystrips: Gasping for air

Just for the record, I don’t expect a program 10 years out of development to sing along without a care in the world.

On the other hand, I do make use of software that pushes possibly as far back as the 1980s, not counting core programs that have been around since the dawn of technology.

Point being, 10 years without attention is not too far gone.

For dailystrips though, it’s not the underlying software that changed, it’s the targets of the software.


I’m being cryptic, and I apologize. See, dailystrips is a great idea — a simple perl script that seeks out the day’s comic strips that you like, downloads the images and lumps them all together on a simple HTML page.

The more you think about it, the more brilliant it is: Rather than wander from site to site loading up all the garbage that comes with those comics, dailystrips peels out the image you actually want to see, and puts it on a vanilla page that loads in seconds. Fractions of seconds.

The problem is, as time has gone on, those sites have either changed or rearranged their content. And like I hinted, dailystrips has gone without updates since (apparently) 2003.

Long and short, for every four or five comics I tried to use with daily strips, I got one, maybe two that still worked. You can see in the screenshot that two out of four there were working, at best.

It depends on the host and probably the comic too. If you’ve got time on your hands I suppose you could pick through and see which ones don’t work, but the home page brags that dailystrips — in its prime — supported more than 550 comics.

You’d really have your work cut out for you.

Personally I’m a fan of any application or program that does the work of yanking actual content out of the swirling pool of muck that obscures the Internet.

The fact that this one is gasping for air makes the state of affairs all the more … disheartening. 😦

P.S., To get this rolling, you’ll need your distro’s version of perl’s lwp-protocol-https. In Arch, that’s perl-lwp-protocol-https and in Debian it should be liblwp-protocol-https-perl.

html2text: And then there were three

I’ll go ahead and close off these three applications by showing html2text.


It’s true, there’s not a lot that html2text does that I haven’t already attributed to either dehtml or vilistextum.

One thing I have noticed, after using all three programs in similar situations, is that html2text tends to be kinder to tables.

I don’t know if that’s something I can quantify, but if you give it a spin, you might get different results, depending on the software.

But that’s always the case, isn’t it? 😉

dehtml: Another scraping tool

A long time ago I talked about vilistextum, and in passing noted both dehtml and html2text as alternatives.


Today is dehtml. Tomorrow … well, let’s pretend it’s a surprise. 🙂

I suppose there’s nothing particularly unique in pulling out text from html coded documents.

So it probably shouldn’t surprise you that there are three tools vying for the job.

Choosing one or another will depend on your preference for the way they approach the task, I suppose.

dehtml tends to be my favorite, only because it seems to handle the job cleanly and without too many leftovers.

(For what it’s worth, all three tools tend to leave in some code, depending on how complex the page is.)

And now, just to be fair, here’s the obligatory ultra-minimalist web browser screenshot.


But I don’t recommend surfing that way. 😉

archmage: Prettifying chm files

File converters ride that fine line between console applications and tools, in my book.

They’re useful of course, and occasionally serve as important tools for other, larger programs.

But as applications … they’re a little slim.

archmage does a decent job recasting chm files as either pdf files or nicely arranged html documents. There’s nothing to see while archmage is at work; the final product looks like this, depending on how much graphical leeway you give it.

2013-03-10-solo-2150-archmage-01 2013-03-10-l3-e7548-archmage-02

I mentioned pdf output, but to be honest, I didn’t see that from archmage. The option is there and it seems to work, but also needs htmldoc (?) and ran for an awful long time with no product.

If it works for you, let me know and I’ll try again.

Beyond that … there isn’t much archmage does. And since there’s not much of an interface to speak of, it’s pretty much a one-shot application.

On the other hand, it does a darned good job prettifying your chm files. 😐

vilistextum: Stripping out the code

Here’s a tool that you might find useful, even if it doesn’t have much in the way of an “interface.”

2013-02-28-solo-2150-vilistextum-01 2013-02-28-solo-2150-vilistextum-02

vilistextum (which doesn’t seem to follow any naming convention 😉 ) does a very clean job of stripping code from html pages. If I understand it correctly, it was intended to compliment mutt, the holy grail of text-based mail readers.

On its own though, you can use it to pull the text from web pages, or display only the goods from anything hidden in dense html.

vilistextum isn’t the only tool like this — dehtml is another, html2text is out there too — and I’m sure there are a few more that do much the same thing. vilistextum seems to have nice clean output though, and will reformat to fit terminal widths or UTF-8.

And so, with a certain sense of ceremony, I give you a clumsy but effective text-based browser.


curl inconsolation.wordpress.com | vilistextum - - | most

Enjoy. 🙂