Tag Archives: html

rfc and httpdoc: Two terminal references

I have a couple of simple but related tools today, both from the same author. At left is rfc, and at right is httpdoc.

2015-04-25-6m47421-rfc 2015-04-25-6m47421-httpdoc

I’ve known about rfc for a while, but got a reminder about httpdoc earlier this week via e-mail. Since they both have the same style and same creator, it makes sense to lump them together.

rfc, when supplied with a number or a topic line, will pull the text of that RFC from the web and dump it into your $PAGER. No fancy formatting, no color-coded document histories, just one-shot quick access to RFCs all the way back to … well, back to number 1.

The home page has a three-step process for “installing” rfc into your $HOME directory, although I daresay it could be rearranged to allow for more than just one person to use. In any case, it takes very little effort and rfc itself won’t bog down your system, seeing as it’s just a bash creation.

As an added bonus, rfc will keep its documents stored locally, so you don’t have to re-download a request. If you rely on rfc frequently, you’ll probably be interested in some of the built-in actions — like update or list, which give rfc a little more oomph, and search, which … well, you should be able to figure that one out. 🙄

httpdoc is similar, in a way. As you can see above, httpdoc becomes an offline reference tool for HTTP documentation. In the screenshot above, I only showed the 404 status code, but httpdoc can also return documentation on header fields, if you need that.

I can see where httpdoc is still being updated even in the past few days, so I expect there will be more references to come.

httpdoc is written in go, so you’ll need that installed before it will play along. There are also some environment variables that you’ll want to adjust before using it, but it’s nothing complicated.

Both of these tools might strike you as too simple to be noteworthy, but that will depend a lot on your perspective. I use things like dict on a daily basis, and even have it hot-wired for thesaurus entries as part of my .bashrc.

If you have a similar need for RFC or HTTP documentation at the command line, then you might find both of these install-worthy. Necessity is the mother of invention. Or is it the other way around … ? 😉

html2wikipedia: Converting back and forth

A long time ago I mentioned wikipedia2text, and not long after we ran past wikicurses as an alternative. In both of those cases, the goal was to show Wikipedia pages in the console, without so much congealed dreck. wikicurses in particular seemed like a good option.

But considering that much of Wikipedia is put together in a markdown-ish fashion, wouldn’t it make sense to have some sort of conversion between HTML and Wikipedia format? You could conceivably take a dull .html file and send it straight through, coded and set.

Never fear, true believer.


html2wikipedia is a free-ranging program that does very much that same thing. In that case, I grabbed kernel.org, pumped it through html2wikipedia, and got something very close to markdown.

I should mention that it’s not perfect; I wouldn’t blithely slap the results of html2wikipedia straight into a Wikipedia page, mostly because I think the formatting would be off kilter here or there.

But at first glance, it’s certainly in a workable state. The author suggests it should work in Windows too, so if you’re an avid Wiki-gnome (I am not), this might save you save time and work in the future.

Like I mentioned, I don’t see html2wikipedia in either Arch or Debian, but I don’t take the time to go through every distro out there. 😯 Whether it is or isn’t, this is one of those times where it might be quicker and easier to download the source code and build it manually than download all the other packaging materials that accompany a 59Kb executable. 🙄

unhtml: Peeling away the layers

Last week I ran across three gold-star-winner programs in a matter of days; this week I seem to have run aground on one-shot command-line tools, clients or application frameworks.

No matter. It takes all kinds. Here’s unhtml, and you can probably guess as to its goal.


unhtml is one of probably two or three (or four or five …) html-strippers that I’ve seen since the start of this silly little site, and while it’s not the most elegant or flexible, it might be the oldest.

The man page for unhtml has a date of 1998, and if that’s its inception date, then it has done well to survive this long.

Of course, it probably has Debian to thank for that — I looked briefly for an original home page, but didn’t find anything that satisfied me. And considering that the AUR page for unhtml simply pulls the source code from Debian, it might be that it’s only around now because of the way Debian preserves code.

I can’t speak very highly of unhtml no matter its age; its only flag is a call for the version, and even the man page is exceedingly terse. You have the flexibility to pipe code through unhtml or to aim it at a file, but that’s about it.

All the same, I think it does the job, and with the exception of a few oddball tags like you see above, it did what it promised. Given the option, I might rely on something else though. Personally, that is. :\

pup: Playing fetch with HTML

Every month I export the posts from this site, grind away at the XML file, pluck out titles and links, and rearrange them to form an index page. Don’t say thank you; I do it for me as much as anyone else. I can’t remember everything I’ve covered in the past two years, and that index has saved me more than once. :\

Point being, it takes a small measure of grep, plus some rather tedious vim footwork to get everything arranged in the proper order and working.

You know what would be nice? If some tool could skim through that XML file, extract just the link and title fields, and prettify them to make my task a bit easier.

pup can do that.


Oh, that is so wonderful. … 🙄

In that very rudimentary example, pup took the file, the field I wanted, and sifted through for all the matching tags before dumping it into the index file.

pup will also colorize and format HTML for the sake of easy viewing, and the effect is again, oh-so wonderful.


That might remind you of tidyhtml, the savior of sloppy HTML coders everywhere, and you could conceivably use it that way. pup can do a lot more than that, though.

You can parse for multiple tags with pup, filter out specific IDs nestled in <span> tags, print from selected nodes and pluck out selectors. And a lot more that I don’t quite understand fully. 😳

It is possible that you could do some of what pup does with a crafty combination of things like sed or grep. Then again, pup seems confident in its HTML expertise, and the way it is designed is easy to figure out.

And for those of you who won’t deal with software more than a few months old, I can see that at the time of this writing, pup had been updated within the week. So it’s quite fresh. Try pup without fear of poisoning your system with year-old programs. 😉

csstidy: Tidy and neat

I like tidy tools. I like tools that take my mess of HTML and turn it into the stuff of legend. No doubt if I was a programmer I’d think a similar tool for perl was as cool as sliced bread. csstidy presses that same button, and makes me wish I had a reason to use it.


As you can see there, csstidy lopped off the start of my ugly-as-sin HTML file, and sent me back a corrected, clean and spaced version, ready for editing or to be injected back into the file. It even went so far as to make some small improvements.

csstidy has a few options, which will reveal themselves to you if you invoke csstidy without a target. (Don’t try -h or --help.) Most of them are more than I would ever need to dress up my lowly web pages, but there might be something there that enthuses you.

Short of that, there’s not a lot for me to say about csstidy. I was scolded a long time ago for not coding like a girl, and I know I shouldn’t rely on tools like this if I ever want to be a rock-and-roll-web-page-designer, but hey … it works clean and neat for me. 😀

html2ps: You thought it wasn’t possible

I made a big deal the other day about converting csv files to Excel files, and from there possibly even generating HTML files as a result. Here’s one that will reverse directions on you again, converting an HTML page into PostScript. And of course, it’s called html2ps.


Nothing much to show, of course, but the end result is undeniable. And of course it’s only one step from there to …


Acrobat Reader?! 😯 Yuck! 😡

Sorry, I do that just because I know I’ll get an e-mail from a certain reader, reminding me that ‘Reader is evil, and there are dozens of free alternatives. It amuses me.

The point of this little soliloquy is to mention that you can dump HTML pages into PostScript files, and maintain most of the information, if not all the style. html2ps does it fairly effortlessly, and fairly quietly. Meaning your csv files can now be converted to Excel, then to HTML, and from there to PostScript.

And of course, from PostScript it’s just a quick hop to PDF format, or whatever your heart desires. Why do that? Well … because you can! :mrgreen:

As a console application, html2ps doesn’t rank very high, since it has little offer in a way of interaction except through option flags and error messages. I don’t hold any ill-will for that, but I will repeat my preference for applications that use space and color and some sort of interface.

I guess programs are like people. Some don’t care to say too much, while others will take up as much as you’ll give them. 😉

wkhtmltopdf and wkhtmltoimage: The clever twins

I was very much torn on whether or not to include wkhtmltopdf and wkhtmltoimage here, or just throw them into the leftovers at the end of the W section.

As I understand it, these twins use WebKit to convert HTML documents into either PDF format, or to generate an image. It’s a nice idea, and works great.

kmandla@6m47421: ~/downloads$ wget https://inconsolation.wordpress.com
--2014-06-29 08:18:55--  https://inconsolation.wordpress.com/
Resolving inconsolation.wordpress.com (inconsolation.wordpress.com)...,,, ...
Connecting to inconsolation.wordpress.com (inconsolation.wordpress.com)||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

    [                                                                  ] 65,344       354KB/s   in 0.2s   

2014-06-29 08:18:56 (354 KB/s) - ‘index.html’ saved [65344]

kmandla@6m47421: ~/downloads$ wkhtmltopdf index.html index.html.pdf
libpng warning: iCCP: Not recognizing known sRGB profile that has been edited
Loading page (1/2)
Printing pages (2/2)                                               
Exit with code 1 due to network error: ProtocolInvalidOperationError

My hesitation comes in the fact that, as best I can tell, neither program can run without qtwebkit or X (although the Arch package page suggests Xvfb will work). So I’m fudging the definition of “text-based program” again, for a tool that doesn’t show much “text-based” to start with.

But this wouldn’t be the first time. We endured a barrage of pdf-conversion programs back in February, and some of those are no more graphical than these two. So I guess it’s okay.

What’s the output look like? Not bad at all.


PDFs are about the same quality, but of course split into pages and probably bigger.

Both wkhtmltopdf and its sister have an immense number of flags and options that allow an impressive level of control over how the pages are rendered and the quality you achieve. So don’t dismiss either as a fire-and-forget conversion tool; the opportunity is there to fine-tune the experience.

Just remember that you will more than likely incur the need for X if you want to try either one. :\