Tag Archives: html

rfc and httpdoc: Two terminal references

I have a couple of simple but related tools today, both from the same author. At left is rfc, and at right is httpdoc.

2015-04-25-6m47421-rfc 2015-04-25-6m47421-httpdoc

I’ve known about rfc for a while, but got a reminder about httpdoc earlier this week via e-mail. Since they both have the same style and same creator, it makes sense to lump them together.

rfc, when supplied with a number or a topic line, will pull the text of that RFC from the web and dump it into your $PAGER. No fancy formatting, no color-coded document histories, just one-shot quick access to RFCs all the way back to … well, back to number 1.

The home page has a three-step process for “installing” rfc into your $HOME directory, although I daresay it could be rearranged to allow for more than just one person to use. In any case, it takes very little effort and rfc itself won’t bog down your system, seeing as it’s just a bash creation.

As an added bonus, rfc will keep its documents stored locally, so you don’t have to re-download a request. If you rely on rfc frequently, you’ll probably be interested in some of the built-in actions — like update or list, which give rfc a little more oomph, and search, which … well, you should be able to figure that one out. 🙄

httpdoc is similar, in a way. As you can see above, httpdoc becomes an offline reference tool for HTTP documentation. In the screenshot above, I only showed the 404 status code, but httpdoc can also return documentation on header fields, if you need that.

I can see where httpdoc is still being updated even in the past few days, so I expect there will be more references to come.

httpdoc is written in go, so you’ll need that installed before it will play along. There are also some environment variables that you’ll want to adjust before using it, but it’s nothing complicated.

Both of these tools might strike you as too simple to be noteworthy, but that will depend a lot on your perspective. I use things like dict on a daily basis, and even have it hot-wired for thesaurus entries as part of my .bashrc.

If you have a similar need for RFC or HTTP documentation at the command line, then you might find both of these install-worthy. Necessity is the mother of invention. Or is it the other way around … ? 😉

html2wikipedia: Converting back and forth

A long time ago I mentioned wikipedia2text, and not long after we ran past wikicurses as an alternative. In both of those cases, the goal was to show Wikipedia pages in the console, without so much congealed dreck. wikicurses in particular seemed like a good option.

But considering that much of Wikipedia is put together in a markdown-ish fashion, wouldn’t it make sense to have some sort of conversion between HTML and Wikipedia format? You could conceivably take a dull .html file and send it straight through, coded and set.

Never fear, true believer.

2015-02-17-6m47421-html2wikipedia

html2wikipedia is a free-ranging program that does very much that same thing. In that case, I grabbed kernel.org, pumped it through html2wikipedia, and got something very close to markdown.

I should mention that it’s not perfect; I wouldn’t blithely slap the results of html2wikipedia straight into a Wikipedia page, mostly because I think the formatting would be off kilter here or there.

But at first glance, it’s certainly in a workable state. The author suggests it should work in Windows too, so if you’re an avid Wiki-gnome (I am not), this might save you save time and work in the future.

Like I mentioned, I don’t see html2wikipedia in either Arch or Debian, but I don’t take the time to go through every distro out there. 😯 Whether it is or isn’t, this is one of those times where it might be quicker and easier to download the source code and build it manually than download all the other packaging materials that accompany a 59Kb executable. 🙄

unhtml: Peeling away the layers

Last week I ran across three gold-star-winner programs in a matter of days; this week I seem to have run aground on one-shot command-line tools, clients or application frameworks.

No matter. It takes all kinds. Here’s unhtml, and you can probably guess as to its goal.

2015-01-28-6m47421-unhtml

unhtml is one of probably two or three (or four or five …) html-strippers that I’ve seen since the start of this silly little site, and while it’s not the most elegant or flexible, it might be the oldest.

The man page for unhtml has a date of 1998, and if that’s its inception date, then it has done well to survive this long.

Of course, it probably has Debian to thank for that — I looked briefly for an original home page, but didn’t find anything that satisfied me. And considering that the AUR page for unhtml simply pulls the source code from Debian, it might be that it’s only around now because of the way Debian preserves code.

I can’t speak very highly of unhtml no matter its age; its only flag is a call for the version, and even the man page is exceedingly terse. You have the flexibility to pipe code through unhtml or to aim it at a file, but that’s about it.

All the same, I think it does the job, and with the exception of a few oddball tags like you see above, it did what it promised. Given the option, I might rely on something else though. Personally, that is. :\

pup: Playing fetch with HTML

Every month I export the posts from this site, grind away at the XML file, pluck out titles and links, and rearrange them to form an index page. Don’t say thank you; I do it for me as much as anyone else. I can’t remember everything I’ve covered in the past two years, and that index has saved me more than once. :\

Point being, it takes a small measure of grep, plus some rather tedious vim footwork to get everything arranged in the proper order and working.

You know what would be nice? If some tool could skim through that XML file, extract just the link and title fields, and prettify them to make my task a bit easier.

pup can do that.

2014-11-08-2sjx281-pup-01

Oh, that is so wonderful. … 🙄

In that very rudimentary example, pup took the file, the field I wanted, and sifted through for all the matching tags before dumping it into the index file.

pup will also colorize and format HTML for the sake of easy viewing, and the effect is again, oh-so wonderful.

2014-11-08-2sjx281-pup-02

That might remind you of tidyhtml, the savior of sloppy HTML coders everywhere, and you could conceivably use it that way. pup can do a lot more than that, though.

You can parse for multiple tags with pup, filter out specific IDs nestled in <span> tags, print from selected nodes and pluck out selectors. And a lot more that I don’t quite understand fully. 😳

It is possible that you could do some of what pup does with a crafty combination of things like sed or grep. Then again, pup seems confident in its HTML expertise, and the way it is designed is easy to figure out.

And for those of you who won’t deal with software more than a few months old, I can see that at the time of this writing, pup had been updated within the week. So it’s quite fresh. Try pup without fear of poisoning your system with year-old programs. 😉

csstidy: Tidy and neat

I like tidy tools. I like tools that take my mess of HTML and turn it into the stuff of legend. No doubt if I was a programmer I’d think a similar tool for perl was as cool as sliced bread. csstidy presses that same button, and makes me wish I had a reason to use it.

2014-10-08-6m47421-csstidy

As you can see there, csstidy lopped off the start of my ugly-as-sin HTML file, and sent me back a corrected, clean and spaced version, ready for editing or to be injected back into the file. It even went so far as to make some small improvements.

csstidy has a few options, which will reveal themselves to you if you invoke csstidy without a target. (Don’t try -h or --help.) Most of them are more than I would ever need to dress up my lowly web pages, but there might be something there that enthuses you.

Short of that, there’s not a lot for me to say about csstidy. I was scolded a long time ago for not coding like a girl, and I know I shouldn’t rely on tools like this if I ever want to be a rock-and-roll-web-page-designer, but hey … it works clean and neat for me. 😀

html2ps: You thought it wasn’t possible

I made a big deal the other day about converting csv files to Excel files, and from there possibly even generating HTML files as a result. Here’s one that will reverse directions on you again, converting an HTML page into PostScript. And of course, it’s called html2ps.

2014-08-12-6m47421-html2ps-gsx

Nothing much to show, of course, but the end result is undeniable. And of course it’s only one step from there to …

2014-08-12-6m47421-html2ps-acroread

Acrobat Reader?! 😯 Yuck! 😡

Sorry, I do that just because I know I’ll get an e-mail from a certain reader, reminding me that ‘Reader is evil, and there are dozens of free alternatives. It amuses me.

The point of this little soliloquy is to mention that you can dump HTML pages into PostScript files, and maintain most of the information, if not all the style. html2ps does it fairly effortlessly, and fairly quietly. Meaning your csv files can now be converted to Excel, then to HTML, and from there to PostScript.

And of course, from PostScript it’s just a quick hop to PDF format, or whatever your heart desires. Why do that? Well … because you can! :mrgreen:

As a console application, html2ps doesn’t rank very high, since it has little offer in a way of interaction except through option flags and error messages. I don’t hold any ill-will for that, but I will repeat my preference for applications that use space and color and some sort of interface.

I guess programs are like people. Some don’t care to say too much, while others will take up as much as you’ll give them. 😉

wkhtmltopdf and wkhtmltoimage: The clever twins

I was very much torn on whether or not to include wkhtmltopdf and wkhtmltoimage here, or just throw them into the leftovers at the end of the W section.

As I understand it, these twins use WebKit to convert HTML documents into either PDF format, or to generate an image. It’s a nice idea, and works great.

kmandla@6m47421: ~/downloads$ wget https://inconsolation.wordpress.com
--2014-06-29 08:18:55--  https://inconsolation.wordpress.com/
Resolving inconsolation.wordpress.com (inconsolation.wordpress.com)... 66.155.9.238, 192.0.81.250, 192.0.80.250, ...
Connecting to inconsolation.wordpress.com (inconsolation.wordpress.com)|66.155.9.238|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

    [                                                                  ] 65,344       354KB/s   in 0.2s   

2014-06-29 08:18:56 (354 KB/s) - ‘index.html’ saved [65344]

kmandla@6m47421: ~/downloads$ wkhtmltopdf index.html index.html.pdf
libpng warning: iCCP: Not recognizing known sRGB profile that has been edited
Loading page (1/2)
Printing pages (2/2)                                               
Done                                                           
Exit with code 1 due to network error: ProtocolInvalidOperationError

My hesitation comes in the fact that, as best I can tell, neither program can run without qtwebkit or X (although the Arch package page suggests Xvfb will work). So I’m fudging the definition of “text-based program” again, for a tool that doesn’t show much “text-based” to start with.

But this wouldn’t be the first time. We endured a barrage of pdf-conversion programs back in February, and some of those are no more graphical than these two. So I guess it’s okay.

What’s the output look like? Not bad at all.

2014-06-29-6m47421-wkhtmltoimage-index.html

PDFs are about the same quality, but of course split into pages and probably bigger.

Both wkhtmltopdf and its sister have an immense number of flags and options that allow an impressive level of control over how the pages are rendered and the quality you achieve. So don’t dismiss either as a fire-and-forget conversion tool; the opportunity is there to fine-tune the experience.

Just remember that you will more than likely incur the need for X if you want to try either one. :\

tidyhtml: Erasing your coding sins

Woe betide the uninitiated first arriving in the realm of HTML coding. Your work is shoddy, your style is crude, and your mismatched closing font tags belie the awful depths of your ignorance. Return to your hovel, peasant. Meditate on your sins and return when you have abandoned the wickedness of your ways.

Know ye first that this is foul. Slatternly. Slovenly.

2014-05-27-jk7h5f1-tidyhtml-02

Such petulance will be erased by repeatedly striking a rod across the palms. The HTML elite do not suffer such insolence.

This instead is the first step on the path of enlightenment:

2014-05-27-jk7h5f1-tidyhtml-03

Beauty. Elegance. Symmetry. Balance. Two-space indents. Only years of practice, asceticism and adherence to the principles of coding like a girl can produce such delicate, exquisite HTML.

The cultured elite can produce code of such perfection with ease. They do it daily. It is as natural for them as an eagle soaring on the wind.

For others, there are no shortcuts. Only toil and tedium. No easy path. No tricks or gimmi

2014-05-26-jk7h5f1-tidyhtml-01

What devilry is this?! Begone, you monstrosity! There are no short routes to achieving nirvana! Your sorcery will not go unpunished! The Yama kings gnash their teeth in anticipation of your arrival in the afterworld. 👿

Blasphemy!

mp3report: We’re not even close to finished yet

The beat goes on. Among esoteric and erstwhile intriguing mp3 accessories is mp3report, which — again, as you might have divined — is really pretty cool.

2014-01-22-g60-125nr-mp3report

Fully known as the MP3 Report Generator, this nifty little tool spits out a classy table in HTML, showing all the mp3s within its reach.

All very clever, you say, but I have over 3Tb of mixed and moshed audio files, arranged by genre, artist, album, release, edition, quality and embedded image data, through a series of 4000 folders.

Apparently not a problem, since mp3report can recurse through directories, and carries support for version 1 and 2 id3 tags.

All very clever, you say, but I would like to see more detail in a report.

In that case, I would suggest checking out the documentation, which allows you to customize the report results, adding or subtracting as much data as your little heart desires.

All very clever, you say, but it’s … it’s … it’s old.

Yes, well, it’s true, it technically dates back to 2000, but I could find no rough spots, except when I tried to force-feed it a few ogg files. 🙄

But considering it does most of its work in textmode and only outputs to a file, what exactly are you missing in terminal evolution over the last 14 years?

I found mp3report in the Debian-Ubuntu-Mint chain gang, but not in Arch or AUR. If it’s not in your distro I believe it could just be downloaded and run; I’m hopeless when it comes to perl, but it looks like it only needs mp3::info.

Next up, believe it or not … more mp3-related tools! 😯 🙄

markdown: A great idea in action

I don’t code much. In fact, I don’t code at all, by most yardsticks.

I have been accused of making web pages in the past though, and while it is sort of fun to flow through the creative process, it is — in my opinion — a somewhat tedious process of matching brackets and hunting down transposed characters.

Which makes me a prime candidate for markdown. I have almost no need for the complications of the latest and greatest Saviour-Of-The-Web Protocol, I hate tracking down teeny tiny mismatches, and I’d much rather be spending my time searching for a baseball bat than searching for a lost bracket.

Our guest, in action:

2014-01-03-lv-r1fz6-markdown-01 2014-01-03-lv-r1fz6-markdown-02

So much nicer, cleaner and prettier. Between this and tidyhtml, my days of sifting through HTML files looking for a stray backslash are over.

Of course, the truth of the matter is that this type (style?) of tool is the de facto standard these days.

WordPress uses something similar when it converts these posts. Wikipedia does much the same thing. I’ve seen other sites offer the same style of conversion.

Can’t say as I blame them. There is a beauty and genius in cleverly and cleanly delivered code. But for those of us who are neither beautiful nor geniuses, there’s markdown. 😉