Tag Archives: wiki

html2wikipedia: Converting back and forth

A long time ago I mentioned wikipedia2text, and not long after we ran past wikicurses as an alternative. In both of those cases, the goal was to show Wikipedia pages in the console, without so much congealed dreck. wikicurses in particular seemed like a good option.

But considering that much of Wikipedia is put together in a markdown-ish fashion, wouldn’t it make sense to have some sort of conversion between HTML and Wikipedia format? You could conceivably take a dull .html file and send it straight through, coded and set.

Never fear, true believer.

2015-02-17-6m47421-html2wikipedia

html2wikipedia is a free-ranging program that does very much that same thing. In that case, I grabbed kernel.org, pumped it through html2wikipedia, and got something very close to markdown.

I should mention that it’s not perfect; I wouldn’t blithely slap the results of html2wikipedia straight into a Wikipedia page, mostly because I think the formatting would be off kilter here or there.

But at first glance, it’s certainly in a workable state. The author suggests it should work in Windows too, so if you’re an avid Wiki-gnome (I am not), this might save you save time and work in the future.

Like I mentioned, I don’t see html2wikipedia in either Arch or Debian, but I don’t take the time to go through every distro out there. 😯 Whether it is or isn’t, this is one of those times where it might be quicker and easier to download the source code and build it manually than download all the other packaging materials that accompany a 59Kb executable. 🙄

wiki-stream: Less than six degrees of separation

I didn’t intend for there to be two Wikipedia-ish tools on the same day, but one good wiki-related utility deserves another. Or in this case, deserves a gimmick.

Josh Hartigan‘s wiki-stream (executable as wikistream) tells you what you probably already know about Wikipedia: that the longer you spend daydreaming on the site, the more likely you are to find yourself traveling to oddball locations.

2014-12-29-jsgqk71-wiki-stream

You might not think it possible to travel from “Linux” to “physiology” in such a brief adventure, but apparently there are some tangential relationships that will lead you there.

I don’t think Josh would mind if I said out loud that wiki-stream has no real function other than to show the links that link between links, and how they spread out over the web of knowledge. Best I can tell, it takes no flags, doesn’t have much in the way of error trapping, and can blunder into logical circles at times.

But it’s kind of fun to watch.

wiki-stream is in neither Arch nor AUR nor Debian, most likely because it’s only about a month old. You can install it with npm, which might be slightly bewildering since the Arch version placed a symlink to the executable at ~/node_modules/.bin. I’m sure you can correct that if you know much about nodejs.

Now the trick is to somehow jam wiki-stream into wikicurses, and create the ultimate text-based toy for time-wasting. … :\

wikicurses: Information, in brief

If you remember back to wikipedia2text from a couple of months ago, you might have seen where ids1024 left a note about wikicurses, which intends to do something similar.

2014-12-29-jsgqk71-wikicurses-linux

Ordinarily I use most as a $PAGER and it might look like most is working there, but it’s not. That’s the “bundled” pager, with the title of the wikipedia page at the top, and the body text formatted down the space of the terminal.

wikicurses has a few features that I like in particular. Color, of course, and the screen layout are good. I like that the title of the page is placed at the topmost point, and in a fixed position. Score points for all that.

Further, wikicurses can access (to the best of my knowledge) just about any MediaWiki site, and has hotkeys to show a table of contents, or to bookmark pages. Most navigation is vi-style, but you can use arrow keys and page up/down rather than the HJKL-etc. keys.

Pressing “o” gives you a popup search box, and pressing tab while in that search box will complete a term — which is a very nice touch. There are a few other commands, accessible mostly through :+term formats, much like you’d see in vi. Press “q” to exit.

From the command line you can feed wikicurses a search term or a link. You can also jump straight to a particular feed — like Picture of the Day or whatever the site offers. If you hit a disambiguation page, you have the option to select a target and move to that page, sort of like you see here.

2014-12-29-jsgqk71-wikicurses-disambiguation

That’s a very nice way to solve the issue.

There are a couple of things that wikicurses might seem to lack. First, short of re-searching a term, there’s no real way to navigate forward or back through pages. Perhaps that is by design, since adding that might make wikicurses more of an Internet browser than just a data-access tool.

It does make things a little clumsy, particularly if you’ve “navigated” to the wrong page and just want to work back to correct your mistake.

In the same way, pulling page from Wikipedia and displaying it in wikicurses removes any links that were otherwise available. So if you’re tracking family histories or tracing the relationships between evil corporate entities, you’ll have to search, read, then search again, then read again, then search again, then. …

But again, if you’re after a tool to navigate the site, you should probably look into something different. As best I can tell, wikicurses is intended as a one-shot page reader, and not a full-fledged browser, so limiting its scope might be the best idea.

There are a couple of other minor points I would suggest. wikicurses might offer the option to use your $PAGER, rather than its built-in format. I say that mostly because there are minor fillips that a pager might offer — like, for example, page counts or text searching — that wikicurses doesn’t approach.

But wikicurses is a definite step up from wikipedia2text. And since wikicurses seems to know its focus and wisely doesn’t step too far beyond it, it’s worth keeping around for one-shot searches or for specialized wikis that don’t warrant full-scale browser searches. Or for times like nowadays, when half of Wikipedia’s display is commandeered by a plea for contributions. … 🙄 😡

wikipedia2text: Looking well-preserved, thanks to Debian

Conversion scripts are always good tools to know about, even if I don’t need them frequently enough to keep them installed. wikipedia2text is one that, in spite of its age, still seems sharp.

wikipedia2text

Technically, the script’s name was just “wiki,” and technically the source link listed on the home page is dead. Late in 2005 though, it made its way into Debian, and is still in a source tarball there. So it seems that it is possible to achieve immortality — all you need to do is somehow find your way into Debian. 😉

The script works fine outside of Debian; just decompress it and go. You’ll need to install perl-uri if you’re using Arch. But if you’re in something Debian-ish, it should pull in liburi-perl as a dependency when you install it.

One thing that’s not mentioned outright in the blog post but does appear in the help flag: wikipedia2text will need one of about a half-dozen text-based browsers, to do the actual fetching of the page. I used lynx because … well, just because. Which leads me to this second screenshot.

lynx

At this point I’m wondering if wikipedia2text is an improvement over what a text-based browser can show. After all, lynx is showing multiple colors, uses the full terminal width, and I have the option of following links.

What’s more, wikipedia2text — strangely — offers a flag to display its results in a browser, and in my case it was possible to send the output back into lynx. So if you’re keeping track, I ran a script that called a browser to retrieve a page, then rerouted that page back into the browser for my perusal. 😕 :\

In the absence of any other instruction, wikipedia2text will default to your $PAGER, which I like because mine is set to most, and I prefer that over almost anything else. Perhaps oddly though, if I ask specifically for pager output, wikipedia2text will arbitrarily commandeer less with no option to change that. Without any instruction for a pager, the output is $PAGER. But with the instruction it jumps to less? That’s also a little confusing. …

Furthermore, I couldn’t get the options for color output to work. And I don’t see a flag or an option to expand the text width beyond what you see in the screenshot, which I believe to be around 80 columns. That alone is almost a dealbreaker for me.

I suppose if I were just looking for a pure text extraction of a page, wikipedia2text has a niche. And it’s definitely worth mentioning that wikipedia2text has a text filtering option with color, which makes for a grep-like effect.

So all in all, wikipedia2text may have a slim focus that you find useful. I might pass it by as an artifact from almost 10 years ago — mostly on the grounds that it has some odd default behavior, and I fail to see a benefit of using this over lynx (or another text-based browser) by itself. 😐

cliwiki: Needing attention

I try to be as honest as possible when I look over software. If it seems like I’m too-often enthusiastic about programs, it’s probably because the ones that were less than gratifying were cast aside out of frustration.

I’ll list cliwiki though, since I sense it has a little potential, and a tool that pulls pages from Wikipedia is worth pursuing.

2014-07-26-lv-c5551-cliwiki

I don’t recall where I got the link to cliwiki, but my assessment is that it needs more attention. Here’s why:

  • The few arguments cliwiki claims to support produce nothing. The home page suggests potd, featured and onthisday, but none of those generates any different results.
  • cliwiki requires you answer a prompt to trigger the search. You can’t tack on the topic as a command line argument.
  • cliwiki incorporates no pager, and because of the prompt, it’s exceedingly difficult to page or redirect the output of very long pages.
  • Because of those shortcomings, I’ve tried to funnel cliwiki into text files with the standard > mark or a pipe symbol, but the prompt confounds things. I get half-formed pages or worse, lockups. For what it’s worth I’ve also tried to echo my search topic into a text file and use < and xargs in one fashion or another, with no real success.

All of this means to me that cliwiki is only half-formed, and partially useful on pages that don’t overflow your terminal window. I daresay on very shallow screens it will be nigh-on useless.

cliwiki does do some things right; I like that it lists links at the end of the page, and adds links to images in the text. But cliwiki still needs a bit more work before it’s functional, either to the degree it promises or to a point of usability.

vimwiki: The reason, in due season

I promised I’d let on to why I’ve relied on vim all these years, and the time has finally come. As luck would have it, the only reason I put up with that cumbersome, unfriendly, cryptic and sadistic text editor is because of vimwiki.

2014-06-17-6m47421-vimwiki

I know, I’m a bit of a hypocrite for clinging to a particular text editor for years, all because of one silly plugin that it supports.

I can’t rationalize that, except to say that vimwiki has vastly simplified the task of managing The List — with a thousand program titles, each (supposedly O_o ) with a one-line synopsis and a link to a home page, plus some notes. I couldn’t imagine trying to handle that in a flat text file, or something like hnb. It would not function nearly as cleanly.

vimwiki as evolved over the years I have used it, and I’m comfortable with it in its current rendition. Press enter on a word to convert it to a link, press enter on a link to jump to its page. Press backspace to work your way back through the breadcrumb trail.

Master that — within the convolutions of vim, of course — and you’ve gotten everything that you need to keep hyperlinked text files organized.

vimwiki also builds calendars and tables, exports to different formats and handles some markdown-ish syntax, although it’s incredibly rare that I need those features.

vimwiki will require some settings in your .vimrc that might prove confusing; the conceallevel in particular might make URLs contract and that was irritating for the first few days. Over the years I’ve learned to live with that.

vimwiki is smart enough to carry a few housekeeping features too, though. It’s a simple three-key command to delete or rename a page, which are both crucial functions in my project. And it’s smart enough to riffle through every other page, and correct links therein.

I know it’s not much of an endorsement, but vimwiki is probably one of the few note-taking tools that I immediately embraced, as soon as I saw how easy and clean it worked. The fact that I was willing to overlook all the fatal eccentricities of vim should be an indicator of how good I think it is.

After years of dedication and service, at last, a well-deserved K.Mandla gold star for vimwiki: ⭐ 😉

One last note: This is the only vim plugin I’ll discuss, mostly because there are literally hundreds out there. Some are good and some are bad, but mostly you know what you need and like. There’s no need for me to traipse through each one. Go on your own little adventure. 😉