httrack: The website copier

I could have used httrack about four months ago, when I wanted to mirror a fairly large website for my offline perusal, and lacked a proper tool. I tried bew and another graphical webcrawler, and even fell back on wget, but nothing was 100 percent successful. I ended up mass-downloading most of what I needed, and it wasn’t a pretty sight.

httrack might have saved me the trouble, and probably would have done a much better job.

2014-11-04-2sjx281-httrack

httrack is more than capable of patiently stepping through the architecture of a website, and bringing you a copy of everything there.

But on top of that, httrack, like a lot of good network-based software, has so many options, it can be a bit bewildering. If you open the --help flag, be prepared. It’s a couple hundred lines long at least.

For example, there are flags to save files in a cache, to skip files that are available locally, four options for logging, flags to create an index, screen for particular types of files (ie., HTML only, etc.), set directions for following directories (only up or only down), disable bandwidth abuse limits, cap the number of links, continue a broken-off mirror attempt, enter an interactive mode, confine the search to a single site, and dozens upon dozens more.

Most of those other ones are far and beyond anything I would ever need, let alone understand. If you know what they mean, you might find them quite useful. And maybe best of all, httrack has about a dozen shortcuts for common flag combinations, meaning you can ask for just --spider, instead of typing out -p0C0I0t.

The first time you use it, I’d recommend just httrack though, since by itself the command steps you through a simple wizard, letting you pick options menu-style. If you’ve never used httrack before, it’s a good introduction, and will finish with the command line needed to recall the same options you set. Very helpful, if you’re like me and you learn by example.šŸ™‚

Once you get the hang of it, try things like httrack http://example.com -W%v2, which will give you a nice fullscreen progress display and prompt you if it finds any eccentricities. Quite useful.

I’m going to go back now and re-mirror the site I mangled back in July, and hope I can get a cleaner, more complete copy.šŸ˜‰

7 thoughts on “httrack: The website copier

  1. Pingback: Links 7/11/2014: War Thunder on GNU/Linux, KaOS ISO 2014.11 | Techrights

  2. Ander

    You need to use setfont to establish a 8×16 font for the tty fist. Also, I don’t think fbterm is compatible.

  3. Pingback: Bonus: 2014 in review | Inconsolation

Comments are closed.