Tag Archives: mirror

httrack: The website copier

I could have used httrack about four months ago, when I wanted to mirror a fairly large website for my offline perusal, and lacked a proper tool. I tried bew and another graphical webcrawler, and even fell back on wget, but nothing was 100 percent successful. I ended up mass-downloading most of what I needed, and it wasn’t a pretty sight.

httrack might have saved me the trouble, and probably would have done a much better job.

2014-11-04-2sjx281-httrack

httrack is more than capable of patiently stepping through the architecture of a website, and bringing you a copy of everything there.

But on top of that, httrack, like a lot of good network-based software, has so many options, it can be a bit bewildering. If you open the --help flag, be prepared. It’s a couple hundred lines long at least.

For example, there are flags to save files in a cache, to skip files that are available locally, four options for logging, flags to create an index, screen for particular types of files (ie., HTML only, etc.), set directions for following directories (only up or only down), disable bandwidth abuse limits, cap the number of links, continue a broken-off mirror attempt, enter an interactive mode, confine the search to a single site, and dozens upon dozens more.

Most of those other ones are far and beyond anything I would ever need, let alone understand. If you know what they mean, you might find them quite useful. And maybe best of all, httrack has about a dozen shortcuts for common flag combinations, meaning you can ask for just --spider, instead of typing out -p0C0I0t.

The first time you use it, I’d recommend just httrack though, since by itself the command steps you through a simple wizard, letting you pick options menu-style. If you’ve never used httrack before, it’s a good introduction, and will finish with the command line needed to recall the same options you set. Very helpful, if you’re like me and you learn by example. 🙂

Once you get the hang of it, try things like httrack http://example.com -W%v2, which will give you a nice fullscreen progress display and prompt you if it finds any eccentricities. Quite useful.

I’m going to go back now and re-mirror the site I mangled back in July, and hope I can get a cleaner, more complete copy. 😉

reflector: One more for Archers

In the interest of parity, and since there have been a lot of Debian-only posts in the past, here’s reflector — an Arch-only trick.

2014-04-13-6m47421-reflector-uk

Mirror management is usually an easy-to-forget, one-time task when building a system, but it might be worth keeping reflector in mind.

I’ve used rankmirrors plenty of times, and if there’s no other available option, it does a fine job. But rankmirrors does expect you to do a little background work, and at times can be a bit time-consuming. All of which is easy to work around, of course.

reflector, in my humble opinion, has the added bonus of being able to filter mirrors by geographical area, which is great if you’re a world traveler and want to update between stopovers.

Or it might just be that some of the mirrors rankmirrors gave you are sluggish or remote, in which case reflector might have a few better ideas for you.

And of course, the best place to learn about reflector is on the one-and-only Arch wiki, which is only the best source for Linux information in the universe. Regardless of your distro. 😉