Tag Archives: parallel

puf: Waiting for its moment to shine

Right off the bat, let’s make it clear that puf is short for parallel URL fetcher.

Now that that’s out of the way, let’s take a closer look.


What’s happening here is that puf is … well, downloading from a URL, much like some other download tools we’ve seen. 😐

It’s not visible in my screenshot there, but apparently puf’s claim to fame is that it can handle multiple connections to retrieve files.

If you squint at the screenshot, you might make out “Connections” with a current and max column beneath it. I had only one connection working, but as you can also see, puf seems to be prepared to handle as many as 20.

All of which is subject to your line speed and bandwidth controls, but as you might imagine, on a proper high-speed connection, puf might prove … very intriguing.

Now for the downsides: puf is already a decade old, and doesn’t seem to have been updated in quite a while. Further, puf isn’t really so much a download manager as a souped-up rendition of wget or curl, prepared to make multiple connections to yank files out of the ether.

And it doesn’t handle multiple simultaneous downloads. Or have a download queue to speak of. And it doesn’t search out alternative targets, like axel does.

And judging by the home page, puf really only handles HTTP addresses. aria2 this is not.

And of course, if you’re trapped on a low-speed network, multiple connections downloading at 56K modem rates isn’t going to thrill anyone. (You have my sympathies, by the way. I lived through the 56K era. 😯 )

Point being, puf may on occasion, in certain situations, when the time is right, at a precious moment, in dire circumstances … be just the right tool.

Outside of that though, it might only be a second-string download utility 10 years beyond its last update. 😕 😦

pssh: Still more parallelized tools

I think so far, every parallelized tool I’ve discovered in the P section has been new to me.

pssh is new to me too, even if it dates back to at least 2009, if not further.

pssh is a collection of ssh-oriented tools written in Python and mimicking a lot of the standard openssh-style fare. There is a strict pssh application, a psshscp tool for scp-ish adventures, a prsync utility and some others, along with a library to assist with creating new tools.

My escapades with the pssh tools was a little less than successful, something I am always willing to blame on myself first.

Part of my difficulty may lie in that the flag options for prsync (and I use that only as an example; some of the other tools also gave difficulty) are very different from vanilla rsync. prsync, as best I can tell, also demands that you declare a host and a user in the command or face error messages.

The odd thing being, if I tried to just sync two folders in my home directory, a la

prsync -r -h kmandla@ -l kmandla source/ /home/kmandla/dest/

I met with an error exit code of 255 — which I can’t seem to track down in the man pages or on the web site.

Some other issues too; there were slight inconsistencies in the documentation. The man page for psshscp is titled “pscp.” The man page for pslurp says it’s an application to kill parallelized processes, but the extended description talks about copying and source and destinations and so forth. I admit I was confused. (All this in the Arch version, by the way.)

And beyond that — and I’m not afraid to display my ignorance here — I’m not sure if “parallelized” means “optimized for multiprocessor machines,” as was the case with pbzip2 and pigz, or “optimized for high bandwidth connections,” since most of these are aimed at networking tasks. For what I can tell.

It seems it should be the latter … or at least that’s what I’d be looking for. I’m probably splitting hairs here, but I can say that most of my bottlenecks when I use things like rsync or ssh are not at the processor. And that’s all the more I’ll say, at risk of embarrassing myself.

I’ll let you give them a try, and see if they behave any better for you, or if their focus is a little more clear. It’s good to know they’re available, and maybe they’ll brighten someone’s day. 😉

pnscan: A parallel network scanner

Back to network tools for a bit. We entertained two parallelized compression tools; here’s one that claims to be a parallel network scanner.


pnscan admits it’s not the tool nmap is, but for what little I’ve seen, it does a pretty good job. As you can see above, it did find the only other machine on my network, and the ssh port that’s open there.

As you can see in the help flags, it can also filter out results that don’t match a string, and also send specific information to specific points. The README file has some good examples too.

Networking is still my weakest point though, so while I can see that it works and I have a vague idea of what it’s doing, I can’t imagine how I would use it.

More distressing though, is that I can’t really be sure it’s “parallelized,” mostly because all my tests finished way too fast to check.

On the plus side though, pnscan didn’t seem to care if I was a privileged user or not; it did everything I asked without calling out sudo to check my credentials. Maybe that’s on the minus side, depending on your perspective. 🙄

I’m willing to keep pnscan in mind if I need a fundamental network tool that doesn’t need superuser powers and seems fairly potent.

pnscan is in Debian; I didn’t see it in Arch/AUR. If you decide to try it and you need to compile it yourself, just make crashed for me. make lnx did the trick though. 😉

pigz: Equality among parallelized compression tools

Miguel called me out the other day, for including pbzip2 when I mentioned repeatedly that I wouldn’t include esoteric compression tools in this little adventure.

He’s right on the one hand, since pbzip2 — and now pigz — are specific to one particular algorithm. But they both do such cool things:


I don’t think I can add much more to the 1000 words that image is worth. Same flags and arrangement as pbzip2, only this time I used a 256Mb file of random characters, because I am impatient. 😈

I should offer the same caveat this time as I did last time: You may not see much improvement on a single-core machine.

And now for the daring feat of the day, jamming this, pbzip2 and parallel all into the same command …

ls random-{1,2}.txt | parallel pbzip2 -f -k -9 | parallel pigz -f -k -9

Let me just press enter and we’ll see if I spawn a singularity aga

pbzip2: The luxury of multiprocessing

This is one of those times when a screenshot will tell you a lot more than I can, with words:


pbzip2, the parallificated bzip2, chopping a good 20 seconds of the compression time on a 256Mb clump of random text.

In that situation, nothing else is running and this laptop has an SSD in it, so it’s fairly quick to start with. But pbzip2 still manages to slash the time it takes to smush it down a bit.

The fun part of pbzip2 is watching htop while it’s running. In the case of vanilla bzip2, the system load meter on one processor spikes to 100 percent, while the other sits near idle.

But pbzip2 kicks both of them up to max on this Core2 Duo, and the fan suddenly starts to whine a little louder. 😉

That does, of course, suggest that on a single core machine, you might not see any improvement at all. Logic says without an advanced CPU, there’s little space to share.

Give it a try and see what happens; you never know, there might be a tiny bump.

In closing, I’m a little surprised pbzip2 isn’t more famous. Perhaps there’s something sketchy in its history that I don’t know about.

For now, I’m going to tempt fate and try

ls random-{1,2}.txt | parallel pbzip2 -f -k -9

and see what happens. Yes, combining parallel and pbzip2 might just trigger a black hole in the center of my computer. But just let me press Enter now and see wha

parallel: Working along the same lines

I have been messing with parallel all morning, trying to get it to do the same things that I see in the videos, tutorial and examples.

If you’ve not heard of it, parallel should (and by all accounts usually does) split CPU-intensive jobs across processors, which should drastically reduce the time they require to finish.

But I’m not getting much in the way of speed increases, and in some cases it seems to be taking longer.


I’ve tried most combinations I could think of, done them all again as root, even followed examples letter-for-letter from the explanatory videos.

But in almost every case I’m getting much the same performance from functions, so long as they don’t bottleneck at hard drive writes, or something like that.

The coolest thing about parallel — that of course I can’t take advantage of 🙄 — is that it can farm out work to other machines.

Yes, that’s what I meant. It can distribute the workload to networked computers and retrieve the results when they all finish.

I think that trumps xargs, which I often see mentioned in the same breath as parallel, because it will take a --max-procs argument and split out to several processors.

But hey, if I could set four computers on the task of compressing my family photos, I’d be all for that.

I’m going to keep tinkering with parallel and if I can get it working in a promising way, I’ll let it make cameos in future posts.

But you’ve got to earn a place on the big screen, friend. :mrgreen: