Tag Archives: parallelize

pssh: Still more parallelized tools

I think so far, every parallelized tool I’ve discovered in the P section has been new to me.

pssh is new to me too, even if it dates back to at least 2009, if not further.

pssh is a collection of ssh-oriented tools written in Python and mimicking a lot of the standard openssh-style fare. There is a strict pssh application, a psshscp tool for scp-ish adventures, a prsync utility and some others, along with a library to assist with creating new tools.

My escapades with the pssh tools was a little less than successful, something I am always willing to blame on myself first.

Part of my difficulty may lie in that the flag options for prsync (and I use that only as an example; some of the other tools also gave difficulty) are very different from vanilla rsync. prsync, as best I can tell, also demands that you declare a host and a user in the command or face error messages.

The odd thing being, if I tried to just sync two folders in my home directory, a la

prsync -r -h kmandla@127.0.0.1 -l kmandla source/ /home/kmandla/dest/

I met with an error exit code of 255 — which I can’t seem to track down in the man pages or on the web site.

Some other issues too; there were slight inconsistencies in the documentation. The man page for psshscp is titled “pscp.” The man page for pslurp says it’s an application to kill parallelized processes, but the extended description talks about copying and source and destinations and so forth. I admit I was confused. (All this in the Arch version, by the way.)

And beyond that — and I’m not afraid to display my ignorance here — I’m not sure if “parallelized” means “optimized for multiprocessor machines,” as was the case with pbzip2 and pigz, or “optimized for high bandwidth connections,” since most of these are aimed at networking tasks. For what I can tell.

It seems it should be the latter … or at least that’s what I’d be looking for. I’m probably splitting hairs here, but I can say that most of my bottlenecks when I use things like rsync or ssh are not at the processor. And that’s all the more I’ll say, at risk of embarrassing myself.

I’ll let you give them a try, and see if they behave any better for you, or if their focus is a little more clear. It’s good to know they’re available, and maybe they’ll brighten someone’s day. 😉

pigz: Equality among parallelized compression tools

Miguel called me out the other day, for including pbzip2 when I mentioned repeatedly that I wouldn’t include esoteric compression tools in this little adventure.

He’s right on the one hand, since pbzip2 — and now pigz — are specific to one particular algorithm. But they both do such cool things:

2014-03-03-lv-r1fz6-pigz

I don’t think I can add much more to the 1000 words that image is worth. Same flags and arrangement as pbzip2, only this time I used a 256Mb file of random characters, because I am impatient. 😈

I should offer the same caveat this time as I did last time: You may not see much improvement on a single-core machine.

And now for the daring feat of the day, jamming this, pbzip2 and parallel all into the same command …

ls random-{1,2}.txt | parallel pbzip2 -f -k -9 | parallel pigz -f -k -9

Let me just press enter and we’ll see if I spawn a singularity aga

pbzip2: The luxury of multiprocessing

This is one of those times when a screenshot will tell you a lot more than I can, with words:

2014-02-25-lv-r1fz6-pbzip2

pbzip2, the parallificated bzip2, chopping a good 20 seconds of the compression time on a 256Mb clump of random text.

In that situation, nothing else is running and this laptop has an SSD in it, so it’s fairly quick to start with. But pbzip2 still manages to slash the time it takes to smush it down a bit.

The fun part of pbzip2 is watching htop while it’s running. In the case of vanilla bzip2, the system load meter on one processor spikes to 100 percent, while the other sits near idle.

But pbzip2 kicks both of them up to max on this Core2 Duo, and the fan suddenly starts to whine a little louder. 😉

That does, of course, suggest that on a single core machine, you might not see any improvement at all. Logic says without an advanced CPU, there’s little space to share.

Give it a try and see what happens; you never know, there might be a tiny bump.

In closing, I’m a little surprised pbzip2 isn’t more famous. Perhaps there’s something sketchy in its history that I don’t know about.

For now, I’m going to tempt fate and try

ls random-{1,2}.txt | parallel pbzip2 -f -k -9

and see what happens. Yes, combining parallel and pbzip2 might just trigger a black hole in the center of my computer. But just let me press Enter now and see wha

parallel: Working along the same lines

I have been messing with parallel all morning, trying to get it to do the same things that I see in the videos, tutorial and examples.

If you’ve not heard of it, parallel should (and by all accounts usually does) split CPU-intensive jobs across processors, which should drastically reduce the time they require to finish.

But I’m not getting much in the way of speed increases, and in some cases it seems to be taking longer.

2014-02-22-lv-r1fz6-parallel

I’ve tried most combinations I could think of, done them all again as root, even followed examples letter-for-letter from the explanatory videos.

But in almost every case I’m getting much the same performance from functions, so long as they don’t bottleneck at hard drive writes, or something like that.

The coolest thing about parallel — that of course I can’t take advantage of 🙄 — is that it can farm out work to other machines.

Yes, that’s what I meant. It can distribute the workload to networked computers and retrieve the results when they all finish.

I think that trumps xargs, which I often see mentioned in the same breath as parallel, because it will take a --max-procs argument and split out to several processors.

But hey, if I could set four computers on the task of compressing my family photos, I’d be all for that.

I’m going to keep tinkering with parallel and if I can get it working in a promising way, I’ll let it make cameos in future posts.

But you’ve got to earn a place on the big screen, friend. :mrgreen: