parallel: Working along the same lines

I have been messing with parallel all morning, trying to get it to do the same things that I see in the videos, tutorial and examples.

If you’ve not heard of it, parallel should (and by all accounts usually does) split CPU-intensive jobs across processors, which should drastically reduce the time they require to finish.

But I’m not getting much in the way of speed increases, and in some cases it seems to be taking longer.


I’ve tried most combinations I could think of, done them all again as root, even followed examples letter-for-letter from the explanatory videos.

But in almost every case I’m getting much the same performance from functions, so long as they don’t bottleneck at hard drive writes, or something like that.

The coolest thing about parallel — that of course I can’t take advantage of 🙄 — is that it can farm out work to other machines.

Yes, that’s what I meant. It can distribute the workload to networked computers and retrieve the results when they all finish.

I think that trumps xargs, which I often see mentioned in the same breath as parallel, because it will take a --max-procs argument and split out to several processors.

But hey, if I could set four computers on the task of compressing my family photos, I’d be all for that.

I’m going to keep tinkering with parallel and if I can get it working in a promising way, I’ll let it make cameos in future posts.

But you’ve got to earn a place on the big screen, friend. :mrgreen:

3 thoughts on “parallel: Working along the same lines

  1. benuwa

    Hi, as far as I can see, the difference between your example and the ones in the tutorial video of the home page is that you use only one file to compress.
    With only sample.txt, “parallel” can only launch one instance of gzip to compress this one file.
    With multiple files (a bunch of log files in the video), parallel can process each file with its own gzip instance.
    Another differcence between your screenshot (fbshot?) and the video is that you use the “time” command (well, the “time” shell build-in but that is irrelevant) in the command executed by parallel. That means, if you have 2 jobs (2 gzip, each processing its own file) you should see 2 “time” output as well. That may not be what you want.
    try something like that:
    $ time gzip -1 -f sample.txt sample2.txt
    $ ls sample.txt sample2.txt | time parallel ‘gzip -1 -f’

    If you already went through that and it did not work, sorry for the babble.

    1. K.Mandla Post author

      Good point. Let me try those again. I know when it didn’t seem to be making much difference I tried to stay close to the examples, but I might have taken a screenshot at a different time.

      Thanks for the help. Cheers! 🙂

  2. Pingback: pbzip2: The luxury of multiprocessing | Inconsolation

Comments are closed.