blog dds

2009.03.04

Parallelizing Jobs with xargs

With multi-core processors sitting idle most of the time and workloads always increasing, it's important to have easy ways to make the CPUs earn their money's worth. My colleague Georgios Gousios told me today how the Unix xargs command can help in this regard.

The GNU xargs command that comes with Linux and the one distributed with FreeBSD support a -P option through which one can specify the number of jobs to run in parallel. Using this flag (perhaps in conjunction with -n to limit the number of arguments passed to the executing program), makes it easy to fire commands in parallel in a controlled fashion.

Georgios sent me an example, where he sped up a job by almost seven times through this technique.

$ ls -l *.eps|wc
    192    1537   17651
$ time find . -type f|xargs -Istr epstopdf str

real    0m54.395s
user    0m42.539s
sys    0m12.645s

$ time find . -type f|xargs -n 8 -P 8 -Istr epstopdf str

real    0m8.189s
user    0m43.363s
sys    0m13.273s

The xargs -P flag can also be useful for parellelizing commands that depend on a large number of high-latency systems. Only a week ago I spent hours to write a script that would resolve IP addresses into host names in parallel. (Yes, I know the logresolve.pl that comes with the Apache web server distribution, and the speedup it provides leaves a lot to be desired.) Had I known the -P xargs option, I would have finished my task in minutes.

Read and post comments    AddThis Social Bookmark Button


Creative Commons License Last modified: Wednesday, March 4, 2009 11:16 pm
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.