Parallelizing Jobs with xargs
With multi-core processors sitting idle most of the time and workloads always increasing, it's important to have easy ways to make the CPUs earn their money's worth. My colleague Georgios Gousios told me today how the Unix xargs command can help in this regard.
The GNU xargs command that comes with Linux and the one distributed with FreeBSD support a -P option through which one can specify the number of jobs to run in parallel. Using this flag (perhaps in conjunction with -n to limit the number of arguments passed to the executing program), makes it easy to fire commands in parallel in a controlled fashion.
Georgios sent me an example, where he sped up a job by almost seven times through this technique.
$ ls -l *.eps|wc 192 1537 17651 $ time find . -type f|xargs -Istr epstopdf str real 0m54.395s user 0m42.539s sys 0m12.645s $ time find . -type f|xargs -n 8 -P 8 -Istr epstopdf str real 0m8.189s user 0m43.363s sys 0m13.273s
The xargs -P flag can also be useful for parellelizing commands that depend on a large number of high-latency systems. Only a week ago I spent hours to write a script that would resolve IP addresses into host names in parallel. (Yes, I know the logresolve.pl that comes with the Apache web server distribution, and the speedup it provides leaves a lot to be desired.) Had I known the -P xargs option, I would have finished my task in minutes.
Read and post comments