The Relativity of Performance Improvements
Today, after receiving a 1.7MB daily security log message containing thousands of ssh failed login attempts from bots around the world, I decided I had enough. I enabled IPFW to a FreeBSD system I maintain, and added a script to find and block the offending IP addresses. In the process I improved the script's performance. The results of the improvement were unintuitive.
This is the original version of the script I found on a FreeBSD wiki.
#!/bin/sh
if ipfw show | awk '{print $1}' | grep -q 20000 ; then
ipfw delete 20000
fi
for ips in `cat /var/log/auth.log | grep sshd | grep "Illegal" |
awk '{print $10}' | uniq -d` ; do
ipfw -q add 20000 deny tcp from $ips to any
done
cat /var/log/auth.log | grep sshd | grep "Failed" | rev |
cut -d\ -f 4 | rev | sort | uniq -c | \
( while read num ips; do
if [ $num -gt 5 ]; then
if ! ipfw show | grep -q $ips ; then
ipfw -q add 20000 deny tcp from $ips to any
fi
fi
done
)
- used one search pass for catching failed attempts for both legal and illegal users,
- integrated the countinng of multiple attempts into awk using an associative array,
- replaced the superfluous cat command with input redirection, and
- removed the check for duplicate entries, since the IPFW rule is always deleted at the beginning of the script.
#!/bin/sh
if ipfw show | awk '{print $1}' | grep -q 20000 ; then
ipfw delete 20000
fi
awk '/sshd.*authentication error/ {try[$(NF)]++}
END {for (h in try) if (try[h] > 5) print h}' /var/log/auth.log |
while read ip
do
ipfw -q add 20000 deny tcp from $ip to any
done
I made the changes, just because the original code offended my sense of parsimony, not because I believed that the code was fundamentally inefficient. Having made them I decided to measure their impact. On my system the new version of command runs twice as fast as the old one (20ms against 50ms). However, the reduction on the overall load system load is negligible. I calculated that if I run the command every 3 minutes, it will take up 0.01% of the system resources; the old one would consume 0.02%. No wonder we're seeing software bloat everywhere. For most cases tuning a non-optimal design is simply not worth the effort. Then, when the need for performance truly arises, the knowledge and experience of how to improve it is probably missing.
Read and post comments