I'm making extensive whois queries in generating spam reports (everyone
needs a hobby, right...). Which...is...slow....
...so I was excited to discover "jwhois", a caching whois client. This
creates a cache (/var/cache/jwhois/jwhois.db) for previously requested
domain. In spam lookups this is convenient as 100 domains accounts for
well over half my spam (1403 total domains recorded).
Problem is: jwhois only caches lookups where it already knows the
server.
The trick then, is to seed the cache. I'd already performed a host
lookup on some 5000+ spams I've received (since early November! -- and
yes, caching DNS helps tons), so the following does the trick:
Assuming domains in /tmp/spamdomains-ranked, in the following format
(modify recipie to suit):
------------------------------------------------------------------------
1 345 kornet.net
2 156 freeserve.com
3 148 comcast.net
4 138 rr.com
5 132 guangzhou.gd.cn
6 107 uu.net
7 104 attbi.com
8 95 dacom.co.kr
9 67 pacbell.net
10 64 wanadoo.fr
------------------------------------------------------------------------
for dom in $(
# Extract domains from list, get rid of any numeric IPs which
# have snuck through.
awk '{print $3}' /tmp/spamdomains-ranked |
sed -e '/^[0-9]\{1,3\}\.[0-9]\{1,3\}/d'
)
do
echo -e "\n>>> $dom <<<"
# Recursive query. Query the second time, using the
# WHOIS server indicated by the first pull
jwhois -h $(
jwhois -h whois.internic.net $dom |
head |
grep '^\[' |
tail -1 |
sed -e 's/[][]//g' -e 's/^$/whois.internic.net/'
) $dom |
head -2 ;
done;
...that's a serial query, which can bog down on timeouts for any given
domain. To speed processing, batch reqeusts, e.g.:
step=40 # Number of requests to batch in simultaneous submits
for s in $(
seq 1 $step $( wc -l /tmp/spamdomains-ranked | awk '{print $1}' )
)
do
e=$(( s + step - 1 ))
echo "e: $e"
for dom in $(
awk '{print $3}' /tmp/spamdomains-ranked |
sed -e '/^[0-9]\{1,3\}\.[0-9]\{1,3\}/d' |
sed -ne "${s},${e}p"
)
do
echo -e "\n>>> $dom <<<"
jwhois -h $(
jwhois -h whois.internic.net $dom |
head |
grep '^\[' |
tail -1 |
sed -e 's/[][]//g' -e 's/^$/whois.internic.net/'
) $dom | head -2
done & wait;
done
Alternatively, sleep for 5-20 seconds between batches rather than
'wait'ing.
What I don't have is a way to periodically repeat this seeding, which
would be useful, though using the recursive lookup in scripts could
satisfy most needs.
Peace.
--
Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/
What Part of "Gestalt" don't you understand?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Attachment:
signature.asc
Description: Digital signature