Focused web crawler: Good Starting URL’s

by bose on May 4, 2011 is a one-stop URL because all its links are to final targets. The script stays within the applegate domain.

A good starting URL should be a hub to likely sites with lots of good links to final targets.

It should FIND relevant new directories not BE a relevant directory.

So I’m thinking a good starting URL changes quickly and constantly, has a random element for the script – so press release sites and magazine news sites would be a good bet.

Previous post:

Next post: