by Michael Schrenk
March 2012, 392 pp.
There's a wealth of data online, but sorting and gathering it by hand can be tedious and time consuming. Rather than click through page after endless page, why not let bots do the work for you?
Webbots, Spiders, and Screen Scrapers will show you how to create simple programs with PHP/CURL to mine, parse, and archive online data to help you make informed decisions. Michael Schrenk, a highly regarded webbot developer, teaches you how to develop fault-tolerant designs, how best to launch and schedule the work of your bots, and how to create Internet agents that:
Sample projects for automating tasks like price monitoring and news aggregation will show you how to put the concepts you learn into practice.
This second edition of Webbots, Spiders, and Screen Scrapers includes tricks for dealing with sites that are resistant to crawling and scraping, writing stealthy webbots that mimic human search behavior, and using regular expressions to harvest specific data. As you discover the possibilities of web scraping, you'll see how webbots can save you precious time and give you much greater control over the data available on the Web.
About the Author
Michael Schrenk has developed webbots for over 15 years, working just about everywhere from Silicon Valley to Moscow, for clients like the BBC, foreign governments, and many Fortune 500 companies. He's a frequent Defcon speaker and lives in Las Vegas, Nevada.
Table of Contents
Part I: Fundamental Concepts and Techniques
Part II: Projects
Part III: Advanced Technical Considerations
Part IV: Larger Considerations
Appendix A: PHP/CURL Reference
"Webbots, Spiders, and Screen Scrapers is well-written and easy to read. Schrenk will encourage you to look at the web as a data resource and inspire you to write useful code which saves time and money"
"This book is a great resource for those looking to move beyond the Internet browser with automated solutions for collecting and using data. It should prove to stimulate your imagination with the possibilities of what can be done."
"There are certainly many ways that a web developer can learn to code webbots and spiders, but one would be hard pressed to find a better starting point than reading Schrenk's second edition. The text and its associated code library lay an excellent foundation from which almost no webbot project is out of reach."
"Overall the book is interesting and readable, and the code is straightforward and easy to follow even for those without a solid grounding in PHP."
"This book is one of the few that attempts to gather together the range of techniques that you need to write programs that work with web sites intended to be used by humans."
"Overall, I found this a very clear, very readable, and thorough presentation of the topic. Given that this is the second edition of this volume, others before have realized that Schrenk has written probably the definitive introduction to this topic and made the whole field of crawlers, spiders, and bots an approachable and interesting area to explore. Highly recommended."