Scraping Search Engines

Scraping Search Engines is Fun.


  3. inurl:profile 
  4. female inurl:profile inurl:about


  • The use of the keywords ‘inurl’ and ‘site’. ‘site’ is straightforward look for that specific site. ‘inurl’ suggests that word should be present in the url
  • Notice the use of the wildcard operator (*) in 3 above.


  • Interestingly Bing doesn’t support ‘inurl’ or wildcards (*)
  • So 2 & 3 above dont work.

Google API

  • I played around with the Google Apis to scrape some profiles. However it seemed they had some limitations with the number of results sent back.
  • The code below should give you a sense of my trials in scraping Google programmatically.
    • It detects fairly easily you are a bot. And throttles you.
  • Also, sometimes google can do re-directs. So be careful with the url format you use for querying.

Bing API

For Bing API, I use the one they expose in the azure Marketplace.





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s