Downloading bears

Well, I had an Azure API key, but apparently my free account ran out, and I need to pay 25 dollars for a month.

So I’ll try some less advanced methods for getting my pictures of bears. Jeremy did say there were methods that break the terms of service of various tools, so I will try to do this discreetly and without causing trouble.

First I do a web search for grizzly images:

https://duckduckgo.com/?q=grizzly+bear&t=brave&iax=images&ia=images

That gives me an infinitely scrolling page of nice grizzly bear pictures. I need 150…

I can download at least the beginning of this

curl "https://duckduckgo.com/?q=grizzly+bear&t=brave&iax=images&ia=images" > grizzly_bear.html

But that’s a big pile of pretty unreadable HTML all in one line. I could try prettifying it and then grepping through it, but…

Time to do some screen-scraping. I haven’t done that in a while. I used to do that in Perl, but let’s try using the languages I use most these days, Python or Javascript. What are some good tools?

In Python… BeautifulSoup?… or in Javascript/Node… There was also that kibitzr tool I played with recently…

OK I ended up in Javascript with this nice tutorial:

https://www.freecodecamp.org/news/the-ultimate-guide-to-web-scraping-with-node-js-daa2027dcd3/

Stop the presses!

This page lets you test out the Bing API and see results as images, and also get a JSON version. Maybe I can use that.

https://azure.microsoft.com/en-us/services/cognitive-services/bing-image-search-api/

ah but it only returns 35 results per request…

OK so back to using duckduckgo, in a browser to avoid the limited amount returned when using curl, zoom out to 33%, scroll down a bit, should be displaying enough images, then Right-click Inspect, then right click html tag, Copy, Copy outerHTML, then paste that into a file.

OK I have three HTML files, back to parsing them…

(to be continued)