0

I'm trying to make my first web scraper and keep getting errors for https URLs but not http.

I am running Python 2.7.9 on RPi2 with Raspbian Jessie. I am also SSHing into the Pi on my laptop. All of this is being done at my home network.

I do not get these errors when running the code below on my laptop. Only when running it on RPi2.

I tried to find some solutions online and I found some people talking about issues with the proxy, but not sure if it applies here.

>>> import requests
>>> page = requests.get("http://www.google.com")
>>> page.status_code
200
>>> page = requests.get("https://www.timeanddate.com")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.timeanddate.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x764fbd90>: Failed to establish a new connection: [Errno 101] Network is unreachable',))

3 Answers3

1

I think that you are having timeout issues. Here is an example of 1 second timeout, which is working fine.

import requests

from requests.exceptions import ConnectionError
try:
   r = requests.get("https://www.timeanddate.com", timeout=1)
   print r.status_code
   print r.headers
   print r.encoding
   print r.text
except ConnectionError as e:
   print e
   r = "No response"
Dr.Rabbit
  • 1,018
  • 6
  • 10
  • I tried your code and I get the following output: HTTPSConnectionPool(host='www.timeanddate.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConn ection object at 0x7645f910>: Failed to establish a new connection: [Errno 101] Network is unreachable',)) Using a longer timeout value doesn't help either. – Korean_Of_the_Mountain Oct 05 '17 at 22:30
  • do they even have port=443 open? – Dr.Rabbit Oct 05 '17 at 22:43
  • I just followed instructions here (https://www.techwalla.com/articles/how-to-check-if-port-443-is-open) and it looks like port 443 is open for timeanddate.com – Korean_Of_the_Mountain Oct 06 '17 at 00:45
  • remove it. Use my code then work from there – Dr.Rabbit Oct 06 '17 at 00:47
  • Remove what? I was using your code earlier. I didn't specifically use port 443 or anything. That error message is the result from using your code. – Korean_Of_the_Mountain Oct 06 '17 at 00:50
  • strange. Is it in a new blank file? also can you ping the www.timeanddate.com ? or are you behind a proxy? – Dr.Rabbit Oct 06 '17 at 01:03
  • All of this being done at home and I don't believe I'm behind a proxy. Your code works fine on my laptop. However, I'm trying to run this on my Raspberry Pi 2 -- I'm SSHing into it on my laptop. I have run your code in interactive mode and as a script, but I still get the same error. I recently setup a static IP on the RPi2 for SSHing. Might that have something to do with this error? – Korean_Of_the_Mountain Oct 06 '17 at 01:14
  • Hmm I get "network is unreachable" if I try to ping websites from RPi2. On the other hand, I can get your code to work for http websites (eg google.com), just not https. – Korean_Of_the_Mountain Oct 06 '17 at 01:22
  • ah nevermind i figured it out and posted the answer as a comment under the OP – Korean_Of_the_Mountain Oct 06 '17 at 01:30
1

From the

Failed to establish a new connection: [Errno 101] Network is unreachable

error message, it seems that you are not able to connect to the server. Check that you can reach www.timeanddate.com on port 443 (HTTPS):

$ curl -I https://www.timeanddate.com

If you get something like:

HTTP/1.1 200 OK
(...)

then there may be some problem with your python installation. Else, you may have network issues. How is your PI connected to the internet?

Nuno
  • 41
  • 2
-1

I would try running : sudo pip install requests —upgrade