November 2017
Intermediate to advanced
226 pages
5h 59m
English
It's impossible to remember which characters are invalid and manually escape them with percent signs, but the built-in Python module urllib.parse has the required methods to solve this.
Now we can try fixing this by escaping/URL encoding the request. Rewrite the script as follows:
patten = '(http)?s?:?(\/\/[^"]*\.(?:png|jpg|jpeg|gif|png|svg))'
for line in open('packtpub.txt'):
for m in re.findall(patten, line):
print('https:' + m[1])
fileName = basename(urllib.parse.urlsplit(m[1])[2])
print(fileName)
request = 'https:' + urllib.parse.quote(m[1])
img = urllib.request.urlopen(request).read()
file = open(fileName, "wb")
file.write(img)
file.close()
break
Read now
Unlock full access