trbot Posted June 19, 2020 Posted June 19, 2020 (edited) Hi! some days ago everything was good and my script working wellhere it is: #!/usr/bin/python3.6print("Content-Type: text/html\n\n")import requestsresponse = requests.get('https://www.ya.ru')print(response.url + response.text) the problem is in requests module it's just not work anymore! I get only Content-Type: text/html and thats's all can you fix it please? I run it in cgi-bin dir with 0775 rights Edited June 19, 2020 by trbot
trbot Posted June 20, 2020 Author Posted June 20, 2020 On 6/20/2020 at 1:04 AM, flazepe said: It should be 0755, not 0775.yes correct I was misprint 0755requests module not working
Krydos Posted June 20, 2020 Posted June 20, 2020 It has nothing to do with our servers. It has to do with https://www.ya.ru/ Most websites don't like to be scraped so most likely they noticed you scraping their website and implemented some mechanism to block you. Try using requests to scrape some other website and you'll see that it's working just fine. If a website is going to these lengths to block you perhaps you should just leave them alone, but if you insist on scraping them anyways you might find using a public proxy will get around their block.
trbot Posted June 21, 2020 Author Posted June 21, 2020 (edited) On 6/20/2020 at 9:18 PM, Krydos said: It has nothing to do with our servers. It has to do with https://www.ya.ru/ Most websites don't like to be scraped so most likely they noticed you scraping their website and implemented some mechanism to block you. Try using requests to scrape some other website and you'll see that it's working just fine. If a website is going to these lengths to block you perhaps you should just leave them alone, but if you insist on scraping them anyways you might find using a public proxy will get around their block.Before staring this topic I'm trying everything First time it works well and I used https://www.alphavantage.co with python and it just can't ban coz it give stock info for everyone! OK now I create PHP script with this lines on your server: echo file_get_contents('http://www.ya.ru/');echo file_get_contents('https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=MSFT&apikey=demo'); and it works! in python: #!/usr/bin/python3.6 print("Content-Type: text/html\n\n") import requests response = requests.get('http://www.ya.ru')print(response.url + response.text) response2 = requests.get('https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=MSFT&apikey=demo')print(response2.url + response2.text) and it NOT working!!! I see only Content-Type: text/htmland it just create json.cpython-36.pyc in __pycache__ dir and nothing happens So how you can explain that in PHP it work fine but not working in python?There is no any ban 100% I'm tryed a lot of different sites and nothing!it's something wrong with requests module in python Edited June 21, 2020 by trbot
Krydos Posted June 21, 2020 Posted June 21, 2020 I've tested requests on python 3.6 on Ricky and I can't find anything wrong with it. Try this code: #!/usr/bin/python3.6 import requests print("Content-Type: text/html\n\n") response = requests.get('https://www.heliohost.org/ip.php') print(response.url + response.text)Working example on Ricky https://krydos1.heliohost.org/cgi-bin/req.py As far as the content-type header being printed twice, that might mean you have a file name conflict. For instance if you named a file requests.py and then did import requests it would import the file causing the header to be printed twice.
trbot Posted June 22, 2020 Author Posted June 22, 2020 On 6/21/2020 at 8:49 PM, Krydos said: I've tested requests on python 3.6 on Ricky and I can't find anything wrong with it. Try this code: #!/usr/bin/python3.6 import requests print("Content-Type: text/html\n\n") response = requests.get('https://www.heliohost.org/ip.php') print(response.url + response.text)Working example on Ricky https://krydos1.heliohost.org/cgi-bin/req.py As far as the content-type header being printed twice, that might mean you have a file name conflict. For instance if you named a file requests.py and then did import requests it would import the file causing the header to be printed twice. Finally I find out Python is so weird... If I call file re.py and (requests inside) it will not work if I rename file in another name it will works!but if I have file called json.py in same dir requests module will not works at any py files! so my conclusion don't call filenames like json.py or re.py !!! Json.py & Re.py will be ok. Can you tell me is it possible to configure .htaccess file or some another cfg same way that when I open dir with index.py inside itI can directly open a base dir like this:http://host/cgi-bin/test/ - ->> http://host/cgi-bin/test/index.py like it works with index.php/index.html files I'm newbie in Python so I making my first steps...
Krydos Posted June 22, 2020 Posted June 22, 2020 Create a /home/trbot/public_html/.htaccess file and put these contentsOptions +ExecCGI AddHandler cgi-script .py DirectoryIndex index.pyThe first two lines make .py files executable outside cgi-bin, and the last line makes the filename index.py show up if someone goes to domain.heliohost.org without having to type out the filename like domain.heliohost.org/index.py
trbot Posted June 22, 2020 Author Posted June 22, 2020 (edited) On 6/22/2020 at 4:50 AM, Krydos said: Create a /home/trbot/public_html/.htaccess file and put these contentsOptions +ExecCGI AddHandler cgi-script .py DirectoryIndex index.pyThe first two lines make .py files executable outside cgi-bin, and the last line makes the filename index.py show up if someone goes to domain.heliohost.org without having to type out the filename like domain.heliohost.org/index.pyTnx!is it possible to make index.php and index.py config that way if index.py absent it will not show current dir contentand try open index.php and if it also absent show 404 ? Edited June 22, 2020 by trbot
Krydos Posted June 22, 2020 Posted June 22, 2020 Sure, put this line in your .htaccessOptions -IndexesWith that it will show 403 forbidden if you don't have an index file.
trbot Posted June 22, 2020 Author Posted June 22, 2020 On 6/22/2020 at 6:46 PM, Krydos said: Sure, put this line in your .htaccessOptions -IndexesWith that it will show 403 forbidden if you don't have an index file. Tnx
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now