Jump to content

Recommended Posts

Posted (edited)

Hi! 
some days ago everything was good and my script working well

here it is:

 

 

#!/usr/bin/python3.6

print("Content-Type: text/html\n\n")
import requests
response = requests.get('https://www.ya.ru')
print(response.url + response.text)

 

the problem is in requests module it's just not work anymore!  I get only   Content-Type: text/html and thats's all

 

can you fix it please?

 

I run it in cgi-bin dir with 0775 rights

Edited by trbot
Posted

It has nothing to do with our servers. It has to do with https://www.ya.ru/ Most websites don't like to be scraped so most likely they noticed you scraping their website and implemented some mechanism to block you. Try using requests to scrape some other website and you'll see that it's working just fine. If a website is going to these lengths to block you perhaps you should just leave them alone, but if you insist on scraping them anyways you might find using a public proxy will get around their block.

Posted (edited)

It has nothing to do with our servers. It has to do with https://www.ya.ru/ Most websites don't like to be scraped so most likely they noticed you scraping their website and implemented some mechanism to block you. Try using requests to scrape some other website and you'll see that it's working just fine. If a website is going to these lengths to block you perhaps you should just leave them alone, but if you insist on scraping them anyways you might find using a public proxy will get around their block.

Before staring this topic I'm trying everything
 
First time it works well and I used https://www.alphavantage.co with python and it just can't ban coz it give stock info for everyone!
 
 
 
OK now I create PHP script with this lines on your server:

 

 
echo  file_get_contents('http://www.ya.ru/');

 

and it works!

 

in python:

 

#!/usr/bin/python3.6
 
print("Content-Type: text/html\n\n")

 

 

import requests
 
response = requests.get('http://www.ya.ru')
print(response.url + response.text)

 

print(response2.url + response2.text)

 

and it NOT working!!! I see only Content-Type: text/html

and it just create json.cpython-36.pyc in __pycache__ dir and nothing happens

 

So how you can explain that in PHP it work fine but not working in python?

There is no any ban 100% I'm tryed a lot of different sites and nothing!

it's something wrong with requests module in python

Edited by trbot
Posted

I've tested requests on python 3.6 on Ricky and I can't find anything wrong with it. Try this code:

 

#!/usr/bin/python3.6
 
import requests
 
print("Content-Type: text/html\n\n")
 
response = requests.get('https://www.heliohost.org/ip.php')
print(response.url + response.text)
Working example on Ricky https://krydos1.heliohost.org/cgi-bin/req.py

 

As far as the content-type header being printed twice, that might mean you have a file name conflict. For instance if you named a file requests.py and then did import requests it would import the file causing the header to be printed twice.

Posted

I've tested requests on python 3.6 on Ricky and I can't find anything wrong with it. Try this code:

 

#!/usr/bin/python3.6
 
import requests
 
print("Content-Type: text/html\n\n")
 
response = requests.get('https://www.heliohost.org/ip.php')
print(response.url + response.text)
Working example on Ricky https://krydos1.heliohost.org/cgi-bin/req.py

 

As far as the content-type header being printed twice, that might mean you have a file name conflict. For instance if you named a file requests.py and then did import requests it would import the file causing the header to be printed twice.

 

Finally I find out  :wacko: 

Python is so weird...

 

If I call file re.py and (requests inside) it will not work if I rename file in another name it will works!

but if I have file called json.py in same dir requests module will not works at any py files!  :blink: 

so my conclusion don't call filenames like json.py or re.py !!!

 

Json.py  & Re.py will be ok.

 

Can you tell me is it possible to configure .htaccess file or some another cfg same way that when I open dir with index.py inside it

I can directly open a base dir like this:

http://host/cgi-bin/test/  - ->> http://host/cgi-bin/test/index.py

 

like it works with index.php/index.html files

 

I'm newbie in Python so I making my first steps...

Posted

Create a /home/trbot/public_html/.htaccess file and put these contents

Options +ExecCGI
AddHandler cgi-script .py
DirectoryIndex index.py
The first two lines make .py files executable outside cgi-bin, and the last line makes the filename index.py show up if someone goes to domain.heliohost.org without having to type out the filename like domain.heliohost.org/index.py
Posted (edited)

Create a /home/trbot/public_html/.htaccess file and put these contents

Options +ExecCGI
AddHandler cgi-script .py
DirectoryIndex index.py

The first two lines make .py files executable outside cgi-bin, and the last line makes the filename index.py show up if someone goes to domain.heliohost.org without having to type out the filename like domain.heliohost.org/index.py

Tnx!

is it possible to make index.php and index.py config that way if index.py absent it will not show current dir content

and try open index.php and if it also absent show 404 ?

Edited by trbot
Posted

Sure, put this line in your .htaccess

Options -Indexes
With that it will show 403 forbidden if you don't have an index file.
Posted

Sure, put this line in your .htaccess

Options -Indexes
With that it will show 403 forbidden if you don't have an index file.

 

Tnx

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...