allu62 Posted July 17, 2020 Posted July 17, 2020 Hi. Last month, I made a post concerning the diminshing of the visitor number on my site to 1..2 per day during May and first half of June, thinking that there were problems with the server or awstats. I guess, that the statistics were correct and that there was a reason that has nothing to do with Heliohost. Anyway, visist numbers are becoming normal again.But, what could have been the reason? Searching the Internet, I found several articles, telling that excessive site access by bad crawlers may drastically affect the normal site traffic. And effectively, when having a closer look at the Apache log files, there are some robots, that download tons of megabytes from my site.My questions: - Has Heliohost any recommendations, which robots to block? - Has any experienced user such recommendations? Perhaps a list of bad robots? Or, maybe, block all and just allow some known as ok? - What should be blocked in robot.txt and what in .htaccess? - Should some of them be blocked, using IP Bocking in C-Panel?Hope to find some help, because really no real knowledge/experience with these things.Thanks.
wolstech Posted July 17, 2020 Posted July 17, 2020 Sounds like snake oil to me. I've never heard of such causing an impact (unless the site is down due to being overloaded by them...). The only time I really see bot blocking scripts used is on phishing and other illegal sites...and that's generally done to hide from automated anti-abuse services (ironically, implementing a block like this actually makes abuse easier to identify).
Krydos Posted July 18, 2020 Posted July 18, 2020 Trying to block certain IPs or trying to block certain bots just makes you look guilty of something. I wouldn't be surprised if legitimate crawlers like google down ranked you for doing suspicious stuff like that. Maybe link to the article you found and we can read it ourselves?
allu62 Posted July 18, 2020 Author Posted July 18, 2020 There are dozens of sites (SEO and others) recommending to block what they call "bad robots" or even suggest just to allow "good ones"... My awstats from yesterday: - SemrushBot: 18,536+356 hits, 84.77 MB - AhrefsBot: 4,591+261 hits, 46.92 MB - Unknown robot identified by bot\*: 2,908+147 hits, 37.03 MBand similar for other days.In comparison Googlebot: 1,248+853,,70 MB and this only some times a month...Totals for this month: 29,808 pages (428.45 MB) not viewed traffic vs. 630 pages (199.83 MB) normal traffic. Should I really let these crawlers do? And doing so, is that not a senseless "overload" of Tommy?
Krydos Posted July 19, 2020 Posted July 19, 2020 We have an account on Tommy that did 47 GB of traffic last month. That's more than 100 times as much traffic as you got. Here is your load graph for the last week:You definitely don't need to worry about overloading Tommy.
MoneyBroz Posted September 29, 2020 Posted September 29, 2020 Just use this code in your index page <meta name="robots" content="noindex"> Note:it will block all search engines from indexing your website.
Krydos Posted September 29, 2020 Posted September 29, 2020 <meta name="robots" content="noindex">The funny thing about this is if it's a good robot it will obey this rule and not crawl your site. If it's a "bad" robot, which is what OP is trying to block, it will ignore the rules and crawl anyways. 1
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now