Ashoat Posted July 26, 2010 Posted July 26, 2010 Okay, so this problem was pretty complicated. Turns out that some system upgrade or something made it so when batch (not bash) scripts run, they don't show up in the process list. So that made one of our cron scripts, that checks if it's running already by checking the process list, basically start spawning off lots of copies. But it only did that when another conditional was met, which explains common recent downtimes. However, it took the server down one morning, and I was forced to hard reset the server. This caused some data corruption. On boot, some mounted partitions would decide to set themselves to read-only, despite not being mounted as such. This would cause unexplainable errors that took a while to debug. Once I figured out the problem, I knew the fix was to run fsck before boot. This had some problems of its own, as apparently the standard fsck command broke, so I had to get the datacenter tech to run a command that wouldn't prompt the user. Looks like everything's working now, though...
Byron Posted July 26, 2010 Posted July 26, 2010 Glad you got it ALL figured out djbob. I was afraid Stevie was going to have to take a trip back home.
prithwis Posted July 27, 2010 Posted July 27, 2010 hello DJBOB thanks for getting us back in operation. but believe me, when the service was down, i was clueless about what was going on .. because both heliohost and helionet were both unavailable. this is very unnerving. may i humbly suggest that you have a separate "face" -- in the form of, say, a blog on an independent server, like blogspot.com -- which (a) is unlikely to be affected by whatever is happening on your servers and ( b ) can act as a channel for distribution of news and status updates. i know you are offering a free service and so i really cannot demand anything in terms of customer service ... but this is a simple, inexpensive solution that you can consider. cheers and have a good day. prithwis
rvt Posted July 27, 2010 Posted July 27, 2010 ... thanks for getting us back in operation. but believe me, when the service was down, i was clueless about what was going on .. because both heliohost and helionet were both unavailable. this is very unnerving. may i humbly suggest that you have a separate "face" -- in the form of, say, a blog on an independent server, like blogspot.com -- which (a) is unlikely to be affected by whatever is happening on your servers and ( b ) can act as a channel for distribution of news and status updates. i know you are offering a free service and so i really cannot demand anything in terms of customer service ... but this is a simple, inexpensive solution that you can consider. ... Actually, I sent an email to DJBob regarding this during the downtime. (Not sure if it made it to you because I sent it to the Helionet admin email that kept sending me forum notifications, if you want me to send it elsewhere with my ideas just let me know). And this is what kills many small, free hosts. Communication is king in today's world and if people don't get the information they are looking for they pack up and move on.
coozzle Posted July 27, 2010 Posted July 27, 2010 Okay, so this problem was pretty complicated. Turns out that some system upgrade or something made it so when batch (not bash) scripts run, they don't show up in the process list. So that made one of our cron scripts, that checks if it's running already by checking the process list, basically start spawning off lots of copies. But it only did that when another conditional was met, which explains common recent downtimes. However, it took the server down one morning, and I was forced to hard reset the server. This caused some data corruption. On boot, some mounted partitions would decide to set themselves to read-only, despite not being mounted as such. This would cause unexplainable errors that took a while to debug. Once I figured out the problem, I knew the fix was to run fsck before boot. This had some problems of its own, as apparently the standard fsck command broke, so I had to get the datacenter tech to run a command that wouldn't prompt the user. Looks like everything's working now, though... What matters in the end is that server is up and running again BTW, just a suggestion. Can we have helionet moved to a different server than heliohost? That way we'll have a way to get some insights into what's happening in the back-end while the server is out for some reasons. This time I was anxious coz neither of the sites were available.
Ashoat Posted July 27, 2010 Author Posted July 27, 2010 As soon as our new server (Charlie) is up and running, we'll have HelioHost and HelioNet on an isolated virtual machine. Actually, I sent an email to DJBob regarding this during the downtime. (Not sure if it made it to you because I sent it to the Helionet admin email that kept sending me forum notifications, if you want me to send it elsewhere with my ideas just let me know).I got the email. The admin team is considering your offer and we'll get back to you once we've come to a consensus
Ashoat Posted July 27, 2010 Author Posted July 27, 2010 Okay, so looks like when I fixed the script that restarts our server when it's down, I also broke it.
Guest Geoff Posted July 27, 2010 Posted July 27, 2010 As soon as our new server (Charlie) is up and running, we'll have HelioHost and HelioNet on an isolated virtual machine. Ok, thats good because I was considering suggesting moving to VPS to reduce downtime.
Garret Posted July 28, 2010 Posted July 28, 2010 Free service or not, people using it require proper service. This was the last straw with me because I just lost 2 clients. Thanks for nothing. If it was just free - great. But look at the first page: "Professional-grade web hosting... with a twist You know that old saying: "You get what you pay for?" Well, HelioHost is changing that around. Our feature set beats out most professional web hosts, yet we come without a price tag." Guess what: Helio isn't changing anything, and we ARE getting what we pay for. Which pisses me off because it's false advertising. Good luck with your future endeavors, but I will be moving to a new host by the end of the week.
Safiria Posted July 28, 2010 Posted July 28, 2010 This downtime was the last straw. A friend of mine just gave me free unlimited hosting off his account, and I'm moving Safiria there. Thanks for nothing.
Byron Posted July 29, 2010 Posted July 29, 2010 In return, thanks for your nothing. Couldn't have said it any better myself.
Audality Posted August 3, 2010 Posted August 3, 2010 i know you's are saying everything is fixed, but i cant access my site through my url. My account has been cancelled because i can still log into cpanel and i cant think of anything else being wrong since here is working. Punctured-drum.com however throws me an error message saying the site is unavailable.
Byron Posted August 3, 2010 Posted August 3, 2010 Punctured-drum.com however throws me an error message saying the site is unavailable. I'm stil seeing your site here: Punctured-drum.com Refresh your browsers cache.
Facha Posted August 4, 2010 Posted August 4, 2010 you need to have two servers to minimize the load... i'm a CEO of a hosting but my private website is here because i think that heliohost bis better than my host..... (in: cpanel and disk space, but not in speed uptime and ftp speed) www.galaticohost.0lx.net but i think that Heliohost is better and i will kep my site here in heliohost.
Recommended Posts