Ashoat

August 16, 2011

Looks like everything is okay right now.

This sort of issue occurs when cPanel's build of httpd.conf (rather randomly) fails part-way through, leaving a broken httpd.conf.

August 16, 2011

What domain are you trying to change settings for? I can just manually sync the record and see if that fixes the issue.

August 16, 2011

Closing due to inactivity and probable resolution...

PS: SSH access granted on Cody to all admins.

August 16, 2011

Huh, that's weird... aren't you seeing http://stevie.heliohost.org/cgi-sys/suspendedpage.cgi and http://johnny.heliohost.org/cgi-sys/suspendedpage.cgi?

[Solved] One Account per User. · August 9, 2011

jje: Yeah, that's fine. There's actually a list of IPs from which we ignore that rule in the settings table, but the settings module in the ACP is long broken.

August 9, 2011

We actually don't run upcp regularly due to the high CPU and disk I/O necessary for that process.

August 9, 2011

Sorry for the slowness. Johnny required a manual fsck as their was some disk corruption that resulted from the power failure. I just finished resolving those issues, and Johnny should be up soon.

August 8, 2011

HE's report:

At approximately 6:00am PDT, at the Fremont 1 facility, power was interrupted to
the datacenter ﬂoor when the main UPS input breaker tripped. The breaker was reset,
the maintenance bypass closed, and power was restored at approximately 6:20am PDT.
Support technicians on site visually inspected customer equipment located in the
affected area, restarting any that needed assistance.

UPS technicians will be onsite today to assist us in determining the cause of the failure.

Moving forward, we are now going to replace this UPS system with another brand as
soon as possible and we are going to go from an N+1 UPS conﬁguration to a 1+1
electrical system, with each PDU having a static transfer switch to a separate electrical
distribution system so that if something happens all the PDUs can be switched over.
The time line for this is in the next few months due to the lead time on getting this type
of equipment.

The power failure actually only lasted about twenty minutes, but since most of our servers are running ext2 to avoid wasting too much CPU time on kjournald we had to run a long fsck on them. Not sure what's going on with Johnny right now. I can't SSH in, so I'll have to log in via VMWare vSphere when I get home.

August 7, 2011

Geoff: are you sure you're phrasing that correctly? A nameserver is authoritative for a domain if that domain (or its earliest parent doman) has an NS record for that nameserver. In this sense, I would imagine that ns2.heliohost.org is authoritative for all HelioHost domains.

Are you saying that Cody doesn't have the records for a lot of *.heliohost.org domains? That might be the case. I'm running the following command on Cody right now to sync all the records:

/scripts/dnscluster syncall --full

If this happens again in the future, you could try running that command and seeing if it helps. Make sure to run it in screen, though, as it takes a while.

UPDATE

Syncing is done. "rndc reload" is failing for some unknown reason (possible related to the issue at hand), so I'm running /scripts/restartsrv_named.

UPDATE

Looks like named.conf was broken on Cody. Consequently, ns2.heliohost.org is currently down. Running a script (mv /etc/named.conf /etc/named.conf.broken; /scripts/rebuildnamedconf) on Cody to rebuild named.conf...

UPDATE

Okay, named.conf rebuilt and named restarted on Cody. Both nameservers seem to be correctly responding to DNS requests now.

August 6, 2011

Hmm... I'd say we have enough space on both Stevie and Johnny right now to increase limits to 500 MiB. Unfortunately, the process of increasing disk limits is a slow and disk-I/O-heavy one. I'll see if I can figure out a time to do it in the future.

August 6, 2011

My plan is to keep an SSH connection open with Cody and then log in and try to see what's going on next time he goes down.

August 6, 2011

It seems that ns1.heliohost.org (Stevie) has the right results for this MX record, but ns2.heliohost.org (Cody) does not. However, the actual zone files seem to be intact on Cody. I'm syncing zone files right now, but once that's done I'll restart named on Cody in the hopes that that will correct this issue.

Please continue discussion of this issue in this thread. Closing...

August 6, 2011

Seems to work now. This issue probably occurred when ns1 was down and ns2 wasn't synced. I'm syncing Cody's records right now; continue discussion in this thread.

Closing...

August 6, 2011

I'm generally on IRC, but I won't notice your message unless you type my username in your message or if you /msg me. My username is "ashoat" (my first name).

We definitely can't upgrade our install of Perl. I tried this once and ran into a lot of difficulties since cPanel is mostly written in Perl. If there was a way to install a newer version side-by-side then I'd be up for that, but I wasn't able to do it last time (if I recall correctly).

August 6, 2011

I'm guessing that cPanel EasyApache just disables those extensions by default, even though PHP has them on by default. We'll enable them next time we rebuild.

List of PHP modules we are currently going to add when we rebuild:

fileinfo

intl

As for Roundcube, it has a process time leak somewhere. I disabled it (the most recent disabling) after I found it using far too much CPU time on one of the servers. This issue might have since been patched, though... I'm not sure.

August 6, 2011

Might just me, but I can't seem to access it. I'm getting a 404...

August 6, 2011

Geoff: no, that script is definitely not a good idea. If we really wanted to fill up resources on the queue then we could do so by changing bihourly.php settings to make a single process stay permanently open. That way, we won't have two account creations happening simultaneously, which usually is really bad for the HDD.

July 30, 2011

Okay, try logging in now.

July 30, 2011

I think we have a consensus against a paid site builder. Closing...

July 30, 2011

Closing due to inactivity...

July 30, 2011

I think my code was set up to treat a 0 limit as no limit. I've switched Johnny's daily registration limit to 1 to get around this. Hopefully Johnny will clear up his account queue soon.

July 30, 2011

Hmm, looks like this is occurring due to bihourly.php sometimes being run while heliohost_nightly is being run. I've rearranged the run times to try to avoid collisions.

July 27, 2011

Okay, I restarted Tomcat. If it's still not working, I'll need your HelioHost username.

July 25, 2011

I usually do my killing through htop. I open it up, set it to tree mode (F5), find the bihourly tree (/bihourly), select everything in the tree (go up and down with the arrow keys and select with the spacebar) and then kill everything selected (F9, enter).

You could alternately use killall. killall bihourly_wrapper should do the trick; let me know if it doesn't.

July 25, 2011

Server and username?

Sign In

Ashoat

Posts

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Posts posted by Ashoat

[Solved] Website doesn't show, error message

[Solved] Can't configure domain DNS

[Solved] Account not found

[Solved] Suspended page outdated

[Solved] One Account per User.

[Solved] PHP Extensions

Power Failure

Power Failure

[Solved] Account not found

[Solved] Increased Web Space For Forum Posts ?

[Solved] Why is HelioNet unstable?

[Solved] Site email is sent but not received

[Solved] Queued: lolita

[Solved] Can't install JSON:XS module for Perl

[Solved] PHP Extensions

HelioPanel Beta

[Solved] Johnny signups frozen

[Solved] locked out of account due to brute force attack

[Solved] Site builder

[Solved] Queued: koyal142

[Solved] account setup date

[Solved] Johnny signups frozen

[Solved] [SOLVED] not execute jsp and java

[Solved] account setup date

[Solved] locked out of account due to brute force attack

HelioHost

Donate

Forums

Activity