Jump to content

BIG downtime


Ashoat

Recommended Posts

Today's downtime (and Sunday's earlier in the week) were the result of electrical problems with our datacenter (Hurricane Electric). Since I also recently de-journaled our /home filesystem (from ext3 to ext2) to improve disk I/O performance, the power cycle forced our machine to perform a very long e2fsck to confirm disk integrity.

 

Short answer: we should be a good bit faster because of the de-journaling, but when electrical problems like this force a power cycle, we might stay down for a day.

 

Electrical problems are rare for datacenters and hopefully this won't happen again.

Link to comment
Share on other sites

They do use UPSes. Those were actually what broke in this incident.

 

On Saturday night at Fremont 1 around 9pm during a thunderstorm, there was a power incident

involving the electric power utility lasting approximately 3 seconds that damaged two UPSes

causing them to fail. The UPS paralleling system automatically went into bypass in order to

restore power. The UPS service technicians inspected the units and determined the specific

components that failed and ordered parts.

 

The failed UPSes were damaged by power fluctuations that occurred during the thunderstorm.

 

On Tuesday morning at Fremont 1 at 07:27am PST, the electric power utility had a 1 second

power incident. Due to running on bypass this had an effect, where it normally would not have.

 

The UPS service technicians have the replacement parts and are on site performing the necessary

repairs to restore the 2 failed UPS units to normal operational status.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...