Okay, so this problem was pretty complicated. Turns out that some system upgrade or something made it so when batch (not bash) scripts run, they don't show up in the process list. So that made one of our cron scripts, that checks if it's running already by checking the process list, basically start spawning off lots of copies. But it only did that when another conditional was met, which explains common recent downtimes.
However, it took the server down one morning, and I was forced to hard reset the server. This caused some data corruption. On boot, some mounted partitions would decide to set themselves to read-only, despite not being mounted as such. This would cause unexplainable errors that took a while to debug. Once I figured out the problem, I knew the fix was to run fsck before boot. This had some problems of its own, as apparently the standard fsck command broke, so I had to get the datacenter tech to run a command that wouldn't prompt the user.
Looks like everything's working now, though...