shiney
07-22-2010, 08:52 AM
Alright the explanation you all have been probably not caring about:
We did a semi-major system update, in a manner of speaking. In short, the linux equivalent of a (very overdue) Windows Update. Everything downloaded, installed, etc, it all looked fine, it all went fine. No hiccups. But, our MYSQL server ran out of memory and when it does that the entire server gets angry and won't move or do anything. It won't even let us shut down MYSQL -- we have to reboot the machine usually. (This is the main misconfiguration problem that I just can't seem to nail down for some idiot reason.)
So I was at work and another kind of co-admin was going to do this. Unfortunately it happened over lunch break so I wasn't immediately available. I got back to find a few emails saying "Yep locked up" "Yep rebooting" "Yep it never came back WTF" "Yep we're screwed". So I do the obvious thing and I call Josh (whom you all typically know as ProphetX). He tells me initially that I'm stupid, second that I'm a moron, and summarizes the conversation with an insinuation that I'm retarded. Par for the course. In the midst of the slew of insults you come to expect in a friendship with the man, he said he would contact the datacenter man who could get a tech down there to help troubleshoot briefly.
And so it began.
I spent 3+ hours on the phone the night the server went down, with this guy (Jim), trying to figure out why the server wouldn't load. We didn't have access to standard setup utilities and when attempting to do modification we were finding commands disabled or unavailable. Repeated reboots did nothing to help. Every time we built a new kernel, it didn't help. Everything we did ended in failure. After 3 straight hours of stealing my wife's computer and neglecting my family I called it, sent Jim home, wrote Josh something of a frantic "It's possibly on fire and/or infested with demons" message and canned it. Jim was very gracious about the whole ordeal as his presence was not included in our hosting contract and we should all be very thankful - as it stands we had to drop some funds for the man.
Anyhoo that began Waiting Game 1: The Waitening. We had to wait until Josh could find some free time to get to the datacenter and look into things himself. That did not occur until the weekend due to obligations. When he got down there he spent a few hours looking into stuff, a few more hours poking things with sticks, and when he finally got frustrated he opened the machine to find that the insides had exploded!
It is of particular note that right around this time is when Krylo texted my cell phone and I spoke with him. Also apparently Nikose? I got a new cell phone the next day. Then another new one because I was like "well this is only one cell phone away from the phone that had Nikose messaging it, I need another one". Seven cell phones and acid baths later I felt clean again. I was going to burn the old one, but I didn't want to let the smoke get into the air, so it's hermetically sealed and buried at sea. My final plan is to jettison it into the well cap in the Gulf of Mexico just before they shoot all the permanent sealant in there.
Yep the server exploded. Rock! A daughter card, which apparently assists the mainframe in communicating with the network interface card, had gone 'splode and leaked oil everywhere. So Josh had to order a new part from his supplier, which fortunately was also under contract. That began Waiting Game 2: Electric Boogaloo. Turns out Dell (Yeah, we're on a Dell...sigh...) determined that because the part was ordered Saturday, that didn't count as a "day" according to them, and while the part should have arrived Monday it didn't until Tuesday. So there was an encore performance on the end of Waiting Game 2: Electric Boogaloo, none of which was particularly enticing except for the Bollywood part.
So Josh finally managed to get back down to the datacenter to look into things once the new part was installed and was simply unable to get things up and running. He and I had a lengthy email conversation about it and what to do, I came up with 90 ideas, he shot them down for reasons why they wouldn't work, I came up with 90 more, he had a few thoughts, we determined we'd take another crack at it. This was around last Wednesday so yeah, thus began Waiting Game 3: Return of the Jedi.
Fast forward past about 15-20 frustrated emails of me bitching him out and him in turn bitching me out and we reach yesterday. We finally determined the system update could not have caused the problems we experienced. We also determined that the broken daughter card likely was a simple byproduct as well. No, what we finally found out yesterday was that the primary data-holding partition on the server had become unmounted, the partition that holds the master boot record, primary and newly built kernels, the partition that is necessary for all new updates and the partition that had all the information for how to connect to the goddamned internet. Our conversation started yesterday around 3PM CST with him saying "This thing is hosed and we're going to have to wipe" and ended at 3:15 CST with him saying "Well you're online, I'm going home".
And now we're back online and angry as ever. I'm sure that, if pressured, POS or maybe NonCon could relate what I said above through the power of out-of-context manga illustrations.
If you want to help me and/or Josh any in the "Repay Jim For His Assistance" fund then you can always paypal a few bucks toward elandriel at gmail (dot) com -- one of the unfortunate reasons this took a while is none of us are paying clients (nuklear power forums, the RO server, etc) and as a result the priority level of this server as compared to other real life pursuits is somewhat lower. Not to say it's not important, I did impress upon Josh how very important it is to me, but unfortunately he (and I) don't get anything for these sites realistically, I'm not about to charge Brian for forums he doesn't particularly want, I can't charge people to play an unlicensed game server, and I don't have the money myself to pay for anything!
So yeah. That's the story in a nutshell.
We did a semi-major system update, in a manner of speaking. In short, the linux equivalent of a (very overdue) Windows Update. Everything downloaded, installed, etc, it all looked fine, it all went fine. No hiccups. But, our MYSQL server ran out of memory and when it does that the entire server gets angry and won't move or do anything. It won't even let us shut down MYSQL -- we have to reboot the machine usually. (This is the main misconfiguration problem that I just can't seem to nail down for some idiot reason.)
So I was at work and another kind of co-admin was going to do this. Unfortunately it happened over lunch break so I wasn't immediately available. I got back to find a few emails saying "Yep locked up" "Yep rebooting" "Yep it never came back WTF" "Yep we're screwed". So I do the obvious thing and I call Josh (whom you all typically know as ProphetX). He tells me initially that I'm stupid, second that I'm a moron, and summarizes the conversation with an insinuation that I'm retarded. Par for the course. In the midst of the slew of insults you come to expect in a friendship with the man, he said he would contact the datacenter man who could get a tech down there to help troubleshoot briefly.
And so it began.
I spent 3+ hours on the phone the night the server went down, with this guy (Jim), trying to figure out why the server wouldn't load. We didn't have access to standard setup utilities and when attempting to do modification we were finding commands disabled or unavailable. Repeated reboots did nothing to help. Every time we built a new kernel, it didn't help. Everything we did ended in failure. After 3 straight hours of stealing my wife's computer and neglecting my family I called it, sent Jim home, wrote Josh something of a frantic "It's possibly on fire and/or infested with demons" message and canned it. Jim was very gracious about the whole ordeal as his presence was not included in our hosting contract and we should all be very thankful - as it stands we had to drop some funds for the man.
Anyhoo that began Waiting Game 1: The Waitening. We had to wait until Josh could find some free time to get to the datacenter and look into things himself. That did not occur until the weekend due to obligations. When he got down there he spent a few hours looking into stuff, a few more hours poking things with sticks, and when he finally got frustrated he opened the machine to find that the insides had exploded!
It is of particular note that right around this time is when Krylo texted my cell phone and I spoke with him. Also apparently Nikose? I got a new cell phone the next day. Then another new one because I was like "well this is only one cell phone away from the phone that had Nikose messaging it, I need another one". Seven cell phones and acid baths later I felt clean again. I was going to burn the old one, but I didn't want to let the smoke get into the air, so it's hermetically sealed and buried at sea. My final plan is to jettison it into the well cap in the Gulf of Mexico just before they shoot all the permanent sealant in there.
Yep the server exploded. Rock! A daughter card, which apparently assists the mainframe in communicating with the network interface card, had gone 'splode and leaked oil everywhere. So Josh had to order a new part from his supplier, which fortunately was also under contract. That began Waiting Game 2: Electric Boogaloo. Turns out Dell (Yeah, we're on a Dell...sigh...) determined that because the part was ordered Saturday, that didn't count as a "day" according to them, and while the part should have arrived Monday it didn't until Tuesday. So there was an encore performance on the end of Waiting Game 2: Electric Boogaloo, none of which was particularly enticing except for the Bollywood part.
So Josh finally managed to get back down to the datacenter to look into things once the new part was installed and was simply unable to get things up and running. He and I had a lengthy email conversation about it and what to do, I came up with 90 ideas, he shot them down for reasons why they wouldn't work, I came up with 90 more, he had a few thoughts, we determined we'd take another crack at it. This was around last Wednesday so yeah, thus began Waiting Game 3: Return of the Jedi.
Fast forward past about 15-20 frustrated emails of me bitching him out and him in turn bitching me out and we reach yesterday. We finally determined the system update could not have caused the problems we experienced. We also determined that the broken daughter card likely was a simple byproduct as well. No, what we finally found out yesterday was that the primary data-holding partition on the server had become unmounted, the partition that holds the master boot record, primary and newly built kernels, the partition that is necessary for all new updates and the partition that had all the information for how to connect to the goddamned internet. Our conversation started yesterday around 3PM CST with him saying "This thing is hosed and we're going to have to wipe" and ended at 3:15 CST with him saying "Well you're online, I'm going home".
And now we're back online and angry as ever. I'm sure that, if pressured, POS or maybe NonCon could relate what I said above through the power of out-of-context manga illustrations.
If you want to help me and/or Josh any in the "Repay Jim For His Assistance" fund then you can always paypal a few bucks toward elandriel at gmail (dot) com -- one of the unfortunate reasons this took a while is none of us are paying clients (nuklear power forums, the RO server, etc) and as a result the priority level of this server as compared to other real life pursuits is somewhat lower. Not to say it's not important, I did impress upon Josh how very important it is to me, but unfortunately he (and I) don't get anything for these sites realistically, I'm not about to charge Brian for forums he doesn't particularly want, I can't charge people to play an unlicensed game server, and I don't have the money myself to pay for anything!
So yeah. That's the story in a nutshell.