PDA

View Full Version : 20070331 AM Problems



Maksutov
2007-Mar-31, 11:33 AM
I noticed it took me about one hour to finally get a response posted to a thread.

On the way, there were numerous database error messages, long waits for edits, etc.

Was this a DOSS situation, or was the BAUT database undergoing some kind of extensive update?

:think:


PS: Heck, this post took a long time to "take". Anything unusual going on?

Moose
2007-Mar-31, 11:54 AM
The board does it's backups at roughly this time of day, Mak. It may well have something to do with that.

Jeff Root
2007-Mar-31, 11:56 AM
Look at the "sticky" thread
http://www.bautforum.com/showthread.php?t=52864
"FYI: Forum is backed up starting at 1100 UTC daily"

When ToSeek posted that, the forum immediately stopped
working at 5 AM. Now it stops working at 5:53 or some such,
apparently due to a combination of daylight savings time and
a fast computer clock.

-- Jeff, in Central Daylight Savings Time

Jeff Root
2007-Mar-31, 11:57 AM
I just had to go get that link.

-- Fejj, mi Nimminapliso

Maksutov
2007-Mar-31, 12:04 PM
So we might as well call the BAUT kaput from 5AM to 6AM CDT?

OK.

antoniseb
2007-Mar-31, 12:08 PM
I had some unusual problems myself this morning (not the usual backup problems). It seemed like parts of the software were being removed and upgraded.

Serenitude
2007-Mar-31, 12:57 PM
There's a time in the morning (local time) that I just avoid posting or reading - the forum is pretty much unresponsive. Database errors, all sorts of demonic activity during back up.

Jeff Root
2007-Mar-31, 04:41 PM
So we might as well call the BAUT kaput from 5AM to 6AM CDT?

OK.
I think it is more like 5:50 or so to maybe 6:30 AM CDT, or
4:50 or so to maybe 5:30 CST. An hour late because of the
change to daylight time; several minutes early because of a
fast computer clock.

I think.

At least I think I think.

-- Jeff, in Minneapolis

Fraser
2007-Mar-31, 04:58 PM
Yeah, that's all during the backup. The backup is taking just too long now, so I need to figure out a new solution. I've got a few ideas, I just need to test them out. I can run the backup routine at a lower priority, so forum gets priority, and there are other programs out there to back up the database.

Smashing Young Man
2007-Apr-02, 10:43 PM
Yeah, that's all during the backup. The backup is taking just too long now, so I need to figure out a new solution. I've got a few ideas, I just need to test them out. I can run the backup routine at a lower priority, so forum gets priority, and there are other programs out there to back up the database.

What's the attachment and picture storage method you have set for the forums? File or database? Moving these to a file system can really help cut down on the database size, and should make performing backups a much speedier process.

Maksutov
2007-Apr-06, 09:59 AM
20070407: Wow, that was some time out! 12:00 to 4:45 AM CDT.

Backup and software upgrade?

:think:

01101001
2007-Apr-06, 03:15 PM
20070407: Wow, that was some time out! 12:00 to 4:45 AM CDT.
When I tried during that, it wasn't even answering pings. No... worse, I couldn't get DNS resolution.

Fraser
2007-Apr-06, 03:47 PM
Oh, our hosting provider upgraded their network last night, so the site was inaccessible during that period.

NEOWatcher
2007-Apr-06, 04:15 PM
Oh, our hosting provider upgraded their network last night, so the site was inaccessible during that period.
As long as we find out beforehand. :shhh:

Sarcasm comes from somebody in IT but not in operations that keeps getting stung by operations doing changes that (they say/think) should be transparent, resulting in us spending at least a day to find out why things are crashing.

Moose
2007-Apr-06, 04:48 PM
I hear you. I still remember the time they pushed a patch across the network at morning log-in time. Because of the server load, the patch took 45+ minutes to download and install, and we were all stuck just past the login screen for the entire time.

And of course, everybody comes to me for the answers/fixes I don't have if tech support doesn't do me the courtesy of a quick email.

NEOWatcher
2007-Apr-06, 04:59 PM
...And of course, everybody comes to me for the answers/fixes I don't have if tech support doesn't do me the courtesy of a quick email.
Well, I can see thier side of the coin too. If anything breaks around the time of their change, everyone comes down on them like a ton of bricks.

But; In our case, it's complete denial.
I once spent about a week and a half with some intermittent communications issues between computers on different platforms. They kept saying my software was flawed. We pointed out other similar issues occuring that they said had no relationship.
Finally in a large conference call, we asked if there were any changes in the network the day we started having the issues.
They had the nerve to say NO, demean us by saying we don't know what we are talking about, and later in the conversation say "All we did was move a (router?) to the other side of the firewall, and change the (bridge?) into a (router?), that should have had no effect"

Moose
2007-Apr-06, 05:29 PM
Well, I can see thier side of the coin too. If anything breaks around the time of their change, everyone comes down on them like a ton of bricks.

One of my job tasks is to act as a filter to keep the users from overwhelming our DBA/Security admin with requests that can be handled at a lower level. Our local users know they're supposed to contact me and I contact the DBA, so they come to me directly. On everything even remotely computer related. That's fine. That's part of my job. And I can help them on most issues.

If the network admin (the guy who pushes the button on patches) would simply make a simple mailing list to notify the 14 or so tech support departments province-wide (one per major site) whenever they do anything that could potentially cause outages, we could act as a filter for him just as easily.

It would save him time, and it would save us time investigating these things.

Whirlpool
2007-Apr-14, 01:35 PM
Is there any development on this problem. ?
I'm still experiencing "Database Errors" every 6PM my time , which last almost an hour and sometimes extends.

:think:

Fraser
2007-Apr-18, 08:14 AM
I haven't resolved it yet. The solution is going to be pretty difficult, since the backup is becoming gigantic. I don't want to lose our backups in the mean time.

Maksutov
2007-Apr-18, 09:01 AM
I haven't resolved it yet. The solution is going to be pretty difficult, since the backup is becoming gigantic. I don't want to lose our backups in the mean time.I can send you some ZIP disks if you need them.

http://img137.imageshack.us/img137/566/iconwink6tn.gif

Jeff Root
2007-Apr-18, 12:24 PM
The down time does seem to keep expanding, in both directions.

I don't understand why the backup is a problem.
What software are you using and what are you telling it to do?

-- Jeff, in Minneapolis

HenrikOlsen
2007-Apr-18, 02:44 PM
I remember someone asking about this a while ago, but I can't remember the answer.

Are attachments saved as files, or in the database?
If the latter, a great saving in backups can be got by moving the attachments to files.

Fraser
2007-Apr-18, 05:28 PM
The big processor hit is the mysql dump, the second one is when the whole backup is zipped up so I can take it offsite. I've got both processes running in Unix as nice, but they still don't play nicely. So my first plan is either to use a Perl script that lets you extract a mysql database directly from the filesystem. Apparently, it's very efficient and low impact.

The second option is to mirror BAUT on a second server that acts as a slave, so that records are written to both at the same time. I can then back up the slave system without hampering the live server.

The third option is to do an incremental backup, where only the new records added or changed today are backed up.

I've got a couple of days of finals left in school, and then I'll solve this problem once and for all.

HenrikOlsen
2007-Apr-18, 11:36 PM
The problem is that even though the mysqldump is running nice'ed, it's getting the data by querying the mysql server, which is not, and it's the database querying that interferes with the site running.

What OS is it running on?

If *BSD, you could issue a FLUSH TABLES WITH READ LOCK, do a snapshot of the filesystem, then RELEASE LOCKS.
Then at your ease, mount the snapshot somewhere else.

Alternately, if you're on Linux or some other OS that can't take snapshots, accept that you're off the air entirely for a few minutes longer than would be the case with the snapshot, do FLUSH TABLES WITH READ LOCK, copy the data directory somewhere else, then RELEASE LOCKS.

Once you have a clean copy of the datafiles you can either run a second mysqld (niced) with the datadir on the snapshot and mysqldump from that one or zip the copied files and back them up, they will be consistent and clean.

The important trick is that FLUSH TABLES WITH READ LOCK will write out all pending changes, and close all datafiles until RELEASE LOCKS, so you can manipulate the files without worrying about concurrent changes by the server.

Fraser
2007-Apr-19, 10:11 PM
Give me a couple more days and then I'll pick your brains to help devise a backup strategy. It's running on CentOS linux. The problem is that the backup script is provided as a service by my management system, so I'd need to hack it.

Jeff Root
2007-Apr-19, 10:40 PM
Is there any reason not to make it an incremental backup?
I bet it would reduce the size by three orders of magnitude and
cut the time from what appears to be more than an hour to less
than three minutes.

-- Jeff, in Minneapolis

Fraser
2007-Apr-20, 12:11 AM
I don't know of a way to make an incremental backup of a database. The file system sees it all as one great big file which is being updated all the time. So every day, the system sees that the database has changed so it gets backed up too.

HenrikOlsen
2007-May-03, 12:38 PM
It looks like the move of images from the database to files has done a lot to cut down the time where the board is unresponsive.

I didn't check earlier, but before the move it would be well past now before it became usable again, and it seems to have done quite well this time.

I'll have a look again tomorrow to see if I can see how long it takes.

NEOWatcher
2007-May-03, 12:58 PM
I didn't check earlier, but before the move it would be well past now before it became usable again, and it seems to have done quite well this time.

Yes, it has been unusable... But this morning, I have experienced no noticeable delays.

Fraser
2007-May-03, 04:56 PM
It's still bottoming out, though. There's a period for a few minutes when the server is overloaded. I'm going to try some tricks to optimize the server.

HenrikOlsen
2007-May-04, 11:16 AM
A few minutes is a lot better than several hours :)

Occam
2007-May-06, 12:03 PM
I cannot access this site for at least an hour each night, usually between 10.30pm and 11.30pm New Zealand time. Sometimes it's longer.

Gillianren
2007-May-07, 07:07 PM
I couldn't access the site for over a day--but it's because my boyfriend's brother screwed up while networking our computers and I couldn't get online at all. (Graham's computer still won't access the internet, and we don't know why.)