server issues

Discussion about the Geocaching Australia web site
User avatar
ideology
Posts: 2763
Joined: 28 March 03 4:01 pm
Location: Sydney
Contact:

server issues

Post by ideology » 30 June 07 11:14 pm

we are having some server issues at the moment, with apache crashing 6 or 7 times this weekend already.

we have turned off the gca site database to investigate and hopefully fix the issue. it is unlikely to be fixed tonight due to the complexity of the issue

sorry for the inconvenience

User avatar
ideology
Posts: 2763
Joined: 28 March 03 4:01 pm
Location: Sydney
Contact:

Post by ideology » 01 July 07 1:03 am

we are slowly returning the pages to normal functionality. so far we have done the homepage and cacher pages. we need to spend time on each page so it definitely won't be fixed tonight. we will definitely have the site operating in limited functionality mode* by the end of this weekend

* where "limited functionality mode" = whatever we can get working this weekend!

User avatar
ideology
Posts: 2763
Joined: 28 March 03 4:01 pm
Location: Sydney
Contact:

Post by ideology » 01 July 07 11:47 am

- the cache page is now back online
- we got the daily emails out this morning
- the statpack did not update but it should work tomorrow at 3am as normal. - automated cache archiving was offline but should run tomorrow morning at 7am as normal
- future cache releases did not run at midnight but should start at midnight tonight

overall the issue is a combination of
1. a growing database
2. some sloppy database administration (oops!)
3. various web robots and spiders crawling the pages
4. some code that works okay on a development machine but doesn't quite scale on a system that is impacted by 1-3 above!

we thought the issue had been solved a couple of months ago, but it turned out that one of our sysadmins had been constantly nursing apache but didn't tell us, so we didn't know there was a problem.

so in an effort to solve the problem we are doing a code walkthrough of every page of the site, have written some database logging routines which log each database query and the elapsed time taken and looking for ways to optimise it.

the killer one was a query that was taking 0.7 of a second. that doesn't sound like much, but even this morning at 8am we were averaging 3 queries per second from the main gca site (ie excluding forum, gallery, subsites, etc) and that was at a non-peak time and after we'd turned away the robots last night. so little wonder that it was going slow during peak times.

our biggest problem is that our mac mini just died, so it looks like it's off to the apple store to pick up one of those new whiz-bang powerbooks!

our plan today is to continue walking through the code, and once we are satisfied that it is 80% optimal, re-enabling that page.

sorry for the inconvenience, but we need to keep the server alive for other services we provide. we figure if we put our heads down today and focus on each page in turn, we should get a fair chunk* of it operational again by the end of the weekend.*

* see previous disclaimer!

User avatar
Mr Router
1500 or more caches found
1500 or more caches found
Posts: 2782
Joined: 22 May 05 11:59 am
Location: Bathurst

Post by Mr Router » 01 July 07 11:54 am

We can see no issues ! Keep up the fine work.
Although we must look into your Mac habits a bit closer :oops:

User avatar
ideology
Posts: 2763
Joined: 28 March 03 4:01 pm
Location: Sydney
Contact:

Post by ideology » 01 July 07 9:40 pm

we've walked through and made minor modifications to the code behind 20 or so more pages, with more to do! so hopefully the site is getting back to a reasonable amount of functionality now.

we have also put in place a nightly database "vacuum" process which should keep things running smoother

our plan is to review and re-enable a few more pages tonight and see how the server goes over the next few days before re-enabling the rest of the pages (ie we want to make sure what we are doing is making a difference!)

User avatar
solomonfamily
1700 or more caches found
1700 or more caches found
Posts: 238
Joined: 28 September 05 9:02 am

Post by solomonfamily » 01 July 07 10:07 pm

thanks for your efforts and keeping us all informed. Cheers

Knot_gillty
100 or more tracks walked
100 or more tracks walked
Posts: 249
Joined: 29 January 07 9:19 pm
Location: Trafalgar VIC,

Post by Knot_gillty » 01 July 07 11:32 pm

Great work you are doing. 8) Most appreciated.

User avatar
zactyl
Posts: 1171
Joined: 28 July 04 6:40 pm
Location: Mullumbimby, NSW

Post by zactyl » 02 July 07 2:47 am

Thanks for all the hard work :D and sorry you lost your weekend to it. :(

User avatar
ideology
Posts: 2763
Joined: 28 March 03 4:01 pm
Location: Sydney
Contact:

Post by ideology » 02 July 07 9:32 am

thanks for your comments
we think we have about 80%* of the site up and running at the moment
we are going to monitor this for a couple of days, and if the server stays stable and fast, we'll conclude we're on the right track and review the rest of the code. if the server still has problems, we'll analyse the code of the enabled pages even more carefully to try to work out what's going on.

if there's a particular page of interest that you'd like re-enabled, please let us know and we'll take a look at it. the ones we are avoiding at the moment are the stats and the dashboard because we know they are pretty hard on the database. but if there's simple pages like logging a gca cache or whatever that we've forgotten to re-enable, please let us know.


* where 80% is measured on pageviews, not pages. there are > 20% of pages that still aren't operational, but they aren't used much. we focused on the ones that were being used.

User avatar
Sunshine Toledo
5500 or more caches found
5500 or more caches found
Posts: 439
Joined: 07 August 06 6:07 pm
Location: Wavell Heights, Brisbane
Contact:

Post by Sunshine Toledo » 02 July 07 10:10 am

You are doing a fabulous job. Thanks...

User avatar
edmil
1000 or more caches found
1000 or more caches found
Posts: 149
Joined: 24 June 05 11:01 am
Location: Upwey

Post by edmil » 02 July 07 5:06 pm

I echo the words of thanks for the work you are doing to keep the site up and running. What many users take for granted equates to a lot of work behind the scenes and it often goes unrecognised. It is but a small inconvenience as you work your fingers to the bone to keep us happy Thanks :)

User avatar
mtrax
Posts: 1974
Joined: 19 December 06 9:57 am
Location: Weston Creek, Canberra

Post by mtrax » 02 July 07 5:29 pm

I have one gripe..
<br>
whats with the m$soft ram drive.. placeholder on the page. can we have something more geo :lol: eg. a broken sat or gps
ie Image
thanks for your work

SUBYDAZZ
600 or more caches found
600 or more caches found
Posts: 81
Joined: 20 June 06 8:38 pm
Location: Singleton, Hunter Valley, NSW
Contact:

Post by SUBYDAZZ » 02 July 07 10:23 pm

I take it the cache editing pages are not back up then? Just tried to update the co-ords of one of my locationless caches (http://geocaching.com.au/cache/ga0643) but no go. No rush, love your work.

User avatar
caughtatwork
Posts: 17017
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 02 July 07 10:51 pm

Works for me SUBYDAZZ

SUBYDAZZ
600 or more caches found
600 or more caches found
Posts: 81
Joined: 20 June 06 8:38 pm
Location: Singleton, Hunter Valley, NSW
Contact:

Post by SUBYDAZZ » 03 July 07 10:05 am

caughtatwork wrote:Works for me SUBYDAZZ
Apparently not for me though - just tried again: This page is currently offline while we re-tune the database.

Post Reply