Page 4 of 5

Posted: 10 February 07 9:43 pm
by Shifter Brains
3679.55 miles (5921.67 kilometres)</p>
But this doesn't include the 70 odd GCA caches we have done. :)

Posted: 10 February 07 9:59 pm
by caughtatwork
Bear_Left wrote:
caughtatwork wrote:
CraigRat wrote:hey c@w, the cacher stats page looks lacking now :lol:

What do you think the computational load would be for a distance feature like that?
Will be available in the next release. Slightly different numbers due to differences in the circumference of the earth in our two calculations. Close enough for what the number represents.
An average (km/cache) and a median would be interesting.
The median should take care of the skewing of the figures by the frequent flyer cachers.
Your wish is my command.
Will be available in the next release.

By the way, according to my GC (only) stats from the new calculations:
40,560.54 total
6.54 median
37.18 average

Good suggestion.

Posted: 10 February 07 10:03 pm
by caughtatwork
Wingaap wrote:It'd also be interesting to see the most economical cacher ie most caches in the least distance.
I would too, but I don't that's going to be possible.

The problem becomes one of having to calculate the incremental step distance for every single log in the database in order to compare the distances between cachers.

With over 450,000 (and growing daily) logs, the CPU to dedicate to this type of calculation would be too significant.

inatn.com can do some of this differently to us here as they calculate everything based on a loaded file, so they only have to do the calculation once. As we gets logs throughout the day, we would need to recalculate it every time and I just don't think we can afford the CPU time.

Great idea, but I think impractical at this stage.

Posted: 10 February 07 10:06 pm
by Cached
An average (km/cache) and a median would be interesting.
The median should take care of the skewing of the figures by the frequent flyer cachers.
Can we have standard deviation as well?

Posted: 10 February 07 10:23 pm
by caughtatwork
Cached wrote:
An average (km/cache) and a median would be interesting.
The median should take care of the skewing of the figures by the frequent flyer cachers.
Can we have standard deviation as well?
Sheesh! I only just learnt what a median was.
What's a standard deviation?

Posted: 10 February 07 10:53 pm
by Cached
Standard Deviations are a really important statistical tool.

From wikipedia:
Another way of seeing it is to consider sports teams. In any set of categories, there will be teams that rate highly at some things and poorly at others. Chances are, the teams that lead in the standings will not show such disparity, but will be pretty good in most categories. The lower the standard deviation of their ratings in each category, the more balanced and consistent they might be. So, a team that is consistently bad in most categories will have a low standard deviation indicating they will probably lose more often than win.
People with a large standard deviation (SD) do more long distance trips than those with a smaller SD - there is less variation in their distances.

Image
Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for 68.27% of the set; while two standard deviations from the mean (blue and brown) account for 95.45%; and three standard deviations (blue, brown and green) account for 99.73%.
If none of this makes any sense, I'll have another attempt tomorrow!

Posted: 10 February 07 11:02 pm
by caughtatwork
Hmmmmmmm.
Ahhhhhhhhhhh.
Gotcha.
I checked my result against an excel spreadsheet of all my individual distances and came out with the same number, so I'm happy that it works (even if I'm still unsure of exactly what I'm doing)

So this gets me:
40,560.54 total
6.54 median
37.18 average
180.64 std dev

Posted: 10 February 07 11:04 pm
by CraigRat
Could be easy:
from :http://dev.mysql.com/doc/refman/5.0/en/ ... tions.html

Code: Select all

#

STD(expr) STDDEV(expr)

Returns the population standard deviation of expr. This is an extension to standard SQL. The STDDEV() form of this function is provided for compatibility with Oracle. As of MySQL 5.0.3, the standard SQL function STDDEV_POP() can be used instead.

These functions return NULL if there were no matching rows.
#

STDDEV_POP(expr)

Returns the population standard deviation of expr (the square root of VAR_POP()). This function was added in MySQL 5.0.3. Before 5.0.3, you can use STD() or STDDEV(), which are equivalent but not standard SQL.

STDDEV_POP() returns NULL if there were no matching rows.
#

STDDEV_SAMP(expr)

Returns the sample standard deviation of expr (the square root of VAR_SAMP(). This function was added in MySQL 5.0.3.

STDDEV_SAMP() returns NULL if there were no matching rows.
EDIT: You beat me to it.... :lol:

Posted: 10 February 07 11:06 pm
by caughtatwork
Yeah, except that the distances aren't stored in the tables now are they. That would make life a lot easier.

It's OK, I've worked it out. If you want a sneak peek, it's up in the SVN now. See if it's right for you.

Posted: 10 February 07 11:40 pm
by TeamAstro
for the record:

Approximate cache-to-cache distance: 67551.35 miles (108713.36 kilometres) (Excludes locationless and known traveling caches)

Active Caches: 1213 of the caches you've found are still active (84.9%)

Average log size: 67.4 words - Biggest log: 708 words - Shortest log: 1 word - Number of one-word logs: 1

Geeez, thats a lot of K's.



Year Total
2002 23
2003 237
2004 379
2005 345
2006 453
2007 1


....... mmmm, only 1 this year eh?? I don't think so. (yep, up to date PQ) Cool site though.

Astro.

Posted: 11 February 07 2:23 pm
by Cached
caughtatwork wrote:Hmmmmmmm.
Ahhhhhhhhhhh.
Gotcha.
I checked my result against an excel spreadsheet of all my individual distances and came out with the same number, so I'm happy that it works (even if I'm still unsure of exactly what I'm doing)

So this gets me:
40,560.54 total
6.54 median
37.18 average
180.64 std dev
Which means about 68% of your finds are within 180km.

See, nice useful statistic.

Posted: 11 February 07 2:47 pm
by dak's Emu Mob
348783.94 miles (561313.35 kilometres) (Excludes locationless and known travelling caches)
<p>
I couldn't resist fixing the spelling error (traveling -> travelling). :wink:
<p>
Cheers,
<p>
dak

Posted: 11 February 07 4:17 pm
by Team Falling Numerals
Cached wrote:
Which means about 68% of your finds are within 180km.

See, nice useful statistic.
within 180km of where?

home?
the cache found previously?
the cache found next?
the nearest pie shop?

Are we measuring cache to cache distance or home to cache distance. Need to make sure that the conclusions that we make for any statisic tie back to the population.

and is our population normally distributed - I see an argument that it would be quite significantly skewed towards shorter distances?

oh, head spins, time to lie down :shock:

Posted: 11 February 07 5:40 pm
by CraigRat
caughtatwork wrote:Yeah, except that the distances aren't stored in the tables now are they. That would make life a lot easier.

It's OK, I've worked it out. If you want a sneak peek, it's up in the SVN now. See if it's right for you.

Code: Select all

Total distance between attempted caches: 	27,954.12 km (44,987.80 mi)
Median distance between attempted caches: 	19.95 km (32.11 mi)
Average distance between attempted caches: 	58.36 km (93.92 mi)
Standard deviation distance between attempted caches: 	134.53 km (216.50 mi)
My Dev data is a little dodgy, but it looks like it works ok.....

Posted: 11 February 07 7:31 pm
by JackHenry
Is '?' a word.