How far can you go...or have been

Shifter Brains · Post by **Shifter Brains** » 10 February 07 9:43 pm

3679.55 miles (5921.67 kilometres)</p>
But this doesn't include the 70 odd GCA caches we have done.

Post by **caughtatwork** » 10 February 07 9:59 pm

Bear_Left wrote:
caughtatwork wrote:
CraigRat wrote:hey c@w, the cacher stats page looks lacking now

What do you think the computational load would be for a distance feature like that?
Will be available in the next release. Slightly different numbers due to differences in the circumference of the earth in our two calculations. Close enough for what the number represents.
An average (km/cache) and a median would be interesting.
The median should take care of the skewing of the figures by the frequent flyer cachers.

Your wish is my command.
Will be available in the next release.

By the way, according to my GC (only) stats from the new calculations:
40,560.54 total
6.54 median
37.18 average

Good suggestion.

Post by **caughtatwork** » 10 February 07 10:03 pm

Wingaap wrote:It'd also be interesting to see the most economical cacher ie most caches in the least distance.

I would too, but I don't that's going to be possible.

The problem becomes one of having to calculate the incremental step distance for every single log in the database in order to compare the distances between cachers.

With over 450,000 (and growing daily) logs, the CPU to dedicate to this type of calculation would be too significant.

inatn.com can do some of this differently to us here as they calculate everything based on a loaded file, so they only have to do the calculation once. As we gets logs throughout the day, we would need to recalculate it every time and I just don't think we can afford the CPU time.

Great idea, but I think impractical at this stage.

Cached · Post by **Cached** » 10 February 07 10:06 pm

An average (km/cache) and a median would be interesting.
The median should take care of the skewing of the figures by the frequent flyer cachers.

Can we have standard deviation as well?

Post by **caughtatwork** » 10 February 07 10:23 pm

Cached wrote:
An average (km/cache) and a median would be interesting.
The median should take care of the skewing of the figures by the frequent flyer cachers.
Can we have standard deviation as well?

Sheesh! I only just learnt what a median was.
What's a standard deviation?

Cached · Post by **Cached** » 10 February 07 10:53 pm

Standard Deviations are a really important statistical tool.

From wikipedia:

Another way of seeing it is to consider sports teams. In any set of categories, there will be teams that rate highly at some things and poorly at others. Chances are, the teams that lead in the standings will not show such disparity, but will be pretty good in most categories. The lower the standard deviation of their ratings in each category, the more balanced and consistent they might be. So, a team that is consistently bad in most categories will have a low standard deviation indicating they will probably lose more often than win.

People with a large standard deviation (SD) do more long distance trips than those with a smaller SD - there is less variation in their distances.

Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for 68.27% of the set; while two standard deviations from the mean (blue and brown) account for 95.45%; and three standard deviations (blue, brown and green) account for 99.73%.

If none of this makes any sense, I'll have another attempt tomorrow!

Post by **caughtatwork** » 10 February 07 11:02 pm

Hmmmmmmm.
Ahhhhhhhhhhh.
Gotcha.
I checked my result against an excel spreadsheet of all my individual distances and came out with the same number, so I'm happy that it works (even if I'm still unsure of exactly what I'm doing)

So this gets me:
40,560.54 total
6.54 median
37.18 average
180.64 std dev

Post by **CraigRat** » 10 February 07 11:04 pm

Could be easy:
from :http://dev.mysql.com/doc/refman/5.0/en/ ... tions.html

Code: Select all

#

STD(expr) STDDEV(expr)

Returns the population standard deviation of expr. This is an extension to standard SQL. The STDDEV() form of this function is provided for compatibility with Oracle. As of MySQL 5.0.3, the standard SQL function STDDEV_POP() can be used instead.

These functions return NULL if there were no matching rows.
#

STDDEV_POP(expr)

Returns the population standard deviation of expr (the square root of VAR_POP()). This function was added in MySQL 5.0.3. Before 5.0.3, you can use STD() or STDDEV(), which are equivalent but not standard SQL.

STDDEV_POP() returns NULL if there were no matching rows.
#

STDDEV_SAMP(expr)

Returns the sample standard deviation of expr (the square root of VAR_SAMP(). This function was added in MySQL 5.0.3.

STDDEV_SAMP() returns NULL if there were no matching rows.

EDIT: You beat me to it....

Post by **caughtatwork** » 10 February 07 11:06 pm

Yeah, except that the distances aren't stored in the tables now are they. That would make life a lot easier.

It's OK, I've worked it out. If you want a sneak peek, it's up in the SVN now. See if it's right for you.

TeamAstro · Post by **TeamAstro** » 10 February 07 11:40 pm

for the record:

Approximate cache-to-cache distance: 67551.35 miles (108713.36 kilometres) (Excludes locationless and known traveling caches)

Active Caches: 1213 of the caches you've found are still active (84.9%)

Average log size: 67.4 words - Biggest log: 708 words - Shortest log: 1 word - Number of one-word logs: 1

Geeez, thats a lot of K's.

Year Total
2002 23
2003 237
2004 379
2005 345
2006 453
2007 1

....... mmmm, only 1 this year eh?? I don't think so. (yep, up to date PQ) Cool site though.

Astro.

Cached · Post by **Cached** » 11 February 07 2:23 pm

caughtatwork wrote:Hmmmmmmm.
Ahhhhhhhhhhh.
Gotcha.
I checked my result against an excel spreadsheet of all my individual distances and came out with the same number, so I'm happy that it works (even if I'm still unsure of exactly what I'm doing)

So this gets me:
40,560.54 total
6.54 median
37.18 average
180.64 std dev

Which means about 68% of your finds are within 180km.

See, nice useful statistic.

dak's Emu Mob · Post by **dak's Emu Mob** » 11 February 07 2:47 pm

348783.94 miles (561313.35 kilometres) (Excludes locationless and known travelling caches)
<p>
I couldn't resist fixing the spelling error (traveling -> travelling).

<p>
Cheers,
<p>
dak

Team Falling Numerals · Post by **Team Falling Numerals** » 11 February 07 4:17 pm

Cached wrote:
Which means about 68% of your finds are within 180km.

See, nice useful statistic.

within 180km of where?

home?
the cache found previously?
the cache found next?
the nearest pie shop?

Are we measuring cache to cache distance or home to cache distance. Need to make sure that the conclusions that we make for any statisic tie back to the population.

and is our population normally distributed - I see an argument that it would be quite significantly skewed towards shorter distances?

oh, head spins, time to lie down

Post by **CraigRat** » 11 February 07 5:40 pm

caughtatwork wrote:Yeah, except that the distances aren't stored in the tables now are they. That would make life a lot easier.

It's OK, I've worked it out. If you want a sneak peek, it's up in the SVN now. See if it's right for you.

Code: Select all

Total distance between attempted caches: 	27,954.12 km (44,987.80 mi)
Median distance between attempted caches: 	19.95 km (32.11 mi)
Average distance between attempted caches: 	58.36 km (93.92 mi)
Standard deviation distance between attempted caches: 	134.53 km (216.50 mi)

My Dev data is a little dodgy, but it looks like it works ok.....

JackHenry · Post by **JackHenry** » 11 February 07 7:31 pm

Is '?' a word.