Stats idea - Top/Bottom caches by find rate

Discussion about the Geocaching Australia web site
Post Reply
Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1281
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Stats idea - Top/Bottom caches by find rate

Post by Laighside Legends » 19 May 20 11:11 pm

This might already exist but... (if so, where do I find it?)

I was looking for a list of caches with the highest finds per day, is that possible?
And also, the inverse of that: the caches with the least finds per day (or most days per find)?

Would need to exclude caches with zero finds and new caches (eg. less than 1 month old)

User avatar
caughtatwork
Posts: 16207
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Stats idea - Top/Bottom caches by find rate

Post by caughtatwork » 20 May 20 10:03 am

This doesn't exist. We will need to check every geocache and order them so it's a large dataset to manipulate, but it can mostly likely be done.

I need to know how you want to handle dodgy data, so if you could think about the following items that will help create the function to determine the range and rates.

We would need to calculate the "days" that the geocache is "active", so do you consider the hidden or published date? Then for those geocaches that do not have published logs should we assume the hidden date?

Do we need to count the days between "disabled" and "enabled" and remove them?

If we do not have an archived log, then what would you like to consider the "end date"? The last log of any type?

If a geocache was archived, then unarchived, do we count the days between and remove them? What if we only have one of those log types?

If a geocache is published, found and archived one the same day is that 0 or 1 days? i.e. What is a day? e.g. Monday to Monday = 1? Then Monday to Tuesday is 2?

Are you interested in all geocaches or only those which are active?

Not being difficult, but I need to know what you would like for the ranges so I can just get it done.

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1281
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Stats idea - Top/Bottom caches by find rate

Post by Laighside Legends » 20 May 20 9:24 pm

I'm not sure of the right answer without some trial and error to see what gives sensible results.

Probably just exclude archived caches since that solves a lot of the problems.

Start date would be publish date if it available, hidden date if there is no publish date.
End date is the current date. (exclude archived caches so assume everything else is still in play)

I don't think it's worth worrying about disabled periods, they are usually too short to effect it? (but again, some trail and error might suggest otherwise)
Probably likewise for achieved and then unarchived caches. (are there any of these that cause problems?)

The publish/archive on the same day problem can be solved by excluding new caches.
ie. only include caches where (end date - start date) is greater than 30 days. (the 30 days is just a guess here, might need to be adjusted depending on results)
This also solves the problem of a cache published at an event that gets twenty finds on the first day.

User avatar
caughtatwork
Posts: 16207
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Stats idea - Top/Bottom caches by find rate

Post by caughtatwork » 21 May 20 9:33 am

Thanks for getting back to me. I'll have a look and see what I can do.

User avatar
caughtatwork
Posts: 16207
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Stats idea - Top/Bottom caches by find rate

Post by caughtatwork » 21 May 20 2:08 pm

Have a look at this.
https://geocaching.com.au/stats/graphs/ ... ds_per_day

We're going to have weird results while the @home style virtual geocaches are still in play as they're extremely popular right now.

I didn't pursue the lowest as the lowest values are all going to have 4 decimals of zeros (e.g. 0.00003, 0.00002, etc) so it's not going to be useful unless we go into a lot of decimals and even then, there will be more than 50 with the same number, all pointing to a not useful outcome.

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1281
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Stats idea - Top/Bottom caches by find rate

Post by Laighside Legends » 21 May 20 4:37 pm

That's quite interesting, not the results I was expecting. It does seem a little skewed towards the newer caches (even if you ignore the @home ones) so maybe the 30 day limit needs to be increased - eg. to 100 days?

The difference between caches with most finds vs cache with highest find rate is quite a bit bigger than I expected.

caughtatwork wrote:
21 May 20 2:08 pm
I didn't pursue the lowest as the lowest values are all going to have 4 decimals of zeros (e.g. 0.00003, 0.00002, etc) so it's not going to be useful unless we go into a lot of decimals and even then, there will be more than 50 with the same number, all pointing to a not useful outcome.
Would this work better if you flipped the division - ie. calculate days/finds instead of finds/days ? (and sort by largest days/finds value)

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1281
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Stats idea - Top/Bottom caches by find rate

Post by Laighside Legends » 21 May 20 4:46 pm

Actually thinking about it further, it'll always be skewed towards the newer caches. Since the average find rate across all caches is higher now than it was 10 years ago. (and presumably the find rate will keep increasing again once coronavirus is over)

User avatar
caughtatwork
Posts: 16207
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Stats idea - Top/Bottom caches by find rate

Post by caughtatwork » 22 May 20 9:38 am

Laighside Legends wrote:
21 May 20 4:37 pm
Would this work better if you flipped the division - ie. calculate days/finds instead of finds/days ? (and sort by largest days/finds value)
You get the wrong result. 5000 days / 1 find = 5000. 5000 days / 2 finds 2500. This counts the wrong thing.

User avatar
caughtatwork
Posts: 16207
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Stats idea - Top/Bottom caches by find rate

Post by caughtatwork » 22 May 20 9:41 am

Laighside Legends wrote:
21 May 20 4:46 pm
Actually thinking about it further, it'll always be skewed towards the newer caches. Since the average find rate across all caches is higher now than it was 10 years ago. (and presumably the find rate will keep increasing again once coronavirus is over)
The total number of finds goes up on older geocaches that are "popular", but the rate may not be sustainable as it them depends on people joining the game so they cacnb find the geocache.
e.g. Image which tracks my most popular geocache in the Melbourne CBD. Great find total, but it's also been there 15 years :-)

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1281
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Stats idea - Top/Bottom caches by find rate

Post by Laighside Legends » 22 May 20 7:14 pm

caughtatwork wrote:
22 May 20 9:38 am
Laighside Legends wrote:
21 May 20 4:37 pm
Would this work better if you flipped the division - ie. calculate days/finds instead of finds/days ? (and sort by largest days/finds value)
You get the wrong result. 5000 days / 1 find = 5000. 5000 days / 2 finds 2500. This counts the wrong thing.
That looks correct to me?
One cache that averages 5000 days between finds and one cache that averages 2500 days between finds. The one with 5000 days/find is the one with the lowest find rate. What part of that is wrong?

User avatar
caughtatwork
Posts: 16207
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Stats idea - Top/Bottom caches by find rate

Post by caughtatwork » 23 May 20 1:08 pm

Well it's not the top finds per day. So it's not the right result for the graph that's been asked. That number is the days per find.

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1281
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Stats idea - Top/Bottom caches by find rate

Post by Laighside Legends » 23 May 20 6:31 pm

I'm confused.

The cache that is at "top days per find" will be the same cache that is at "bottom finds per day"? I think both of those will result in the same list of caches?

Post Reply