Stats Graphs

Discussion about the Geocaching Australia web site
Post Reply
User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Stats Graphs

Post by caughtatwork » 06 August 05 9:08 pm

Hi Guys.

I enjoyed creating the stats banners so as a possible enhancement to the website, I've also written this:

<<EDIT: REMOVE OLD GRAPHS THAT ARE NOW BROKEN>>

From my database (at the moment of course) it pulls all my finds by month and then in real time it will generate the graph. The same code (with a different MySQL query) does the job for Terrain and Difficulty.

It scales itself based on the highest value it gets and it's very much customisable as far as fonts, text sizes and colors go. You also pass it a string for the title display so you can use the same code over and over.

It's a bar graph rather than a column graph, simply because for cachers that have been around for a couple of years, it ends up pushing a horizontal scroll bar as the volume of data increases. Using a bar graph means the width can be set and it will increase in height which stops annoying scrolling to the right.

It's a PNG at the moment, but there is some commented code that allows it to be output as a GIF instead.

All it needs is two arrays. One containing the descriptors (eg. dates or terrain / difficulty ratings) and one containing the counts / values. So you could use it to represent all finds across the board by month, by cacher and as long as you set up the querys it could also provide the graph for terrain and difficulty ratings.

There's probably a multitude of uses for the graphs as long as you feed it two arrays it should pretty much do anything.

If you're interested, just tell me and I'll PM you the code.

Regards.
caughtatwork.
Last edited by caughtatwork on 12 August 05 9:46 pm, edited 1 time in total.

User avatar
ideology
Posts: 2763
Joined: 28 March 03 4:01 pm
Location: Sydney
Contact:

Post by ideology » 07 August 05 2:31 am

excellent stuff thanks!
we like the way the bars have the value written inside them
did you see the scatterplots we did on the other server which showed the correllation between terrain and difficulty (and the giant blob that represented the 1/1s)?
we will probably move the currrent stats from /stats to /stats/summary so we can fit in /stats/graph or similar to start these types of plots
is there any way we could do it as columns with a scalable x-axis? that way time would flow to the right which is a more common representation
we would also like to do line graphs, particularly for /stats/graph/cacher/Ideology or whatever which could possibly display a line graph of cumulative finds
we have half got the banners working - they now query fields in the user table rather than try to count caches each time. we are just hooking up a routine to recount the hides and finds. once that it done the banners will be live. the only remaining problem is that our ad-blocking software filters out anything with the URL "banner" oops!

User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 07 August 05 5:09 pm

The main reason I initially stayed away from a traditional column graph is that over time the y-axis will increase which means the columns get narrower and the available space to draw the y-axis dates gets smaller.

A quickie test reveales that this may be a concern around the year 2010. About 10 years after the first cacher were hidden, so it's not as bad as I initially thought. The text still gets pretty small, but is quite readable as a PNG.

<<EDIT: REMOVE OLD GRAPHS THAT ARE NOW BROKEN>>

The y-axis scales as the number of y-axis entries get larger.
An example with made up data going back to January 2001.
You'll see the columns get narrower and if the label doesn't fit nicely with no rotation, it gets rotated 90 degrees.

<<EDIT: REMOVE OLD GRAPHS THAT ARE NOW BROKEN>>

You'll probably notice that the y-axis labels are not exactly aligned. Not sure why this is. I used the correct method to identify their size. It looks like 1/2 pixels are causing this type of problem. Same as for some of the data labels in the last example. 1/2 pixels cause them to align with the edge of the column. Maybe white isn't the best color, but the color is easily customisable.

For these tests there are actually 4 files because each file contains the MySQL query. I'm sure we can get it working having the query run outside of the graph code so it only needs to exist once. Maybe via session data or possibly running the SQL and the immediately calling the graph code to output the results.

Well, that's most of today gone :-)

I'll have a look at a line graph during the week. Shouldn't be too hard now that I have the basics of the drawing down. It will probably be in a different file though. I'm not sure I would want to put all of the graph types in the one file. I think it would end up a horrid mess.

I'll get back to you.

caughtatwork.
Last edited by caughtatwork on 12 August 05 9:46 pm, edited 1 time in total.

User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 07 August 05 8:12 pm

Wheeeeeeeeeee!

<<EDIT: REMOVE OLD GRAPHS THAT ARE NOW BROKEN>>

Line graph. Again, as long as you feed it two arrays, it will plot what you want. So if you can set up the array to be cumulative, then the linegraph should display as it accumulates.

I made this one 600x400 just to show you what happens when you squeeze the graph down. Still looks pretty readable. The line and bar graphs are customisable so you can set the height and width as you need.

This one didn't take long at all. The hardest part was trying work out how the ImageFilledPolygon worked.

I'll think about your scatter graph for 1/1, 1/1.5 etc too. I've got to read up on ImageFilledElipse to see how that works.

Enjoy.
caughtatwork.
Last edited by caughtatwork on 12 August 05 9:47 pm, edited 1 time in total.

User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 07 August 05 11:32 pm

Oooooooooooh, ugly!

<<EDIT: REMOVE OLD GRAPHS THAT ARE NOW BROKEN>>

Don't know what it used to look like but this is what I came up with.

The most obvious ugly is the 1/1 bubble. It will be huge compared to the remainder. You need to squint pretty darn close to see that there are actually some other combinations that I've only found 1 of. eg. 4.5/5

I haven't done the x-axis and y-axis titles yet, but given this looks so ugly I'm not sure I want to continue. God know what it would look like if I used maccamobs stats. The 1/1 bubble would take over the page. I can't really decrease the size of this either because if I do then as a relative size the combinations that have only been found once would have a spot less than 1 pixel by 1 pixel which won't be seen.

Any suggestions?
Last edited by caughtatwork on 12 August 05 9:47 pm, edited 1 time in total.

swampgecko
It's all in how you get there....
It's all in how you get there....
Posts: 2185
Joined: 28 March 03 6:00 pm

Post by swampgecko » 08 August 05 7:08 am

caughtatwork wrote:Oooooooooooh, ugly!


Don't know what it used to look like but this is what I came up with.

The most obvious ugly is the 1/1 bubble.
That pretty well sums it up and that is a very familar bubble pattern...

User avatar
ideology
Posts: 2763
Joined: 28 March 03 4:01 pm
Location: Sydney
Contact:

Post by ideology » 08 August 05 9:01 am

awesome stuff!
perhaps the size of the bubbles could be a log scale?
we'll be in touch to see how we can best get you access to the live data

User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 08 August 05 12:54 pm

A little time at morning tea this morning and I've had a look at the scale.

A log scale for the bubbles doesn't quite work. The Log10 of the larger numbers makes it just too small while the Log10 of the smaller numbers ends up somewhere around the same size. Maybe I don't undertstand logs :-)

I made a quick and dirty change to take the square root of the number and then multiply it by 5 (just to give a better visual). This gives a better result (see above). It may be misleading though as a bubble that is twice the size of another bubble is not simply double the number of finds.

eg.
root/144 = 12*5 = 60 so you get a radius of 60.
root/400 = 20*5 = 100 so you get a radius of 100.

So even though the second bubble looks around twice the size of the first, it is way more than double of the actual finds of the first.

Should work well for maccamob though :-)
Assume 1600 1/1 would result in a bubble root/1600 = 40*5 = 200 radius. Still quite large but probably viewable.

I might need to do a little testing to see what very large numbers look like, but I'm pretty sure they will look OK.

Just on an asthetic thing, the bubbles are aligned in the x-axis space, but on the y-axis line. Do you think I should keep the gridlines the same but move the x-axis descriptors and therefore the bubble onto the gridline?

User avatar
muzza
2500 or more caches found
2500 or more caches found
Posts: 354
Joined: 05 April 03 7:00 pm
Location: Melbourne Australia

Post by muzza » 08 August 05 2:18 pm

Using the square root is actually the best thing to do. Because the area of a circle is pi x r squared, the area of the drawn circles represents the number found.

For 144 finds, the area is pi x 60 x 60 = 11310
For 400 finds, the area is pi x 100 x 100 = 31416

The ratio of these areas is 1:2.777777
The ratio of 144 to 400 is also 1:2.77777

User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 08 August 05 3:17 pm

Thanks Muzza. I should have thought to ask Mr. Puzzlemania :D

User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 08 August 05 11:36 pm

I've realigned the bubblegraph such that the bubbles now sit in the grid space rather than on the grid lines. I know this is exactly the opposite to the way excel presents data, but I reckon it's easier to read (and I hate Microsoft products more and more just recently).

I've also enabled the bubble graph to have two parameters passed as the first array. This way you can not only use it for Terrain / Difficulty bubbling but others as well.

eg. You could have States down the side and Date along the bottom. Then you would be able to see how many (according to the bubble size) had been found by date within state.

Mocked up data of course, but you can see from the example that on 200507 NT, Tas, WA and Vic all had around the same number of finds. You can also see that for 200503 NSW was the outright leader in number of finds. This is just an example. Any two axis you can think of can be displayed.

Saves you having to interpret numbers. You just look for the biggest bubble :D

<<EDIT: REMOVE OLD GRAPHS THAT ARE NOW BROKEN>>

OK. I'm getting carried away, but I think the visualisation of the stats is interesting. And yes, if you do care to ask, I like my numbers :D
Last edited by caughtatwork on 12 August 05 9:53 pm, edited 1 time in total.

User avatar
The Ginger Loon
450 or more roots tripped over
450 or more roots tripped over
Posts: 824
Joined: 28 March 03 9:09 pm
Location: Tamworth
Contact:

Post by The Ginger Loon » 09 August 05 10:08 pm

This has been a fascinating thread and I await the final outcome and implementation into the GCA site, but I have one little question.

In the previous example (State vs Date) how come there are so many finds for October05, when it's still only August05?

User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 09 August 05 11:12 pm

'tis mocked up data.
I just threw the data together to show what 'could' be achieved rather a reflection of reality.

The real data comes directly from my own site database, so it bears a striking similarity to my stats at GCA :D

Occassionally you will notice the graphs above changing. As they are redrawn in realtime from my database and code, whenever I release something new, the graphs may get redrawn and look different.

This may negate some of the previous comments I've made.

For example (in a few minutes) you should see that the x and y axis now have titles. This is a small enhancement that I think is working well enough to show.

caughtatwork.

User avatar
ideology
Posts: 2763
Joined: 28 March 03 4:01 pm
Location: Sydney
Contact:

Post by ideology » 10 August 05 1:15 am

cool
we are investigating ways of opening up the database for this type of thing
that way you could run the stats on the live data here
we've never done it before so we're trying to suss it out

User avatar
caughtatwork
Posts: 17015
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Post by caughtatwork » 10 August 05 11:38 pm

You can open your database through your CPanel (if you have one). In the MySQL Database section add my URL (http://www.caughtatwork.net)to the Access Hosts.

I would then strongly recommend a READ ONLY access granted to a userid and passord combo that you could PM me.

If you're not using CPanel, ummm... you'll need a Linux expert (which 'aint me).

From there I should be able to remote connect to the database. Not sure how this is done, but I can easily read up on it.

Not sure you would want to use any wildcards in your granting of Access Hosts. Every man and his dog could then hit your DB like nothing.

I'm not even sure I want to do this. I write really crappy SQL statements. They get the job done, but probably suck CPU like a $2 whore :shock:

Maybe it would be better if you gave me access to the test environment. I can be responsible (when I have to). I could give you a cash deposit to be forfeited if I crash anything :D

The graphs are pretty much ready. There's one or two small things for me to look at (like the space after the X axis titles when they are not rotated 90 degrees) and some naming conventions that I need to get a better handle on. I need to get a color check on the bubble graph too. The bubbles need to be lighter than the text so when I draw the text on top of the bubbles they can still be seen. I'm not sure about the color I've chosen, but as it's parameter driven, this can be changed easily.

Any other comments or feed back? Not sure whether I can do everything (or anything) but I'll certainly have a look.

Anyone? Anyone? Bueller?

Regards.
C@W

Post Reply