Report Site Issues Here

Discussion about the Geocaching Australia web site
Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1272
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Report Site Issues Here

Post by Laighside Legends » 28 February 19 11:49 am

Caches with names containing non-ASCII characters still seem to be causing us problems. For example, see https://geocaching.com.au/cache/gc7xan2

The name starts with a non-ASCII character but for some reason, the rest of the name is missing to. The individual GPX file from Groundspeak shows the entire name correctly with unicode. But when I import it to GCA, something goes wrong and the entire name is lost. (GSAK was not involved at all so that's not the problem)

I tried it locally on my machine and the importer swapped the non-ASCII character(s) for question marks and then left the rest of the name intact. Not the best solution but it's probably ok. But why doesn't the website do the same?

Online
User avatar
caughtatwork
Posts: 16115
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Report Site Issues Here

Post by caughtatwork » 28 February 19 12:52 pm

It's not a UTF-8 character so:
a. The GPX file should not contain it as the character encoding is UTF-8
b. Our DB is UTF-8 encoded so should not store it.
Anything after that is up for grabs.
Please email Groundspeak and advise them to stop using non UTF-8 characters in their UTF-8 encoded file.

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1272
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Report Site Issues Here

Post by Laighside Legends » 28 February 19 3:27 pm

Are you sure? The first character in the example above appears to be a valid UTF-8 character to me.

The 4 bytes in the GPX file are:
f0 9f a6 84

Which is exactly how U+1F984 is supposed to be encoded (according to https://en.wikipedia.org/wiki/UTF-8)

Can you explain which part of that you think doesn't fit the UTF-8 specs?

Online
User avatar
caughtatwork
Posts: 16115
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Report Site Issues Here

Post by caughtatwork » 28 February 19 3:43 pm

utf8 is limited to the 1- to 3-byte utf8 codes. This leaves out Emoji.

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1272
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Report Site Issues Here

Post by Laighside Legends » 28 February 19 3:53 pm

Are we using an old version of UTF-8 or something?

Every reference I look at says UTF-8 can go up to 4 bytes per character. Which suggests that the GPX file meets the standard.

User avatar
CraigRat
850 or more found!!!
850 or more found!!!
Posts: 6944
Joined: 23 August 04 3:17 pm
Twitter: CraigRat
Facebook: http://facebook.com/CraigRat
Location: Launceston, TAS
Contact:

Re: Report Site Issues Here

Post by CraigRat » 28 February 19 3:56 pm

Our instance of MySQL/Mariah uses utf8 , however there is a utf8mb4 format that extends, however I don't know what the ramifications are if we change our encoding

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1272
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Report Site Issues Here

Post by Laighside Legends » 28 February 19 4:12 pm

CraigRat wrote:Our instance of MySQL/Mariah uses utf8 , however there is a utf8mb4 format that extends, however I don't know what the ramifications are if we change our encoding
Ah, that makes sense. Looks like MySQL only started supporting 4 byte UTF-8 in 2010. I guess we should probably think about upgrading at some point...

Or at the very least, filter out any 4 byte characters before they get to the database.

Online
User avatar
caughtatwork
Posts: 16115
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Report Site Issues Here

Post by caughtatwork » 28 February 19 5:55 pm

Laighside Legends wrote:
CraigRat wrote:Our instance of MySQL/Mariah uses utf8 , however there is a utf8mb4 format that extends, however I don't know what the ramifications are if we change our encoding
Ah, that makes sense. Looks like MySQL only started supporting 4 byte UTF-8 in 2010. I guess we should probably think about upgrading at some point...

Or at the very least, filter out any 4 byte characters before they get to the database.
As a senator, please read the Senate thread titled: Remove 4 byte UTF-8 emoj and see what the senate decided to do.

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1272
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Report Site Issues Here

Post by Laighside Legends » 28 February 19 11:00 pm

It's a bit all over the place but this seemed to be the conclusion in the thread from 2016:
Long term:
I think we should update to UTF8MB4 but that requires the new server and some time.

Short term:
Strip the emoji on input and avoid the garbage from GC.
Neither of these things seemed to have actually happened though. Instead of striping out the 4 byte characters, the entire name gets removed.

Online
User avatar
caughtatwork
Posts: 16115
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Report Site Issues Here

Post by caughtatwork » 01 March 19 6:27 am

The emoji gets stripped, as discussed, as agreed.

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1272
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Report Site Issues Here

Post by Laighside Legends » 01 March 19 10:02 am

Ok then, if you insist, let's have caches without names. It's not useful to anyone but at least it meets some ill-defined specs from years ago.

Online
User avatar
caughtatwork
Posts: 16115
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Report Site Issues Here

Post by caughtatwork » 01 March 19 2:12 pm

Laighside Legends wrote:Ok then, if you insist, let's have caches without names. It's not useful to anyone but at least it meets some ill-defined specs from years ago.

Why are you being so antagonistic?
When we get a GPX file with an emoji we strip the emoji from the name.
I don't know what happened when you tried to load the GPX file but when I load it the emoji is stripped out and the cache name minus the emoji is used.
Please try a different cache with an emoji and report to back as to whether what I am saying is right or a bug.
If there is something wrong please let me know the cache you're loading.

Laighside Legends
10000 or more caches found
10000 or more caches found
Posts: 1272
Joined: 05 October 10 10:20 pm
Location: Yorke Peninsula, South Australia

Re: Report Site Issues Here

Post by Laighside Legends » 01 March 19 5:17 pm

This appears to be a pretty minor bug but for some reason we have to spend 3 days arguing before anything is done about it.

First you blame Groundspeak with no evidence. Then you insist there is no bug and everything is fine even though several caches clearly don't have names. (I acknowledge the bug could be in something other than Groundspeak or GCA but I can't see how this is possible if I haven't used GSAK or other 3rd party software?)

The cache I mentioned in the first post is just one of an entire powertrail without names. I didn't do the initial load, so someone else could've uploaded a dodgy GPX to start with. I then tried to update it with a GPX file directly from Groundspeak and it didn't fix the problem. I then tried to replicate the problem on my machine but it worked (mostly) as it should - the caches had names anyway. Hence I was a bit confused as to where the problem is...

Online
User avatar
caughtatwork
Posts: 16115
Joined: 17 May 04 12:11 pm
Location: Melbourne
Contact:

Re: Report Site Issues Here

Post by caughtatwork » 01 March 19 5:28 pm

Nothing has been done. The bug seems to be in the data not the code. If you load the GPX file you get a name.i don't know who did what but if it works for you now then great. Maybe if you load a whole PQ you can get the names corrected. That's probably better than getting into an argument about technical items which may behave differently of your own installation and ours. If there is no issues when you load the PQ then we're done.

Luckyl10n
10000 or more caches found
10000 or more caches found
Posts: 81
Joined: 19 January 13 7:16 pm
Location: Australian Capital Territory

Re: Report Site Issues Here

Post by Luckyl10n » 12 March 19 10:30 pm

Hi Guys

I just published GA13696 and used incorrect posted coords first up. These have been corrected and it shows up in Marrickville dZ, but on the cache page it shows as Strathfield. Can you fix for me please?

Thanx
LL

Post Reply