Making Bad Characters Good.
As promised, I am pulling this from a thread I started about the issue of bad characters in order to hopefully see it resolved in new releases of GLM. Essentially, unusual characters like é or even the apostrophe become corrupted when exporting data into a CSV file for use in Excel. I did get a response that at least one other person is seeing this with typical US English text. Since so many of us are dependent upon using this in Excel or other programs, it would be nice to have the characters exported correctly without extra work after the fact.
jim
Here is what I wrote before in two previous comments:
My organization runs an international grants program to support research in Parkinson's. As a result, the applications we receive have a broad range of characters that are used, e.g., Université Laval, etc. Oddly, these characters appear fine on the Foundant website but when exported as a CSV file or in the filename of the print packets these get jumbled. Reviewers complain and it is a bit of a pain to always have to clean up the exported data. For example:
Université Laval becomes Université Laval
The lower case Greek alpha in α-synuclein becomes α-synuclein
Even something standard text like the apostrophe in Parkinson's disease becomes Parkinsonâ€TMs disease.
When I use a text editor, like TextWrangler, to view the text in my export file, the text looks perfectly normal. It says the text is "Unicode (UTF-8, with BOM)"
I did a bit more reseraching and it has a lot to do with how Excel imports the CSV file and deals with the text encoding. If you are like me, then I simply drag the CSV file into Excel and it opens up in nice columns, etc...only some characters are mangled. However, if you use the import wizard to bring in your CSV file to Excel, you can control what the source character encoding is and see if that will minimize the issue. (You have to have a worksheet open in order to use the File>Import function.) I chose Unicode 6.1(Little-Endian). This did not fully solve my problem as it removed characters that were previously there. For instance, I loose all the apostrophes for possesive words, e.g., "Dean's list" becomes "Deans list".
http://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0
Of course, even using this apporach, my problems continue as I cannot get the import wizard to work properly--all my columns get muddled and the result is useless even though I set the comma as my delimeter.
More research shows that the easiest solution may be for Foundant to offer an export as Excel function--that might fix it. Also, they may be able to easily tack some data at the begining of the file that would allow Excel to easily recognize the correct text encoding. But I have quickly reached (and am beyond!) the extent of my knowledge.
http://www.roosmaa.net/importing-utf-8-csvs-in-excel/
Regardless, would love some guidance on how to get the Foundant Export data into excel properly.
Idea posted March 6, 2013 by Jim Beck, Parkinson's Disease Organization
Moved to Archive during a clean up effort in April 2024.
-
Chris Dahl commented
Hi Jim,
Thanks for the detailed information as well as the research you were able to do. I certainly agree that's a frustrating experience. We're currently in the middle of trying to get our 3.4.0 release out the door and finishing the specifications for the 3.5.0 release, so it's a bit hectic. But assuming 3.4.0 doesn't engender much support, our Quality Assurance engineer should have some time after the release to look at this issue a bit. While he might not be able to fix the issue, he may be able to identify some steps to help with the export from GLM into Excel. And in any case, having him spend some time on the issue is going to be a necessary step before we can begin to slot this into a release.
I'll bring this up to discuss with our product team in the next week or so and we can go from there.
Let me know if you have any questions in the meantime. Thanks,
-chris
posted March 27, 2013 by Chris Dahl, Foundant Technologies