Opened 8 years ago

Closed 6 years ago

#641 closed defect (fixed)

SasView won't start if user has name with accented/national characters - possible unicode error

Reported by: smk78 Owned by:
Priority: critical Milestone: SasView 5.0.0
Component: SasView Keywords: unicode
Cc: Work Package: SasView Bug Fixing

Description

Reported by Wojciech Zając wojciech.zajac@… 31 August 2016 to help@…

I would like to report what seems to be minor yet annoying technical problem. On computer at work I have to use user name identical with my family name which is Polish and reads: Zając. Both previous and the latest (4 beta) versions just would not start if launched under my user name. In order to test the case I have created another user, whose name is "English", i.e. without national characters. Then physically the same SasView works fine. It also works on my laptop, where I use an "international" user name. Changing user name does not help since such a procedure i fact changes nothing in "hard" directory naming. It only creates an alias. Since this might be an issue for other non-English people as well, I dare to ask to fix this if possible.

Change History (14)

comment:1 Changed 8 years ago by smk78

Reply by PaulK:

Most likely a unicode error, so probably this affects all platforms:

    >>> os.path.join(u"Zając", "sasview.log")
    u'Zaj\u0105c/sasview.log'

    >>> str(os.path.join(u"Zając", "sasview.log"))
    ---------------------------------------------------------------------------
    UnicodeEncodeError                        Traceback (most recent call last)
    <ipython-input-171-e8abaa6a4ecd> in <module>()
    ----> 1 str(os.path.join(u"Zając", "sasview.log"))

    UnicodeEncodeError: 'ascii' codec can't encode character u'\u0105' in position 3: ordinal not in range(128)

Finding all the problems will take some time. For example, copy latex_smeared.xml to Zając.xml and it won't load. Similarly, create a Zając directory and copy latex_smeared.xml into that directory and it won't load.

All of the readers will need to be tested, as well as all the functions which use loaded data since unicode may pop up in captions or output that use the file name. The str() function is called about 2000 times in sasview+sasmodels+bumps.

comment:2 Changed 8 years ago by smk78

Response from Wojciech:

I am using it under Windows 10 (64 bit ): at work, at home, and on my laptop. It refuses cooperation only if I am Zając. If Zajac (laptop) or Wojtek (home) it works fine. That's why I created at work another user environment just to test physically the same installation.

A couple of months ago I came across exactly the same problem with Gabedit but then I was taught a workaround by the authors. Apparently this has to do with internal character representation in filenames or something similar. On all three computers I have the same locale and the same preferred codepage.

comment:3 Changed 8 years ago by smk78

Response from Wojciech:

Also on Linux platforms.

comment:4 Changed 8 years ago by smk78

Reply by PaulK:

I put the directory Zając in …/sasview, and glob() lists it at the end.

What glob returns depends on whether it is passed a byte string or a unicode string:

    f = list(glob(os.path.expanduser('~/Source/sasview-new/sasview/*')))[-1]
    g = list(glob(os.path.expanduser(u'~/Source/sasview-new/sasview/*')))[-1]

    type(f) => str
    type(g) => unicode
    f == g => False
    f.decode('utf-8') == g => True
    os.path.exists(f) => True
    os.path.exists(g) => True

In python 3, the default is unicode, but it exhibits the same behaviour:

    f = list(glob(os.path.expanduser(b'~/Source/sasview-new/sasview/*')))[-1]
    g = list(glob(os.path.expanduser(u'~/Source/sasview-new/sasview/*')))[-1]

    type(f) => bytes
    type(g) => str
    f == g => False
    f.decode('utf-8') == g => True
    os.path.exists(f) => True
    os.path.exists(g) => True

This was on the mac.

Trying the same thing on windows python 2.7, the non-unicode glob returned "Zajac", with no unicode.

    type(f) => str
    type(g) => unicode
    f == g => False
    f.decode('utf-8') == g => False
    os.path.exists(f) => False
    os.path.exists(g) => True

So the fix is going to be messy.

comment:5 Changed 8 years ago by smk78

Response from Wojciech:

This explains a lot. As soon as I am back at the Institute tomorrow morning I will try a workaround by creating a symbolic name of Zajac pointing to Zając user area. Then, if python really discards diacriticals then this could help.

BTW: It may be worth checking whether the problem occurs for German or Nordic names like Würflinger, Mößbauer, Ångström or Øresund :)

comment:6 Changed 8 years ago by smk78

Reply by PaulK:

The latin-1 character set might be in effect:

https://en.wikipedia.org/wiki/ISO/IEC_8859-1

The following are on the code page:

ü: 252
ö: 246
ß: 223
Å: 197
Ø: 216

Many of the str() calls are really for conversion, such as str(k) when k is an integer, so it isn't all bad.

comment:7 Changed 8 years ago by smk78

Response from Wojciech:

You are right. Eastern European characters are on ISO 8859-2 or CP 1250. The problem is that the two differ in what place a given character takes. It may be a real pain while e.g. encoding LaTex documents.

Last edited 8 years ago by smk78 (previous) (diff)

comment:8 Changed 8 years ago by smk78

Response from Wojciech:

Just in case it helps. I created both a symlink and a "junction" to my user area with an ASCII name. None works, unfortunately. However, the log file is placed and updated correctly, so the problem is deeper than just finding "C:\Users\Zając", and probably not worth solving at the moment.

Here is a log from an unsuccessful attempt:

2016-09-01 10:37:14,256 INFO  --- SasView session started ---
2016-09-01 10:37:14,256 INFO Python: 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)]
2016-09-01 10:37:14,256 INFO You have not set the SASVIEW_WX_VERSION environment variable, so using default version of wxPython.
2016-09-01 10:37:14,381 INFO Wx version: 3.0.2.0

And this is a log from successful launch under another username:

2016-09-01 07:25:58,750 INFO  --- SasView session started ---
2016-09-01 07:25:58,750 INFO Python: 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)]
2016-09-01 07:25:58,750 INFO You have not set the SASVIEW_WX_VERSION environment variable, so using default version of wxPython.
2016-09-01 07:25:58,868 INFO Wx version: 3.0.2.0
2016-09-01 07:28:30,017 INFO  --- SasView session started ---
2016-09-01 07:28:30,017 INFO Python: 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)]
2016-09-01 07:28:30,017 INFO You have not set the SASVIEW_WX_VERSION environment variable, so using default version of wxPython.
2016-09-01 07:28:30,148 INFO Wx version: 3.0.2.0
2016-09-01 07:33:34,242 INFO  --- SasView session started ---
2016-09-01 07:33:34,243 INFO Python: 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)]
2016-09-01 07:33:34,243 INFO You have not set the SASVIEW_WX_VERSION environment variable, so using default version of wxPython.
2016-09-01 07:33:34,368 INFO Wx version: 3.0.2.0
2016-09-01 08:38:27,358 INFO  --- SasView session started ---
2016-09-01 08:38:27,358 INFO Python: 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)]
2016-09-01 08:38:27,358 INFO You have not set the SASVIEW_WX_VERSION environment variable, so using default version of wxPython.
2016-09-01 08:38:27,489 INFO Wx version: 3.0.2.0
2016-09-01 10:37:09,792 INFO  --- SasView session started ---
2016-09-01 10:37:09,792 INFO Python: 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)]
2016-09-01 10:37:09,792 INFO You have not set the SASVIEW_WX_VERSION environment variable, so using default version of wxPython.
2016-09-01 10:37:09,917 INFO Wx version: 3.0.2.0
2016-09-01 10:37:14,256 INFO  --- SasView session started ---
2016-09-01 10:37:14,256 INFO Python: 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)]
2016-09-01 10:37:14,256 INFO You have not set the SASVIEW_WX_VERSION environment variable, so using default version of wxPython.
2016-09-01 10:37:14,381 INFO Wx version: 3.0.2.0

comment:9 Changed 8 years ago by smk78

  • Priority changed from blocker to critical

comment:10 Changed 8 years ago by smk78

Response from Wojciech:

Workaround.

Recently I had an occasional chat with people from Rigaku company, developing soft for their X-Ray diffractometers. They use C++, C#, and – quite extensively – python. They also face one or two complaints from Eastern European users with non-8859-1 names, similar to mine. Then, a simple workaround came to my mind which they also liked: Create another user on your computer (It's wise to do it, anyway, for crash recovery), with a simple international name. Then run the soft in question as another user (mouse <shift><right-click> on the executable or its link for pull-down menu). Then everything works fine.

comment:11 Changed 7 years ago by butler

  • Milestone changed from SasView Next Release +1 to SasView 5.0.0

not sure if the full utf-8 support hoped for in release 5.x will fix this but is the next place it has a chance to be so fixed. Otherwise we may just need to close this ticket by documenting this in some requirements somewhere.

comment:12 Changed 7 years ago by piotr

This has been addressed in 5.0. Full UTF-8 support - we can install SasView? to directories with non-ascii characters and run it successfully.

comment:13 Changed 6 years ago by tim

Tested and seems to be running correctly, I can start sasview from a folder called CødeCämpß? and have it fit data as well as save the project.

comment:14 Changed 6 years ago by tim

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.