-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for wide-Unicode little-endian Py3k. #2
base: master
Are you sure you want to change the base?
Conversation
- Inside ae.c, AE_GetCFStringRef assumes that the data inside a wide PyUnicode is in the same endianness that CF wants. But PyUnicode is native-endian, kCFStringEncodingUTF32 is big-endian (if no BOM). We could write code to explcitly use kCFStringEncodingUTF32[LE|BE] as appropriate, or tack on a BOM to the start of a copy of the UTF-32 and use kCFStringEncodingUTF32 as-is, or various other possibilities... but it's a lot simpler to use UTF8, and I doubt the performance will ever be an issue.
To see the problem, you need a wide-Unicode Python 3 on an Intel Mac. The current python.org 3.3.0 installer is fine. $ python3 -c "import appscript; print appscript.app('iTunes')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/abarnert/src/github/appscript/py-appscript/trunk/build/lib.macosx-10.6-intel-3.3/appscript/reference.py", line 799, in __call__
return self._appclass(*args, **kargs)
File "/Users/abarnert/src/github/appscript/py-appscript/trunk/build/lib.macosx-10.6-intel-3.3/appscript/reference.py", line 734, in __init__
constructor, identifier = 'path', aem.findapp.byname(name)
File "/Users/abarnert/src/github/appscript/py-appscript/trunk/build/lib.macosx-10.6-intel-3.3/aem/findapp.py", line 47, in byname
name = _findapp(name)
File "/Users/abarnert/src/github/appscript/py-appscript/trunk/build/lib.macosx-10.6-intel-3.3/aem/findapp.py", line 15, in _findapp
return findapplicationforinfo(creator, id, name)
aem.ae.MacOSError: -50 You might be able to cause the same problem with a wide-Unicode (UTF32) Python 2 by using u'iTunes' instead of 'iTunes', but I haven't tested. Narrow unicode (UTF16) has a different, less serious bug. None of this is a problem on PowerPC builds, because the underlying problem is endianness-related. The error -50 means "bad parameters", because we're converting a Python native-endian UTF32 string into CoreFoundation big-endian UTF32 string, which gives us invalid Unicode, which LaunchServices rejects. |
I wanted first to test the original problem, so I started with py-appscript as it stands. I installed Python 3.3 and ran the py-appscript python3 setup.py. I can't get as far as your test. This works fine:
(I chose the Finder because we know there are problems with iTunes.) But with python3:
I don't know much about python; can you explain how I can get further? It isn't just a command-line thing; the same thing happens when running a script file. |
Okay, we got past that; it turns out you have to say:
Let's try to be accurate here. |
Next question. I'm not seeing any problem with python 2, so why are we patching appscript_2x/ext/ae.c? |
Apologies for the parens. The main reason to fix 2x is that the code is identical. The relevant types are defined the same way, so this can't possibly work. If it can be triggered, it will have the same bug; if it can't be triggered it doesn't matter either way. Meanwhile, the fact that the two functions are nearly identical, as with most of the other code in ae.c, makes maintenance and debugging easier. For example, if they had been radically different when I was looking to fix this bug, I would have wasted time trying to figure out why they're different, which change is relevant to the bug, etc., before spotting the problem. I'll make a utf32 py2.7 build to test it, because I'm not positive that app(u'iTunes') would trigger this code. It's worth having a test to be sure. And if it can't be triggered, it might be more reasonable to just remove the code. But I think the virtue of having the two branches be identical would still be a good argument. |
Verified that the problem exists in Python 2. First you need to get a wide-Unicode build. Apple's builds are narrow, as are the python.org installers. If you're not sure about what you have: python -c 'import sys; print(sys.maxunicode)' For 2.2 through 3.2, this will be 65535 for narrow, 1114111 for wide. The easiest way to get a wide-Unicode build is ot build it: ./configure --enable-unicode=ucs4
make You can ../Python-2.7.3/python.exe setup.py build_ext -i
PYTHONPATH=appscript_2x/lib ../Python2.7.3/python.exe -c 'import appscript; print(appscript.app("Finder"))' You'll get the exact same error as with the official 3.3.0 package. And the same fix works. |
One more thing: The official Python 3.3.0 installer isn't actually wide Unicode; that distinction no longer matters. But it acts as if it were wide for the purposes of Py_UNICODE_WIDE, PyUnicode_AsUnicode, etc. See http://docs.python.org/py3k/whatsnew/3.3.html#pep-393 for details. Anyway, I tested with an actual wide 3.x (built from the 3.2.3 source, just like the 2.7.3 above) to verify that the same problem exists, and the same fix works. |
PyUnicode is in the same endianness that CF wants. But PyUnicode is
native-endian, kCFStringEncodingUTF32 is big-endian (if no BOM). We
could write code to explcitly use kCFStringEncodingUTF32[LE|BE] as
appropriate, or tack on a BOM to the start of a copy of the UTF-32
and use kCFStringEncodingUTF32 as-is, or various other possibilities...
but it's a lot simpler to use UTF8, and I doubt the performance will
ever be an issue.