[PD] locales for Pd WAS: japanese encoded chars in PD

Fri Feb 13 22:59:41 CET 2009

On Feb 13, 2009, at 4:38 AM, Bryan Jurish wrote:

> moin all,
>
> On 2009-02-13 03:14:20, Hans-Christoph Steiner <hans at eds.org>  
> appears to
> have written:
>> On Thu, 12 Feb 2009, Bryan Jurish wrote:
>>> Are we certain that Tk is actually translating at all, and not just
>>> using some 8-bit default like latin-1 when it finds non-UTF-8  
>>> input?  I
>>> ask because that's what Perl does by default, a behavior which  
>>> continues
>>> to give me headaches.  In Perl, each string has its own internal  
>>> "utf8"
>>> flag which tells you whether Perl is currently thinking of that  
>>> string
>>> as a raw byte-string in some unknown encoding or as a  
>>> "native" (utf8)
>>> character string... I assume Tcl/Tk does something similar, but  
>>> don't
>>> know how to test for this property there.
>>
>> Here's the doc that I read on this topic, but it probably doesn't  
>> have
>> the lvel of detail that you require:
>>
>> http://tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm#M8
>
> Had a look at that last night, but the 'fconfigure' command only  
> applies
> to Tcl streams (analagous to the PerlIO layer, which I abhore and  
> try my
> best to avoid, as it doesn't provide a sufficient level of control for
> most of my purposes... fconfigure be ok for Pd-devel if we say we're
> dealing exclusively with utf-8... but then again, I don't know if Tcl
> streams ("channels") are used at all by the GUI... maybe on the socket
> to the backend, but that's probably it; IMHO it's safer to explicitly
> generate byte strings in a known encoding and just pass those around).
>
> Also useful is the 'encoding' command family ('encoding convertfrom',
> 'encoding convertto', 'encoding names', 'encoding system').  Tried  
> this
> with some expicit escapes as well as a tester widget from
> http://en.wikibooks.org/wiki/Tcl_Programming/Internationalization,  
> and I
> get decent display (Japanese still doesn't display with any Tk fonts I
> tried, but I think that's just a font problem).  Also tested the bind
> substitutions with a dummy "puts" script, and managed to get real  
> utf-8
> sent out over the stdout channel for keyboard input.  Still not 100%
> sure how well it's working, since my keyboard only produces latin-1
> symbols (maybe I'll hack my xmodmap for some real testing ;-)
>
> Unfortunately, I still haven't found a way to get Tcl to tell me what
> encoding (if any) it thinks a given string is using, analagous to the
> Perl predicate "utf8::is_utf8($string)".  Maybe Tcl doesn't track this
> information on a per-string level at all, but assumes [encoding  
> system]
> for all strings?  That seems pretty inflexible to me, but after  
> another
> look at http://www.tcl.tk/man/tcl8.5/TclCmd/encoding.htm , it does
> indeed seem to be the case.  So I guess the only safe way to handle
> things is (as you suggest) to select an internal encoding (e.g. UTF-8)
> and enforce its use with {encoding system "utf-8"}, and possibly
> {fconfigure $ch -encoding "utf-8"} for whatever channels we want. The
> fconfigure manpage says the default channel encoding is [encoding
> system]; but I suspect that perhaps it's really the value of [encoding
> system] at the time of the channel's opening which has an effect, so  
> we
> either have to make some accommodations for the standard channels
> (stdin,stdout,stderr), or just leave that up to Tcl (which probably
> defaults to the current locale's LC_CTYPE, but I haven't tested that
> yet)...
>
>> As for Tk hacking for Pd, a big part of the pd-devel effort is to  
>> make
>> the Tk GUI code readable, and even extendable!  Feel free to hit me  
>> with
>> questions, either here, or I am in #dataflow quite a bit these days.
>
> Groovy.  I don't think I'll make the devel meeting today, but it's
> beginning to look as if I've got a bit of a bug in my bonnet about  
> this ;-)

Hey,

Its good to see someone iwlling to dive in deep.  It'll be great to  
have full UTF-8 support.  Patko and I were looking into how to do it  
on the C side, I think what you mentioned, using locale.h and  
setlocale() should be enough.  Maybe patko will chime in with some  
details.

.hc

>
>
> marmosets,
> 	Bryan
>
> -- 
> Bryan Jurish                           "There is *always* one more  
> bug."
> jurish at ling.uni-potsdam.de      -Lubarsky's Law of Cybernetic  
> Entomology

----------------------------------------------------------------------------

Programs should be written for people to read, and only incidentally  
for machines to execute.
  - from Structure and Interpretation of Computer Programs