STR #3355: Support generation of UTF-8 file from FLUID

STR #3355

Application:	FLTK Library
Status:	1 - Closed w/Resolution
Priority:	1 - Request for Enhancement, e.g. asking for a feature
Scope:	3 - Applies to all machines and operating systems
Subsystem:	FLUID
Summary:	Support generation of UTF-8 file from FLUID
Version:	1.4-feature
Created By:	JYG
Assigned To:	matt
Fix Version:	1.4.0
Fix Commit:	b490ce3463e9008d03224feb44c8b365a8e21954
Update Notification:	Receive EMails Don't Receive EMails

Trouble Report Files:

#1	AlbrechtS 05:24 Nov 22, 2016

test.fl
0k

#2	AlbrechtS 05:27 Nov 22, 2016

main.cxx
0k

#3	AlbrechtS 05:27 Nov 22, 2016

fluid_write_code_utf8.patch
1k

Trouble Report Comments:

#1	JYG 05:42 Nov 21, 2016

FLUID generated cxx files with ASCII encoded UTF-8 using octal values. It's annoying to see "\303\251" instead of "é" and impossible to search string in the code. I think FLUID may have an option to use more modern file generation using utf-8 file with BOM or without BOM.

#2	AlbrechtS 05:24 Nov 22, 2016

For more information and the full discussion of this topic please see this thread in fltk.general: https://groups.google.com/forum/#!topic/fltkgeneral/gf0Z3BW-zuc This an edited excerpt of one of my replies: There can always be characters inside a string that must be quoted (decimal 0-31, e.g. 10 = 0x0a = <LF> = '\n') or DEL (decimal 127). The current fluid code does also quote all values in the range 128 to 255. I did not write the code, but I can only assume that this [was done because it] is always safe for all compilers... The patch I append should work for all Unicode characters if the compiler interprets strings as UTF-8. Now to the patch: I attach three files to this post for later reference: (1) test.fl: a fluid file with all ISO-8859-1 characters encoded as UTF-8 (only extended range, not ASCII part). This is also a subset of Microsoft's Windows Codepage 1252 ("Western"). Unicode range U+00a0 to U+00ff). (2) main.cxx: a main program to compile test.cxx. This #include's test.cxx and indirectly test.h generated by fluid from test.fl. (3) fluid_write_code_utf8.patch: the patch against FLTK 1.3.4 (stable release). This patch basically does three things: - Fix reading character string bytes "unsigned", i.e. in range 0-255. - Don't limit line length to avoid breaking lines inside UTF-8 char's. - Write all ASCII and UTF-8 characters literally, i.e. without quoting. You may use this patch if it works for you. Note that this is tested with the posted test cases, but I'm not sure if this will be okay for all users and compilers. A "complete" solution would split strings (limit line length) w/o breaking inside UTF-8 characters and would presumably have an option to switch literal UTF-8 output on and off (on: literal/new vs. off: octal-quoted/old behavior). Note: the posted patch is for FLTK 1.3 and contains only the minimal changes. The complete solution should be in FLTK 1.4 with an option to switch formats as described above.

#3	matt 12:29 Dec 17, 2021

Fixed in Git repository.

#4	matt 12:29 Dec 17, 2021

Fixed in Git repository.