Skip to content

Commit

Permalink
* Merged XML Unicode extensions
Browse files Browse the repository at this point in the history
  • Loading branch information
Jan Wielemaker committed Nov 10, 2006
1 parent 9f02f0e commit bda8e3d
Show file tree
Hide file tree
Showing 26 changed files with 2,987 additions and 1,142 deletions.
5 changes: 5 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
Oct 27, 2006

* ENHANCEMENT: Started branch XML_UNICODE to provide support for Unicode
filenames, tags and elements.

Aug 28, 2006

* DOCUMENTATION: Moved to sgml.doc, using the same system as the
Expand Down
3 changes: 2 additions & 1 deletion Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ INSTALL=@INSTALL@
INSTALL_PROGRAM=@INSTALL_PROGRAM@
INSTALL_DATA=@INSTALL_DATA@

LIBOBJ= parser.o util.o charmap.o catalog.o model.o xmlns.o utf8.o
LIBOBJ= parser.o util.o charmap.o catalog.o model.o xmlns.o utf8.o \
xml_unicode.o
PLOBJ= $(LIBOBJ) error.o sgml2pl.o quote.o
SGMLOBJ= $(LIBOBJ) sgml.o
DTD2PLOBJ= $(LIBOBJ) dtd2pl.o prolog.o
Expand Down
2 changes: 1 addition & 1 deletion Makefile.mak
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ PLHOME=..\..
PKGDLL=sgml2pl

LIBOBJ= parser.obj util.obj charmap.obj catalog.obj \
model.obj xmlns.obj utf8.obj
model.obj xmlns.obj utf8.obj xml_unicode.obj
OBJ= $(LIBOBJ) sgml2pl.obj error.obj quote.obj
SGMLOBJ= $(LIBOBJ) sgml.obj
DTDFILES= HTML4.dcl HTML4.dtd HTML4.soc \
Expand Down
16 changes: 16 additions & 0 deletions TODO
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,19 @@ TODO LIST:
<tag name="value>
* Allow for (a,b) as attribute-type (name-group)
* Handle source-info in included SYSTEM entities

UNICODE:

* Started full unicode support under the CVS branch XML_UNICODE.
* Old version only allows for unicide in CDATA (attribute values
and element content)
* New adds throughout: element-names, tags, filenames, etc.

ISSUES:
- Port to Windows
- Testing
- File-entities
- Leak setting filename using istrdup() in sgml2pl
- Verify performance. Optimise.
- Own character classification and conversion
- Copy from Prolog?
2 changes: 2 additions & 0 deletions Test/ok/utf8-ru.ok
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[element('Человек', [язык=мова], ['Borys'])].
[].
3 changes: 3 additions & 0 deletions Test/utf8-ru.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<?xml version="1.0" encoding="utf-8"?>

<Человек язык="мова">Borys</Человек>
27 changes: 21 additions & 6 deletions Test/wrtest.pl
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,13 @@
ml_file(Ext),
file_base_name(File, Base),
\+ blocked(Base),
format(user_error, '~w ... (ISO Latin-1) ...', [Base]),
fixed_point(File, iso_latin_1),
format(user_error, ' (UTF-8) ...', []),
format(user_error, '~w ... ', [Base]),
( \+ utf8(Base)
-> format(user_error, ' (ISO Latin-1) ... ', []),
fixed_point(File, iso_latin_1)
; true
),
format(user_error, ' (UTF-8) ... ', []),
fixed_point(File, utf8),
format(user_error, ' done~n', []),
fail
Expand All @@ -63,7 +67,7 @@
ml_file(sgml).
ml_file(html).

% blocked(+File)
%% blocked(+File)
%
% List of test-files that are blocked. These are either negative
% tests or tests involving SDATA.
Expand All @@ -74,8 +78,19 @@
blocked('cent-nul.xml').
blocked('defent.sgml').

fixed_point(File) :-
fixed_point(File, iso_latin_1).

%% utf8(+File)
%
% File requires UTF-8. These are files that have UTF-8 characters
% in element or attribute names.

utf8('utf8-ru.xml').


%% fixed_point(+File, +Encoding)
%
% Perform write/read round-trip and validate the data has not
% changed.

fixed_point(File, Encoding) :-
file_name_extension(_, xml, File), !,
Expand Down
Loading

0 comments on commit bda8e3d

Please sign in to comment.