[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

New direct XML converssion extension etc.



Hi,

I was thinking to add direct XML conversion tool to debiandoc-sgml by
modifying HTML conversion tool.  Along the way, I was also thinking to
add SGML pretty print conversion tool which always cleanly close SGML
explicitly.

Along the way, I realized some funny things about debiandoc.dtd.

<!ENTITY % xref "ref|manref|email|ftpsite|ftppath|httpsite|httppath|url">
<!ENTITY % emph "em|strong|var|package|prgn|file|tt|qref">
<!ENTITY % list "list|enumlist|taglist">
<!ENTITY % inline "(#pcdata|%emph|%xref|footnote|comment)+">
<!ENTITY % inpara "((%inline)|(%list)|example|include)+">
<!ENTITY % paras "(p+)">
<!ENTITY % sect "heading,(%paras)?">
...
<!ELEMENT abstract - o (%inpara)>
...
<!ELEMENT chapt - o ((%sect),sect*)>
...
<!ELEMENT heading o o (%inline) -(%xref)>
<!ELEMENT p o o (%inpara)>

This explains why <abstract> can not have <p>'s (docbook <abstract>
contains <para>'s.) 

I think docbook defines abstract as multiple <para> (simplified) as:
from dbpoolx.mod
<!ENTITY % para.class
            "formalpara|para|simpara %local.para.class;">
...
<!ELEMENT abstract %ho; (title?, (%para.class;)+)>

In debiandoc-sgml, some tags like <example>, <list> can not come right
after <chapt><heading>...</heading>  since it must be paras not inparas.

In docbookxml, at least list like structurescan be directly under
section:
from dbhierx.mod
<!ENTITY % divcomponent.mix
                "%list.class;           |%admon.class;
                |%linespecific.class;   |%synop.class;
                |%para.class;           |%informal.class;
                |%formal.class;         |%compound.class;
                |%genobj.class;         |%descobj.class;
                |%ndxterm.class;        |beginpage
                %forms.hook;
                %local.divcomponent.mix;">

...
<!ELEMENT simplesect %ho; ((%sect.title.content;), (%divcomponent.mix;)+)
                %ubiq.inclusion;>

I do not know how much structure difference I have to take care but I
tried to address extra <p> issues for abstract in SGML while also add
method to force addition of <P> or <para>.

Since I have somewhat working version, I am commiting it to CVS soon.

This new direct conversion will resolve issues reported to old
debiandoc2xml which convert through GROFF interface in which generated
XML was not correct and somewhat had many artfacts.  It was impossible
to determin where conversion failure occurs and groff->xml conversion
was outside tool.

At this moment, -1 (single file outputi) is the only working conversion.
-P option changes addition of <p> or <para> tags.  Also to keep *.ent
entries, I think we may need to borrow tool from debiandoc2dbxml.

I think I still need to fix conversion of appendix etc.

But this is news to you.

Osamu
-- 
~\^o^/~~~ ~\^.^/~~~ ~\^*^/~~~ ~\^_^/~~~ ~\^+^/~~~ ~\^:^/~~~ ~\^v^/~~~ +++++
        Osamu Aoki <osamu@debian.org>  Yokohama Japan, GPG-key: A8061F32
 .''`.  Debian Reference: post-installation user's guide for non-developers
 : :' : http://qref.sf.net and http://people.debian.org/~osamu
 `. `'  "Our Priorities are Our Users and Free Software" --- Social Contract



Reply to: