XML::Edifact - an approach towards XML/EDI as a prototype in perl
  release 0.46 - UNOC MAINT
  Michael Koehne, ( kraehe@copyleft.de )
  Wed Apr 24 21:42:29 CEST 2002

  XML::Edifact is a set of perl scripts, for translating EDIFACT into XML.
  Version 0.45 improved UNOC handling. This is a maintenance relase because
  UTF8 in Perl is broken again.
  ______________________________________________________________________

  Table of Contents


  1. Introduction
  2. Release Notes:
     2.1 Edi2SGML-0.1: About the beauty of plain text
     2.2 XML-Edifact-0.2: It's hard work to cook up a second version.
     2.3 XML-Edifact-0.3x: About normalisation, namespaces and xml2edi
     2.4 XML-Edifact-0.4x: the portability track.

  3. Installation
  4. Known Bugs
     4.1 Double namespace declarations
     4.2 Stating level in Syntax identifier.
     4.3 Explicit Indication of Nesting
     4.4 XML::Edifact is slow!

  5. Roadmap
  6. Legal stuff
  7. Download


  ______________________________________________________________________

  1.  Introduction

  EDIFACT is often called "the nightmare of the paperless office" when
  you show a programmer the standard draft. Those 2700 pages of horror-
  filled advisory-board English have given many programmers headaches.


  EDIFACT is trying the impossible: a single form for the real world.

  Orders, invoices, freight papers, etc., always look different, if they
  come from different companies. EDIFACT tries to fulfill all needs of
  commercial messages, regardless of type and origin. Of course the real
  world is neither simple nor complete.  Nevertheless, it's important
  for the top companies and their suppliers -  you know, those who have
  been in business for years and can pay for a mainframe and a pack of
  gurus.

  XML/EDI is meant to provide a simpler (KISS) format that can be
  translated to and from EDI, to allow smaller companies to avoid
  slashing down forests and retyping into a computer keyboard stupid
  lines printed by other computers.

  This is NOT XML/EDI, it's certainly not KISS. The edifact03.dtd
  reflects the original words of the EDIFACT standard as closely as
  possible, on a segment, composite and element level.

  This DTD simplifies EDI inasmuch as it doesn't distinguish between
  e.g. INVOICE or PRICAT, but only defines a generic message type called
  edifact:message. The benefit is of course that it's possible to
  convert any EDI message into edifact. The drawback is that the dtd is
  really relaxed. Validation of EDIFACT message design can therefore not
  be done by a validating XML parser. Message designers will still need
  knowledge about EDIFACT message design and EDIFACT tools.

  But once the message is designed, it's simpler to read it with XML.

  2.  Release Notes:

  2.1.	Edi2SGML-0.1: About the beauty of plain text

  Standards should be based on standards. EDIFACT is based on ASCII and
  documentation is available from WWW.Premenos.Com as plain text. Well,
  the original contains some PCDOS characters. I took the liberty of
  replacing them with ASCII in this distribution to improve readability.
  I'm not talking about human readability here. A friend at SAP joked
  that plain paper is the only platform-independent format in that case.
  But I dislike retyping them. And plain text is more flexible, as I'm a
  programmer.

  Unlike the 0.1 distribution, following distributions will only contain
  those documents I need to parse by the scripts. Download the 0.1 for a
  complete set, or surf at Premenos.

  Note: Premenos was the old url - better start surfing now at
  www.unece.org

  2.2.	XML-Edifact-0.2: It's hard work to cook up a second version.

  As usual, second versions claim to be better documented and tested,
  but the truth is that they contain more features. So let's talk about
  features:

  First of all: It looks like a module. "use strict" and the package
  concept are useful things. But it'll take a lot of RTFM for me to
  understand the perl way of doing it. The XML/Edifact.pm doesnt export
  anything, and it's not even neccessary to "perl Makefile.PL; make
  install".

  The 0.2 version is not intended to be installed; it's a test case.

  So let's talk about the test case: Run ./bin/make_test.sh from here,
  and everything should be fine. Still, it will take some RTFM for me to
  understand the perl way of regession testing. But the
  ./bin/make_test.sh is the one this version offers ,-)

  I'm now using a tied hash for speeding startup. I've decided to use
  SDBM, as this DBM comes with any perl and a small DBM is better in
  this case.

  I've provided a document type definition. And it's now possible to use
  a validating parser like SP from James Clark. You may also notice the
  renaming of Edi2SGML to XML::Edifact. This name change reflects that
  my script is now producing XML and not SGML, and the name should point
  to the place in the CPAN hierarchy where this package belongs.

  2.3.	XML-Edifact-0.3x: About normalisation, namespaces and xml2edi

  You may notice the major change in the DBM design. While the old DBM
  files were modeled closely on the batch directory, this version has
  been partly normalised to improve coding. It's also denormalised for
  some perlish reasons. Unloading this DBM into a relational database
  would be possible with varchars, but the semantics of the 2nd element
  in segments and composite could only be expressed with some weird
  object relational databases like PostgreSQL.

  Also the DTD was changed for namespace reasons. The 0.2 needed to add
  the word literal, where element names clashed with segment names of
  the standard. And it dropped the composite information.  Now
  trsd:party.name means the segment, while tred:party.name points to the
  element.

  This allows parsing the XML message to produce an EDI message without
  a backtracking parser. The event-based parser used for xml2edi is
  quite new, and certainly contains some bugs.	Please dig around in
  your real-life messages, translate them with edi2xml, then back with
  xml2edi, and compare the original with the double translation. I've
  tried for a robust solution, which doesn't croak with codes from an
  unknown namespace, I hope.

  Version 0.30 and 0.31 used edicooked:message as namespace; versions
  0.32 and up will use edifact:message for the main namespace. The
  technical reason is quite simple. The namespace prefix of a message
  does not mean anything. It's only a shorthand for the provided URI in
  the xmlns attribute. So any distinct XML message can claim to be in
  the edifact: namespace, if the URI is distinct. So if other projects
  start to be implemented, they can claim to be in the edifact:
  namespace by the same right.

  Version 0.33 first of all solves a bug which showed up with xml2edi
  and a TeleOrdering message translated by edi2xml. I just forgot to
  encode less than and ampersand, if they occured as translation in a
  code list. So NAD+OB+0091987:160:16' will now be translated using Dun
  & Bradstreet, which is right.

  There are two other major improvements. Version 005.60 contains a
  profiler, and finding the hot spots and optimising the SDBM by further
  denormalisation improved performance of edi2xml by factor 12. I hope
  nobody has used the SDBM internals so far. The last major improvement
  is that I'm getting familar with ExtUtils::MakeMaker, File::Spec and
  friends. Version 0.33 is the first that installed - at least on my
  Linux box :-)

  Version 0.34 introduced coding of UN/EDIFACT code list extensions by
  XML-Edifact namespace migration.

  Version 0.34 fixed a bug concerning the release indicator. As a minor
  improvement, the edi2xml and xml2edi scripts now have pod
  documentation.

  Version 0.35 was a bug fix, thanks to Detlef Lammermann from Dr.
  Materna GmbH, who found that ??' was misinterpreted.

  2.4.	XML-Edifact-0.4x: the portability track.

  The intention is to have a version running under as many operating
  systems as possible. Bug fixes may still merge into this version, but
  new features will be implemented in the 0.50 track.

  Version 0.40 started with a minor bugfix ( thanks to Werner F.C. Bruns
  ) and questions for a W32 port at a DIN meeting in Frankfurt. John
  Cope made the first PPM/PPD that was known to run on W32. But as I
  don't have any W32 system, I was unable to test it.

  Version 0.41 was the first version known to build and to pass its
  regression test under Windows NT, thanks to Arend R. Braun. The only
  change was in Makefile.PL.

  Version 0.42 requires Perl 5.6, and implements interpretation of the
  Stating Level. Now UNOC (Latin1) is translated to UTF8.


  Version 0.43 improved in grammar and spelling - thanks to Julian
  Olson.

  Version 0.44 improved in memory consumption - thanks to Carlos De
  Matos, who confrontet me with DELJIT messages of megabyte size.

  Version 0.45 improved UNOC handling. Perl 5.6.1 droped the 'tr'
  function to convert between ISO-8859-1 and UTF8, and introduced a new
  way. Thanks to Jarkko Hietaniemi for his regexp to produce a version
  compatible from Perl 5.6.0 up.


  3.  Installation

  I've included my modified documents, so others will be able to rebuild
  the DBM files. You may need a Unix-like system because of newline
  conventions.



	   $ perl Makefile.PL

	   I know I should check for those 99 possible places,
	   but I prefer to ask :-)

	   URL for public documents [http://www.xml-edifact.org]
	   Directory on this system [/tmp/xml-edifact]

	   Writing Makefile for XML::Edifact

	   $ make



  perl perl Makefile.PL will first ask two questions. The reason is that
  XML::Edifact wants to install its document type definition on a web
  server to allow validation XML parser to grep the DTD.

  Do not change this setting the first time, as changes cause
  XML::Edifact to fail its regression test.  You may change those
  decisions later by reperling the Makefile.PL, or by editing the
  XML::Edifact::Config module in your SITE_PERL.

  Make will take a while and then you may hope to have a working
  database. This database covers the 96b version of the UN/EDIFACT batch
  directory and will be installed as XML::Edifact::d96b later.



	   $ make test



  The regression test will translate any .edi file found in the examples
  directory to xml and translate the xml back to EDIFACT.  The result
  should not change.



	   $ make install



  This will install the XML::Edifact module, the D96B batch directory,
  various files for the URL and two scripts: edi2xml and xml2edi

  You can now try your own UN/EDIFACT files. I really want to know what
  your EDI messages look like, do they break anything, what about your
  code list extension, ... ?

  Testing different real examples should show some bugs I havn't thought
  of. Think about the O'Reilly invoice or the Dubbel:Test and you should
  get the idea. I've tried to implement the UNA correctly, but this may
  need some additional debugging. Take a look at the difference between
  the edi.tst files from Frankfurt and the Springer message. The last
  one uses newline as a 9th character in UNA, so it's nearly human-
  readable.

  One last word - I hope this complex installation will work on most
  Unix look-alikes, but I'm quite sure that it'll break on Windows and
  Mac. If you have such a system, and have problems during installation,
  drop me a mail. You are granted my help, as I need your help to make
  the installation portable across different platforms.

  4.  Known Bugs

  4.1.	Double namespace declarations

  Namespace declaration was redefined in January 1999. XML::Edifact 0.30
  produced both the old and the new declarations. XML::Edifact 0.31
  dropped the deprecated declarations! If you have an old browser, you
  may have to download XML::Edifact 0.30 and edit the current
  XML::Edifact.  Search for HERE_ and adapt the headers to your browsers
  preferences.

  4.2.	Stating level in Syntax identifier.

  The stating level in EDIFACT speak is called charset encoding in XML
  speak, and it's of course important if you thing about non US/UK
  products.  Currently only UNOA, UNOB and UNOC are translated
  correctly. Other character encodings than Latin1, are not yet
  supported.

  4.3.	Explicit Indication of Nesting

  This has not been coded yet, as no example messsages are available to
  me.

  4.4.	XML::Edifact is slow!

  The 0.50 will be times faster ;-)

  5.  Roadmap

  I'm using even and odd numbering to distinguish between stable and
  experimental versions. Well, 0.2 was not as stable as an even number
  suggests. And I hope this 0.3x is stable enough, as it's often said
  that a third version will be the first useful one.

  Both 0.4x track and 0.5x track are active currently. The 0.35 was
  quite stable, and there is a need for portability, while the version
  under development is far from being usable.

  I had to realise that the roadmap is far to large, so I had to drop
  the steps 0.7x to 0.9x. The functionality will become unbundled into
  other CPAN modules if necessary.

     0.4x
	This version focuses on portability, of the EdiCooked style.
	While Perl ensures portability across the unix'es, MacOS and
	Win32 will cause some problems. The 0.4 version will also be the
	first one intended to be installed. As installation also means
	configuration of non Perlish paths, e.g. for webserver,
	mime.types, mailcap, dtds and databases, XML::Config.pm will be
	discussed in the perlxml list.

     0.5x
	This is the unstable version track.

	XML::Edifact now provides PerlSAX objects as drivers and
	handlers to UN/EDIFACT, making usage more flexible.

     0.6x
	Stabilisation by discussion and consensus about features
	introduced with 0.5.

     1.0
	I hope that a consensus has been found in this direction, so the
	DTDs won't change in further releases. Those versions may focus
	on using XML::Edifact in real life applications. I can imagine
	an SQL interface, a Cobol interface, a message designer, a
	DOM/CORBA wrapper, and much more.

	Once I think XML::Edifact is complete, I have to think about
	speed. Perl is a perfect language for prototyping, but profiling
	and using a low level language like C for hot spots will be
	necessary to handle large batches.

  6.  Legal stuff

  Programs provided with this copy called XML-Edifact-0.32.tgz may be
  used, distributed and modified under terms of the GNU General Public
  License.

  Files in the ./examples directory are from various sources and free of
  claims as far as I know.

  Files in the ./un_edifact_d96b directory are based on EDI batch
  directories and are therefore copyrighted by the United Nations.  See
  un_edifact_d96b/LICENAGR.TXT.

  Files that are produced during the bootstrap process and placed in
  XML::Edifact::d96b are based on the original UN/EDIFACT standard and
  therefore not covered by GPL, but likely copyrighted by the United
  Nations. The same applies to the text tables produced during
  Bootstrap.PL.

  Besides the GPLed Edition, a Custom Edition exists, if you dislike
  GPL. Drop me an eMail and ask for price and conditions.

  7.  Download

  I just got a message from PAUSE that I can upload it to :



	   $CPAN/authors/id/K/KR/KRAEHE



  XML::Edifact requires XML::Parser, so to download and install, type:



      $ perl -MCPAN -e shell
      cpan> install XML::Parser
      cpan> install XML::Edifact



  or ftp directly at:



	   ftp://ftp.cpan.org/pub/perl/CPAN/modules/by-module/XML/XML-Parser-*.tar.gz
	   ftp://ftp.cpan.org/pub/perl/CPAN/modules/by-module/XML/XML-Edifact-*.tar.gz



  The canon source of the XML::Edifact project is now:



	   http://www.xml-edifact.org/



  This site contain various example files, research papers, a complete
  set of UN/EDIFACT batch directories and, most important, current
  versions from the unstable track.