Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Introduction
Document electronic or otherwise is now the preferred media for dissipating and distributing information. Postscript from Adobe System was the language for most printers, which was used for printing paper documents. Postscript print file can be transformed into PDF( portable document format ) file with the help of Adobe's Distiller, which can be viewed by Adobe's Acrobat. However it is not possible to edit PDF file. The language of Internet is HTML. It is vastly superior to PDF but it is not for document transformation but for document distribution. HTML cares less regarding the document content and is for document rendering. Documents can be simple with a few component elements or very complex. Thus it is desirable to have a framework for creating new document type as demanded by Information to be stored. It is required that the information viewer or browser is customizable on the fly whenever a new Information with unknown document type arrives. XML is the language of choice for describing documents and documents type. XML is simplified form of SGML with some discipline regarding mark up nesting and ending. XML need not have an explicit Document Type Definition. In XML, there are no implicit exclusion or inclusion of document elements; this is allowed in SGML. XML document can be handled OS independent way with client and server side Script and Java programs. SGML and its many subsets such as XML and HTML are structuring techniques, available today, are increasingly understood by publishing software such as Internet Browsers, Word Processors etc. Some of this software also provides application-programming interfaces to manipulate SGML documents and present the same to the world audience.
Define correspondence between set of ordinal numbers with character glyphs or appearance Transform the data to incorporate its latent structure and make it a document. Presentation of the document in a publishing media.
There are several reasons for conversion of legacy documents to XML, HTML, PDF, Word or Framemaker format. It is possible then to transport and distribute documents over Internet or in CDROM easily. It is desirable that the conversion is quick, economical and error free. The documents converted should either retain same appearance or given a new look. When the documents to be converted in many hundred thousand pages then it is further desired that the conversion should be automatic and inherently error free. SGML document is an intermediate time independent representation of any source document. Format specification can be provided by Style sheet in DSSSL. Formatting can also be specified in any text processor, which can read and process SGML documents. SGML document has a structure definition called DTD. It is required to identify and formalize the structure latent in Legacy documents in terms of DTD. It is required to specify formatting styles for each component elements in DTD. This style has to be implemented into Text processing environment such as Word, Framemaker where the document is being formatted. It may also be required to transform the document into Internet document such as HTML, XML.
SGML document. An SGML document is a Browser independent representation. It is required to find out the latent structure of the document in case of SGML. SGML elements will have to be given Formatting Style. Formatting style is not very important. Different Element Tags can be given different Format or over rides of some formatting properties and rest they inherit from the context. SGML can itself be used for defining formatting style as in XML. SGML Elements do have some associated meaning such as Glossary. Styles can be predefined for SGML elements and they may be changed as per composer or publisher's taste. But greater efforts are required to find the document structure in terms of SGML DTD. IBM Starter Set GML has a simple DTD. Any extension of GML is an extension of this DTD. Following iterative steps are taken to arrive at DTD for an IBM based Legacy Document. Filter the document and replace tags with SGML Element Tags of IBM supplied GDOC DTD Parse the converted document with the current DTD If error then modify DTD or the Tagged Document and execute the step above
Else, the SGML Document is produced - Tagged Document and corresponding DTD
A template and corresponding Element Definition Document( EDD) of Adobe FM+SGML for the DTD are part of the tool set. The styles for any new elements in the DTD are incorporated into the corresponding EDD of Adobe FM+SGML. EDD further can be customized with user preferred style for document elements. A FM+SGML template is then generated which is used for rendering the SGML document. By carefully choosing the initial documents to be converted it is possible to exhaust all the document elements existing in an installation for its set of documents. Thus single Preprocessor, DTD, Template may be generated for an installation which would convert all its documents to FM+SGML. Some document elements in SGML document may need to be further transformed for correct rendering. PERL script can be developed for the same. It is possible to define HTML document elements for SGML elements in FM+SGML. It is possible then to convert SGML documents to HTML or XML documents with Style sheets for rendering. Index, Table of Contents etc. can be generated in FM+SGML using special templates for the same. It is possible to transform SGML document into XML document and render the same using XSL using Jade and SP software developed by James Clark. The tool set consist of GML Interpreter, DTD, EDD, Template, Filters, EBCDIC to Windows ANSI Character table. It takes about 2-minutes turn around time in IBM mainframe to convert 1000 pages Legacy document to an SGML document in VM/CMS. It takes about 15 minutes to convert this same SGML document to FM+SGML document. It is expected when the application is fully configured and customized for an installation, 1000 pages of conversion can be done in one hour. Script text processor in IBM mainframe is intended for print media. Footnotes, Cross-reference and Index elements are created with SCRIPT and GML in Legacy documents occur anywhere and everywhere freely. They are picked up during Text processing time by SCRIPT processor and replaced with page number etc. But transformed document is intended not only for Paper media but also Screen viewing in CDROM or in Web. It is required to make hyper jump to reference point. XML as a language does not permit any document element to be inserted anywhere or everywhere. So preprocessor converts Legacy document to SGML. FM+SGML is used to transform the resultant document to XML or HTML in straightforward manner after the document has been converted to Frame Maker structured document. WebWorks Professional Edition can be used to convert FM+SGML document with the same or changed appearance for browsing by Internet Explorer or Netscape. The documents can be broken up into smaller files for fast transmission on web. Quality of appearance of the document can be enhanced as per user requirement. Documents are converted to HTML or XML with Cascading Style Sheets.
SGML Converter
Report on Conversion
Converted Books
Table Snap
Flow Snaps
Code Snap
Chapter Snap
TOC Snap
Index Snap
Back matter
Reference
1. 2. 3. 4. 5. 6. Adobe FrameMaker+SGML 6.0 Developer's Guide, Online manual WebWorks Publisher's Professional Edition User Guide GML Starter set User's Guide, IBM Document No. SH20-9186-07 SCRPT/VS User's Guide, IBM Document No. S5444-3191-01 James Clark's Home page; http://www.jclark.com/ ISO/IEC 10179:1996 Information technology -- Text and office systems -Document Style Semantics and Specification Language (DSSSL), dated April 1, 1996, © 1996 ISO/IEC 7. ArbourText, Inc., http://www.arbourtext.com/sgmlxept.html, SGML Exceptions and XML 8. Clark, James, http://www.w3.org/TR/NOTE-sgml-xml.html, Comparison of SGML and XML 9. World Wide Web Consortium (W3C), http://www.w3.org/TR/1998/REC-xml-19980210, Extensible Markup Language (XML) 1.0: W3C Recommendation 10-February-1998
This paper was submitted to XML 2000 Conference in Washington. http://www.linkedin.com/in/kankanroy Links to some recent works done by Kankan Roy: A Mobile System Simulator Engineering Document Management System Collaborative Platform for Healthcare Test site for Secured Web Communication A Short Note on Security, Privacy and Encryption Interesting programming problems Security Study Some Clocks for your XP Desktop or using Internet Explorer: http://kankanroy.bravehost.com/dual_clock.html http://kankanroy.bravehost.com/x_clock.html Miscellaneous: http//kankanroy.bravehost.com/tour1.html
http://kankanroy.bravehost.com/Private/ABN%20ARCHITECTURE%20DOCUMENT.htm : A PDF generating Web service http://kankanroy.bravehost.com/Private/AVM%20Message%20Authentication.htm : Specification for Message Authentication http://kankanroy.bravehost.com/Private/On%20HMac%20Authentication%20for%20AVM%20Application. htm: Message authentication for Enterprise SOA tool
http://kankanroy.bravehost.com/SERVICE%20ORIENTED%20ARCHITECTURE.htm: On SOA
Personal:
http://kamalroy.bravehost.com/index.html
http://silentspaces.bravehost.com/
http://kankanroy.bravehost.com/brahman/bhikhari.html http://www.facebook.com/kankan.roy