Help needed in extracting data from MsWord documents

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Test_Read_XML..Source_File: Unable to resolve prefix 'office'.
pattern = '/office:document-content/office:body/table:table'(Unknown URI, 12, 104)
Remaining tokens: ('/')
This looks like an XPATH for a document that looks something like this.

Code: Select all

<office:document-content>
   <office:body>
      <table:table>
         some data here
      </table:table>
   </office:body>
</office:document-content>
If so, you seem to be missing the namespace declarations for office and table.

See the example below which I got from W3.

Code: Select all

<?xml version="1.1"?>
<!-- both namespace prefixes are available throughout -->
<bk:book xmlns:bk='urn:loc.gov:books'
         xmlns:isbn='urn:ISBN:0-395-36341-6'>
    <bk:title>Cheaper by the Dozen</bk:title>
    <isbn:number>1568491379</isbn:number>
</bk:book>
Jim Paradies
shepli
Participant
Posts: 79
Joined: Fri Dec 17, 2004 9:56 am

Post by shepli »

:? My generated XML does have structure like this:
- <office:document-content xmlns:office="http://openoffice.org/2000/office"
- <office:body>
some data
- <table:table table:name="Table1" table:style-name="Table1">
some data
</table:table>
</office:body>
</office:document-content>
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

<office:document-content xmlns:office="http://openoffice.org/2000/office"
xmlns:table="http://openoffice.org/2000/office">
<office:body>
some data
<table:table table:name="Table1" table:style-name="Table1">
some data
</table:table>
</office:body>
</office:document-content>
Try adding the line in bold italics to this and see what happens.
Jim Paradies
shepli
Participant
Posts: 79
Joined: Fri Dec 17, 2004 9:56 am

Post by shepli »

Hi Jim,
I do have all the things below in my generated XML. I did not list all of them because I just want to show you the layout of the XML, and don't want the mail be too long. Sorry for the confusion. Sheping

- <office:document-content xmlns:office="http://openoffice.org/2000/office" xmlns:style="http://openoffice.org/2000/style" xmlns:text="http://openoffice.org/2000/text" xmlns:table="http://openoffice.org/2000/table" xmlns:draw="http://openoffice.org/2000/drawing" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="http://openoffice.org/2000/meta" xmlns:number="http://openoffice.org/2000/datastyle" xmlns:svg="http://www.w3.org/2000/svg" xmlns:chart="http://openoffice.org/2000/chart" xmlns:dr3d="http://openoffice.org/2000/dr3d" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="http://openoffice.org/2000/form" xmlns:script="http://openoffice.org/2000/script" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" office:version="1.0" office:class="text">
<office:script />
.....
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Shepli,

I've been able to replicate the error you received. It seems that the XSLT that DataStage generates does not include the namespace declarations from the XML document. Here's what I did to get it to run without error.

1. Run the job with tracing (Subroutine Calls) set on the XML input stage.
2. Copy the generated XSLT from the log. It's about the fourth last line and it starts with "<?xml version="1.0" encoding="UTF-8"?>"
3. Select the Output tab of the XML input stage and check the "Use custom stylesheet" box.
4. Copy the generated XSLT (excluding the bit before "<?xml version")
5. Add the namespace declarations to the XSLT

This should compile and run without error.
Jim Paradies
shepli
Participant
Posts: 79
Joined: Fri Dec 17, 2004 9:56 am

Post by shepli »

Jim,

Thanks for your help again. I was busy working on a project with deadline and did not get time to work on this issue these days.

I am able to set the "Transformation error mappings" for fatal, error, and warning to "trace" and set the output to "Use customer stylesheet". However, I am not able to find out where the "generated XSLT" is, or say, where the log is. Can you give me some guidance?
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Shepli,

1. Run the job from Director
2. In the job run dialog, select the Tracing tab
3. In the Tracing tab, select the name of the XML stage
4. In trace level, select the Subroutine calls (or all the boxes if you like)
5. Now hit the Run button and wait for the job to finish
6. Now open the job log (still in Director)
7. The 4th last entry should have something like "...<?xml version=..."
Jim Paradies
Post Reply