2 Replies Latest reply: Mar 22, 2017 9:00 AM by Dirk Fischer RSS

    Check XML structure for errors before importing

    Dirk Fischer

      Hi folks,

       

      I struggle with importing a big number of XML files and would be glad, if I got some input how to handle this problem.

       

      I use the following code for importing the XML file and extracting the data:

      vPathToXmlFile contains the path to the file to be imported.

        RAW_DATA:

        Load

          Concat( RAW_XML, Chr(13) & Chr(10), %REC_ID ) As RAW_XML

          ;

        Load

          [@1:n] As RAW_XML

          ,RecNo() As %REC_ID

        From [$(vPathToXmlFile)]

        (Fix, UTF8);

       

        IPP:

        Load Distinct

          '$(vPathToXmlFile)' As %IMPORT_FILE

          ,Text( '$(vIdMachine)' ) As %ID_MACHINE

          ,[ParentSectionType] As REP_TYPE

          ,[ParentSectionNumber] As IPP_NO

          ,[MainSectionNumber] As IPP_PART_NO

          ,Timestamp#([MainSectionOpenedTime], 'YYYY-MM-DD HH:MM:SS') As START

          ,Timestamp#([MainSectionClosedTime], 'YYYY-MM-DD HH:MM:SS') As END

          ,FileTime( '$(vPathToXmlFile)' ) As FILE_TIME

          ,FileSize( '$(vPathToXmlFile)' ) As FILE_SIZE

        From_Field(RAW_DATA, RAW_XML)

        (XmlSimple, table is [IppNormalBalanceReport/HeaderDataForReport]);

       

      This is working fine as long as the file is not corrupt. Unfortunately, there are files, where the XML structure is corrupt and then the application stops loading. So now I need to check the structure before loading or I need to find a different option for error handling.

       

      The drawback of the text Import is, that you get only one row out of the file, so I can't go for checking the number of rows in the file (which might be a weak check, but better than nothing).

       

      Does anybody have a suggestion, how I could solve this problem?

       

      Regards,

       

      Dirk