to extract data from a field containing XML structured data. Let's say this is not an issue and I managed to successfully code that (at least on some PoC data).
I'm struggling to connect records in the result of this load with the record in the source table.
As an example - let's say we have a table with Invoices, which contains Invoice Items stored in a single field as XML. LOAD ... From_Field generates the table with all of the Invoice Items, but unless there is a reference to Invoice ID included in XML, I'm not able to link Invoice Items to the specific Invoice.
Also, if the structure of embedded XML is slightly more complex (and therefore I need to load the data using multiple LOAD statements), I have difficulties connecting these tables (autogenerated keys from XML loads reset to 0 with each record). I found out that FileName() function helps, but I'm not really sure if this is the right solution.
You need to use the autogenerated keys, or a key that you can generate, to link things together if they don't have a shared value field. Adding FileName() -- or some other uniqueness -- is the correct way to address the resetting key issue.
One useful technique is to add some xml element as required -- for example a root element -- that allows the xmlsimple format to treat things as a complete document. For example see here
Thanks for this link, I didn't come across this before. It looks quite similar to what I already found out and made work. However, I don't think that proposed solution will be in fact reliable, because it seems, that From_Field cycles through FieldValues and not records. I have tested this and:
When there are duplicate values in From_Field field, there are fewer records in the new table (original table has 3 records with 2 distinct values of Xml field, new table has only 2 records).
I tried to load the source table from point 1., then inverted the record order (resident load with order by and drop original table) and then loaded From_Field to a final table... and... the resulting table was in original order.
Null values in Xml field throw errors
FieldValue('Xml', Num#(FileBaseName(), '<0>')+1) as Xml,
appears to return the original full Xml value.
However, this is in fact a hypothesis based on some tests and observation, but I didn't want to rely on it...
In addition, your solution from the other thread with RecNo() IMHO won't work in all cases, because:
of the inverted order and Null tests I mentioned above
in case XML contains multiple (variable number of) records
The only feasible solution for enabling the relationship to source record I was thinking about is to inject some ID (this can be your RecNo() value, Hash or AutoNumber([Xml])) to the Xml field - either in source, or using some string manipulations in load script: