Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I want to use
LOAD ... From_Field (...)
to extract data from a field containing XML structured data. Let's say this is not an issue and I managed to successfully code that (at least on some PoC data).
I'm struggling to connect records in the result of this load with the record in the source table.
As an example - let's say we have a table with Invoices, which contains Invoice Items stored in a single field as XML. LOAD ... From_Field generates the table with all of the Invoice Items, but unless there is a reference to Invoice ID included in XML, I'm not able to link Invoice Items to the specific Invoice.
Also, if the structure of embedded XML is slightly more complex (and therefore I need to load the data using multiple LOAD statements), I have difficulties connecting these tables (autogenerated keys from XML loads reset to 0 with each record). I found out that FileName() function helps, but I'm not really sure if this is the right solution.
Any ideas?
Many thanks.
You need to use the autogenerated keys, or a key that you can generate, to link things together if they don't have a shared value field. Adding FileName() -- or some other uniqueness -- is the correct way to address the resetting key issue.
One useful technique is to add some xml element as required -- for example a root element -- that allows the xmlsimple format to treat things as a complete document. For example see here
-Rob
Hi @rwunderlich ,
thanks for helping out.
@rwunderlich wrote:
Adding FileName() -- or some other uniqueness -- is the correct way to address the resetting key issue.
Any ideas for the "or some other uniqueness"? Of course, FileName() is a solution, I'm just not sure if that's the best solution.
@rwunderlich wrote:
One useful technique is to add some xml element as required -- for example a root element -- that allows the xmlsimple format to treat things as a complete document. For example see here
Thanks for this link, I didn't come across this before. It looks quite similar to what I already found out and made work. However, I don't think that proposed solution will be in fact reliable, because it seems, that From_Field cycles through FieldValues and not records. I have tested this and:
FieldValue('Xml', Num#(FileBaseName(), '<0>')+1) as Xml,
appears to return the original full Xml value.
However, this is in fact a hypothesis based on some tests and observation, but I didn't want to rely on it...
In addition, your solution from the other thread with RecNo() IMHO won't work in all cases, because:
The only feasible solution for enabling the relationship to source record I was thinking about is to inject some ID (this can be your RecNo() value, Hash or AutoNumber([Xml])) to the Xml field - either in source, or using some string manipulations in load script:
Replace([Xml], '<RootElement>', '<RootElement><SourceId>'&[SourceId]&'</SourceId>')
but to me this looks a bit "dirty". So I was wondering if someone knows a better option.
Have you ever found a solution to this ?
I have the following example that works well, but...
invoice_records:
NoConcatenate Load *
Inline [
invoiceID, product_names_csv
uuid-1, 'shampoo, burger,"bread sticks,(12)", eggs'
uuid-2, '"bread sticks,(24)", tuna,,'
uuid-3,',eggs,eggs, soap'
];
invoice_records2:
NoConcatenate Load
RecNo() as rowid, *
Resident invoice_records;
Drop table invoice_records;
result_table:
Load *
FROM_FIELD (invoice_records2, product_names_csv)
(txt, utf8, no labels, delimiter is ',', msq);
result_table2:
NoConcatenate Load RecNo() as rowid, *
Resident result_table;
Drop Table result_table;
final_result:
NoConcatenate Load rowid, invoiceID
Resident invoice_records2;
Join Load *
Resident result_table2;
Drop Tables invoice_records2, result_table2;
as soon as I use input csv data that has different column length,
example :
invoice_records:
NoConcatenate Load *
Inline [
invoiceID, product_names_csv
uuid-1, 'shampoo, burger,"bread sticks,(12)", eggs'
uuid-2, '"bread sticks,(24)", tuna'
uuid-3,',eggs,eggs,eggs, soap'
];
Qlik does nonsense and generates new tables on the fly ([result_table-2], [result_table-3]) during the Load From_Field and I can't use RecNo() or find a solution to this mess.
Anybody has a solution to that ? ...apart from editing each csv, to add commas and make every csvs in product name csv have the same number of elements in each row 🤯
Hi,
If I understood your point, then the reason behind "new tables on the fly" is that FROM_FIELD apparently executes separate load for every value of product_names_csv field. As soon as this load returns table with known structure, it performs implicit concatenate. Otherwise it creates a new table.
It has the same effect as putting all those CSVs into separate files, storing them into a folder and executing
LOAD *
FROM [lib://myconnection/*.csv] (txt, delimiter is ',', ...);
Spoiler alert: it will behave exactly the same.
Solution is - enforce the load result has always the same structure or make sure those tables concatenate some other way...
Now, the solutions:
BR,
Martin
You can force Concatenation like this:
result_table:
Load 1 as DummyField AutoGenerate 0;
Concatenate(result_table)
Load *
FROM_FIELD (invoice_records2, product_names_csv)
(txt, utf8, no labels, delimiter is ',', msq);
Drop Field DummyField;
-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com
(don't forget to iterate backwards = FOR cycle with STEP -1, as deleting table #1 reindexes the rest of tables - table #2 becomes #1, etc.)
🤔 Thank you, that's a good thing to know about the Qlik engine.
Solution 3 for me is not a solution, because if you have a csv of 4 elements in rowa 1 and 3 but a csv of 5 elements in row 2, then when you concatenate all your tables rows won't be ordered the same as in initial table (invoice_records) so there's no way you can join back using RecNo().
invoice_records would first require to be sorted sorted by csv cardinality which cannot be done easily/efficiently.
Solution 1 is not possible as I do not know in advance the structure of the csv field.
but...
I guess that if I fix a maximum cardinality of possible elements in the csv field I can do something like
result_table:
Load
if(IsNum(FieldValueCount('@1')), Null(), @1) as @1,
if(IsNum(FieldValueCount('@2')), Null(), @2) as @2,
...
if(IsNum(FieldValueCount('@n-Max')), Null(), @n-Max) as @n-Max,
FROM_FIELD (invoice_records2, product_names_csv)
(txt, utf8, no labels, delimiter is ',', msq);
I'll try that.
Solution 2 looks promising but I have to experiment the Concactenate(TableName) in front of the Load from_field.
RESolution 3 - in any case it's not a good idea to rely on row ordering only. I'd just advice you to revisit my discussion with Rob (above) on identifying the load number/referring to the source field value index by FileName() function. Once you call that function (for instance LOAD FileName(), * FROM_FIELD ...), you don't have to be concerned about the absolute row ordering in the table. 😉 This way you know to which field value the parsed record(s) is(are) related.
RE Solution 1 - then there is something wrong in your design. 😉
How do you want to analyze the content of attached CSV when you don't know the structure? You either have to unpivot via crosstable, or have a complete mess in your fieldset (in a way that you can't tell which fields are present in your results and which are not). Former is OKay, latter will eventually cause you a lot of headaches.
RE Solution 2 - see Rob's reply above. No experiments needed anymore.
> Re Solution 3
interesting (I've missed that) !
> You either have to unpivot via crosstable, or...
Yes, the whole goal is to pivot afterward.
> Re Solution 2
That's just perfect.
I finally have my solution as:
invoice_records:
NoConcatenate Load *
Inline [
invoiceID, product_names_csv
uuid-1, 'shampoo, burger,"bread sticks,(12)", eggs'
uuid-2, '"bread sticks,(24)", tuna'
uuid-3,',eggs,eggs,eggs, soap'
uuid-4,', , cheese, banana, orange, potato'
];
invoice_records2:
NoConcatenate Load
RecNo() as rowid, *
Resident invoice_records;
Drop table invoice_records;
empty_result_table:
Load 1 as DummyField AutoGenerate 0;
Concatenate(empty_result_table)
Load *
FROM_FIELD (invoice_records2, product_names_csv)
(txt, utf8, no labels, delimiter is ',', msq);
result_table2:
NoConcatenate Load RecNo() as rowid, *
Resident empty_result_table;
Drop Table empty_result_table;
final_result:
NoConcatenate Load rowid, invoiceID
Resident invoice_records2;
Join Load *
Resident result_table2;
Drop Field rowid From final_result;
Drop Tables invoice_records2, result_table2;
transposed_result_tmp:
CrossTable (column_to_drop,product)
Load * Resident final_result;
Drop Field column_to_drop From transposed_result_tmp;
final_transposed_result:
NoConcatenate Load * Resident transposed_result_tmp
Where not IsNull(EmptyIsNull(product));
Drop Table transposed_result_tmp;
Thanks for the help.
Have you tried CrossTable-ing right away? Quite probably it'll fail (crosstable only first table), but it's worth a try... 😉