Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
mk_kmx
Partner - Contributor III
Partner - Contributor III

Identify source field value in FROM_FIELD load

Hi all,

 

I want to use

LOAD ... From_Field (...)

to extract data from a field containing XML structured data. Let's say this is not an issue and I managed to successfully code that (at least on some PoC data).

I'm struggling to connect records in the result of this load with the record in the source table.

 

As an example - let's say we have a table with Invoices, which contains Invoice Items stored in a single field as XML. LOAD ... From_Field generates the table with all of the Invoice Items, but unless there is a reference to Invoice ID included in XML, I'm not able to link Invoice Items to the specific Invoice.

 

Also, if the structure of embedded XML is slightly more complex (and therefore I need to load the data using multiple LOAD statements), I have difficulties connecting these tables (autogenerated keys from XML loads reset to 0 with each record). I found out that FileName() function helps, but I'm not really sure if this is the right solution.

 

Any ideas?

 

Many thanks.

Labels (1)
12 Replies
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

You need to use the autogenerated keys, or a key that you can generate, to link things together if they don't have a shared value field.  Adding FileName() -- or some other uniqueness --  is the correct way to address the resetting key issue. 

One useful technique is to add some xml element as required -- for example a root element -- that allows the xmlsimple format to treat things as a complete document. For example see here 

https://community.qlik.com/t5/QlikView-App-Dev/Need-assistance-for-importing-data-with-XML/m-p/85523...

-Rob

mk_kmx
Partner - Contributor III
Partner - Contributor III
Author

Hi @rwunderlich ,

 

thanks for helping out.

 


@rwunderlich wrote:

Adding FileName() -- or some other uniqueness --  is the correct way to address the resetting key issue. 

Any ideas for the "or some other uniqueness"? Of course, FileName() is a solution, I'm just not sure if that's the best solution.

 


@rwunderlich wrote:

One useful technique is to add some xml element as required -- for example a root element -- that allows the xmlsimple format to treat things as a complete document. For example see here 

https://community.qlik.com/t5/QlikView-App-Dev/Need-assistance-for-importing-data-with-XML/m-p/85523...


Thanks for this link, I didn't come across this before. It looks quite similar to what I already found out and made work. However, I don't think that proposed solution will be in fact reliable, because it seems, that From_Field cycles through FieldValues and not records. I have tested this and:

  1. When there are duplicate values in From_Field field, there are fewer records in the new table (original table has 3 records with 2 distinct values of Xml field, new table has only 2 records).
  2. I tried to load the source table from point 1., then inverted the record order (resident load with order by and drop original table) and then loaded From_Field to a final table... and... the resulting table was in original order.
  3. Null values in Xml field throw errors
  4. FieldValue('Xml', Num#(FileBaseName(), '<0>')+1) as Xml,

    appears to return the original full Xml value.

However, this is in fact a hypothesis based on some tests and observation, but I didn't want to rely on it...

 

In addition, your solution from the other thread with RecNo() IMHO won't work in all cases, because:

  1. of the inverted order and Null tests I mentioned above
  2. in case XML contains multiple (variable number of) records

 

The only feasible solution for enabling the relationship to source record I was thinking about is to inject some ID (this can be your RecNo() value, Hash or AutoNumber([Xml])) to the Xml field - either in source, or using some string manipulations in load script:

 

Replace([Xml], '<RootElement>', '<RootElement><SourceId>'&[SourceId]&'</SourceId>')

 

but to me this looks a bit "dirty". So I was wondering if someone knows a better option.

rdugg
Contributor III
Contributor III

Have you ever found a solution to this ?

I have the following example that works well, but...

invoice_records:
NoConcatenate Load *
Inline [
invoiceID, product_names_csv
uuid-1, 'shampoo, burger,"bread sticks,(12)", eggs'
uuid-2, '"bread sticks,(24)", tuna,,'
uuid-3,',eggs,eggs, soap'
];

invoice_records2:
NoConcatenate Load 
  RecNo() as rowid, *
Resident invoice_records;
Drop table invoice_records;



result_table:

Load *
FROM_FIELD (invoice_records2, product_names_csv)
(txt, utf8, no labels, delimiter is ',', msq);


result_table2:
NoConcatenate Load RecNo() as rowid, *
Resident result_table;
Drop Table result_table;


final_result:
NoConcatenate Load rowid, invoiceID
 Resident invoice_records2;
Join Load * 
 Resident result_table2;
 
Drop Tables invoice_records2, result_table2; 

as soon as I use input csv data that has different column length,

example :

invoice_records:
NoConcatenate Load *
Inline [
invoiceID, product_names_csv
uuid-1, 'shampoo, burger,"bread sticks,(12)", eggs'
uuid-2, '"bread sticks,(24)", tuna'
uuid-3,',eggs,eggs,eggs, soap'
];

Qlik does nonsense and generates new tables on the fly  ([result_table-2], [result_table-3]) during the Load From_Field and I can't use RecNo() or find a solution to this mess.

 

Anybody has a solution to that ? ...apart from editing each csv, to add commas and make every csvs in product name csv have the same number of elements in each row 🤯

mk_kmx
Partner - Contributor III
Partner - Contributor III
Author

Hi,

 

If I understood your point, then the reason behind "new tables on the fly" is that FROM_FIELD apparently executes separate load for every value of product_names_csv field. As soon as this load returns table with known structure, it performs implicit concatenate. Otherwise it creates a new table.

 

It has the same effect as putting all those CSVs into separate files, storing them into a folder and executing

LOAD *
FROM [lib://myconnection/*.csv] (txt, delimiter is ',', ...);

 

Spoiler alert: it will behave exactly the same.

 

Solution is - enforce the load result has always the same structure or make sure those tables concatenate some other way...

Now, the solutions:

  1. The simplest solution is to avoid using LOAD * and list the resulting fields instead. You anyway need to enforce those fields that you need to exist and usually don't need anything else.
  2. Set up empty target table first (again, here you should set up all mandatory fields), give it a label (result_table in your case) and execute the LOAD ... FROM_FIELD with Concatenate(result_table) prefix. Ultimately, point #1 is still valid here. (hope this will work, I haven't tested this)
  3. Alternative (not preferred, but safe enough) solution would be to let Qlik create those tables (all will have result_table prefix in their names) and after that, iterate through all tables in data model and concatenate tables with such prefix in their name into the single final table and drop them (don't forget to iterate backwards = FOR cycle with STEP -1, as deleting table #1 reindexes the rest of tables - table #2 becomes #1, etc.)

BR,

Martin

 

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

You can force Concatenation like this:

result_table:
Load 1 as DummyField AutoGenerate 0;

Concatenate(result_table)
Load *
FROM_FIELD (invoice_records2, product_names_csv)
(txt, utf8, no labels, delimiter is ',', msq);

Drop Field DummyField;

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

rdugg
Contributor III
Contributor III


(don't forget to iterate backwards = FOR cycle with STEP -1, as deleting table #1 reindexes the rest of tables - table #2 becomes #1, etc.)

 🤔 Thank you, that's a good thing to know about the Qlik engine.

 

Solution 3 for me is not a solution, because if you have a csv of 4 elements in rowa 1 and 3 but a csv of 5 elements in row 2, then when you concatenate all your tables rows won't be ordered the same as in initial table (invoice_records) so there's no way you can join back using RecNo().

invoice_records would first require to be sorted sorted by csv cardinality which cannot be done easily/efficiently.

 

Solution 1 is not possible as I do not know in advance the structure of the csv field.

but...

I guess that if I fix a maximum cardinality of possible elements in the csv field I can do something like

 

Spoiler
result_table:

Load 
    if(IsNum(FieldValueCount('@1')), Null(), @1) as @1,
    if(IsNum(FieldValueCount('@2')), Null(), @2) as @2,
    ...
    if(IsNum(FieldValueCount('@n-Max')), Null(), @n-Max) as @n-Max,
FROM_FIELD (invoice_records2, product_names_csv)
(txt, utf8, no labels, delimiter is ',', msq);

I'll try that.

 

Solution 2 looks promising but I have to experiment the Concactenate(TableName) in front of the Load from_field.

 

 

 

 

mk_kmx
Partner - Contributor III
Partner - Contributor III
Author

RESolution 3 - in any case it's not a good idea to rely on row ordering only. I'd just advice you to revisit my discussion with Rob (above) on identifying the load number/referring to the source field value index by FileName() function. Once you call that function (for instance LOAD FileName(), * FROM_FIELD ...), you don't have to be concerned about the absolute row ordering in the table. 😉 This way you know to which field value the parsed record(s) is(are) related.

 

RE Solution 1 - then there is something wrong in your design. 😉

How do you want to analyze the content of attached CSV when you don't know the structure? You either have to unpivot via crosstable, or have a complete mess in your fieldset (in a way that you can't tell which fields are present in your results and which are not). Former is OKay, latter will eventually cause you a lot of headaches.

 

RE Solution 2 - see Rob's reply above. No experiments needed anymore.

 

rdugg
Contributor III
Contributor III

> Re Solution 3

interesting (I've missed that) !

 

> You either have to unpivot via crosstable, or...

Yes, the whole goal is to pivot afterward.

 

> Re Solution 2

That's just perfect.

 

I finally have my solution as:

Spoiler
invoice_records:
NoConcatenate Load *
Inline [
invoiceID, product_names_csv
uuid-1, 'shampoo, burger,"bread sticks,(12)", eggs'
uuid-2, '"bread sticks,(24)", tuna'
uuid-3,',eggs,eggs,eggs, soap'
uuid-4,', , cheese, banana, orange, potato'
];

invoice_records2:
NoConcatenate Load
RecNo() as rowid, *
Resident invoice_records;
Drop table invoice_records;

empty_result_table:
Load 1 as DummyField AutoGenerate 0;

Concatenate(empty_result_table)
Load *
FROM_FIELD (invoice_records2, product_names_csv)
(txt, utf8, no labels, delimiter is ',', msq);


result_table2:
NoConcatenate Load RecNo() as rowid, *
Resident empty_result_table;
Drop Table empty_result_table;


final_result:
NoConcatenate Load rowid, invoiceID
Resident invoice_records2;
Join Load *
Resident result_table2;

Drop Field rowid From final_result;
Drop Tables invoice_records2, result_table2;


transposed_result_tmp:
CrossTable (column_to_drop,product)
Load * Resident final_result;
Drop Field column_to_drop From transposed_result_tmp;

final_transposed_result:
NoConcatenate Load * Resident transposed_result_tmp
Where not IsNull(EmptyIsNull(product));
Drop Table transposed_result_tmp;

Thanks for the help.

mk_kmx
Partner - Contributor III
Partner - Contributor III
Author

Have you tried CrossTable-ing right away? Quite probably it'll fail (crosstable only first table), but it's worth a try... 😉