Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I've got a problem with an inefficient and non-future proof solution so far. What I want is to scrape data from an API that allows 10.000 records to be scraped. My sad solution so far is creating multiple tables, each skipping a certain set of records. Compare the FROM-statement between raw_CAO_CBS1 & raw_CAO_CBS2 for example.
raw_CAO_CBS1: LOAD [content/properties/ID] as ID_1, [content/properties/CaoSectoren] as CaoSectoren_1, [content/properties/Perioden] as Perioden_1, [content/properties/CaoLonenPerUurInclBijzBeloningen_4] as CaoLonenPerUurInclBijzBeloningen_4_1, [content/properties/CaoLonenPerUurInclBijzBeloningen_11] as CaoLonenPerUurInclBijzBeloningen_11_1, [content/properties/PercentageAfgeslotenCaoS_15] as PercentageAfgeslotenCaoS_15_1 FROM [https://opendata.cbs.nl/ODataFeed/odata/82838NED/TypedDataSet] (XmlSimple, Table is [feed/entry]) Where [content/properties/Perioden] = '2019MM01'; raw_CAO_CBS2: LOAD [content/properties/ID] as ID_2, [content/properties/CaoSectoren] as CaoSectoren_2, [content/properties/Perioden] as Perioden_2, [content/properties/CaoLonenPerUurInclBijzBeloningen_4] as CaoLonenPerUurInclBijzBeloningen_4_2, [content/properties/CaoLonenPerUurInclBijzBeloningen_11] as CaoLonenPerUurInclBijzBeloningen_11_2, [content/properties/PercentageAfgeslotenCaoS_15] as PercentageAfgeslotenCaoS_15_2 FROM [https://opendata.cbs.nl/ODataFeed/odata/82838NED/TypedDataSet?$skip=10000] (XmlSimple, Table is [feed/entry]) Where [content/properties/Perioden] = '2019MM01';
I repeat this process by adding tables, editing the record names so no synthetic table is formed, and adjust the end of the FROM statement each time by ?&skip=10000 to ?&skip=20000 .... 30000 and so on.
Question: is there an elegant way in which I can circumvent the multiple table creations? So that in the future, when new data is added, I don't have to create new tables (enhance maintainability).
Thanks in advance 🙂
You don't need and shouldn't create for each load an own table else just loading all records into a single table maybe with a loop like this one:
for i = 0 to 200
let vSkip = $(i) * 10000;
table:
load YourFields, $(i) as Counter
from [https://opendata.cbs.nl/ODataFeed/odata/82838NED/TypedDataSet?$skip=$(vSkip)] ...;
if mod(noofrows('table') / 10000) > 0 then
exit for;
end if
next
- Marcus
You don't need and shouldn't create for each load an own table else just loading all records into a single table maybe with a loop like this one:
for i = 0 to 200
let vSkip = $(i) * 10000;
table:
load YourFields, $(i) as Counter
from [https://opendata.cbs.nl/ODataFeed/odata/82838NED/TypedDataSet?$skip=$(vSkip)] ...;
if mod(noofrows('table') / 10000) > 0 then
exit for;
end if
next
- Marcus
Nice and elegant solution. For the first line, i = 0 to 200, I reckon that 200 is an arbitrary number you took?
Second, for relatively low-skilled Qlikview like me, the Modulo operator only accepts integers, as such:
if Mod(NoOfRows('Your_Table'),10000) > 0
For the rest, my greatest thanks!
Yes, it was just arbitrary. If you know the approximately number of records you could use an exact number and if not you need to ensure that those number * 10000 is definitely larger as the number of records (and to run not in an error the extra exit for statement - whereby using ERRORMODE might be an alternatively).
- Marcus