Qlik Community

QlikView Scripting

Discussion Board for collaboration on QlikView Scripting.

Highlighted
klikgevoel
New Contributor III

Scrape webpage in one blow

Hi all,

I've got a problem with an inefficient and non-future proof solution so far. What I want is to scrape data from an API that allows 10.000 records to be scraped. My sad solution so far is creating multiple tables, each skipping a certain set of records. Compare the FROM-statement between raw_CAO_CBS1 & raw_CAO_CBS2 for example.

raw_CAO_CBS1:
LOAD
    [content/properties/ID] as ID_1,
    [content/properties/CaoSectoren] as CaoSectoren_1,
    [content/properties/Perioden] as Perioden_1,
    [content/properties/CaoLonenPerUurInclBijzBeloningen_4] as CaoLonenPerUurInclBijzBeloningen_4_1,
    [content/properties/CaoLonenPerUurInclBijzBeloningen_11] as CaoLonenPerUurInclBijzBeloningen_11_1,
    [content/properties/PercentageAfgeslotenCaoS_15] as PercentageAfgeslotenCaoS_15_1
FROM [https://opendata.cbs.nl/ODataFeed/odata/82838NED/TypedDataSet] (XmlSimple, Table is [feed/entry])
Where [content/properties/Perioden] = '2019MM01';

raw_CAO_CBS2:
LOAD
    [content/properties/ID] as ID_2,
    [content/properties/CaoSectoren] as CaoSectoren_2,
    [content/properties/Perioden] as Perioden_2,
    [content/properties/CaoLonenPerUurInclBijzBeloningen_4] as CaoLonenPerUurInclBijzBeloningen_4_2,
    [content/properties/CaoLonenPerUurInclBijzBeloningen_11] as CaoLonenPerUurInclBijzBeloningen_11_2,
    [content/properties/PercentageAfgeslotenCaoS_15] as PercentageAfgeslotenCaoS_15_2
FROM [https://opendata.cbs.nl/ODataFeed/odata/82838NED/TypedDataSet?$skip=10000] (XmlSimple, Table is [feed/entry])
Where [content/properties/Perioden] = '2019MM01';

I repeat this process by adding tables, editing the record names so no synthetic table is formed, and adjust the end of the FROM statement each time by ?&skip=10000 to ?&skip=20000 .... 30000 and so on.

 

Question: is there an elegant way in which I can circumvent the multiple table creations? So that in the future, when new data is added, I don't have to create new tables (enhance maintainability).

 

Thanks in advance 🙂

1 Solution

Accepted Solutions
MVP & Luminary
MVP & Luminary

Re: Scrape webpage in one blow

You don't need and shouldn't create for each load an own table else just loading all records into a single table maybe with a loop like this one:

for i = 0 to 200
   let vSkip = $(i) * 10000;
   table:
      load YourFields, $(i) as Counter
      from [https://opendata.cbs.nl/ODataFeed/odata/82838NED/TypedDataSet?$skip=$(vSkip)] ...;

   if mod(noofrows('table') / 10000) > 0 then
      exit for;
   end if

next

- Marcus

3 Replies
MVP & Luminary
MVP & Luminary

Re: Scrape webpage in one blow

You don't need and shouldn't create for each load an own table else just loading all records into a single table maybe with a loop like this one:

for i = 0 to 200
   let vSkip = $(i) * 10000;
   table:
      load YourFields, $(i) as Counter
      from [https://opendata.cbs.nl/ODataFeed/odata/82838NED/TypedDataSet?$skip=$(vSkip)] ...;

   if mod(noofrows('table') / 10000) > 0 then
      exit for;
   end if

next

- Marcus

klikgevoel
New Contributor III

Re: Scrape webpage in one blow

Nice and elegant solution. For the first line, i = 0 to 200, I reckon that 200 is an arbitrary number you took?

Second, for relatively low-skilled Qlikview like me, the Modulo operator only accepts integers, as such:

if Mod(NoOfRows('Your_Table'),10000) > 0

For the rest, my greatest thanks!

MVP & Luminary
MVP & Luminary

Re: Scrape webpage in one blow

Yes, it was just arbitrary. If you know the approximately number of records you could use an exact number and if not you need to ensure that those number * 10000 is definitely larger as the number of records (and to run not in an error the extra exit for statement - whereby using ERRORMODE might be an alternatively).

- Marcus