Solved: Better way to load .json files? - Qlik Community

pedrohenriqueperna · ‎2023-12-04

Hi,

I'm trying to join 1.2kk .json files (i know) into a table.

I downloaded the files from an open API for zip codes and I'm planning to use this to validate a file with +- 1kk rows. I could and will use the API to validate future customer registrations, but I need to validate past data and I can't make 1.2kk calls or I'll get blocked.

Given that, I found some stuff about joining .json files from Google Drive, that way Qlik can read the files using (json);. It does work and I managed to loop through each file in the folder, but it takes too long. By my calculations it'd take 29 days to complete, it's just madness.

I'm pretty sure there must be a smarter way to do this. Can anyone advise me?

ZipCodes:
LOAD * Inline [a, b, c, d, e]
];

For Each file in FileList('lib://PP:Google Drive/[...]')
If WildMatch('$(file)', '*.json') Then
Concatenate ZipCodes
LOAD
*
From [$(file)] (json);
End If
Next file

pedrohenriqueperna · ‎2023-12-04

After some research I managed to do it using python by joining the json files and saving it as csv. If you're having the same problem I advise you to do the same:

import json
import unicodecsv as csv
import os
import time

start_time = time.time()
directory = 'C:/Users/x/y/z'
json_files = os.listdir(directory)

merged_json = list() # uma lista é criada para adicionar os valores dos diferentes json
fields = ['cep', 'logradouro', 'complemento', 'bairro', 'localidade', 'uf', 'ibge'] # cabeçalho utilizado como chave

i = 0

while i < len(json_files😞 # loopar para cada arquivo da pasta
    file = json_files[i]
    with open(os.path.join(directory, file), encoding='utf-8') as f: # encoding garante caracteres especiais
        json_arr = json.load(f) # o arquivo por default é carregado como um dicionário e portanto sobrescrito a cada iteração
        merged_json.append(json_arr) # para garantir todos valores cada um é adicionado à uma lista
        p = len(json_files)
        print(f'index: {i}/','{:,}'.format(p).replace(',','.'), sep='')
              
    i += 1

with open('base_ceps.csv', 'wb') as csv_file: # gerar csv e escrever valores separados por |
    writer = csv.writer(csv_file,encoding='utf-8', delimiter='|')
    writer.writerow(fields)
    
    for row in merged_json: # para escrever cada valor é loopado campo por campo para cada item
        writer.writerow(row[field] for field in fields)
        
        
print("runtime: %.2f seconds " % (time.time() - start_time))
        

View solution in original post

pedrohenriqueperna · ‎2023-12-04

After some research I managed to do it using python by joining the json files and saving it as csv. If you're having the same problem I advise you to do the same:

import json
import unicodecsv as csv
import os
import time

start_time = time.time()
directory = 'C:/Users/x/y/z'
json_files = os.listdir(directory)

merged_json = list() # uma lista é criada para adicionar os valores dos diferentes json
fields = ['cep', 'logradouro', 'complemento', 'bairro', 'localidade', 'uf', 'ibge'] # cabeçalho utilizado como chave

i = 0

while i < len(json_files😞 # loopar para cada arquivo da pasta
    file = json_files[i]
    with open(os.path.join(directory, file), encoding='utf-8') as f: # encoding garante caracteres especiais
        json_arr = json.load(f) # o arquivo por default é carregado como um dicionário e portanto sobrescrito a cada iteração
        merged_json.append(json_arr) # para garantir todos valores cada um é adicionado à uma lista
        p = len(json_files)
        print(f'index: {i}/','{:,}'.format(p).replace(',','.'), sep='')
              
    i += 1

with open('base_ceps.csv', 'wb') as csv_file: # gerar csv e escrever valores separados por |
    writer = csv.writer(csv_file,encoding='utf-8', delimiter='|')
    writer.writerow(fields)
    
    for row in merged_json: # para escrever cada valor é loopado campo por campo para cada item
        writer.writerow(row[field] for field in fields)
        
        
print("runtime: %.2f seconds " % (time.time() - start_time))
        

Better way to load .json files?

Folder Data Connection

General Question

SaaS

Web Connectors