Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Connect 2026 Agenda Now Available: Explore Sessions
cancel
Showing results for 
Search instead for 
Did you mean: 
pedrohenriqueperna
Creator III
Creator III

Better way to load .json files?

Hi,

I'm trying to join 1.2kk .json files (i know) into a table.

I downloaded the files from an open API for zip codes and I'm planning to use this to validate a file with +- 1kk rows. I could and will use the API to validate future customer registrations, but I need to validate past data and I can't make 1.2kk calls or I'll get blocked.

Given that, I found some stuff about joining .json files from Google Drive, that way Qlik can read the files using (json);. It does work and I managed to loop through each file in the folder, but it takes too long. By my calculations it'd take 29 days to complete, it's just madness.

I'm pretty sure there must be a smarter way to do this. Can anyone advise me?

 

 

ZipCodes:
LOAD * Inline [a, b, c, d, e]
];


For Each file in FileList('lib://PP:Google Drive/[...]')
If WildMatch('$(file)', '*.json') Then
Concatenate ZipCodes
LOAD
*
From [$(file)] (json);
End If
Next file

Labels (4)
1 Solution

Accepted Solutions
pedrohenriqueperna
Creator III
Creator III
Author

After some research I managed to do it using python by joining the json files and saving it as csv. If you're having the same problem I advise you to do the same:

 

import json
import unicodecsv as csv
import os
import time

start_time = time.time()
directory = 'C:/Users/x/y/z'
json_files = os.listdir(directory)

merged_json = list() # uma lista é criada para adicionar os valores dos diferentes json
fields = ['cep', 'logradouro', 'complemento', 'bairro', 'localidade', 'uf', 'ibge'] # cabeçalho utilizado como chave

i = 0

while i < len(json_files😞 # loopar para cada arquivo da pasta
    file = json_files[i]
    with open(os.path.join(directory, file), encoding='utf-8') as f: # encoding garante caracteres especiais
        json_arr = json.load(f) # o arquivo por default é carregado como um dicionário e portanto sobrescrito a cada iteração
        merged_json.append(json_arr) # para garantir todos valores cada um é adicionado à uma lista
        p = len(json_files)
        print(f'index: {i}/','{:,}'.format(p).replace(',','.'), sep='')
             
    i += 1

with open('base_ceps.csv', 'wb') as csv_file: # gerar csv e escrever valores separados por |
    writer = csv.writer(csv_file,encoding='utf-8', delimiter='|')
    writer.writerow(fields)
   
    for row in merged_json: # para escrever cada valor é loopado campo por campo para cada item
        writer.writerow(row[field] for field in fields)
       
       
print("runtime: %.2f seconds " % (time.time() - start_time))
       

View solution in original post

1 Reply
pedrohenriqueperna
Creator III
Creator III
Author

After some research I managed to do it using python by joining the json files and saving it as csv. If you're having the same problem I advise you to do the same:

 

import json
import unicodecsv as csv
import os
import time

start_time = time.time()
directory = 'C:/Users/x/y/z'
json_files = os.listdir(directory)

merged_json = list() # uma lista é criada para adicionar os valores dos diferentes json
fields = ['cep', 'logradouro', 'complemento', 'bairro', 'localidade', 'uf', 'ibge'] # cabeçalho utilizado como chave

i = 0

while i < len(json_files😞 # loopar para cada arquivo da pasta
    file = json_files[i]
    with open(os.path.join(directory, file), encoding='utf-8') as f: # encoding garante caracteres especiais
        json_arr = json.load(f) # o arquivo por default é carregado como um dicionário e portanto sobrescrito a cada iteração
        merged_json.append(json_arr) # para garantir todos valores cada um é adicionado à uma lista
        p = len(json_files)
        print(f'index: {i}/','{:,}'.format(p).replace(',','.'), sep='')
             
    i += 1

with open('base_ceps.csv', 'wb') as csv_file: # gerar csv e escrever valores separados por |
    writer = csv.writer(csv_file,encoding='utf-8', delimiter='|')
    writer.writerow(fields)
   
    for row in merged_json: # para escrever cada valor é loopado campo por campo para cada item
        writer.writerow(row[field] for field in fields)
       
       
print("runtime: %.2f seconds " % (time.time() - start_time))