Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm trying to load all csv files from a folder and after that I want to use NoConcatenate.
I need this to do the JOIN after load another csv file from a folder.
The steps that I want to do:
1. Load csv files in a table. After loading I want to use NoCocatenate. If I use it before the script will load each separably. So, I need to load all files and after that I want to add NoConcatenate line?
2. Load other csv files. Also here I want to load all files and then use NoConcatenate line to avoid concatenation with the first one
3. After that I want to do a join of these 2 tables.
Any help? I tried also to load the table again using NoConcatenate and dropping the old one but the problem is still not resolved.
this is the script that I used to load all files from the first folder:
// First folder
for each file in filelist('lib://IMPORT_CSV\*.csv')
[join_ImportError_files]:
load
"timestamp",
timestamp as ts_import_join,
"day" as day_join,
"time" as time_join,
"hour" as hour_join,
requestor_lab,
production_lab,
last_timestamp,
id_log,
request_number as request_number_join,
request_number as request_number_import,
sample_number as sample_number_join,
universal_test_id
from [$(file)]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
next file;
[join_ImportError]:
NoConcatenate
LOAD
"timestamp",
ts_import_join,
day_join,
time_join,
hour_join,
requestor_lab,
production_lab,
last_timestamp,
id_log,
request_number_join,
request_number_import,
sample_number_join,
universal_test_id
Resident join_ImportError_files;
DROP TABLE [join_ImportError_files]
[join_ImportError_filter_unique]:
inner keep
LOAD
request_number_join,
MAX(timestamp) as ts_import_join,
MAX(day_join) as day_join,
MAX(time_join) as time_join,
MAX(hour_join) as hour_join,
sample_number_join
resident join_ImportError
Group By request_number_join, sample_number_join;
drop table join_ImportError_filter_unique;
// The second folder (DB)
for each file in filelist('lib://DB_CSV\*.csv')
[join_DB_SnapWroklistSample_files]:
// NoConcatenate
LOAD
SAMPLECODE as sample_number_join,
SAMPLECODE2,
REQUESTCODE as request_number_join,
REQUESTCODE as request_number_db,
TIMEVALIDATED,
TIMERESULT,
DATETIMEREQ,
DATETIMEREQ_TS,
REQ_DATE,
REQ_HOUR,
TIMESAMPLEIN,
TIMESAMPLEIN_TS as ts_db_join,
SAMPLEIN_DATE,
SAMPLEIN_HOUR
from [$(file)]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
[join_DB_SnapWroklistSample]:
NoConcatenate
LOAD
sample_number_join,
SAMPLECODE2,
request_number_join,
request_number_db,
TIMEVALIDATED,
TIMERESULT,
DATETIMEREQ,
DATETIMEREQ_TS,
REQ_DATE,
REQ_HOUR,
TIMESAMPLEIN,
ts_db_join,
SAMPLEIN_DATE,
SAMPLEIN_HOUR
Resident join_DB_SnapWroklistSample_files;
DROP TABLE [join_DB_SnapWroklistSample_files];
[join_SnapWroklistSample_filter_unique]:
inner keep
LOAD
sample_number_join,
request_number_join,
MIN(ts_db_join) as ts_db_join
resident join_DB_SnapWroklistSample
Group By request_number_join, sample_number_join;
drop table join_SnapWroklistSample_filter_unique;
[join_DB_Import]:
NoConcatenate
Load request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join
resident [join_ImportError];
join
Load request_number_join, sample_number_join, ts_db_join
resident [join_DB_SnapWroklistSample];
[join_DB_Import2]: //add_diff_field
NoConcatenate
LOAD request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join,
ts_db_join , Num#(Interval(ts_db_join-ts_import_join,'S'),'0.0') as diff3
RESIDENT [join_DB_Import];
DROP TABLE [join_ImportError];
DROP TABLE [join_DB_SnapWroklistSample];
DROP TABLE [join_DB_Import];
I resolved the problem. This is the solution.
The steps are the same :
1. Load all csv files in a table (without NoConcatenate: we can not use it because if we use it, the script will generate one table for each file). After that load again the table using NoConcatenate. Then, drop the first table (withour NoConcatenate).
2. Load other csv files with the same procedure of step 1.
3. After that I want to do a join of these 2 tables.
// Load all files from the first folder (Import):
for each file in filelist('lib://IMPORT_CSV\*.csv')
[join_ImportError_files]:
load
"timestamp",
timestamp as ts_import_join,
"day" as day_join,
"time" as time_join,
"hour" as hour_join,
requestor_lab,
production_lab,
last_timestamp,
id_log,
request_number as request_number_join,
request_number as request_number_import,
sample_number as sample_number_join,
universal_test_id
from [$(file)]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
next file;
// Load again the table with NoConcatenate
[join_ImportError]:
NoConcatenate
LOAD
"timestamp",
ts_import_join,
day_join,
time_join,
hour_join,
requestor_lab,
production_lab,
last_timestamp,
id_log,
request_number_join,
request_number_import,
sample_number_join,
universal_test_id
Resident join_ImportError_files;
// Drop the first table with NoConcatenate
DROP TABLE [join_ImportError_files];
// I use this to group by -> u can skip this step
[join_ImportError_filter_unique]:
inner keep
LOAD
request_number_join,
MAX(timestamp) as ts_import_join,
MAX(day_join) as day_join,
MAX(time_join) as time_join,
MAX(hour_join) as hour_join,
sample_number_join
resident join_ImportError
Group By request_number_join, sample_number_join;
drop table join_ImportError_filter_unique;
// Do the same steps for the second csv files
for each file2 in filelist('lib://DB_CSV\*.csv')
[join_DB_SnapWroklistSample_files]:
LOAD
SAMPLECODE as sample_number_join,
SAMPLECODE2,
REQUESTCODE as request_number_join,
REQUESTCODE as request_number_db,
TIMEVALIDATED,
TIMERESULT,
DATETIMEREQ,
DATETIMEREQ_TS,
REQ_DATE,
REQ_HOUR,
TIMESAMPLEIN,
TIMESAMPLEIN_TS as ts_db_join,
SAMPLEIN_DATE,
SAMPLEIN_HOUR
from [$(file2)]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
next file2;
[join_DB_SnapWroklistSample]:
NoConcatenate
LOAD
sample_number_join,
SAMPLECODE2,
request_number_join,
request_number_db,
TIMEVALIDATED,
TIMERESULT,
DATETIMEREQ,
DATETIMEREQ_TS,
REQ_DATE,
REQ_HOUR,
TIMESAMPLEIN,
ts_db_join,
SAMPLEIN_DATE,
SAMPLEIN_HOUR
Resident join_DB_SnapWroklistSample_files;
DROP TABLE [join_DB_SnapWroklistSample_files];
[join_SnapWroklistSample_filter_unique]:
inner keep
LOAD
sample_number_join,
request_number_join,
MIN(ts_db_join) as ts_db_join
resident join_DB_SnapWroklistSample
Group By request_number_join, sample_number_join;
drop table join_SnapWroklistSample_filter_unique;
// Here we do the join
[join_DB_Import]:
NoConcatenate
Load request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join
resident [join_ImportError];
join
Load request_number_join, sample_number_join, ts_db_join
resident [join_DB_SnapWroklistSample];
// I use this to do the difference between 2 timestamp -> you can skip
[join_DB_Import2]: //add_diff_field
NoConcatenate
LOAD request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join,
ts_db_join , Num#(Interval(ts_db_join-ts_import_join,'S'),'0.0') as diff3
RESIDENT [join_DB_Import];
DROP TABLE [join_ImportError];
DROP TABLE [join_DB_SnapWroklistSample];
DROP TABLE [join_DB_Import];
After run this script, how is your data model looks like?
I resolved the problem. This is the solution.
The steps are the same :
1. Load all csv files in a table (without NoConcatenate: we can not use it because if we use it, the script will generate one table for each file). After that load again the table using NoConcatenate. Then, drop the first table (withour NoConcatenate).
2. Load other csv files with the same procedure of step 1.
3. After that I want to do a join of these 2 tables.
// Load all files from the first folder (Import):
for each file in filelist('lib://IMPORT_CSV\*.csv')
[join_ImportError_files]:
load
"timestamp",
timestamp as ts_import_join,
"day" as day_join,
"time" as time_join,
"hour" as hour_join,
requestor_lab,
production_lab,
last_timestamp,
id_log,
request_number as request_number_join,
request_number as request_number_import,
sample_number as sample_number_join,
universal_test_id
from [$(file)]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
next file;
// Load again the table with NoConcatenate
[join_ImportError]:
NoConcatenate
LOAD
"timestamp",
ts_import_join,
day_join,
time_join,
hour_join,
requestor_lab,
production_lab,
last_timestamp,
id_log,
request_number_join,
request_number_import,
sample_number_join,
universal_test_id
Resident join_ImportError_files;
// Drop the first table with NoConcatenate
DROP TABLE [join_ImportError_files];
// I use this to group by -> u can skip this step
[join_ImportError_filter_unique]:
inner keep
LOAD
request_number_join,
MAX(timestamp) as ts_import_join,
MAX(day_join) as day_join,
MAX(time_join) as time_join,
MAX(hour_join) as hour_join,
sample_number_join
resident join_ImportError
Group By request_number_join, sample_number_join;
drop table join_ImportError_filter_unique;
// Do the same steps for the second csv files
for each file2 in filelist('lib://DB_CSV\*.csv')
[join_DB_SnapWroklistSample_files]:
LOAD
SAMPLECODE as sample_number_join,
SAMPLECODE2,
REQUESTCODE as request_number_join,
REQUESTCODE as request_number_db,
TIMEVALIDATED,
TIMERESULT,
DATETIMEREQ,
DATETIMEREQ_TS,
REQ_DATE,
REQ_HOUR,
TIMESAMPLEIN,
TIMESAMPLEIN_TS as ts_db_join,
SAMPLEIN_DATE,
SAMPLEIN_HOUR
from [$(file2)]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
next file2;
[join_DB_SnapWroklistSample]:
NoConcatenate
LOAD
sample_number_join,
SAMPLECODE2,
request_number_join,
request_number_db,
TIMEVALIDATED,
TIMERESULT,
DATETIMEREQ,
DATETIMEREQ_TS,
REQ_DATE,
REQ_HOUR,
TIMESAMPLEIN,
ts_db_join,
SAMPLEIN_DATE,
SAMPLEIN_HOUR
Resident join_DB_SnapWroklistSample_files;
DROP TABLE [join_DB_SnapWroklistSample_files];
[join_SnapWroklistSample_filter_unique]:
inner keep
LOAD
sample_number_join,
request_number_join,
MIN(ts_db_join) as ts_db_join
resident join_DB_SnapWroklistSample
Group By request_number_join, sample_number_join;
drop table join_SnapWroklistSample_filter_unique;
// Here we do the join
[join_DB_Import]:
NoConcatenate
Load request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join
resident [join_ImportError];
join
Load request_number_join, sample_number_join, ts_db_join
resident [join_DB_SnapWroklistSample];
// I use this to do the difference between 2 timestamp -> you can skip
[join_DB_Import2]: //add_diff_field
NoConcatenate
LOAD request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join,
ts_db_join , Num#(Interval(ts_db_join-ts_import_join,'S'),'0.0') as diff3
RESIDENT [join_DB_Import];
DROP TABLE [join_ImportError];
DROP TABLE [join_DB_SnapWroklistSample];
DROP TABLE [join_DB_Import];