Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I've set up a File delimited Metadata, and I want to be able to automatically filter out any columns that have certain strings in the name, specifically "schedule" or "hist." And I cannot, for the life of me, figure out how to do this.
Its likely that the columns in this metadata will change, and I don't want to have to manually go in and remove these columns any time the metadata is updated.
I tried to use tFilterColumns to do this (since tFilterRow works for row filtering) but it just seems like a simpler version of tMap to me.
Hello,
Could you please be more precise
Maybe I did not understand your problem well but... try this :
You can use a tFilterRow with advanced condition.
The condition would be :
MyField.indexOf("MyString") != -1
If this condition is true, your string contains searched text
Please ask if you want a more detailed solution
ok, the delimited file I'm importing has a bunch of columns in it, some of these columns contain a lot of schedule or historical data and we don't want to import those columns daily. so I wanted to know if there was a way to filter out columns, by setting a filter on their names, like you can filter out rows with tFilterRow (pic #3)
Ok, so you can see from the attached picture (pic #1) that the file has a lot of columns (thats only a couple of the several dozen columns). Some of these columns have column names that include the words schedule or hist (like the two circled in red). I want to be able to set a filter that says to not import those columns, with those words in the column name, into the database
And I want to be able to set up a filter rule to do this, like you can set up rules to filter out rows in tFilterRow, but for columns
You can use a tFilterRow with advanced condition.
The condition would be :
MyField.indexOf("MyString") != -1
If this condition is true, your string contains searched text
I don't think thats quite going to work... because its not the rows I want to filter out but some of the actual columns
Have you tried the tFilterColumns ?
It matches and input flow against a destination and only copies over columns having the same name.
Yea I looked at tFilterColumns, but thats still manual. And this file will be changing.
Basically, to explain the file, we request a list of columns/information from Bloomberg nightly.
This list of columns is subject to change based on any requirements changes of other departments in our company, if they need different/more information. Because these column changes come from outside our group, we cannot predict how often we will be asked to change the column list.
Also, the program that sends out the requests to Bloomberg wouldn't just append the new column to the end of the list of columns already being requested. The new column will go into the list, of requested columns, based on whatever sorting mechanism that program uses.
So, not only will the columns be subject to change but their order will be as well - which would make it very inconvenient to have to manually filter out the columns we don't want every time the file is changed.
And, to compound this problem, we have 10 request files (each with different lists of columns) we send/get to/from Bloomberg every night. So, if I was only going to have to manually remove these columns for one file, that wouldn't be fun - but having to do it for 10 files is just not productive.
From the job flow you posted above - your tFilterRow is conducting a basic validity check (CUSIP?).
Then that output will either go to rejects or the good DB.
Will the schema of the destination table (tMSSqlOutput2) always be the same?
Or will it vary based upon the incoming text file?
based upon the incoming text file
Sorry if I'm being dense here - but Talend cannot handle dynamic / run-time schema changes in the Database Input/Output components.
So the only way this job you posted MIGHT work would be if the destination table in your tMSSqlOutput2 always had the exact same schema.
In other words, if you have x input files (let's say 4) with a common set of columns among them and that common set of columns would be output to ONE MSSQL table. Something like...
INCOMING FILE 1
============
CUSIP | HIGH | LOW | CLOSE | VOL | HIST_PRICE | HIST_VOL
INCOMING FILE 2
============
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL |
INCOMING FILE 3
============
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL | SCHEDULE_G
INCOMING FILE 4
===========
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL | COMPANY_NAME | SCHEDULE_G | HIST_VOL
OUTPUT DATABASE TABLE
=================
CUSIP | HIGH | LOW | CLOSE | VOL |
In such a case, you could define a tFilterColumns with a schema matching that of the Output Database Table and only those columns having matching names would be imported into the Database table.
Talend cannot handle dynamic / run-time schema changes in the Database Input/Output components.
I guess that answers my ?
So the only way this job you posted MIGHT work would be if the destination table in your tMSSqlOutput2 always had the exact same schema.
In other words, if you have x input files (let's say 4) with a common set of columns among them and that common set of columns would be output to ONE MSSQL table. Something like...
INCOMING FILE 1
============
CUSIP | HIGH | LOW | CLOSE | VOL | HIST_PRICE | HIST_VOL
INCOMING FILE 2
============
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL |
INCOMING FILE 3
============
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL | SCHEDULE_G
INCOMING FILE 4
===========
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL | COMPANY_NAME | SCHEDULE_G | HIST_VOL
OUTPUT DATABASE TABLE
=================
CUSIP | HIGH | LOW | CLOSE | VOL |
In such a case, you could define a tFilterColumns with a schema matching that of the Output Database Table and only those columns having matching names would be imported into the Database table.
I guess that answers my ?
No I want those 10 different files to go to 10 different tables, and I want each of those tables to be in the same format as the file it is based off of (just without any columns that have HIST or SCHEDULE in the name, without having to manually remove those columns)