How to union 2 data sources with different schemas

    Suppose you need to union 2 data files together into 1 file, but the input files have different schemas. 

     

    simple1.txt has:

    FirstName,LastName,Address1,City,State,PostalCode,Notes

    Mike,Tarallo,123 myplace,mycity,mystate,11211,These are some notes

    John,Smith,345 his place,hiscity,histatte,23233,More notes

     

    simple2.txt has:

    FirstName,LastName,Address1,Address2,City,State,PostalCode

    Sam,Johson,456 a street, po box 1090,boston,MA,78945

    Wayne,Eckerson,659 main street,apt 1a,orlando,FL,90812

     

    Take a look at the attached project(s), which you can import into Expressor.  Do not unzip - just use the Import Projects... wizard that is under the Studio tab - or right click on an existing workspace top level node in the explorer tree and select Import Projects.

     

    You will have to modify your file connection artifact FileConnection1 which will can point to the appropriate location where you saved the sample input files (.txt) to and where the output file will be written to.

     

    union_data.zip.- simply unions the files and just leaves ",," (NULLS) - for the missing values

     

    The union output is:

    FirstName,LastName,Address1,Address2,City,State,PostalCode,Notes

    Mike,Tarallo,123 myplace,,mycity,mystate,11211,These are some notes

    John,Smith,345 his place,,hiscity,histatte,23233,More notes

    Sam,Johson,456 a street, po box 1090,boston,MA,78945,

    Wayne,Eckerson,659 main street,apt 1a,orlando,FL,90812,

     

    If you want to explicitly set a default value for the missing data you can use a transform operator to do so - look in project union_data_explicit_set_value.zip

    Here we use the Transform operator to insert default values:

    trans1.PNG

     

    The union output becomes:

    FirstName,LastName,Address1,Address2,City,State,PostalCode,Notes

    Sam,Johson,456 a street, po box 1090,boston,MA,78945,No Notes Today

    Wayne,Eckerson,659 main street,apt 1a,orlando,FL,90812,No Notes Today

    Mike,Tarallo,123 myplace,No Address2,mycity,mystate,11211,These are some notes

    John,Smith,345 his place,No Address2,hiscity,histatte,23233,More notes

     

    Finally, a more sophisticated way is to take advantage of the semantic type constraint properties (you'll need to understand composite types a bit more but it is very powerful).  Look in union_data_explicit_set_value_using_composite_constraint.zip

     

    This solution uses composite type constraints and NO Transformation Rules in the Dataflow!

     

    If you open the schemas for both the Read File 1 and Read File 2 operators - on the right you will notice that I manually added the missing attributes BUT did not map them to anything. This allows the structure of the outgoing record layout (we call a Composite Type) to then conform when using the Funnel Operator - which requires the record layout to be the same. Then the Transform Operator does just that, transforms the data in a number of ways.

     

    Then finally the Write File Operator uses a 3rd schema to define the delimited file output that will be written.

     

    Read File 1 Schema

     

    schema1.JPG

     

    Read File 2 Schema

    schema2.JPG

     

    Output Write File Schema

    outputschema.JPG