Processing Heterogeneous Data Records

    Here is a sample of a file where only certain rows of data are to be kept:

     

    GLID|863|Laguna Beach|20120511|20120511|20120512|133019|1.00|1.00| 3700|V4.9
    SUM|1|DINING RM|2683.70|-112.00|493.09|-22.00|3|0.00|0|208.08|0.00|0.00|47|71|37|0|0.00|0. 00|DINING RM|1
    SUM|2|BAR|1641.00|-177.50|204.63|-6.00|1|0.00|0|127.28|0.00|0.00|67|27|46|0|0.00|0.0 0|BAR|2
    DSC|1|101|REST MGR|-24.00|2|REST MGR|101
    DSC|1|301|BDAY|-39.00|3|BDAY|301
    DSC|1|302|VIP GUEST|-19.00|2|VIP GUEST|302
    DSC|1|303|APP CARD|-7.00|1|APP CARD|303
    DSC|1|402|FLAVOR COMP|-23.00|2|FLAVOR COMP|402
    DSC|2|101|REST MGR|-14.00|1|REST MGR|101
    DSC|2|104|REST STAFF|-62.00|8|REST STAFF|104
    DSC|2|105|RETAIL 100%|-49.00|3|RETAIL 100%|105
    DSC|2|106|RETAIL 50%|-4.50|1|RETAIL 50%|106
    DSC|2|302|VIP GUEST|-48.00|3|VIP GUEST|302
    DSC|2|404|ERROR BOH|0.00|0|ERROR BOH|404
    SVC|1|1|CHG TIP|453.46|33|CHG TIP|1
    SVC|1|6|% GRATUITY|39.63|1|% GRATUITY|6
    SVC|2|1|CHG TIP|201.57|30|CHG TIP|1
    SVC|2|6|% GRATUITY|3.06|1|% GRATUITY|6
    TND|1|101|CASH|1|351.43|15|CASH|101
    TND|1|201|VISA|1|1584.23|19|VISA|201
    TND|1|202|MASTERCARD|1|618.68|5|MASTERCARD|202
    TND|1|203|AMEX|1|830.53|10|AMEX|203
    TND|1|503|TIPS PAID|2|493.09|34|TIPS PAID|503
    TND|2|101|CASH|1|642.91|36|CASH|101
    TND|2|201|VISA|1|607.88|19|VISA|201
    TND|2|202|MASTERCARD|1|203.18|4|MASTERCARD|202
    TND|2|203|AMEX|1|396.03|8|AMEX|203
    TND|2|204|DISCOVER|1|122.91|1|DISCOVER|204
    CHDR|2|1|4709|20120511000000|20120511000000|201205 11161836|20120511165621|1201|0| |8.08|0.58|0.00|0.00|0.00|0.00|0.00|0.00||0|0.00|0 .00|7.50
    CHDR|2|1|4710|20120511000000|20120511000000|201205 11162006|20120511171940|2034|0| |23.94|1.44|0.00|0.00|4.00|0.00|0.00|0.00||0|4.00| 0.00|18.50
    CHDR|2|1|4711|20120511000000|20120511000000|201205 11162300|20120511162324|9002|0|BRIT|0.00|0.00|0.00 |0.00|0.00|0.00|0.00|0.00||0|0.00|-14.00|0.00
    CHDR|2|1|4762|20120511000000|20120511000000|201205 11231005|20120511231420|1201|0|COMP|0.00|0.00|0.00 |0.00|0.00|0.00|0.00|0.00||0|0.00|-6.00|0.00
    CDTL|2|0|4743|20120511000000|20120511205534|11|0|0 |0|0|0|0.00|0|0.00|1201|XXXXXXXXXXXX6103||0|0
    CDTL|2|1|4743|20120511000000|20120511205534|4|0|20 1|0|0|1|8.47|1|8.47|1201|XXXXXXXXXXXX6103||0|201
    CDTL|2|1|4743|20120511000000|20120511205534|3|0|1| 0|0|1|2.00|1|2.00|1201|||0|1
    CDTL|2|1|4743|20120511000000|20120511205534|4|0|50 3|0|0|1|2.00|1|2.00|1201|XXXXXXXXXXXX6103||1|503
    We only need the rows that start with the CHDR and CDTL and they need to be sent to 2 separate tables. However it is heterogeneous data.  Can this be done with Expressor?

     

    Basics:

    Expressor can easily process files with heterogeneous records.  Use Expressor to first read each line as a single variable length string and then use a filter operator and string pattern matching functions to examine the characters before the first pipe separator character. If these characters are CHDR or CDTL you would emit the lines from separate output ports. All other lines would be dropped.

    Next you could pass each group of lines (that is, CHDR or CDTL) to a Transform operator where you can parse the string into the respective columns for the target table structure.  Next, use Write Table to load the target tables.

     

    Additional Details:

    In the Delimited File schema that you use with the Read File operator, use CR+LF as the record delimiter.  For the field delimiter, select Tab as it's very unlikely a line includes a Tab.

    Now in the Filter operator you want to use the string pattern matching function string.match. Look this function up in the product documentation, as well as the documentation that discusses patterns.  In the rule in the Filter operator, you will drag the line into the rule.

    Then your conditional statement will look something like.

     

    string.match(input.line,"^(CHDR).+") or string.match(input.line,"^(CDTL).+")