Reading Fixed Width Records

    In order to read file data into anexpressor data integration application, you must create a schema file that describes the structure of each record.  expressor Studio  has wizards that you use to create schemas for delimited files or  database tables.  A wizard for files with fixed width fields is not yet  included in the product.  However, since data within files is read into expressor Studio as strings, it is a simple matter to use the substring function to parse each fixed width record into individual fields.


    Let's assume that your application needs to read a file where each record includes four fixed width fields of the following sizes: 2 characters, 10 characters, 15 characters, and 30 characters.  Each 57 character record ends with a new line terminator.  You can easily write a delimited file schema that describes this record format.  The schema contains a single field, perhaps named "line," the record delimiter is the new line (or carriage return/new line), and, since the record contains only a single field, any character or character combination not in the actual data is an acceptable field delimiter.  The corresponding composite type contains a single attribute (also named "line") of type string.


    When your application reads this file, each incoming record will be a 57 character string.  Immediately following the Read File operator, place a  Transform operator.  Define, in the composite type describing the output record, four attributes that correspond to the four fixed width fields in the incoming record.  If appropriate, assign non-string types, such as integer, decimal, or datetime to these attributes.  For the purposes of this discussion, let's assume that the first attribute should be handled as an integer, the second and third attributes are strings, and the fourth attribute is a datetime.


    Now, within the Transform Editor, map the attribute in the incoming record to all four attributes in the outgoing record.  For each mapping, use the string.substring function to extract the desired characters from the single incoming string and then change the type of the data if necessary.  For example, the expression used to initialize the first string output attribute would be string.substring(input.line,3,12).  Note the pattern used to set the starting and ending characters of the  substring.  The starting character is the ending character of the  previous field plus one and the ending character is the ending character of the previous field plus the width of the next field.


    Initializing the fourth output attribute is a little more involved as you must use the string.datetime function to convert the extracted string into a datetime value.  And since the data in each fixed width field may not actually require the full field width, it's a good practice to use the string.trim function to trim trailing space characters from each substring.


    If you choose to use an expression rule, you can perform all four extractions in a single rule.


    Alternatively, you may choose to use a function rule.


    The preceding approach is fine if the number of fields in the incoming record is relatively small.  But how would you handle a record with many fields?  In this case, it might be too tedious to write an assignment statement for each field, so you need to use some sort of a loop to handle the processing.  While there are many ways to approach this objective, the following screen shot illustrates the basic logic.


    Beginning on the first line, you define a numerically indexed Datascript table where each element's value is another numerically indexed table with two elements.  The first element in this nested table is the name of an output attribute while the second element is the length of the field.  Note that the elements within the table fields are in the same order as the fields in the input record.


    Then within the transform function, parsing of the input record is performed within the ipairs iterator function.  As each element of the table fields is retrieved the code extracts the name of the output attribute and the corresponding characters from the input string.  If necessary, the extracted value is then converted into a different data type such as integer or datetime.  Each value parsed from the input is then used to initialize an output attribute and when the ipairs loop completes, the output record is emitted.