QlikView Expressor: How to Process Rejected Records

    When records produce errors because they violate constraints set on Composite Type attributes or other reasons, the operator that encounters the error can handle them by skipping them, aborting the dataflow, or rejecting the offending records. In some cases, it is sufficient to simply send rejected records to a Write File operator and examine the records in the output file. If the intent is, however, to correct or otherwise use those records, examining each error and changing the data could be very cumbersome. The more efficient approach would be to reprocess the records as they come out the reject port.

    Records rejected by input operators such as Read File, Read Table, and Read Excel are structured into the following fields:

    RejectType
    RecordNumber
    RecordData
    RejectReason
    RejectMessage

    The record data as it was constituted before being rejected is contained in the RecordData field. To process that data, it must first be reconstructed from the rejected record format. Several factors affect the reconstruction. The order of the record data fields can be different from the order represented in the original Schema, and some of the records emitted from the reject port do not contain record data. For example, RejectType 1 errors are constraint violations, but before they are emitted, a RejectType 4 record is emitted. The RejectType 4 record contains the record data field order for the subsequent RejectType 1 errors in its RecordData field. The RejectTypes are fully explained in the Using the Reject Port section of the Read Custom operator topic in the product documentation.

    Note: All non-input operators that have a reject port emit rejected records with the existing attributes of the record, that is, they do not restructure the records the way input operators do. Reprocessing records rejected by non-input operators do not have to be reconstructed the way records rejected by input operators do.

    The Aggregate operator is used to reconstruct rejected records. To illustrate the logic used to reconstruct rejected data, we will use an Excel file containing a list of US presidents. The following table shows the header and first few rows in the spreadsheet. The place field in a Number in Excel. The text in several of those Number cells will generate errors that will be used in the reconstruction example.

     

    placelnamefnameparty
    1WashingtonGeorge
    twoAdamsJohnFederalist
    3JeffersonThomasDemocratic-Republican
    4MadisonJamesDemocratic-Republican
    fiveMonroeJamesDemocratic-Republican
    6AdamsJohn QDemocratic-Republican
    qwertyJacksonAndrewDemocratic
    8Van BurenMartinDemocratic

     

    The following dataflow uses an Aggregate operator to count the number of Democratic presidents and the number of Republican presidents in United States history.

    622d1336163114-bprejectdfwithaggregate2.png

    The Read Excel operator uses Semantic Type constraints to separate Democratic and

    Republican party presidents from all the others, such as Whig and Democratic-Republican. The following screen shows the constraint set on Allowed values. The Corrective Action is <Escalate> so that errors are passed to the operator for error handling. (The Maximum length setting of 32676 is taken from the Schema field, which was read from the setting in the spreadsheet.)

    623d1336163224-bpattributeconstraint.png

    This constraint setting will cause all the records whose "party" value is not Democratic or Republican to be rejected.

    To illustrate the rejected records that will be sent out the Read Excel operator's reject port, we will use the following dataflow, which is like the one above but with the Write File  operator attached directly to the reject port. This allows us to see the format of the rejected records.

    624d1336163355-bprejectwritefile2.png

    The CSV file produced by the Write File 3 operator is:

    ___________________________________________________________________________________________________

    RejectType,RecordNumber,RecordData,RejectReason,RejectMessage
    4,0,"fname,lname,party,place",0|header,header
    1,1,"George,Washington,,1","1|TRANSLITERATE-1021,party,",The value '' for attribute
    'party' is not a member of the allowed list
    3,2,"lname,fname,party,place",ExcelRejectHeader1,Header
    3,2,"Adams,John,Federalist,two",ExcelRejectHeader1,"Error while converting value
    from Excel type 'string' to Expressor Studio type 'double'; resultCode: 1, field:
    place, value: two, message: Unable to convert value from Excel type string to expressor
    Studio type double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8"
    1,3,"Thomas,Jefferson,Democratic-Republican,3","1|TRANSLITERATE-1021,
    party,Democratic-Republican",The value 'Democratic-Republican' for attribute
    'party' is not a member of the allowed list
    1,4,"James,Madison,Democratic-Republican,4","1|TRANSLITERATE-1021,
    party,Democratic-Republican",The value 'Democratic-Republican' for attribute
    'party' is not a member of the allowed list
    3,5,"Monroe,James,Democratic-Republican,five",ExcelRejectHeader1,"Error while
    converting value from Excel type 'string' to Expressor Studio type 'double'; resultCode:
    1,field: place, value: five, message: Unable to convert value from Excel type string
    to expressor Studio type double: value does not correspond to pattern: 123 or -1.0
    or -45.67e-8"
    1,6,"John Q,Adams,Democratic-Republican,6","1|TRANSLITERATE-1021,
    party,Democratic-Republican",The value 'Democratic-Republican' for attribute
    'party' is not a member of the allowed list
    3,7,"Jackson,Andrew,Democratic,qwerty",ExcelRejectHeader1,"Error while converting
    value from Excel type 'string' to Expressor Studio type 'double'; resultCode: 1,
    field: place, value: qwerty, message: Unable to convert value from Excel type string
    to expressor Studio type double: value does not correspond to pattern: 123 or -1.0
    or -45.67e-8"
    1,9,"William H,Harrison,Whig,9","1|TRANSLITERATE-1021,party,Whig",The value 'Whig'
    for attribute 'party' is not a member of the allowed list
    1,10,"John,Tyler,Whig,10","1|TRANSLITERATE-1021,party,Whig",The value 'Whig' for
    attribute 'party' is not a member of the allowed list
    1,12,"Zachary,Taylor,Whig,12","1|TRANSLITERATE-1021,party,Whig",The value 'Whig'
    for attribute 'party' is not a member of the allowed list
    1,13,"Millard,Fillmore,Whig,13","1|TRANSLITERATE-1021,party,Whig",The value 'Whig'
    for attribute 'party' is not a member of the allowed list

    ___________________________________________________________________________________________________

     

    The first row is the standard header row for a rejected records file. The row following the header contains a RejectType 4 error, which is not erroneous data but instead is the header for all the RejectType 1 errors that follow. Notice that the order of field names in the RecordData field in the RejectType 4 row ("fname, lname, party, place") is not the same as in the Excel file, which is "place, lname, fname, party." The Schema and mapped Composite Type show the same sequence of field names:

    625d1336163837-bp-presidentsschema.png

    The first record with RejectType 1, which is a constraint violation, is for George Washington. The RecordData field displays the data fields in the order indicated by the RejectType 4 header--"George,Washington, ,1". Knowing this field order is critical to reconstructing the record data, as we will see when writing an Aggregate Rule.

    The second rejected record is John Adams. It is rejected not because John Adams was neither a Democratic nor Republican party member but because the record first produces an error when the place column is read and does not contain numeric data. Its RejectType is 3. Note, however, that before the John Adams record is listed in the reject file, another row is inserted as RejectType 3.

    3,2,"lname,fname,party,place",ExcelRejectHeader1,Header

    This row serves as a header for RejectType 3 errors, and its RecordData field shows the sequence in which data fields in those records will be displayed--"lname, fname, party, place." It is different from the Schema and from the sequence used for RejectType 1 errors. The fact that this field order is different must also be factored in to reconstructing the record data when writing an Aggregate Rule.

    Returning to the first dataflow shown above, we will now configure the Aggregate operator to reconstruct the original records before they were rejected in the Read Excel operator.

    First, the Change Function Rule must be enabled. The Change function checks to find the last record in a group. It is required in this instance of the Aggregate operator because Function Rule iterates through the input records before emitting its output. The Change Function Rule is enabled in the Rules Editor, and the Use Change function property in the Aggregate 1 property panel must be checked as well. When the Change Rule is used, an Aggregate key is not required. The basic logic of the Change Function Rule, shown here, does not need to be changed.

    ____________________________________

    function change(input, previous) 
    re
    turn false 
    end

    ________________________________

    The logic of the aggregation is contained in one Function Rule, labeled Rule_1.

    656d1336662901-bpaggregaterule2.png

    Each input record attribute is mapped to the Rule_1 input. The output attributes come from the Composite Type in the downstream Write File 2 operator. Those attributes are the four from the original records plus an attribute to capture the RejectMessage content so that it can be kept with the record.

    The aggregate and result functions use a function from an expressor library, so that library must be included with a require statement at the beginning of the rule. The rule also constructs three variables as tables (rows={}, header1={}, and header3={}).

    The aggregate function tests the RejectType of each record to determine if it is a header for a RejectType 1 or 3 error and creates the appropriate header by using the expressor library function csv.linetoCSV to convert the data in the RecordData field to a CSV header for the ensuing records with either RejectType 1 or 3 errors. If a record's RejectType is neither 4 nor 3 marked as a header, then it is added to the "rows" table. The aggregate function processes all the input records before passing them to the result function. The reason the aggregate function processes all records is there is only one group, and as a result, the change function continues to return "false" until the last incoming record is processed. The aggregate function also indexes the records it processes and passes the index value as the count to the result function.

    At the conclusion of the aggregate function, content of the three tables is:

    header1 { fname,lname,party,place }

    header3 { lname,fname,party,place }

    row { "George,Washington,,1",The value '' for attribute 'party' is not a member of
    the allowed list,1
    "Adams,John,Federalist,two","Error while converting value from
    Excel type 'string' to Expressor Studio type 'double'; resultCode: 1, field: place,
    value: two, message: Unable to convert value from Excel type string to expressor
    Studio type double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8",3
    "Thomas,Jefferson,Democratic-Republican,3",The value 'Democratic-Republican' for
    attribute 'party' is not a member of the allowed list,1
    "James,Madison,Democratic-Republican,4",The value
    'Democratic-Republican' for attribute 'party' is not a member of the allowed list,1
    "Monroe,James,Democratic-Republican,five","Error while converting value from
    Excel type 'string' to Expressor Studio type 'double'; resultCode: 1, field: place,
    value: five, message: Unable to convert value from Excel type string to expressor
    Studio type double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8",3
    "John Q,Adams,Democratic-Republican,6",The value 'Democratic-Republican' for
    attribute 'party' is not a member of the allowed list,1
    "Jackson,Andrew,Democratic,qwerty","Error while converting value from Excel type
    'string' to Expressor Studio type 'double'; resultCode: 1, field: place, value: qwerty,
    message: Unable to convert value from Excel type string to expressor Studio type
    double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8",3
    "
    William H,Harrison,Whig,9",The value 'Whig' for attribute 'party' is not a member of the
    allowed list,1
    "John,Tyler,Whig,10",The value 'Whig' for attribute 'party' is not a member of the
    allowed list,1
    "Zachary,Taylor,Whig,12",The value 'Whig' for attribute 'party' is not a member of the
    allowed list,1
    "Millard,Fillmore,Whig,13",The value 'Whig' for attribute 'party' is not a member of the
    allowed list,1 }

     

    Once the records have been aggregated in the three tables, the result function takes each record arrayed in the "rows" table, selects the first element in the numerically indexed array, and parses it into the "line" variable with the csv.linetoCSV function. Then the "rows" record's third [3] field is evaluated to determine if RejectType 1 or 3. In both cases, the ipairs function is then used to iterate through the relevant header array ("header1" or "header3"). The header arrays are numerically indexed, so the ipairs count variable, "k," records the current position in header array. The value at the [k] location in header file is placed in the output array as the key label for the value at the corresponding place in the line array (output[v]). In this example, "header1" is arrayed as {1=fname,2=lname,3=party,4=place}, so output[v]=line[k] becomes output[fname]=line[1], output[lname]=line[2], output[party]=line[3], and output[place]=line[4].

    The datascript code for the result function is:

    _______________________________________________________________________________

    function result(input, count)

    cnt = 0

    return function()    

    output = {}

    if cnt < #rows then

    cnt = cnt + 1      

    success,line = expressor.csv.lineToCSV(rows[cnt][1])

    if rows[cnt][3]==1 then

    for k,v in ipairs(header1) do          

    output[v] = line[k]         

    end  --end for       

    output.Error_Message = rows[cnt][2]       

    elseif rows[cnt][3]==3 then

    for k,v in ipairs(header3) do

    output[v] = line[k]         

    end  --end for       

    output.Error_Message = rows[cnt][2]       

    else       

    end     

    return output

    else

    return nil

    end  -- end if  

       end  -- end return function

    end  --end result function

    _________________________________________________________________________________

    Notice that after the output array's keys and values are assembled, the field Error_Messages is added to the output. The content of that field comes from the second field in the rows table, which is the RejectReason. At that point, the output array looks like the following, assuming the first rows record had just been read:

    output { fname="George",lname="Washington",party="",place=1,Error_Message="The value '' for attribute 'party' is not a member of the allowed list" }

    The output array is a dictionary or associative array, so there is no guarantee as to the order in which the keys will be stored. But it is guaranteed that output parameters of the Function Rule will go to the matching attributes in the Aggregate 1 operator's output.

    It is important to understand how the return function within the result function works. Note that the syntax is "result function," not "function result." The return function has iteration built in, and that is why Rule_1 must have Iteration turned on (see Use an Iterative Function in the Writing Rules in Transformation Operators topic in the product documentation). The return function iterates until it returns "nil," which is the signal used to end an iterative function. In each iteration before "nil," the return function sends the current record in the output array to the Aggregate operator's output attributes, and from there, each record is propagated downstream.

    Note: The looping structures for and while cannot be used in the return function because the return function must control the iteration. Using a for or while structure would cause an infinite loop.

    When the result function has finished processing all records in the rows table, the rejected records will be uniformly organized, regardless of RejectType. In the example, the attributes from the Aggregate 1 operator are propagated to a Write File operator so the results can be viewed. In most cases, further processing would be performed after the Aggregate operator, but in this case, we simply want to see the results of the processing in the Aggregate 1 operator. The Write File 2 operator uses a Schema that maps to the attributes propagated from Aggregate 1:

    627d1336165551-bppresidentsrejectschema.png

    The Error_Message attribute is mapped to the RejectReason field in the Delimited Schema, and the fields are in the same order they were in when read by the Schema used by the Read Excel operator. That order is reflected in the output file to which the Write File 2 sends the data:

    _________________________________________________________________________________________________

    place,lname,fname,party,RejectReason
    1,Washington,George,,The value '' for attribute 'party' is not a member of the
    allowed list
    two,Adams,John,Federalist,"Error while converting value from Excel type 'string'
    to Expressor Studio type 'double'; resultCode: 1, field: place, value: two, message:
    Unable to convert value from Excel type string to expressor Studio type double: value
    does not correspond to pattern: 123 or -1.0 or -45.67e-8"
    3,Jefferson,Thomas,Democratic-Republican,The value 'Democratic-Republican' for
    attribute party' is not a member of the allowed list
    4,Madison,James,Democratic-Republican,The value 'Democratic-Republican' for
    attribute 'party' is not a member of the allowed list
    five,Monroe,James,Democratic-Republican,"Error while converting value from Excel
    type 'string' to Expressor Studio type 'double'; resultCode: 1, field: place, value:
    five, message: Unable to convert value from Excel type string to expressor Studio
    type double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8"
    6,Adams,John Q,Democratic-Republican,The value 'Democratic-Republican' for attribute
    'party' is not a member of the allowed list
    qwerty,Jackson,Andrew,Democratic,"Error while converting value from Excel type
    'string' to Expressor Studio type 'double'; resultCode: 1, field: place, value: qwerty,
    message: Unable to convert value from Excel type string to expressor Studio type
    double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8"
    9,Harrison,William H,Whig,The value 'Whig' for attribute 'party' is not a member
    of the allowed list
    10,Tyler,John,Whig,The value 'Whig' for attribute 'party' is not a member of the
    allowed list
    12,Taylor,Zachary,Whig,The value 'Whig' for attribute 'party' is not a member of
    the allowed list
    13,Fillmore,Millard,Whig,The value 'Whig' for attribute 'party' is not a member of
    the allowed list

    __________________________________________________________________________________________________