Qlik Community

QlikView Documents

Documents for QlikView related information.

QlikView Expressor: How to Process Rejected Records

jsw
New Contributor II

QlikView Expressor: How to Process Rejected Records

When records produce errors because they violate constraints set on Composite Type attributes or other reasons, the operator that encounters the error can handle them by skipping them, aborting the dataflow, or rejecting the offending records. In some cases, it is sufficient to simply send rejected records to a Write File operator and examine the records in the output file. If the intent is, however, to correct or otherwise use those records, examining each error and changing the data could be very cumbersome. The more efficient approach would be to reprocess the records as they come out the reject port.

Records rejected by input operators such as Read File, Read Table, and Read Excel are structured into the following fields:

RejectType
RecordNumber
RecordData
RejectReason
RejectMessage

The record data as it was constituted before being rejected is contained in the RecordData field. To process that data, it must first be reconstructed from the rejected record format. Several factors affect the reconstruction. The order of the record data fields can be different from the order represented in the original Schema, and some of the records emitted from the reject port do not contain record data. For example, RejectType 1 errors are constraint violations, but before they are emitted, a RejectType 4 record is emitted. The RejectType 4 record contains the record data field order for the subsequent RejectType 1 errors in its RecordData field. The RejectTypes are fully explained in the Using the Reject Port section of the Read Custom operator topic in the product documentation.

Note: All non-input operators that have a reject port emit rejected records with the existing attributes of the record, that is, they do not restructure the records the way input operators do. Reprocessing records rejected by non-input operators do not have to be reconstructed the way records rejected by input operators do.

The Aggregate operator is used to reconstruct rejected records. To illustrate the logic used to reconstruct rejected data, we will use an Excel file containing a list of US presidents. The following table shows the header and first few rows in the spreadsheet. The place field in a Number in Excel. The text in several of those Number cells will generate errors that will be used in the reconstruction example.

placelnamefnameparty
1WashingtonGeorge
twoAdamsJohnFederalist
3JeffersonThomasDemocratic-Republican
4MadisonJamesDemocratic-Republican
fiveMonroeJamesDemocratic-Republican
6AdamsJohn QDemocratic-Republican
qwertyJacksonAndrewDemocratic
8Van BurenMartinDemocratic

The following dataflow uses an Aggregate operator to count the number of Democratic presidents and the number of Republican presidents in United States history.

622d1336163114-bprejectdfwithaggregate2.png

The Read Excel operator uses Semantic Type constraints to separate Democratic and

Republican party presidents from all the others, such as Whig and Democratic-Republican. The following screen shows the constraint set on Allowed values. The Corrective Action is <Escalate> so that errors are passed to the operator for error handling. (The Maximum length setting of 32676 is taken from the Schema field, which was read from the setting in the spreadsheet.)

623d1336163224-bpattributeconstraint.png

This constraint setting will cause all the records whose "party" value is not Democratic or Republican to be rejected.

To illustrate the rejected records that will be sent out the Read Excel operator's reject port, we will use the following dataflow, which is like the one above but with the Write File  operator attached directly to the reject port. This allows us to see the format of the rejected records.

624d1336163355-bprejectwritefile2.png

The CSV file produced by the Write File 3 operator is:

___________________________________________________________________________________________________

RejectType,RecordNumber,RecordData,RejectReason,RejectMessage
4,0,"fname,lname,party,place",0|header,header
1,1,"George,Washington,,1","1|TRANSLITERATE-1021,party,",The value '' for attribute
'party' is not a member of the allowed list
3,2,"lname,fname,party,place",ExcelRejectHeader1,Header
3,2,"Adams,John,Federalist,two",ExcelRejectHeader1,"Error while converting value
from Excel type 'string' to Expressor Studio type 'double'; resultCode: 1, field:
place, value: two, message: Unable to convert value from Excel type string to expressor
Studio type double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8"
1,3,"Thomas,Jefferson,Democratic-Republican,3","1|TRANSLITERATE-1021,
party,Democratic-Republican",The value 'Democratic-Republican' for attribute
'party' is not a member of the allowed list
1,4,"James,Madison,Democratic-Republican,4","1|TRANSLITERATE-1021,
party,Democratic-Republican",The value 'Democratic-Republican' for attribute
'party' is not a member of the allowed list
3,5,"Monroe,James,Democratic-Republican,five",ExcelRejectHeader1,"Error while
converting value from Excel type 'string' to Expressor Studio type 'double'; resultCode:
1,field: place, value: five, message: Unable to convert value from Excel type string
to expressor Studio type double: value does not correspond to pattern: 123 or -1.0
or -45.67e-8"
1,6,"John Q,Adams,Democratic-Republican,6","1|TRANSLITERATE-1021,
party,Democratic-Republican",The value 'Democratic-Republican' for attribute
'party' is not a member of the allowed list
3,7,"Jackson,Andrew,Democratic,qwerty",ExcelRejectHeader1,"Error while converting
value from Excel type 'string' to Expressor Studio type 'double'; resultCode: 1,
field: place, value: qwerty, message: Unable to convert value from Excel type string
to expressor Studio type double: value does not correspond to pattern: 123 or -1.0
or -45.67e-8"
1,9,"William H,Harrison,Whig,9","1|TRANSLITERATE-1021,party,Whig",The value 'Whig'
for attribute 'party' is not a member of the allowed list
1,10,"John,Tyler,Whig,10","1|TRANSLITERATE-1021,party,Whig",The value 'Whig' for
attribute 'party' is not a member of the allowed list
1,12,"Zachary,Taylor,Whig,12","1|TRANSLITERATE-1021,party,Whig",The value 'Whig'
for attribute 'party' is not a member of the allowed list
1,13,"Millard,Fillmore,Whig,13","1|TRANSLITERATE-1021,party,Whig",The value 'Whig'
for attribute 'party' is not a member of the allowed list

___________________________________________________________________________________________________

The first row is the standard header row for a rejected records file. The row following the header contains a RejectType 4 error, which is not erroneous data but instead is the header for all the RejectType 1 errors that follow. Notice that the order of field names in the RecordData field in the RejectType 4 row ("fname, lname, party, place") is not the same as in the Excel file, which is "place, lname, fname, party." The Schema and mapped Composite Type show the same sequence of field names:

625d1336163837-bp-presidentsschema.png

The first record with RejectType 1, which is a constraint violation, is for George Washington. The RecordData field displays the data fields in the order indicated by the RejectType 4 header--"George,Washington, ,1". Knowing this field order is critical to reconstructing the record data, as we will see when writing an Aggregate Rule.

The second rejected record is John Adams. It is rejected not because John Adams was neither a Democratic nor Republican party member but because the record first produces an error when the place column is read and does not contain numeric data. Its RejectType is 3. Note, however, that before the John Adams record is listed in the reject file, another row is inserted as RejectType 3.

3,2,"lname,fname,party,place",ExcelRejectHeader1,Header

This row serves as a header for RejectType 3 errors, and its RecordData field shows the sequence in which data fields in those records will be displayed--"lname, fname, party, place." It is different from the Schema and from the sequence used for RejectType 1 errors. The fact that this field order is different must also be factored in to reconstructing the record data when writing an Aggregate Rule.

Returning to the first dataflow shown above, we will now configure the Aggregate operator to reconstruct the original records before they were rejected in the Read Excel operator.

First, the Change Function Rule must be enabled. The Change function checks to find the last record in a group. It is required in this instance of the Aggregate operator because Function Rule iterates through the input records before emitting its output. The Change Function Rule is enabled in the Rules Editor, and the Use Change function property in the Aggregate 1 property panel must be checked as well. When the Change Rule is used, an Aggregate key is not required. The basic logic of the Change Function Rule, shown here, does not need to be changed.

____________________________________

function change(input, previous) 
re
turn false 
end

________________________________

The logic of the aggregation is contained in one Function Rule, labeled Rule_1.

656d1336662901-bpaggregaterule2.png

Each input record attribute is mapped to the Rule_1 input. The output attributes come from the Composite Type in the downstream Write File 2 operator. Those attributes are the four from the original records plus an attribute to capture the RejectMessage content so that it can be kept with the record.

The aggregate and result functions use a function from an expressor library, so that library must be included with a require statement at the beginning of the rule. The rule also constructs three variables as tables (rows={}, header1={}, and header3={}).

The aggregate function tests the RejectType of each record to determine if it is a header for a RejectType 1 or 3 error and creates the appropriate header by using the expressor library function csv.linetoCSV to convert the data in the RecordData field to a CSV header for the ensuing records with either RejectType 1 or 3 errors. If a record's RejectType is neither 4 nor 3 marked as a header, then it is added to the "rows" table. The aggregate function processes all the input records before passing them to the result function. The reason the aggregate function processes all records is there is only one group, and as a result, the change function continues to return "false" until the last incoming record is processed. The aggregate function also indexes the records it processes and passes the index value as the count to the result function.

At the conclusion of the aggregate function, content of the three tables is:

header1 { fname,lname,party,place }

header3 { lname,fname,party,place }

row { "George,Washington,,1",The value '' for attribute 'party' is not a member of
the allowed list,1
"Adams,John,Federalist,two","Error while converting value from
Excel type 'string' to Expressor Studio type 'double'; resultCode: 1, field: place,
value: two, message: Unable to convert value from Excel type string to expressor
Studio type double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8",3
"Thomas,Jefferson,Democratic-Republican,3",The value 'Democratic-Republican' for
attribute 'party' is not a member of the allowed list,1
"James,Madison,Democratic-Republican,4",The value
'Democratic-Republican' for attribute 'party' is not a member of the allowed list,1
"Monroe,James,Democratic-Republican,five","Error while converting value from
Excel type 'string' to Expressor Studio type 'double'; resultCode: 1, field: place,
value: five, message: Unable to convert value from Excel type string to expressor
Studio type double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8",3
"John Q,Adams,Democratic-Republican,6",The value 'Democratic-Republican' for
attribute 'party' is not a member of the allowed list,1
"Jackson,Andrew,Democratic,qwerty","Error while converting value from Excel type
'string' to Expressor Studio type 'double'; resultCode: 1, field: place, value: qwerty,
message: Unable to convert value from Excel type string to expressor Studio type
double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8",3
"
William H,Harrison,Whig,9",The value 'Whig' for attribute 'party' is not a member of the
allowed list,1
"John,Tyler,Whig,10",The value 'Whig' for attribute 'party' is not a member of the
allowed list,1
"Zachary,Taylor,Whig,12",The value 'Whig' for attribute 'party' is not a member of the
allowed list,1
"Millard,Fillmore,Whig,13",The value 'Whig' for attribute 'party' is not a member of the
allowed list,1 }

Once the records have been aggregated in the three tables, the result function takes each record arrayed in the "rows" table, selects the first element in the numerically indexed array, and parses it into the "line" variable with the csv.linetoCSV function. Then the "rows" record's third [3] field is evaluated to determine if RejectType 1 or 3. In both cases, the ipairs function is then used to iterate through the relevant header array ("header1" or "header3"). The header arrays are numerically indexed, so the ipairs count variable, "k," records the current position in header array. The value at the location in header file is placed in the output array as the key label for the value at the corresponding place in the line array (output). In this example, "header1" is arrayed as {1=fname,2=lname,3=party,4=place}, so output=line becomes output[fname]=line[1], output[lname]=line[2], output[party]=line[3], and output[place]=line[4].

The datascript code for the result function is:

_______________________________________________________________________________

function result(input, count)

cnt = 0

return function()    

output = {}

if cnt < #rows then

cnt = cnt + 1      

success,line = expressor.csv.lineToCSV(rows[cnt][1])

if rows[cnt][3]==1 then

for k,v in ipairs(header1) do          

output = line         

end  --end for       

output.Error_Message = rows[cnt][2]       

elseif rows[cnt][3]==3 then

for k,v in ipairs(header3) do

output = line         

end  --end for       

output.Error_Message = rows[cnt][2]       

else       

end     

return output

else

return nil

end  -- end if  

   end  -- end return function

end  --end result function

_________________________________________________________________________________

Notice that after the output array's keys and values are assembled, the field Error_Messages is added to the output. The content of that field comes from the second field in the rows table, which is the RejectReason. At that point, the output array looks like the following, assuming the first rows record had just been read:

output { fname="George",lname="Washington",party="",place=1,Error_Message="The value '' for attribute 'party' is not a member of the allowed list" }

The output array is a dictionary or associative array, so there is no guarantee as to the order in which the keys will be stored. But it is guaranteed that output parameters of the Function Rule will go to the matching attributes in the Aggregate 1 operator's output.

It is important to understand how the return function within the result function works. Note that the syntax is "result function," not "function result." The return function has iteration built in, and that is why Rule_1 must have Iteration turned on (see Use an Iterative Function in the Writing Rules in Transformation Operators topic in the product documentation). The return function iterates until it returns "nil," which is the signal used to end an iterative function. In each iteration before "nil," the return function sends the current record in the output array to the Aggregate operator's output attributes, and from there, each record is propagated downstream.

Note: The looping structures for and while cannot be used in the return function because the return function must control the iteration. Using a for or while structure would cause an infinite loop.

When the result function has finished processing all records in the rows table, the rejected records will be uniformly organized, regardless of RejectType. In the example, the attributes from the Aggregate 1 operator are propagated to a Write File operator so the results can be viewed. In most cases, further processing would be performed after the Aggregate operator, but in this case, we simply want to see the results of the processing in the Aggregate 1 operator. The Write File 2 operator uses a Schema that maps to the attributes propagated from Aggregate 1:

627d1336165551-bppresidentsrejectschema.png

The Error_Message attribute is mapped to the RejectReason field in the Delimited Schema, and the fields are in the same order they were in when read by the Schema used by the Read Excel operator. That order is reflected in the output file to which the Write File 2 sends the data:

_________________________________________________________________________________________________

place,lname,fname,party,RejectReason
1,Washington,George,,The value '' for attribute 'party' is not a member of the
allowed list
two,Adams,John,Federalist,"Error while converting value from Excel type 'string'
to Expressor Studio type 'double'; resultCode: 1, field: place, value: two, message:
Unable to convert value from Excel type string to expressor Studio type double: value
does not correspond to pattern: 123 or -1.0 or -45.67e-8"
3,Jefferson,Thomas,Democratic-Republican,The value 'Democratic-Republican' for
attribute party' is not a member of the allowed list
4,Madison,James,Democratic-Republican,The value 'Democratic-Republican' for
attribute 'party' is not a member of the allowed list
five,Monroe,James,Democratic-Republican,"Error while converting value from Excel
type 'string' to Expressor Studio type 'double'; resultCode: 1, field: place, value:
five, message: Unable to convert value from Excel type string to expressor Studio
type double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8"
6,Adams,John Q,Democratic-Republican,The value 'Democratic-Republican' for attribute
'party' is not a member of the allowed list
qwerty,Jackson,Andrew,Democratic,"Error while converting value from Excel type
'string' to Expressor Studio type 'double'; resultCode: 1, field: place, value: qwerty,
message: Unable to convert value from Excel type string to expressor Studio type
double: value does not correspond to pattern: 123 or -1.0 or -45.67e-8"
9,Harrison,William H,Whig,The value 'Whig' for attribute 'party' is not a member
of the allowed list
10,Tyler,John,Whig,The value 'Whig' for attribute 'party' is not a member of the
allowed list
12,Taylor,Zachary,Whig,The value 'Whig' for attribute 'party' is not a member of
the allowed list
13,Fillmore,Millard,Whig,The value 'Whig' for attribute 'party' is not a member of
the allowed list

__________________________________________________________________________________________________


Version history
Revision #:
1 of 1
Last update:
‎2012-08-10 09:09 AM
Updated by:
jsw