Originally published on 09-22-2011 02:55 PM
The Pivot Row operator converts a single record into multiple records where each of these records contains selected fields from the original record. That is, this operator performs a one-to-many transformation. Using this operator requires no coding; all configuration is performed through a graphical interface.
Let's suppose you have a file with monthly corporate sales data. The first line in your file is a header and the following lines are sales data for selected companies, as illustrated in the following fragment.
You want to pivot this data so that each row of an output file contains only the data for a single month, in a single year, for a specific company.
This is the type of operation the Pivot Row operator performs.
Let's first implement this basic example. The dataflow includes only three operators: Read File, Pivot Row, Write File.
Each incoming record, that is, row from the source file, contains 14 fields while each outgoing record will contain four fields. Although the Read File operator is configured to drop the header row, the default schema for the input file defines an attribute whose name is the same as the field header. As with all operators in the Transformers grouping, you will work with the attribute names within the operator.
To configure the Pivot Row operator, there are four tabs that require attention. You may find it easier to configure the upstream and downstream operators first, but that option may not always be possible, so this example will be developed using the left-to-right paradigm.
As a second example, let's use a more involved incoming record - a record that includes both revenue and expense data. In this use case, you want to create multiple pivots, one for each month's expense and one for each month's revenue.
In this example, you will set up a pivot for the revenues and a second pivot for the expenses. Each pivot must include the same number of attributes. The following screen shots show the tabs in the Pivot Row wizard.
Note how in Tab 3 the two pivots are configured. One deals only with the data related to revenue and the other with the data related to expenses. As shown in Tab 4, each output record will include six fields: Company, Year, MonthlyRevenue, Revenue, MonthlyExpense, and Expense.
The output file includes the following content.