Originally published on 09-22-2011 01:55 PM
The Pivot Column operator converts multiple records into a single record that contains selected fields from each of the multiple records. That is, this operator performs a many-to-one transformation. Using this operator requires no coding; all configuration is performed through a graphical interface.
Let's suppose you have a file with monthly corporate sales data. The first line is a header and the following lines are the sales data by month, by year, for selected companies.
You want to pivot this data so that each row of an output file contains all of the data for a single year for a specific company.
This is the type of operation the Pivot Column operator performs.
Let's first implement this basic example. The dataflow includes only three operators: Read File, Pivot Column, Write File.
Each incoming record, that is, row from the source file, contains four fields while each outgoing record will contain 14 fields. Although the Read File operator is configured to drop the header row, the default schema for the input file defines an attribute whose name is the same as the field header. As with all operators in the Transformers grouping, you will work with the attribute names within the operator.
To configure the Pivot Column operator, there are four tabs that require attention.
As a second example, let's use a more involved incoming record - a record that includes both revenue and expense data. In this use case, you want to create multiple pivots, one for each month's expense and one for each month's revenue and then combine into a single record that includes all revenue and expense data for a single year of a specific company.
In this example, you will set up a pivot for the revenues and a second pivot for the expenses. Each pivot must include the same number of attributes. The following screen shots show the tabs in the Pivot Column wizard.
Note how in Tab 3 the two pivots are configured. One deals only with the data related to revenue and the other with the data related to expenses. As shown in Tab 4, each output record will include 26 fields: Company, Year, and 12 fields for monthly revenue and 12 fields for monthly expenses.
The outut file includes the following content.