The Pivot Column operator converts multiple records into a single record that contains selected fields from each of the multiple records. That is, this operator performs a many-to-one transformation. Using this operator requires no coding; all configuration is performed through a graphical interface.
Let's suppose you have a file with monthly corporate sales data. The first line is a header and the following lines are the sales data by month, by year, for selected companies.
You want to pivot this data so that each row of an output file contains all of the data for a single year for a specific company.
This is the type of operation the Pivot Column operator performs.
Let's first implement this basic example. The dataflow includes only three operators: Read File, Pivot Column, Write File.
Each incoming record, that is, row from the source file, contains four fields while each outgoing record will contain 14 fields. Although the Read File operator is configured to drop the header row, the default schema for the input file defines an attribute whose name is the same as the field header. As with all operators in the Transformers grouping, you will work with the attribute names within the operator.
To configure the Pivot Column operator, there are four tabs that require attention.
- Place the Read File operator onto the dataflow and set its properties.
- Place the Pivot Column operator onto the dataflow and connect to the Read File operator.
- Select the Pivot Column operator and click the Edit Pivot link in its Properties panel or the Edit Pivot button in the Operators grouping of the ribbon bar. This opens the Pivot Column wizard.
- Tab 1, Build Output, is where you describe the structure of the record emitted by the operator. Notice that the Input Attributes panel lists all of the attributes in the incoming record.
- In this example, the emitted record will include the attributes Company, Year, and 12 attributes, each representing a single month's sales. To add Company and Year to the listing of output attributes, select thee attributes in the input attributes listing and click the Add button that is between the Input Attributes and Output Attributes panels. The attribute names are transferred to the Output Attributes panel.
- Now you need to manually add an attribute for each month's revenue. Click the Add button that is above the Output Attributes panel to open the Add Attributes window. Create the required 12 attributes. In this example, all attribute types are strings, but in an actual use case you will probably have changed the input attributes that represent revenue to decimal types.
- Note the right-facing arrow before Company and Year and the diamond before the months. These icons indicate that Company and Year will be directly initialized with values from the incoming record and that the months will be initialized with values selected by the pivot operation.
- Click Next or click on Tab 2 to move to the next tab, Specify Transfers. Notice the arrow between the input attributes Company and Year and the identically named output attributes. The operator can make this decision as the attribute names are identical. However, you could make an alternative assignment by clicking on the small triangle and selecting a different input attribute from the drop down list.
- Since none of the remaining input attributes are assigned directly to output attributes, move to Tab 3.
- On Tab 3, Specify Pivots, you define the pivot operation. Start by clicking the Add Pivot button, which places a pivot descriptor into the tab.
- Next, click Select..., and in the Select Attributes window, select which data from the incoming record you want emitted. In this example, you want to emit data for each month, so select all the attributes. Then click Select.
- The Output attributes listing now includes the name of the pivot attribute. In the Pivot from drop down under the Input attributes label, select RevenueMonth, the attribute in the incoming record that will identify the month.
- Then in the Value from drop down, select Revenue, the attribute in the incoming record that contains the monthly revenue value.
- Move to the final tab, Edit Output, whihc presents a graphical representation of the pivot operation. Use this information to confirm that the output is what you desire.
- If desired, you can also change the names of the values under the pivot column (RevenueMonth). Simply click on the appropriate cell and change.
- Click OK to complete the process and close the Pivot Column wizard.
As a second example, let's use a more involved incoming record - a record that includes both revenue and expense data. In this use case, you want to create multiple pivots, one for each month's expense and one for each month's revenue and then combine into a single record that includes all revenue and expense data for a single year of a specific company.
In this example, you will set up a pivot for the revenues and a second pivot for the expenses. Each pivot must include the same number of attributes. The following screen shots show the tabs in the Pivot Column wizard.
Note how in Tab 3 the two pivots are configured. One deals only with the data related to revenue and the other with the data related to expenses. As shown in Tab 4, each output record will include 26 fields: Company, Year, and 12 fields for monthly revenue and 12 fields for monthly expenses.
The outut file includes the following content.