Originally published on 07-21-2011 08:47 AM
The mappings within a Schema artifact have two functions:
- They associate each field in the external data resource with its corresponding attribute in the composite type that represents this data within an expressor dataflow.
- They describe any data transformations that may be required to move data into or out of the expressor dataflow. Mapping formats provide directives to these transformations.
When you create a Schema artifact, the schema wizard will create default mappings and formatting directives that, in many situations, may be completely suitable for your purposes. In other situations, you may want to alter the Schema, perhaps by selecting a different composite type or by changing the data type of an attribute. In these situations, you may need to modify the mapping format. This document summarizes the rules that govern how you configure a mapping format.
- Mapping formats are characteristics of Schema artifacts and are therefore only relevant to those operators involved in moving data into or out of a dataflow, i.e., the input and output operators.
- A mapping format may only be applied when either the field in the external data or the attribute in the semantic type corresponding to that field is a string type.
- Think of a Schema artifact as having an upstream and downstream orientation.
- For the input and output operators, upstream represents the data received by the operator and downstream represents the data emitted by the operator.
- For input operators, upstream data is the external data read by the operator and downstream data are the attributes emitted by the operator. The mapping format is applied to the data as it moves left-to-right across the Schema editor.
- For output operators, upstream data are the attributes received by the operator and downstream data is the data written by the operator to an external resource. The mapping format is applied to the data as it moves right-to-left across the Schema editor.
- The format always describes how upstream data will be transformed into the downstream data.
- If one endpoint of the mapping is a non-string type, the format is a description of how the string type represents the non-string type. For example:
- With a datetime endpoint, the format describes how the characters in the string correspond to the datetime entities such as century, year, month, day, hour, minute, seconds.
- With a decimal endpoint, the format describes how the characters in the string correspond to a number and which characters (currency, grouping) should be removed/added from/to the string representation.
- If both endpoints of the mapping are string types, the format describes the modifications to the upstream data that must be applied before emitting the downstream data.
If you are using the same Schema artifact for both input and output operators, you may find it useful to create two mappings, one for when the Schema is used by an input operator and the second for when the Schema is used by an output operator. This will allow you to read data with minimal formatting, for example, database table content, and write it to a file with enriched formatting such as a currency sign or numeric grouping.