Originally published on 09-22-2011 02:55 PM
New in expressor 3.4 are lookup tables, which are implemented by a light-weight relational database management system integrated into the expressor Data Integration Engine. The database stores the lookup table content in a file and depending on the file system location of this file, the scope of the lookup table can be limited to a single project (or library) or to all projects (and libraries) in a workspace, or to all projects (and libraries) deployed onto the computer. In fact, if the file system location of the lookup file is mapped to a drive, the lookup table's contents can be accessed by expressor data integration applications running on networked computers. In developing the examples described in this tutorial, the lookup table files will be stored in a directory (C:\lookups) accessible to all expressor applications.
A lookup table first needs to be initialized with data using the Write Lookup Table operator. The data used to initialize a lookup table can come directly from a resource, such as a file or database table, or can be data produced by a data integration application. In keeping with the expressor composite type usage paradigm, you can either manually define the columns in a lookup table or or define the table by referencing an existing composite type.
The first step is to create the lookup table. Lookup tables are expressor artifacts similar to schema or connection artifacts. Under every project or library directory, look for the subdirectory Lookup Tables. This is where lookup table artifacts are stored. To create a new lookup table artifact,
- Start the lookup table wizard:
- In the Home tab of the ribbon bar, click New Lookup Table
- In the Explorer panel, right-click on a project or library directory and select New Lookup Table... from the popup menu
- In the Explorer panel, right-click on the Lookup Tables subdirectory and select New... from the popup menu
- In the New Lookup Table window, give the lookup table a distinctive and meaningful name (for example, CityStation), then click Create. The wizard opens.
- In this example, the lookup table will store the call signs for television stations in various cities. Therefore, the lookup table requires two columns that are identified by the attributes city, call_sign.
- To manually add an attribute to the table's description, click Add in the Type Attributes grouping in the Lookup Table > Edit tab of the ribbon bar. This opens the Add Attribute window, where you specify the name, type, and constraints on an attribute.
- Alternatively, if you have, or would like to define, a composite type that describes the table's structure, you can simply assign the type to the table.
- Once the table's columns have been defined, you need to associate one or more columns with a named key, which will be used to select records from the lookup table.
- Click Add in the Keys grouping in the Lookup Table > Edit tab of the ribbon bar and in the Add Key window, enter a name and indicate whether the key values will be unique or not. A lookup table may have only one unique key.
- Click OK and your key is listed.
- You now associate one or more columns with the key. In this example, only the call_sign column contributes to the key.
- Create a second, non-unique, key named City that is based on the attribute city.
- The final step is to specify the file system location where the file containing the lookup table's data will be written.
- At the bottom of the wizard's panel, click Assign and select either an existing File Connection or define a new connection.
- In this example, select a connection that points to the directory C:\lookups.
- If you do not assign a connection, the lookup table file will be stored in a location within your project and its scope will be limited to a single dataflow.
- Save and close the wizard.
Now that the lookup table has been defined, it will appear under the Lookup Tables subdirectory and you are ready to populate the table with data. This example populates the lookup table with data extracted from a database table that contains a collection of cities and their television station call signs.
mysql> select * from city_station;
| city | call_sign |
| Atlanta | WAGA
| Atlanta | WATL
| Atlanta | WPBA
| Atlanta | WXIA
| Boston | WBZ
| Boston | WCVB
| Boston | WGBH
| Boston | WHDH
| Boston | WLVI
| Dallas | KDAF
| Dallas | KERA
| Dallas | KTVT
| Dallas | KXAS
| Dallas | KXTX
| Dallas | WFAA
| Knoxville | WATE
| Missoula | KECI
| New York | WABC
| New York | WCBS
| New York | WNBC
| New York | WNYW
| New York | WNET
| New York | WPIX
| New York | WWOR
| Philadelphia | KYW
| Philadelphia | WCAU
| Philadelphia | WHYY
| Philadelphia | WPHL
| Philadelphia | WPVI
| Philadelphia | WTXF
| Saginaw | WNEM
31 rows in set (0.08 sec)
A simple dataflow - Read Table, Write Lookup Table - can be used to transfer the data from the database to the lookup table. Since all the details of the structure and file system location of the lookup table are embedded in the Lookup Table artifact, you only need to set the Lookup Table property of the Write Lookup Table operator and indicate whether you want to truncate, or append to, the contents of the table when the dataflow runs.
Now that the lookup table is loaded with data, how do you use it?
Applications access the data in a lookup table through use of a lookup rule, which can only be used within a Transform Operator. In this example, the lookup table has both unique and non-unique keys, so you can find the city in which a specific television station is located or all the televisions stations in a given city. When you use the unique key, zero or one record may be returned from the lookup table; when you use a non-unique key, zero or more records may be returned from the lookup table.
Let's create a small application that reads a list of station call signs and extracts from the lookup table the city in which the station is located. The dataflow would include a Read File operator, a Transform operator in which a lookup rule is used to access the lookup table, and a Write File operator to output the records.
- Place the Read File operator onto the dataflow and configure.
- Add a Transform operator and connect to the Read File operator.
- Open the Transform operator's Rules Editor. Note that the input panel shows the attribute representing the call sign and that an identically named attribute in included in the output panel. The right-facing arrow next to the output attribute indicates that the call sign will be automatically transferred from the input to the output.
- On the Home tab in the ribbon bar, click New Rule and select Lookup Expression Rule from the drop down menu.
- From the Lookup drop down control, select the desired lookup table.
- Once a table selection has been made, the Key drop down control populates with the table's keys. Select the unique CallSign key.
- As soon as you select the key, the input and output parameter panels are populated. As configured in the above figure, for each incoming call sign, one city will be extracted from the lookup table and the rule output parameter city will be initialized with this value.
- To complete set up of this rule, you will first need to add an attribute to the output panel for the city.
- Now, using the mouse, drag a connection from the input attribute call_sign to the input rule paramter call_sign and from the output rule parameter city to the output attribute city. There is no need to map the rule output parameter call_sign to the output attribute call_sign; this output attribute will be automatically initialized from the input attribute through expressor's attribute propagation functionality.
- Close the rules editor.
- Finally, add a Write File operator to the dataflow and connect to the Transform operator. Use the New Delimited Schema from Upstream Output... option to create the schema for this operator.
Now, let's run some data through the application.
- The input file includes a listing of station call signs, each on a separate line.
- The output file contains a single row for each incoming call sign, which includes the city and call sign.
If the input file contains the call signs
The output file will contain
But what happens if the input file includes a call sign, perhaps ZZZZ, for which there isn't a table entry? The lookup rule gives you control over the outcome.
- Open the Transform operator's Rule Editor and focus on the upper right-hand corner of the Lookup Expression Rule.
- The rule output paramter city could be initialized with nil (the default action).
- An error could be raised and passed to the Transform operator for a response
- The Transform operator could respond by aborting the dataflow, skipping this one record, rejecting this one record, skipping this record and all following records, or rejecting this one record and all following records.
- The rule could generate a record in which the rule output parameter city is initialized with whatever value you choose.
So much for using a unique key with the lookup. What happens when a non-unique key is used? In this case, more than one record could be returned from the lookup table, for example, New York has seven television stations. The lookup rule can handle this situation as well.
The circled symbol in the upper right-hand corner of the rule indicates that multiple matches may be found in the lookup table. In this situation, the Transform operator will emit multiple records, for example, seven records would be emitted for New York.
If you ever need to read the entire contents of a lookup table, use the Read Lookup Table operator. Just like the Write Lookup Table operator, the only information you need to configure this operator is the lookup table you want to read.
For those situations in which you need absolute control over the contents of the lookup table, you may use a Lookup Function Rule. When you use a function rule, you may write code that can directly manipulate the contents of the table, performing reads, updates, and deletes as well as inserts. Use of Lookup Function Rules and the lookup application programming interface will be discussed in another tutorial.