QlikView Expressor: Managing Dataflow Parameters

    Originally published on 10-11-2011 12:20 PM
    Modified on 7-23-2012 8:09 PM, 8-9-2012 4:36 PM

     

    Parameters are the settings and values you specify in the Properties panel when you are configuring an operator within expressor Studio. Parameters include settings explicitly entered into the Properties panel as well as settings specified within other artifacts, for example, the path to a file system location encapsulated within a File Connection, the connection details in a Database Connection, and the database, database schema and table name in a Table Schema artifact.  When you develop an application, you most likely specify parameters unique to your development environment, but once you deploy your project you may need to alter some of these settings, for example, shift to a production database or use a different file system location.  In Expressor 3.4.1  three approaches to managing parameters when a dataflow runs, collectively referred to as a Substitution Approach, were introduced.

    1. Use the Write Parameters operator to alter one or more parameters during execution of a dataflow.
    2. Use command line arguments to the etask utility to alter one or more parameters during execution of a dataflow.
    3. Use the eflowsubst command line utility to prepare a substitution file that may be used to alter any parameter within a dataflow when run using the etask command line argument.

    In Expressor 3.8 a second approach, the Configuration Parameters Approach as well as additional parameter replacement command line arguments were introduced.  While either or both approaches may be used within a dataflow, the Configuration Parameters Approach is easier to use and provides more options than the Substitution Approach.

    Since there are two approaches to managing dataflow parameters, this document will employ the term 'substitution parameter' to signify a parameter set through the Substitution Approach and 'configuration parameter' to reference a parameter set through the Configuration Parameters Approach.

    There is a subtle but significant difference between substitution parameters and configuration parameters.  Substitution parameters are specific to a single property in a single operator.  The file attached to this document lists the operator properties that may be set through a substitution parameter.  Configuration parameters specify a value that may be referenced in multiple operators, file connections, database connections, and database table schemas.

    Identifying Parameters

    Since a dataflow may have multiple steps and multiple instances of the same operator, in order to uniquely identify which substitution parameter you want to set, you use a compound name comprised on the step name, operator name, and parameter name.

    "step_name@operator_name@parameter_name"

    The step_name is the name of a step within a dataflow, for example Step_1 while operator_name is the value of the Name property of an operator.  Then select the parameter_name from the table referenced in this article.  Although the quotation marks are not a required part of the syntax, it's a good practice to include them as they allow you to have space characters in the step and operator names.

    Configuration parameters are not specific to a single operator.  They are simply identified by name and this name may be referenced in any parameterizable property.  It is up to the developer to ensure that the value assigned to a configuration parameter is appropriate to its usage.

    Write Parameters Operator Approach

    To use this Substitution Approach, your dataflow must include a step that uses the Write Parameters operator to create a listing of parameters and their values.  These parameter settings will exist only during execution of this dataflow.  The schema corresponding to the structure of the parameter listing has only two attributes: name and value.  You use a Read File, Read Table, or SQL Query operator to provide content to the Write Parameters operator, although any other  operators can be positioned upstream of the Write Parameters operator if they are needed to properly isolate and format the parameter data.

    For example, consider a CSV file containing the following content, which identifies various parameters and the values to be assigned during the execution of the dataflow.

    "Step_2@Write Output File@fileName",presidents_out.txt
    "Step_2@Write Output File@quotes",noquotes
    "Step_2@Write Output File@includeHeader",false
    "Step_2@Write Output File@connection.FileConnection.Path",C:\data

    Four parameters specific to a Write File operator named Write Output File, which is on a step named Step_2, are being specified.  To read this file, you would configure a Read File operator using a schema that includes the two comma delimited string attributes name and value, and connect the Read File operator directly to the Write Parameters operator.  This two operator flow must be included as a step in the dataflow (for example, Step_1) that executes before the step named Step_2.  When the dataflow runs, these property values will replace the values specified during the design of the dataflow.  Note that the names of the dataflow step do not matter; what does matter is the order in which the steps are executed.

    Substitution File Approach

    To use this Substitution Approach, you first use the eflowsubst command line utility to create an XML file that includes the parameters for all operators in a dataflow then you edit this file changing whichever parameters are necessary.

    1. If you are using the Expressor Standard or Enterprise Edition, use the eproject command line utility to check out your project to the computer on which the expressor Data Integration Engine is installed.
    2. If you are using the Expressor Desktop Edition, find the directory containing your workspace and the dpp subdirectory under your project directory, which is under the Metadata subdirectory.
    3. Use the Start > All Programs > expressor > expressor3 > expressor command prompt menu item to open a command window.
    4. With the Standard or Enterprise Edition, move to the directory that contains the three subdirectories of the checked out project (external, modules, Project_name...Version_number...Dataflow_name). With the Desktop Edition, move to the directory that has the same name as your deployment package (this directory will contain the project's three subdirectories).
    5. Issue the eflowsubst command:
      eflowsubst -x Dataflow_name -p Project_name -V version_number
      .
      • This will create a substitution file named Project_name...Version_number...Datflow_name.rpx-sub.xml in the directory Project_name...Version_number...Dataflow_name.

    6. Using a text editor (Wordpad or Notepad++) or an XML editor, open the file Project_name...Version_number...Datflow_name.rpx-sub.xml, find the entry for the parameter(s) you want to set, and enter a new value within the <value></value> tag(s).
    7. To run the dataflow, use the etask command line utility adding the -S command line flag.  Since the substitution file is in the Project_name...Version_number...Dataflow_name directory, you only need to follow this flag with the name of the substitution file.

    The easiest way to find a parameter entry is to search the XML substitution file using the name of the parameter.  For example, to change the name of the file written by an operator named Write Output File, on Step_1 of a dataflow, search for Step_1@Write Output File@fileName.  The information within the <ParameterizedProperty></ParameterizedProperty> tags tells you everything you need to know in order to override the parameter including the current value and a listing of acceptable values if appropriate.

    If desired, you may rename the substitution file, which is appropriate if you need to create multiple substitution files to provide for the ability to run a dataflow using different parameter settings.  The substitution file must, however, remain in the directory Project_name...Version_number...Dataflow_name

    Configuration Parameter Approach

    This is the newer and preferred approach to managing parameters.  Parameter values are stored in a configuration artifact that you create from within Studio.  Each entry consists of a name, a parameter type (e.g., string, integer, encrypted string), and a value.  The process of creating a configuration artifact is described in the documentation topic Configurations and will not be repeated here.  When configuring an operator, file connection artifact, database connection artifact, or database table schema artifact, you may specify a parameter name rather than a literal value for many properties.  When the dataflow runs, the parameter names are resolved to the values included in the configuration artifact.

    Unlike the Substitution File Approach, the Configuration Parameter Approach may be employed when running a dataflow from within Studio or with the etask command line utility.  When you run from within Studio, you will be required to select the configuration artifact that will be used to resolve the parameter names.  When you execute etask, you will use the --configuration command line argument to specify the name of the configuration artifact.  Notice that you use a fully qualified name that references the project and project version for the configuration file artifact.

    etask -x Dataflow1 -p Project1 -V 0
      --configuration Project1.0.ConfigurationFileName.xml

    Command Line Arguments Approach

    To use this approach with substitution parameters, you use the -P flag to the etask command line utility to pass the parameters and their values to the dataflow.  If you need to pass multiple parameters to the dataflow, use a space character as the delimiter between entries.

    etask -x Dataflow1 -p Project1 -V 0
      -P "Step_1@Write File@fileName"=output_data.txt
         "Step_1@Write File@quotes"=noquotes
         "Step_1@Write File@includeHeader"=false

    To use this approach with configuration parameters, you use the --parameter flag to the etask command line utility to pass the parameters and their values to the dataflow.  If you need to pass multiple parameters to the dataflow, you must enter the flag for each parameter.

    etask -x Dataflow1 -p Project1 -V 0
      --parameter fileName=output_data.txt
      --parameter quotes=false
      --parameter header=false

    Precedence

    You are free to combine these approaches but when you do you must take precedence into consideration.  The command line arguments approach has the highest precedence and a parameter set through this approach will be preferentially used over identically named parameters set using the other approaches.  Substitution parameters set through the -P command line argument have higher precedence than configuration parameters set through the --parameter command line argument.  The Write Parameters operator approach has the next highest precedence, followed by the Substitution File Approach, the Configuration Parameter approach, and property settings entered directly into an operator's Properties panel have the lowest precedence.

    -P substitution parameters > --parameter configuration parameters > write parameters > substitution file > configuration file > property settings