A look at the Data Flow Processors: Sort, Remove, and Split Fields and how they can be used to prepare your data.
If you have not gotten the chance, please read Jennell Yorkman’s posts on Data Flow Processors that she posted last week as this post is a continuation of that one. In her entry, she goes over what data processors are, how to access them, and deeper coverage on the Filter, Join and Unpivot Processors. Her blog can be found here.
In this blog post, we’re going to talk about three more processors: Sort, Remove and Split Fields.
The 'Sort' processor is a simple processor that allows users to sort numeric or textual data from your fields by ascending or descending order. Here in our college football dataset, we have our Index, Year and lower_school (short for lower case school). Our year field is sorted in ascending order, having the oldest data first, but what if we wanted it to be the most recent data first instead?
That is where the Sort processor comes in, we would connect this dataset to a Sort processor, select the field we want to sort, in this case the Year, and the direction we want it sorted, in our case Descending. With just a few clicks, our most recent data is now at the top, ready to be used.
The next processor we’ll take a look at is the 'Remove' processor. The Remove processor simply allows a user to remove unwanted fields from a dataset. Continuing in my College Football app, I have a dataset that contains three fields, Index, Year and lower_school. If I wanted to remove that Index field, I would use the Remove processor.
Here I have loaded my dataset and attached it to a Remove processor. The fields within this dataset will appear. I simply select the ‘Index’ field and hit Apply.
Just that easily, Index is removed.
Now we can look at preparing this dataset the opposite way. Going back to our Index, Year and lower_school dataset, what if we wanted not to remove Index, but keep Year and lower_school? We could simply use the ‘Select Fields’ processor.
Just like with the Remove processor, the fields within the dataset will appear once it is connected to the Select Field process. In this example, we’d select our Year and lower_school fields, because those are the fields we are going to keep. While this is a small scale example, think if you had a dataset with 1000 fields, but only wanted data from three or four of them. This would provide a much cleaner and faster load for your application.
Thank you for taking the time to read this blog entry! How do you think these data processors can help you prepare your data? Are these any of the data processors you would like to see us cover next? Drop some ideas down in the comments below!