Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Decomposing a Field

I have a field called Lot_Code that I need to decompose into individual fields so that users can more easily search/select. I'd like a little help if determining the best way to this.

The data typically looks like this:    001JAN211701 or 001DEC011602 or 002FEB201701x

I'd like to break that into five separate fields.

1. Location:  First three digits are a production location code.

2. Month: alpha characters in position 4-6 are production month.

3. Date: Two digits in position 7 & 8 are the production date.

4. Year: Two digits in position 9 & 10 are production year.

5. Line: Any digits or characters starting at position 11 to the end of the field are the production line.

To complicate this, the data is really dirty. I only want to perform this process on rows that contain a legitimate Lot_Code in the format above. Bad data, those that don't follow this format, are to be ignored since decomposing that data would either produce errors or just wouldn't result in data that makes sense.

4 Replies
sunny_talwar

May be something along these lines

Table:

LOAD *,

  Date(MakeDate(Year, Num(Month), Day)) as Date;

LOAD *,

  Left(Field, 3) as Location,

  Month(Date#(Mid(Field, 4, 3), 'MMM')) as Month,

  Mid(Field, 7, 2) as Day,

  Year(Date#(Mid(Field, 9, 2), 'YY')) as Year,

  Mid(Field, 11) as Line;

LOAD * Inline [

Field

001JAN211701

001DEC011602

002FEB201701x

];

When you say bad data, by the means of example, how would you differentiate between bad and good data?

Anonymous
Not applicable
Author

That's pretty close. I don't think the Month and Year fields need to be actual month or years, so this is fine:

Left(Lot_Code, 3)as Location,
Mid(Lot_Code, 4, 3)as Month,
Mid(Lot_Code, 7, 2)as Day,
Mid(Lot_Code, 9, 2)as Year,
Mid(Lot_Code, 11)as Line


The bad data I was referring to are rows where Lot_Code doesn't follow the standard format. It might be null, two characters, four characters, a mix of text and numbers with dashes, etc etc. Basically, anything could be in the field but I'm only interested in the values that follow the proper format. So it seems to me I need some sort of IF test (yes, even though that will affect performance) so that the new fields only have legitimate values.

johnw
Champion III
Champion III

Maybe load your legitimate location codes, dates, and lines into temporary tables before the main load, then something like this:

if(exists([Legitimate Location],left(Lot_Code,3)),left(Lot_Code,3),'Bad') as Location,
if(exists([Legitimate Date],date#(mid(Lot_Code,4,7),'MMMDDYY')),mid(Lot_Code,4,3),'Bad') as Month,
if(exists([Legitimate Date],date#(mid(Lot_Code,4,7),'MMMDDYY')),mid(Lot_Code,7,2),'Bad') as Month,
if(exists([Legitimate Date],date#(mid(Lot_Code,4,7),'MMMDDYY')),mid(Lot_Code,9,2),'Bad') as Year,
if(exists([Legitimate Line],mid(Lot_Code,11)),mid(Lot_Code,11),'Bad') as Line

Then if you don't need them, you can drop the Legitimate tables.

ahaahaaha
Partner - Master
Partner - Master

Hi Brian,

As variant. You have five control groups.


1) 
Left(Field, 3) as Location,

2)  Month(Date#(Mid(Field, 4, 3), 'MMM')) as Month,

3)  Mid(Field, 7, 2) as Day,

4)  Year(Date#(Mid(Field, 9, 2), 'YY')) as Year,

5)  Mid(Field, 11) as Line


By the integer division of the 1st group into groups of groups 3 and 4, using functions Mod(), we verify that these are numbers. Next, we check the months and the length of the whole value to exclude the wrong format. As a result, the above code may look like this.


LOAD *,Table:

  Date(MakeDate(Year, Num(Month), Day)) as Date;

LOAD *,

  Left(Field, 3) as Location,

  Month(Date#(Mid(Field, 4, 3), 'MMM')) as Month,

  Mid(Field, 7, 2) as Day,

  Year(Date#(Mid(Field, 9, 2), 'YY')) as Year,

  Mid(Field, 11) as Line

Where Len(Field)=12          //field length is 12 characters - indirect control of the 5th group

  And Mod(Left(Field, 3), Num(Mid(Field, 7, 2)&Year(Date#(Mid(Field, 9, 2), 'YY'))))>0 //1, 3, 4 group

                                                                                                                            //is number

  And Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'jan'       //months content control (2th qroup)

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'feb'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'mar'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'apr'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'may' 

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'jun'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'jul'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'aug'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'sep'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'oct'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'nov'

  Or Lower(Month(Date#(Mid(Field, 4, 3), 'MMM'))) = 'dec';

LOAD * Inline [

Field

001JAN211701

001DEC011602

002FEB201701x

00 JAN211701

];

Regards,

Andrey