Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi! Please let me know the formula to get the string pattern of each field .
What you exactly want?
Can you provide some sample data ?
It's part of the data profiling project Im working on. I want to determine the typical string pattern of each column. So that after getting the typical pattern, I can compare and check each value as valid or not dependending on the pattern defined for that column.
Ex. Product_detail_code, Modified_date, etc..
PURCHASE_DATE | PRODUCT_DETAIL_CODE |
14-Nov-07 | Others |
Dec 2010 | 10128738 |
26-Feb-08 | 11197529 |
14-Dec-07 | 11438953 |
18-Nov-05 | 11425786 |
11-Sep-13 | 11208766 |
30-Aug-13 | 11209023 |
2-Sep-13 | 11208856 |
23-Oct-13 | 11208959 |
10-Sep-13 | 11209074 |
12-Feb-13 | 11208907 |
8-Nov-13 | 11198348 |
Nov 2010 | 43005270 |
9-Sep-13 | 11170032 |
Something Like Below?
Load
RowNo() as NO,
PRODUCT_DETAIL_CODE,
PURCHASE_DATE as Original_PURCHASE_DATE,
Date(Alt(
Date#(PURCHASE_DATE,'DD-MMM-YY'),
Date#(PURCHASE_DATE,'MMM YYYY'),
Date#(PURCHASE_DATE,'YYYYMMDD'),
Date#(PURCHASE_DATE,'D-MM-YY'),
),'DD-MM-YYYY') as Modified_PURCHASE_DATE
Inline
[
PURCHASE_DATE, PRODUCT_DETAIL_CODE
14-Nov-07, Others
Dec 2010, 10128738
26-Feb-08, 11197529
14-Dec-07, 11438953
18-Nov-05, 11425786
11-Sep-13, 11208766
30-Aug-13, 11209023
2-Sep-13, 11208856
23-Oct-13, 11208959
10-Sep-13, 11209074
12-Feb-13, 11208907
8-Nov-13, 11198348
Nov 2010, 43005270
9-Sep-13, 11170032
];
Thanks Manish for quick response but thats not what I want. I want to determine the typical pattern of each field
Step 1 : Determine the typical pattern of the column. What is that formula to get the pattern
Output
formulaXYZ(PURCHASE_DATE ) ==> "dd-mm-yy"
formulaXYZ(Product_detail_code) ==> "NNNNNNNN" where n = 0 to 9 , 8 digits
Step 2: Validate each values in that column follows the right pattern
Purchase_date Remarks
Dec 2010 ==> Invalid Format
14-Nov-07 ==> Valid
Product_Detail_Code
Others ==> Invalid
10128738 ==> Valid (8 digits)
Got it...
Sorry but I dont have idea.. May be someone can suggest...
Hi,
Hope it helps you,
LOAD *,if(IsNum(PRODUCT_DETAIL_CODE),'Valid','InValid') as ProdRemark,
If(Alt(Date#(PURCHASE_DATE,'DD-MMM-YY'),0)=0,'InValid','Valid') as PurDateRemark;
LOAD * INLINE [
PURCHASE_DATE, PRODUCT_DETAIL_CODE
14-Nov-07, Others
Dec 2010, 10128738
26-Feb-08, 11197529
14-Dec-07, 11438953
18-Nov-05, 11425786
11-Sep-13, 11208766
30-Aug-13, 11209023
2-Sep-13, 11208856
23-Oct-13, 11208959
10-Sep-13, 11209074
12-Feb-13, 11208907
8-Nov-13, 11198348
Nov 2010, 43005270
9-Sep-13, 11170032
];
If you want to check for 8 digits, use this calc if(IsNum(PRODUCT_DETAIL_CODE) and Len(PRODUCT_DETAIL_CODE)<=8,'Valid','InValid') as ProdRemark
Thanks,
Jagan
Thanks Jagan. Any suggestions how to get the typical pattern each column?
Rhodora - What do you mean by typical pattern each column ?
Hi Rhodora,
you are asking for a very complex function that most likely does not exist "off the shelve" in QlikView.
If you realy intend to implement such a thing (and you would surely be welcome to post the result) then using regular expressions might be the most promising approach.
Here are some links on using RegEx in QlikView:
hope this helps
regards
Marco