Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Split fields from fields in a text line

Hi,

I have the following Problem.

Normally, the field names an data in a file look like that:

ID:          1

Date:      20.04.2018

Item1:     Content1

Item2:     Content2

ID:          2

Date:      21.04.2018

Item1:     Content1

Item2:     Content2

...

But in many cases it looks like that:

ID:          1

Date:      20.04.2018

Item1:     Content1     Item2:     Content2

ID:          2

Date:      21.04.2018

Item1:     Content3     Item2:     Content4

The result should be:

IDDateItem1Item2
120.04.2018Content1Content2
221.04.2018Content3Content4

Does anyone has an idea, how to solve the problem?

Best regards

Michael

16 Replies
sunny_talwar

Did you look at the response here?

Re: Data Transformation

Anonymous
Not applicable
Author

Yes, I've looked at it and the problem is solved. But this is an additional Problem from a different file with that slightly different structure of data.

sunny_talwar

I don't see much difference from the previous thread.... is this data coming from Excel file?

Anonymous
Not applicable
Author

No, its an exctract from a PDF. The difference is, that you have some of the fieldnames and the content of that fields are comming from one text line, whereas in the the former file each fieldname/content has its separate line.

sunny_talwar

Oh, so you want to know how to split them in two columns... try SubField for that

Table:

LOAD *,

PurgeChar(Field, ':') as NewField,

If(Field = 'ID', Data, Peek('RowNum')) as RowNum;

LOAD Trim(SubField(F1, ':', 1)) as Field,

Trim(SubField(F1, ':', 2)) as Data;

LOAD * INLINE [

F1

    ID:          1

    Date:      20.04.2018

    Item1:    Content1

    Item2:    Content2

    ID:          2

    Date:      21.04.2018

    Item1:    Content1

    Item2:    Content2

];


FinalTable:

LOAD Distinct RowNum

Resident Table;


FOR i = 1 to FieldValueCount('NewField')


LET vField = FieldValue('NewField', $(i));


Left Join (FinalTable)

LOAD RowNum,

Data as [$(vField)]

Resident Table

Where NewField = '$(vField)';


NEXT


DROP Table Table;

Capture.PNG

Anonymous
Not applicable
Author

Thank you, but the structure of the input data is a little bit different.

It's not:

    ID:          1

    Date:      20.04.2018

    Item1:    Content1

    Item2:    Content2

    ID:          2

    Date:      21.04.2018

    Item1:    Content1

    Item2:    Content2

but

    ID:          1

    Date:      20.04.2018

    Item1:    Content1 Item2:    Content2

    ID:          2

    Date:      21.04.2018

    Item1:    Content1 Item2:    Content2


The information of two fields is put in one input line.

    Item1:    Content1 Item2:    Content2

So QV is interpreting the line as:

Fieldname: Item1

Content: Content1 Item2:    Content2


So the second fieldname/content is "hidden" in the content of the first field.

sunny_talwar

oh okay, I got it now... sorry for the confusion... checkin

sunny_talwar

May be like this

Table:

LOAD *,

PurgeChar(Field, ':') as NewField,

If(Field = 'ID', Data, Peek('RowNum')) as RowNum;

LOAD Trim(SubField(F1, ':', 1)) as Field,

Trim(SubField(F1, ':', 2)) as Data

Where Len(Trim(SubField(F1, ':', 2))) > 0;

LOAD If(Index(F1, 'Item'), 'Item' & SubField(F1, 'Item'), F1) as F1;

LOAD * INLINE [

    F1

    ID:          1

    Date:      20.04.2018

    Item1:    Content1 Item2:    Content2

    ID:          2

    Date:      21.04.2018

    Item1:    Content1 Item2:    Content2

];


FinalTable:

LOAD Distinct RowNum

Resident Table;


FOR i = 1 to FieldValueCount('NewField')


LET vField = FieldValue('NewField', $(i));


Left Join (FinalTable)

LOAD RowNum,

Data as [$(vField)]

Resident Table

Where NewField = '$(vField)';


NEXT


DROP Table Table;

Anonymous
Not applicable
Author

That looks great, but what to do if the fieldnames are not nearly identical (item1, item2) but e.g. "item" and "color"