Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi
I have an excel file that has a few columns that are filled with this kind of data:
Column K: "green", "red", "yellow" [the anwers are different per row, sometimes three, sometimes none at all]
Column L: "car", "bike", "motor", "boat" (follows the same as Column K.
So.
First thing I did: split the answers. Then tried to remove the double quotes.
It works for processing one column...
q1:
LOAD
Answers1,
SubField (Answers1, ', ') AS Answers1_list,
Answers2,
Answers3,
Answers4
FROM
(ooxml, embedded labels, table is Stats);
q2:
LOAD
PurgeChar(Answers1_list,'"') AS Answers1_list_cleaned
Resident q1;
But... Now I need the same trick for Answers2 (and there are a few columns with answers more to process).
What would be the best way forward? I get wrong results in Answers1_list_cleaned whenever I split the answers in Answers2
So, I hoped that this would work, but to no joy;
LOAD
[Answers1],
PurgeChar(Answer1,'"') AS Answers_question1_clean
SubField (Answers_question1_clean, ', ') AS Answers1_cleaned_list,
[Answers question2],
PurgeChar(Answer2,'"') AS Answers_question2_clean
SubField (Answers_question2_clean, ', ') AS Answers2_cleaned_list,
[Answers question3],
PurgeChar(Answer3,'"') AS Answers_question2_clean
PurgeChar(Answer2,'"') AS Answers_question2_clean
SubField (Answers_question3_clean, ', ') AS Answers2_cleaned_list,
FROM
(ooxml, embedded labels, table is Stats);
How about this:
LOAD *,
SubField(PurgeChar([Answers question3], '"'), ', ') AS Answers3_cleaned_list;
LOAD *,
SubField(PurgeChar([Answers question2], '"'), ', ') AS Answers2_cleaned_list;
LOAD [Answers1],
SubField(PurgeChar(Answer1, '"'), ', ') AS Answers1_cleaned_list,
[Answers question2],
[Answers question3]
FROM
(ooxml, embedded labels, table is Stats);
Give this a try with few rows of data and see if this works
How about this:
LOAD
[Answers1],
SubField(PurgeChar(Answer1, '"'), ', ') AS Answers1_cleaned_list,
[Answers question2],
SubField(PurgeChar([Answers question2], '"'), ', ') AS Answers2_cleaned_list,
[Answers question3],
SubField(PurgeChar([Answers question3], '"'), ', ') AS Answers3_cleaned_list
FROM
(ooxml, embedded labels, table is Stats);
Perhaps like this:
Temp:
CrossTable(AnswerNo,Value)
LOAD
0 as Dummy,
*
FROM
(ooxml, embedded labels, table is Stats)
;
Result:
LOAD
AnswerNo,
PurgeChar(SubField(Answer,', '),'"') as AnswerValue
RESIDENT
Temp
;
DROP TABLE Temp;
Hi Sunny,
The quickest response as usual. Well done.
I have tried to implement your solution, but it will explode the results. In my testfile I have 543 rows.
When I only process Answers1, it will correctly show the numbers.
When I enable (uncomment) Answer2 and re-run the script the basic data (man, woman, unknown) will immediately go up from (381 ; 180 ; 70) to (1081 ; 379 ; 105)
How about this:
LOAD *,
SubField(PurgeChar([Answers question3], '"'), ', ') AS Answers3_cleaned_list;
LOAD *,
SubField(PurgeChar([Answers question2], '"'), ', ') AS Answers2_cleaned_list;
LOAD [Answers1],
SubField(PurgeChar(Answer1, '"'), ', ') AS Answers1_cleaned_list,
[Answers question2],
[Answers question3]
FROM
(ooxml, embedded labels, table is Stats);
Give this a try with few rows of data and see if this works
You gave me the ingredients to get it working correctly. Process the answers each in its own LOAD command and finish the remainder of the fields in the last LOAD!
Thanks SUNNY T.
Again.
LOAD
[Question 1],
SubField(PurgeChar([Question 1], '"'), ', ') AS Answers1_cleaned_list
FROM
(ooxml, embedded labels, table is Stats);
LOAD
[Question 2],
SubField(PurgeChar([Question 2], '"'), ', ') AS Answers2_cleaned_list
FROM
(ooxml, embedded labels, table is Stats);
Good afternoon, Gysbert,
I have tried to get that solution to work too, but it did not give me too much to work with. It looks too complicated for my level of knowledge. 🙂 (Boost for you).
Appreciate your effort and perhaps it will benefit another visitor of the community one day.
Alex