Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I have a field that contains random text and I want to split the text into single words.
The text can be separated by spaces, commas, periods, etc.
I tried using SubField with a space separator but I only split the first word.
In addition, I was unable to separate more than one type of separator.
For example, I have the following sentence:
how are-you.doing today
The result I would like to have is:
how
are
you
doing
today
Thank you!
what about split a field on the model itself?
What do you want to do with the split words? Show them in a textbox? or a listbox? Is it only one selection you want to process or all of them?
-Rob
In the end I want to count the amount of words that appear in the sentence and how many times each word appeared.
It is important for me to do this in the model, because the counting should be done after the user performs several filters in the model
You can break out the words in the script and leave them linked to the sentences. Then the words will reflect selections.
CharMap:
Mapping LOAD *, ' '
Inline [
char
.
,
-
] (delimiter is '\t');
Sentences:
LOAD *, RecNo() as RecId
Inline [
Sentence
how are-you.doing today
I am fine
];
Words:
LOAD
RecId,
SubField(MapSubString('CharMap', Sentence), ' ') as Word
Resident Sentences;
-Rob
Hi,
another method to define non-letter characters for the word separation might be (included in the already proposed SubField/MapSubString solution😞
mapNonLetterToSpace:
Mapping
LOAD Chr(RecNo()), ' '
AutoGenerate 65535
Where Upper(Chr(RecNo()))=Lower(Chr(RecNo()));
tabTextLines:
LOAD RecNo() as ID, *
Inline '
TextLine
"SubField - script and chart function"
"Subfield() is used to extract substring components from a parent string field, where the original record fields consist of two or more parts separated by a delimiter."
"The Subfield() function can be used, for example, to extract first name and surname from a list of records consisting of full names, the component parts of a path name, or for extracting data from comma-separated tables."
"If you use the Subfield() function in a LOAD statement with the optional field_no parameter left out, one full record will be generated for each substring. If several fields are loaded using Subfield() the Cartesian products of all combinations are created."
"Syntax:"
"SubField(text, delimiter[, field_no ])"
"Return data type: string"
"Arguments:"
"Argument Description"
"text The original string. This can be a hard-coded text, a variable, a dollar-sign expansion, or another expression."
"delimiter A character within the input text that divides the string into component parts."
"field_no The optional third argument is an integer that specifies which of the substrings of the parent string text is to be returned. A negative value causes the substring to be extracted from the right-hand side of the string. That is, the string search is from right to left, instead of left to right, if field_no is a positive value."
';
tabWords:
LOAD Distinct * Where Len(Word);
LOAD ID,
SubField(MapSubString('mapNonLetterToSpace',TextLine),' ') as Word
Resident tabTextLines;
hope this helps
regards
Marco
one example of calculating word frequencies and counting words per text line in the front end might be:
mapNonLetterToSpace:
Mapping
LOAD Chr(RecNo()), ' '
AutoGenerate 65535
Where Upper(Chr(RecNo()))=Lower(Chr(RecNo()));
tabTextLines:
LOAD RecNo() as LineID, *
Inline '
TextLine
"SubField - script and chart function"
"Subfield() is used to extract substring components from a parent string field, where the original record fields consist of two or more parts separated by a delimiter."
"The Subfield() function can be used, for example, to extract first name and surname from a list of records consisting of full names, the component parts of a path name, or for extracting data from comma-separated tables."
"If you use the Subfield() function in a LOAD statement with the optional field_no parameter left out, one full record will be generated for each substring. If several fields are loaded using Subfield() the Cartesian products of all combinations are created."
"Syntax:"
"SubField(text, delimiter[, field_no ])"
"Return data type: string"
"Arguments:"
"Argument Description"
"text The original string. This can be a hard-coded text, a variable, a dollar-sign expansion, or another expression."
"delimiter A character within the input text that divides the string into component parts."
"field_no The optional third argument is an integer that specifies which of the substrings of the parent string text is to be returned. A negative value causes the substring to be extracted from the right-hand side of the string. That is, the string search is from right to left, instead of left to right, if field_no is a positive value."
';
tabWords:
LOAD
LineID,
Word,
AutoNumber(IterNo,'WordNo'&LineID) as WordNo,
AutoNumber(Hash128(LineID,IterNo),'WordID') as WordID,
Upper(Word) as WORD
Where Len(Word);
LOAD LineID,
IterNo() as IterNo,
SubField(MapLine,' ',IterNo()) as Word
While IterNo()<=SubStringCount(MapLine,' ')+1;
LOAD LineID,
MapSubString('mapNonLetterToSpace',TextLine) as MapLine
Resident tabTextLines;
hope this helps
regards
Marco
Thank you very much