Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
data_RN
Creator
Creator

Deleting text between characters

Hello All,

I having been searching all over trying to find the right formula to purge text between specific characters and I cannot find the right way to do it 😕

Essentially I am displaying a text field that for some reason is pulling in a bunch of extra text between <> and I just want to purge it.

I can identify the first group that I want to remove (see the second column below), but I can't figure out how to purge it.. Note, there is an extra <> at the end of the string that I also want to remove.  Can I do this within the app or is it something I should remove while loading the file?  If so, how would I do that?

Thanks as always!

 

[Program Description]TextBetween([Program Description],'<','>')
<div class="ExternalClassC09E7B5015894C68B16FD07CFDE66515">​Goal of program is to transform the current stuff into new stuff.</div.>div class="ExternalClassC09E7B5015894C68B16FD07CFDE66515"

 

16 Replies
QlikTom
Employee
Employee

Glad to hear it! Good luck on your Qlik journey. 

data_RN
Creator
Creator
Author

Hi Tom,

Our messages crossed.  Both suggestions leave me with extraneous characters. 🤔 I will play with the TextBetween and see if I can come up with a combination that works.

Thanks for your help!

 

QlikTom
Employee
Employee

The remaining characters are HTML Entities (here is an example)
https://www.w3.org/MarkUp/html-spec/html-spec_13.html

You likely don't want to remove them, but rather translate them to what they represent (a colon, or a quotation mark, etc). With this method, you can replace them with whatever you like however. 

Since creating a table of all possible html entities would likely be overkill, just add ones you find to an inline load such as the code below.  An inline load just involves you typing the data in. Just press enter to create a new row. fields are separated by a comma.

 

[query-2]:
LOAD
    "Program Title",
    "Planisware Code",
    "Program Description",
    "Value Proposition",
    "Program Status",
    "Initial Bus Case Complete?"
FROM [lib://Desktop/Book1.xlsx]
(ooxml, embedded labels, table is [query-2]);



//load all possible HTML tags that exist in field, set replacement as ''
TagMap_Temp:
LOAD DISTINCT 
	'<' & TextBetween('<' & SubField("Program Description", '<'),'<','>') & '>' as TagMatch
    , '' as Replacement
Resident [query-2];
//concatenate hand type html entities into temp table (This would happen automatically without concatinate syntax)
Concatenate (TagMap_Temp)
LOAD 
	* 
    INLINE [
TagMatch,	Replacement
'&#58;',	':'
];

//create a mapping table of all potential tags, and their replacements from the temp table
TagMap:
MAPPING LOAD 
	* 
Resident TagMap_Temp;

//drop TagMap_Temp, as it is no longer needed
Drop Table TagMap_Temp;


//NoConcatenate prevents automatic contatination with source table
NoConcatenate
//use MapSubString Funciton to remove all possible tags
FinalResult:
load 
    "Program Title",
    "Planisware Code",
	MapSubString('TagMap',"Program Description") as "Program Description",
    "Value Proposition",
    "Program Status",
    "Initial Bus Case Complete?"
resident [query-2];

drop table [query-2];

 

2020-03-17_10-50-49.png

 

data_RN
Creator
Creator
Author

This is becoming more complicated than I thought 😕  With Tom's suggestion, I can get it to resolve in a localized file, but our source is coming from a Sharepoint List file. The load fix doesn't seem to work with our URL connection. At least from what I can see.

Is it possible add logic to what Sunny has to fix directly in the Dimension's formula?

QlikTom
Employee
Employee

Can you clarify what you mean by URL Connection? WebFile? REST? 

The code I provided should be data source agnostic, but keep in mind, it is only an example. 

The general shape of the solution is:

  1. load table/field from any source into memory
  2. create a mapping table of all HTML tags/entities (may take a couple steps as seen in my example) from the RESIDENT table.
  3. Load your RESIDENT table again (don't forget the NoConcatenate prefix)  use the MapSubString() function find an replace the elements in your string.
  4. *Make sure to drop any temp/helper tables you don't need. 

 

sunny_talwar

Not sure what you logic is to remove the stuff at the end of Program 1, but assuming you want to only keep upper and lower case letters, you can try this

[query-2]:
LOAD [Program Title],
	[Planisware Code],
	KeepChar(TextBetween([Program Description] & '@>@', '>', '@>@'), 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ') as [Program Description],
    [Program Description] as Temp,
	[Value Proposition],
	[Program Status],
	[Initial Bus Case Complete?]
Where Len(Trim(TextBetween([Program Description] & '@>@', '>', '@>@'))) > 0;
LOAD
	[Program Title],
	[Planisware Code],
	SubField([Program Description], '<') as [Program Description],
    [Program Description] as Temp,
	[Value Proposition],
	[Program Status],
	[Initial Bus Case Complete?]
 FROM [lib://Desktop/Book1.xlsx]
(ooxml, embedded labels, table is [query-2]);
data_RN
Creator
Creator
Author

Hi Tom,

I am connecting via a URL connection to an Office 360,  Sharepoint List. Something I have not done before.  All my previous efforts have been on local files.

So, probably not to your surprise, my lack of knowledge was indeed hindering me 🙂  I had edited your script wrong, but with a little trial and error, I managed to connect it! And I learned a bit more about the whole mapping and reload process works.  This has been a crash course exercise, so thanks to both you and Sunny for your guidance!

Now onto some report building...thank you!