Solved: Re: Deleting text between characters - Page 2 - Qlik Community

data_RN · ‎2020-03-15

Hello All,

I having been searching all over trying to find the right formula to purge text between specific characters and I cannot find the right way to do it 😕

Essentially I am displaying a text field that for some reason is pulling in a bunch of extra text between <> and I just want to purge it.

I can identify the first group that I want to remove (see the second column below), but I can't figure out how to purge it.. Note, there is an extra <> at the end of the string that I also want to remove. Can I do this within the app or is it something I should remove while loading the file? If so, how would I do that?

Thanks as always!

[Program Description]	TextBetween([Program Description],'<','>')
<div class="ExternalClassC09E7B5015894C68B16FD07CFDE66515">Goal of program is to transform the current stuff into new stuff.</div.>	div class="ExternalClassC09E7B5015894C68B16FD07CFDE66515"

QlikTom · ‎2020-03-17

Glad to hear it! Good luck on your Qlik journey.

data_RN · ‎2020-03-17

Hi Tom,

Our messages crossed. Both suggestions leave me with extraneous characters. 🤔 I will play with the TextBetween and see if I can come up with a combination that works.

Thanks for your help!

QlikTom · ‎2020-03-17

The remaining characters are HTML Entities (here is an example)
https://www.w3.org/MarkUp/html-spec/html-spec_13.html

You likely don't want to remove them, but rather translate them to what they represent (a colon, or a quotation mark, etc). With this method, you can replace them with whatever you like however.

Since creating a table of all possible html entities would likely be overkill, just add ones you find to an inline load such as the code below. An inline load just involves you typing the data in. Just press enter to create a new row. fields are separated by a comma.

[query-2]:
LOAD
    "Program Title",
    "Planisware Code",
    "Program Description",
    "Value Proposition",
    "Program Status",
    "Initial Bus Case Complete?"
FROM [lib://Desktop/Book1.xlsx]
(ooxml, embedded labels, table is [query-2]);



//load all possible HTML tags that exist in field, set replacement as ''
TagMap_Temp:
LOAD DISTINCT 
	'<' & TextBetween('<' & SubField("Program Description", '<'),'<','>') & '>' as TagMatch
    , '' as Replacement
Resident [query-2];
//concatenate hand type html entities into temp table (This would happen automatically without concatinate syntax)
Concatenate (TagMap_Temp)
LOAD 
	* 
    INLINE [
TagMatch,	Replacement
'&#58;',	':'
];

//create a mapping table of all potential tags, and their replacements from the temp table
TagMap:
MAPPING LOAD 
	* 
Resident TagMap_Temp;

//drop TagMap_Temp, as it is no longer needed
Drop Table TagMap_Temp;


//NoConcatenate prevents automatic contatination with source table
NoConcatenate
//use MapSubString Funciton to remove all possible tags
FinalResult:
load 
    "Program Title",
    "Planisware Code",
	MapSubString('TagMap',"Program Description") as "Program Description",
    "Value Proposition",
    "Program Status",
    "Initial Bus Case Complete?"
resident [query-2];

drop table [query-2];

data_RN · ‎2020-03-17

This is becoming more complicated than I thought 😕 With Tom's suggestion, I can get it to resolve in a localized file, but our source is coming from a Sharepoint List file. The load fix doesn't seem to work with our URL connection. At least from what I can see.

Is it possible add logic to what Sunny has to fix directly in the Dimension's formula?

QlikTom · ‎2020-03-18

Can you clarify what you mean by URL Connection? WebFile? REST?

The code I provided should be data source agnostic, but keep in mind, it is only an example.

The general shape of the solution is:

load table/field from any source into memory
create a mapping table of all HTML tags/entities (may take a couple steps as seen in my example) from the RESIDENT table.
Load your RESIDENT table again (don't forget the NoConcatenate prefix) use the MapSubString() function find an replace the elements in your string.
*Make sure to drop any temp/helper tables you don't need.

sunny_talwar · ‎2020-03-18

Not sure what you logic is to remove the stuff at the end of Program 1, but assuming you want to only keep upper and lower case letters, you can try this

[query-2]:
LOAD [Program Title],
	[Planisware Code],
	KeepChar(TextBetween([Program Description] & '@>@', '>', '@>@'), 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ') as [Program Description],
    [Program Description] as Temp,
	[Value Proposition],
	[Program Status],
	[Initial Bus Case Complete?]
Where Len(Trim(TextBetween([Program Description] & '@>@', '>', '@>@'))) > 0;
LOAD
	[Program Title],
	[Planisware Code],
	SubField([Program Description], '<') as [Program Description],
    [Program Description] as Temp,
	[Value Proposition],
	[Program Status],
	[Initial Bus Case Complete?]
 FROM [lib://Desktop/Book1.xlsx]
(ooxml, embedded labels, table is [query-2]);

data_RN · ‎2020-03-18

Hi Tom,

I am connecting via a URL connection to an Office 360, Sharepoint List. Something I have not done before. All my previous efforts have been on local files.

So, probably not to your surprise, my lack of knowledge was indeed hindering me 🙂 I had edited your script wrong, but with a little trial and error, I managed to connect it! And I learned a bit more about the whole mapping and reload process works. This has been a crash course exercise, so thanks to both you and Sunny for your guidance!

Now onto some report building...thank you!

Deleting text between characters

purgechar()

Qlik Sense

Qlik Sense Functions

qliksense business

textbetween