Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
LRAngela
Contributor III
Contributor III

How to split a field based on a fixed number of bytes

hi all,

i want to split a field to some fields,E.g: 

text

1如23 split to       

 1   如     23   

i want use mid()to run it but i found that  it failed。

mid(text,2,2)  it shows that       如2  .  it's wrong.it should be 如.

so,i want to split a field based on a fixed number of bytes,can you help me,thanks!

 

 

Labels (2)
1 Solution

Accepted Solutions
cwolf
Creator III
Creator III

If you ar loading from text file you can split the line in the load statement. For example:

LOAD
[@1:455] as Field1,
[@456:495] as "Transactor CertNo",
[@496:795] as Address,
[@796:n] as Rest
FROM [.\file.txt](fix, utf8, embedded labels); // file with headerline

or

FROM [.\file.txt](fix, utf8, no labels); // file without headerline


@startpos:endpos => startpos and endpos are byte positions!


Without a headerline you have to consider the encoding of the file. If it is a utf8 file with signature then the first line has a prefix of 3 bytes. In this case you have to move the first line for 3 Bytes.

View solution in original post

10 Replies
MayilVahanan

Hi

In ur sample, try like Mid('1如23',2,1) <-- it gives expected result.

However, if you want dynamic, if always first character is number, you can split the in-between letters, then try like below

=mid('1如23', 2 , FindOneOf(mid('1如23',2), 1234567890)-1)

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
LRAngela
Contributor III
Contributor III
Author

Thank you for your help, but mine is more complicated.

The first character is not sure, and the number of Chinese characters in the middle is not sure, but the start byte position and end position of Chinese are certain. So I want a function that can be intercepted according to the byte, is there a way.

eg:  the field is test , start byte position is 2   end position is  10

test

1如果还有  3456

a如果            7890

c如                  a

I have tried many methods but can't solve it, because this position is calculated by the number of bytes, not the number of characters, and mid() is intercepted by characters, so it can't be solved.

thanks  a lot

MayilVahanan

Hi

If you want to get only the Chinese characters, try with Purgechar() functionality like below

LOAD *, PurgeChar(test, '0123456789abcdefghijklmnopqrstuvwxyz') as chinesetest INLINE [
test

1如果还有 3456

a如果 7890

c如 a
];

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
LRAngela
Contributor III
Contributor III
Author

It is not just Chinese characters, the part to be intercepted is actually the address, there may be numbers, etc., so it needs to be intercepted according to the byte position

MayilVahanan

Hi

Can you say the expected output for below?

 

test

1如果还有  3456

a如果            7890

c如                  a

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
marcus_sommer

I think I would check and categorize each char and would be starting by:

load *, if(CharID >= n1 and CharID <= n2, 'Number/AlphaNumeric/Chinese/Others', ... as CharCategory;
load *, len(Text) as Len, ord(Char) as CharID;
load Text, mid(Text, iterno(), 1) as Char, recno() as RecNo, iterno() as IterNo
from Source while len(Text) <= iterno();

Instead of a nested if-loop for the categorizing you may use a mapping. In further steps you may read the first/last char from each category, count their amount and so on to classify their content and with this information you may concat() the Char's in the end again.

Especially if there is a high variety of different content you will need a systematically approach with rather less (nested) if-loops to cut and pick here and there the relevant stuff. Further depending on the data-quality you may not be able to resolve all data - **bleep** in = **bleep** out ...

- Marcus 

 

LRAngela
Contributor III
Contributor III
Author

Generally speaking, I have a text file, in which each line of data is a combination of multiple fields, in which the number of bytes is 200 to 300 for the field address, and if the length is less than 100, it will be represented by a space. I just want to intercept the adress by the number of bytes, but i dont konw the function that can help me

solve this question.

Currently my script looks like this :Mid(test, 456,40) as "Transactor CertNo",   Mid(test, 496,300) as Address。

This address is misplaced, because there are other characters in the front that occupy two bytes, so mid() cannot be used, mid() is intercepted according to the character。("Transactor CertNo" is an empty data field)

LRAngela_3-1668566283981.png

 

 

LRAngela
Contributor III
Contributor III
Author

I can't understand what you mean very well, can you give an example or explain it in detail?

thanks

cwolf
Creator III
Creator III

If you ar loading from text file you can split the line in the load statement. For example:

LOAD
[@1:455] as Field1,
[@456:495] as "Transactor CertNo",
[@496:795] as Address,
[@796:n] as Rest
FROM [.\file.txt](fix, utf8, embedded labels); // file with headerline

or

FROM [.\file.txt](fix, utf8, no labels); // file without headerline


@startpos:endpos => startpos and endpos are byte positions!


Without a headerline you have to consider the encoding of the file. If it is a utf8 file with signature then the first line has a prefix of 3 bytes. In this case you have to move the first line for 3 Bytes.