Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hi all,
i want to split a field to some fields,E.g:
text
1如23 split to
1 如 23
i want use mid()to run it but i found that it failed。
mid(text,2,2) it shows that 如2 . it's wrong.it should be 如.
so,i want to split a field based on a fixed number of bytes,can you help me,thanks!
If you ar loading from text file you can split the line in the load statement. For example:
LOAD
[@1:455] as Field1,
[@456:495] as "Transactor CertNo",
[@496:795] as Address,
[@796:n] as Rest
FROM [.\file.txt](fix, utf8, embedded labels); // file with headerline
or
FROM [.\file.txt](fix, utf8, no labels); // file without headerline
@startpos:endpos => startpos and endpos are byte positions!
Without a headerline you have to consider the encoding of the file. If it is a utf8 file with signature then the first line has a prefix of 3 bytes. In this case you have to move the first line for 3 Bytes.
Hi
In ur sample, try like Mid('1如23',2,1) <-- it gives expected result.
However, if you want dynamic, if always first character is number, you can split the in-between letters, then try like below
=mid('1如23', 2 , FindOneOf(mid('1如23',2), 1234567890)-1)
Thank you for your help, but mine is more complicated.
The first character is not sure, and the number of Chinese characters in the middle is not sure, but the start byte position and end position of Chinese are certain. So I want a function that can be intercepted according to the byte, is there a way.
eg: the field is test , start byte position is 2 end position is 10
test
1如果还有 3456
a如果 7890
c如 a
I have tried many methods but can't solve it, because this position is calculated by the number of bytes, not the number of characters, and mid() is intercepted by characters, so it can't be solved.
thanks a lot
Hi
If you want to get only the Chinese characters, try with Purgechar() functionality like below
LOAD *, PurgeChar(test, '0123456789abcdefghijklmnopqrstuvwxyz') as chinesetest INLINE [
test
1如果还有 3456
a如果 7890
c如 a
];
It is not just Chinese characters, the part to be intercepted is actually the address, there may be numbers, etc., so it needs to be intercepted according to the byte position
Hi
Can you say the expected output for below?
test
1如果还有 3456
a如果 7890
c如 a
I think I would check and categorize each char and would be starting by:
load *, if(CharID >= n1 and CharID <= n2, 'Number/AlphaNumeric/Chinese/Others', ... as CharCategory;
load *, len(Text) as Len, ord(Char) as CharID;
load Text, mid(Text, iterno(), 1) as Char, recno() as RecNo, iterno() as IterNo
from Source while len(Text) <= iterno();
Instead of a nested if-loop for the categorizing you may use a mapping. In further steps you may read the first/last char from each category, count their amount and so on to classify their content and with this information you may concat() the Char's in the end again.
Especially if there is a high variety of different content you will need a systematically approach with rather less (nested) if-loops to cut and pick here and there the relevant stuff. Further depending on the data-quality you may not be able to resolve all data - **bleep** in = **bleep** out ...
- Marcus
Generally speaking, I have a text file, in which each line of data is a combination of multiple fields, in which the number of bytes is 200 to 300 for the field address, and if the length is less than 100, it will be represented by a space. I just want to intercept the adress by the number of bytes, but i dont konw the function that can help me
solve this question.
Currently my script looks like this :Mid(test, 456,40) as "Transactor CertNo", Mid(test, 496,300) as Address。
This address is misplaced, because there are other characters in the front that occupy two bytes, so mid() cannot be used, mid() is intercepted according to the character。("Transactor CertNo" is an empty data field)
I can't understand what you mean very well, can you give an example or explain it in detail?
thanks
If you ar loading from text file you can split the line in the load statement. For example:
LOAD
[@1:455] as Field1,
[@456:495] as "Transactor CertNo",
[@496:795] as Address,
[@796:n] as Rest
FROM [.\file.txt](fix, utf8, embedded labels); // file with headerline
or
FROM [.\file.txt](fix, utf8, no labels); // file without headerline
@startpos:endpos => startpos and endpos are byte positions!
Without a headerline you have to consider the encoding of the file. If it is a utf8 file with signature then the first line has a prefix of 3 bytes. In this case you have to move the first line for 3 Bytes.