Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have some real troubles with a XML and is leaning towards writing some XSLT and converting it to csv or something for the import.
But before that I just want to make sure that there really is no way back with QV 😃
Consider the following example:
<category id="15" desc="Rent_a_car_facility">
<name language="arabic">????? ??????</name>
<name language="czech">P?j?ovna aut</name>
<name language="danish">Biludlejningssted</name>
<name language="german">Autoverleih</name>
<name language="english">Car rental</name>
</category>
And I got like a thousand of these.
I want to extract the following:
- category id (15 above)
- "Car rental" (i.e categoryDescription)
All others are irrelevant and can be ignored.
I've tried some different concatenations and loads on loads and stuff but I think I am missing some things. Any tips or tricks?
And for the curious one.
#!/usr/bin/perl -w
#
# Converts the POI category tree to csv
#
use XML::Simple;
use Data::Dumper;
# This script will produce a intermittent format like this
#
# The final product of this script is a CSV file on the following format
# ^categoryId;categoryDescription$
#
#
#
# language to use in the description field is selected from the variable below.
# please do note that you have to use the unicode converter thingie for special characters in russian, swedish, arabic and stuff.
my $language = "english";
# path to POI category tree.
my $filename = "statData/poi_category_tree.xml";
# Below this point you really should not be editing anything 😃
#
#
#
#
#
$xml = new XML::Simple;
my $data = $xml->XMLin($filename);
foreach my $categories (keys %{$data->{categories}}) {
# now we can extract the relevant fields for this category
# we will loop the catergories and for each and every print the ID and the translation matching the language in $language
my $i = 0;
while($data->{categories}{$categories}[$i]) {
my $category = $data->{categories}{$categories}[$i];
print $category->{id}.";";
# for the translated name we must find the INDEX where language == $language and print "content"
my @arr = $category->{name};
my $n = 0;
while ($arr[0][$n]) {
if ($arr[0][$n]{language} eq lc($language)) {
print $arr[0][$n]{content}."\n";
}
#increase counter to try next translation
$n++
}
#increase counter to get next category
$i++;
}
}
Martin Bagge wrote:
I have some real troubles with a XML and is leaning towards writing some XSLT and converting it to csv or something for the import.
But before that I just want to make sure that there really is no way back with QV 😃
Consider the following example:
<category id="15" desc="Rent_a_car_facility">
<name language="arabic">????? ??????</name>
<name language="czech">P?j?ovna aut</name>
<name language="danish">Biludlejningssted</name>
<name language="german">Autoverleih</name>
<name language="english">Car rental</name>
</category>
And I got like a thousand of these.
I want to extract the following:
- category id (15 above)
- "Car rental" (i.e categoryDescription)
All others are irrelevant and can be ignored.
I've tried some different concatenations and loads on loads and stuff but I think I am missing some things. Any tips or tricks?<div></div>
Hi,
please have a look at the attached example.
Best regards
Stefan
Well.
That was far from the solution, I want the text from <name></name> and more specifically I only want the name element when language is English. However the restriction can be left out as long as I can get the connection from id to the text in <name>. (I've started writing a Perl script to transform the XML to an csv for now, I have to be done tomorrow).
And here is the Perl solution for anyone who want to know that.
#!/usr/bin/perl -w
#
# Converts the POI category tree to csv
#
use XML::Simple;
use Data::Dumper;
# language to use in the description field is selected from the variable below.
# please do note that you have to use the unicode converter thingie for special characters in russian, swedish, arabic and stuff.
my $language = "english";
# path to POI category tree.
my $filename = "statData/poi_category_tree.xml";
# Below this point you really should not be editing anything 😃
#
#
#
#
#
$xml = new XML::Simple;
my $data = $xml->XMLin($filename);
foreach my $categories (keys %{$data->{categories}}) {
# now we can extract the relevant fields for this category
# we will loop the catergories and for each and every print the ID and the translation matching the language in $language
my $i = 0;
while($data->{categories}{$categories}[$i]) {
my $category = $data->{categories}{$categories}[$i];
print $category->{id}.";";
# for the translated name we must find the INDEX where language == $language and print "content"
my @arr = $category->{name};
my $n = 0;
while ($arr[0][$n]) {
if ($arr[0][$n]{language} eq lc($language)) {
print $arr[0][$n]{content}."\n";
}
#increase counter to try next translation
$n++
}
#increase counter to get next category
$i++;
}
}
And for the curious one.
#!/usr/bin/perl -w
#
# Converts the POI category tree to csv
#
use XML::Simple;
use Data::Dumper;
# This script will produce a intermittent format like this
#
# The final product of this script is a CSV file on the following format
# ^categoryId;categoryDescription$
#
#
#
# language to use in the description field is selected from the variable below.
# please do note that you have to use the unicode converter thingie for special characters in russian, swedish, arabic and stuff.
my $language = "english";
# path to POI category tree.
my $filename = "statData/poi_category_tree.xml";
# Below this point you really should not be editing anything 😃
#
#
#
#
#
$xml = new XML::Simple;
my $data = $xml->XMLin($filename);
foreach my $categories (keys %{$data->{categories}}) {
# now we can extract the relevant fields for this category
# we will loop the catergories and for each and every print the ID and the translation matching the language in $language
my $i = 0;
while($data->{categories}{$categories}[$i]) {
my $category = $data->{categories}{$categories}[$i];
print $category->{id}.";";
# for the translated name we must find the INDEX where language == $language and print "content"
my @arr = $category->{name};
my $n = 0;
while ($arr[0][$n]) {
if ($arr[0][$n]{language} eq lc($language)) {
print $arr[0][$n]{content}."\n";
}
#increase counter to try next translation
$n++
}
#increase counter to get next category
$i++;
}
}