Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
lgati
Contributor II
Contributor II

Consuming REST API and extracting dynamic nested JSON fields

I am rather new to Talend, however I have a software developer background.

I have a project where I need to query Zendesk’s Rest API and store the information in AWS Redshift. So far I have pulled about 4 APIs into Redshift without much problem.

However, the object Ticket contains nested JSON elements and some elements (custom fields) are in a dynamic array of JSON objects, such as:

{
    "ticket": {
        "url": "https://{DOMAIN}.zendesk.com/api/v2/tickets/60335.json",
        "id": 99966685,
        "external_id": null,
        "via": {
            "channel": "web",
            "source": {
                "from": {},
                "to": {},
                "rel": null
            }
        },
        "created_at": "2019-02-26T18:05:35Z",
        "updated_at": "2019-03-06T18:02:13Z",
        "type": "incident",
        "subject": "Extract request",
        "raw_subject": "VExtract request",
        "description": "Long Text Here",
        "priority": "normal",
        "status": "pending",
        "recipient": null,
        "requester_id": 370109203834,
        "submitter_id": 370109203834,
        "assignee_id": 497118067,
        "organization_id": 21682736,
        "group_id": 20168737,
        "collaborator_ids": [
            497118067,
            540244256,
            13249116068
        ],
        "follower_ids": [
            497118067,
            540244256,
            13249116068
        ],
        "email_cc_ids": [],
        "forum_topic_id": null,
        "problem_id": null,
        "has_incidents": false,
        "is_public": true,
        "due_at": null,
        "tags": [
            "3___medium",
            "custom_extract",
            "custom_api",
            "extracts",
            "normal",
            "ts2",
            "wex"
        ],
        "custom_fields": [
            {
                "id": 22827487,
                "value": "custom_extract"
            },
            {
                "id": 22798946,
                "value": "extracts"
            },
            {
                "id": 21670316,
                "value": "3___medium"
            },
            ...
            {
                "id": 30046718,
                "value": null
            }
        ],
        "satisfaction_rating": {
            "score": "unoffered"
        },
        "sharing_agreement_ids": [],
        "fields": [
            {
                "id": 22827487,
                "value": "custom_extract"
            },
            {
                "id": 22798946,
                "value": "extracts"
            },
            ...
            {
                "id": 30046718,
                "value": null
            }
        ],
        "followup_ids": [],
        "ticket_form_id": 12345,
        "brand_id": 12345,
        "satisfaction_probability": null,
        "allow_channelback": false,
        "allow_attachments": true
    }
}

When I query the Ticket API I get a list of Tickets, so I already use tExtractJSONFields, using JsonPath, with a Loop JsonPath query "$.tickets[*]".

 

So my question is how to parse the list of JSON documents inside the element "custom_fields", so from this string.

[{"id":22827487,"value":"feature"},{"id":22798946,"value":"ap"},{"id":30046718,"value":"test"}]

to a row based approach, I also have to keep in mind, that these list is dynamic, meaning more custom fields could be later added.

id, value
22827487, "feature"
22798946,ap"
30046718,"test"

 

 

Labels (5)
3 Replies
vapukov
Master II
Master II

Hi,

 

JSONPath do not support parent node, so if you need just id, value - you could use it

0683p000009M3Mm.png

 

result will be:

.--------+--------------.
|       tLogRow_2       |
|=-------+-------------=|
|id      |value         |
|=-------+-------------=|
|22827487|custom_extract|
|22798946|extracts      |
|21670316|3___medium    |
|30046718|null          |
'--------+--------------'

but if you need parent id as well (for join the tables), better to use XPath

0683p000009M3Mr.png

 

 

and result will be:

.---------+--------+--------------.
|            tLogRow_1            |
|=--------+--------+-------------=|
|ticket_id|id      |value         |
|=--------+--------+-------------=|
|99966685 |22827487|custom_extract|
|99966685 |22798946|extracts      |
|99966685 |21670316|3___medium    |
|99966685 |30046718|              |
'---------+--------+--------------'

 

 

P.S.

it is often not possible extract JSON/XML in flat structure with single component, so you need use more than 1 and join them back if need

lgati
Contributor II
Contributor II
Author

@vapukovhow do I go about connecting many tExtractJson components to each other? or should I put a tMap in front of them to pass certain values to each of them?

Also, should I use tUnite, to join all these tExtractJsons?

Any help is appreciated.

vapukov
Master II
Master II

store extracted data - to memory hash or fils, and join them with tMap in next subjob

 

it in case if you want to store all in the same flat table (where you will have a lot of duplicated information)

or you could store all in normalized form, with foreign keys relations