Re: Consuming REST API and extracting dynamic nest... - Qlik Community

lgati · ‎2019-03-11

I am rather new to Talend, however I have a software developer background.

I have a project where I need to query Zendesk’s Rest API and store the information in AWS Redshift. So far I have pulled about 4 APIs into Redshift without much problem.

However, the object Ticket contains nested JSON elements and some elements (custom fields) are in a dynamic array of JSON objects, such as:

{
    "ticket": {
        "url": "https://{DOMAIN}.zendesk.com/api/v2/tickets/60335.json",
        "id": 99966685,
        "external_id": null,
        "via": {
            "channel": "web",
            "source": {
                "from": {},
                "to": {},
                "rel": null
            }
        },
        "created_at": "2019-02-26T18:05:35Z",
        "updated_at": "2019-03-06T18:02:13Z",
        "type": "incident",
        "subject": "Extract request",
        "raw_subject": "VExtract request",
        "description": "Long Text Here",
        "priority": "normal",
        "status": "pending",
        "recipient": null,
        "requester_id": 370109203834,
        "submitter_id": 370109203834,
        "assignee_id": 497118067,
        "organization_id": 21682736,
        "group_id": 20168737,
        "collaborator_ids": [
            497118067,
            540244256,
            13249116068
        ],
        "follower_ids": [
            497118067,
            540244256,
            13249116068
        ],
        "email_cc_ids": [],
        "forum_topic_id": null,
        "problem_id": null,
        "has_incidents": false,
        "is_public": true,
        "due_at": null,
        "tags": [
            "3___medium",
            "custom_extract",
            "custom_api",
            "extracts",
            "normal",
            "ts2",
            "wex"
        ],
        "custom_fields": [
            {
                "id": 22827487,
                "value": "custom_extract"
            },
            {
                "id": 22798946,
                "value": "extracts"
            },
            {
                "id": 21670316,
                "value": "3___medium"
            },
            ...
            {
                "id": 30046718,
                "value": null
            }
        ],
        "satisfaction_rating": {
            "score": "unoffered"
        },
        "sharing_agreement_ids": [],
        "fields": [
            {
                "id": 22827487,
                "value": "custom_extract"
            },
            {
                "id": 22798946,
                "value": "extracts"
            },
            ...
            {
                "id": 30046718,
                "value": null
            }
        ],
        "followup_ids": [],
        "ticket_form_id": 12345,
        "brand_id": 12345,
        "satisfaction_probability": null,
        "allow_channelback": false,
        "allow_attachments": true
    }
}

When I query the Ticket API I get a list of Tickets, so I already use tExtractJSONFields, using JsonPath, with a Loop JsonPath query "$.tickets[*]".

So my question is how to parse the list of JSON documents inside the element "custom_fields", so from this string.

[{"id":22827487,"value":"feature"},{"id":22798946,"value":"ap"},{"id":30046718,"value":"test"}]

to a row based approach, I also have to keep in mind, that these list is dynamic, meaning more custom fields could be later added.

id, value
22827487, "feature"
22798946,ap"
30046718,"test"

vapukov · ‎2019-03-11

Hi,

JSONPath do not support parent node, so if you need just id, value - you could use it

result will be:

.--------+--------------.
|       tLogRow_2       |
|=-------+-------------=|
|id      |value         |
|=-------+-------------=|
|22827487|custom_extract|
|22798946|extracts      |
|21670316|3___medium    |
|30046718|null          |
'--------+--------------'

but if you need parent id as well (for join the tables), better to use XPath

and result will be:

.---------+--------+--------------.
|            tLogRow_1            |
|=--------+--------+-------------=|
|ticket_id|id      |value         |
|=--------+--------+-------------=|
|99966685 |22827487|custom_extract|
|99966685 |22798946|extracts      |
|99966685 |21670316|3___medium    |
|99966685 |30046718|              |
'---------+--------+--------------'

P.S.

it is often not possible extract JSON/XML in flat structure with single component, so you need use more than 1 and join them back if need

lgati · ‎2019-03-11

@vapukovhow do I go about connecting many tExtractJson components to each other? or should I put a tMap in front of them to pass certain values to each of them?

Also, should I use tUnite, to join all these tExtractJsons?

Any help is appreciated.

vapukov · ‎2019-03-11

store extracted data - to memory hash or fils, and join them with tMap in next subjob

it in case if you want to store all in the same flat table (where you will have a lot of duplicated information)

or you could store all in normalized form, with foreign keys relations

Consuming REST API and extracting dynamic nested JSON fields

AWS

JSON

REST

Talend Data Integration

v7.x