Solved: Re: Qlik Replicate: Segment boundary at parallel l... - Qlik Community

HeleneExner · ‎2022-12-05

Dear Community,

I am trying to load a very large table in parallel using segmentation on the attribute with the data type TIMESTAMP.
The table contains 20 years of data.
These DB settings for time look like this:

nls_time_format HH.MI.SSXFF AM
nls_timestamp_format DD-MON-RR HH.MI.SSXFF AM
nls_time_tz_format HH.MI.SSXFF AM TZR
nls_timestamp_tz_format DD-MON-RR HH.MI.SSXFF AM TZR

The segment boundaries set to:

Segment Boundaries CREATED
1 "CREATED >=to_date('01.01.2003 00:00:00','dd.mm.yyyy hh24:mi:ss') and CREATED < to_date('01.01.2008 00:00:00','dd.mm.yyyy hh24:mi:ss')"
2 "CREATED >=to_date('01.01.2008 00:00:00','dd.mm.yyyy hh24:mi:ss') and CREATED < to_date('01.01.2013 00:00:00','dd.mm.yyyy hh24:mi:ss')"
3 "CREATED >=to_date('01.01.2013 00:00:00','dd.mm.yyyy hh24:mi:ss') and CREATED < to_date('01.01.2018 00:00:00','dd.mm.yyyy hh24:mi:ss')"
4 "CREATED >=to_date('01.01.2018 00:00:00','dd.mm.yyyy hh24:mi:ss')"

The replication immediately breaks with the error:

Failed to init unloading table 'SYXXX'.'BIGTABLE'
ORA-01841: (full) year must be between -4713 and +9999, and not be 0

Is the formatting of the TIMESTAMP wrong? If yes, how is it correct?

Please hlp!

Many thanks in advance and

Beste Regards,

Helene

IanM · ‎2022-12-14

From my experience, one thing to watch out for that has caught me previously:

If the data is static or only being added to, you should be fine, but if the records are updatable, and the timestamp is reset during that update, it is possible for your parallel 'full copy' to pick up the same record twice (or possibly missed entirely).

I had 8 parallel segments, but they did not begin simultaneously. During the period between the first segment starting and the last segment starting, a record was updated and it appeared in both segments (causing an index failure at the destination luckily).

In your scenario it would be the equivalent of the timestamp updating from 2020 to 2022 in the period you started the copy.

It makes sense to quiesce the database if at all possible.

View solution in original post

Dineshan · ‎2022-12-05

Hello @HeleneExner ,

I would recommend opening a case for this issue. Please Set SOURCE_UNLOAD and TARGET_LOAD to TRACE, run the task until failure, then upload the logs and diagnostic package to the case for investigation.

Thank you,

Dinesh

shashi_holla · ‎2022-12-12

Hi @HeleneExner

The error is on the Oracle DB and not from Replicate. Please try to query the table from the DB using the same filter condition and it would fail with the same error.

Please check the value for NLS_DATE_FORMAT in the Database and use the same format for conversion. Try the filter condition in the Database first and once it succeeds apply it through Replicate.

Thank you,

Heinvandenheuvel · ‎2022-12-12

Well, the error comes from Oracle, so that's where the focus should be . When you ask for LOGGING VERBOSE (Edit: TRACE is eough to show Oracle SQL used) on SOURCE UNLOAD does it give you a full Oracle Statement to try in SQLplus or SQldelveopper?

Testing the to_date as provided in SQLplus and/or SQLdeveloper works, but shows UPPERCASE format string.

Best I know the format string is mostly case-insensitive, but the Oracle Documentation shows UPPERCASE, so why not use that? Admittedly in many places there is an automatic conversion to uppercase but why not follow the suggested syntax. Try again with uppercase date format string and let us know if it happens to make a difference? But mostly try to capture the full query going to Oracle.

Hein.

Heinvandenheuvel · ‎2022-12-13

Helene, can you provide some more details? Specifically a JSON snippet, and more complete LOG output, and the 'CREATED' datatype on the source?

I just did a very coarse test with ranges on a Timestamp column TS. The most simple form appears to work just fine.

This result in a JSON looking like:

"source_table_settings":	{
	"unload_segments":	{
		"segments_type":	"RANGES",
		"ranges":	{
			"column_names":	["TS"],
			"split_points":	[{
					"values":	["2020-01-01"]
				}, {
					"values":	["2021-01-01"]
				}, {
					"values":	["2022-01-01"]
				}]
		},

Using LOGGING set to TRACE for SOURCE_UNLOAD we can see the 4 streams.

They all 4 start with: 

 Select statement for UNLOAD is 'SELECT "ID","TS",<columnlist>  FROM "ATT_USER"."TEST" 

The 4 WHERE CLAUSEs predicatably looks like:

 WHERE  ( ("TS" <= '2020-01-01') )
 WHERE  (NOT (("TS" <= '2020-01-01') ))  AND  ( ("TS" <= '2021-01-01') )
 WHERE  (NOT (("TS" <= '2021-01-01') ))  AND  ( ("TS" <= '2022-01-01') )
 WHERE  (NOT (("TS" <= '2022-01-01') ))

Those all return the appropriate row counts to be loaded. No errors.

Now I used a TIMESTAMP for which the defaults work: NLS_TIMESTAMP_FORMAT = 'YYYY-MM-DD HH:MI:SS.FF'

So what does your column look like?

How did you provide the boundaries?

Why would you provide it in a National format (DD.MM.YYYY) and not just use 'computer language' - National language is often a requirement for end users, but you as a Replicate 'programmer' can surely just use what the systems likes best can you not?

BTW with SOURCE_UNLOAD LOGGING set to VERBOSE one sees a lot of silliness imho. Every unload thread repeats the same non-variant variant initial queries like supplemental_log_data_min and whether access to dba_objects is granted. Bah. Each load stream for a segment for a table will ask how many partitions there are on the source for that table and so on. That all finishes in a second or a few seconds, but it still bothers me.

hth,

Hein.

IanM · ‎2022-12-14

From my experience, one thing to watch out for that has caught me previously:

If the data is static or only being added to, you should be fine, but if the records are updatable, and the timestamp is reset during that update, it is possible for your parallel 'full copy' to pick up the same record twice (or possibly missed entirely).

I had 8 parallel segments, but they did not begin simultaneously. During the period between the first segment starting and the last segment starting, a record was updated and it appeared in both segments (causing an index failure at the destination luckily).

In your scenario it would be the equivalent of the timestamp updating from 2020 to 2022 in the period you started the copy.

It makes sense to quiesce the database if at all possible.

Qlik Replicate: Segment boundary at parallel load

Best Practices

Errors - Unexpected Behavior

Performance