Solved: Proper syntax for parallel load segments from Orac... - Qlik Community

rurquhart · ‎2022-05-17

Hi folks,

New to Qlik Replicate and trying to understand proper syntax to use the parallel load (full load) feature by data segment for a date column.

Source database is Oracle and a table we're trying to fully load into Sql server 2019 has over 83 million rows. It has over 1300 partitions by date and the full load by partitions will take just over 14 hours. (Throughput records is 1635 and Throughput volume is 1605.)

I wanted to test if manually defining the date segments via "use data ranges" and having fewer plus more evenly distributed segments, still by date, would offer better performance; however, I keep getting some sort of error.

For the segment boundaries, the instructions indicate to specify the "upper limit of the range":

Segment 1: '2019-01-19 00:00:00'

Segment 2: '2019-07-01 00:00:00'

Segment 3: '2020-01-01 00:00:00'

...

Segment 7: '2025-01-01 00:00:00'

This attempt resulted in:

"Failed to init unloading table '<database>'.'<table>'
ORA-01841: (full) year must be between -4713 and +9999, and not be 0 "

&

"Stream component 'st_1_<task>' terminated
Stream component failed at subtask 1, component st_1_<task>
Error executing source loop"

Any thoughts or advice?

Thanks

Alan_Wang · ‎2022-05-17

Hi @rurquhart

Please post the data type for that column being segmented. It should work without the single quotes around the timestamp.

If the issue is solved please mark the answer with Accept as Solution.

View solution in original post

Steve_Nguyen · ‎2022-05-17

@rgurquhart

1. could you check with your source on this table , what is the column that you use for the segment, what is the datatype ?

2. could you verify the datetime from that column, exactly ?

Help users find answers! Don't forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Alan_Wang · ‎2022-05-17

Hi @rurquhart

Please post the data type for that column being segmented. It should work without the single quotes around the timestamp.

If the issue is solved please mark the answer with Accept as Solution.

rurquhart · ‎2022-05-17

Hi Steve,

Source column 'abc' has data type defined as a DATE. It's also partitioned by this column 'abc' in the source database.

Here's one snippet of the source database ddl:

NOCOMPRESS
NOPARALLEL
PARTITION BY RANGE (abc)
(
PARTITION P_20130327 VALUES LESS THAN (TO_DATE(' 2013-03-28 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
LOGGING
TABLESPACE DW_LARGE_DATA01
PCTFREE 10
INITRANS 1
STORAGE

In the Qlik app, I used column abc as the segment column.

shashi_holla · ‎2022-05-17

Looks like the issue with date format, we can use function similar to below and convert date to DD-MM-YYYY format and then test.

SELECT to_char(to_date('2019-01-19 00:00:00','RRRR-MM-DD HH24:MI:SS'), 'DD-MM-RRRR HH24:MI:SS') FROM dual

rurquhart · ‎2022-05-25

Thanks, everyone, for your suggestions. @Alan_Wang This recommendation helped get past the syntax error I would normally receive; however, I'm now getting duplicate rows loaded based only on the segment values listed as such above.

Reminder:

Segment 1: '2019-01-19 00:00:00'

Segment 2: '2019-07-01 00:00:00'

Segment 3: '2020-01-01 00:00:00'

...

Segment 7: '2025-01-01 00:00:00'

Does this section allow for logic to be applied? Example, something along these lines:

Segment Boundaries abc
1 <'2019-01-19 00:00:00'
2 abc<'2019-07-01 00:00:00' AND abc > '2018-12-31 23:59:59'
3 abc<'2020-01-01 00:00:00' AND abc > '2019-06-30 23:59:59'
4 abc<'2020-04-01 00:00:00' AND abc>'2019-12-31 23:59:59'
5 abc<'2020-07-01 00:00:00' AND abc>'2020-03-31 23:59:59'
6 abc>'2020-03-31-23:59:59'
7 !=0

Thanks for any further guidance/suggestions.

Steve_Nguyen · ‎2022-05-25

@rurquhart

the logic is some what correct, below is from my task , so that you see the logic

you can enable source_unload to trace and see the how the segment is done.

example in my task :

Line 2232: 00007976: 2022-05-25T14:56:43:735596 [SOURCE_UNLOAD ]T: Order by '', Resume Where '', Transformation Where ' ( ([MyDate1] <= '2013-09-10 10:01:04') ) ' (endpointshell.c:8433)
Line 4511: 00007956: 2022-05-25T14:56:43:860593 [SOURCE_UNLOAD ]T: Order by '', Resume Where '', Transformation Where ' (NOT (([MyDate1] <= '2013-09-10 10:01:04') )) AND ( ([MyDate1] <= '2013-09-10 10:01:05') ) ' (endpointshell.c:8433)
Line 15208: 00003040: 2022-05-25T14:56:44:39594 [SOURCE_UNLOAD ]T: Order by '', Resume Where '', Transformation Where ' (NOT (([MyDate1] <= '2013-09-10 10:01:05') )) AND ( ([MyDate1] <= '2014-09-10 10:01:06') ) ' (endpointshell.c:8433)
Line 349199: 00003248: 2022-05-25T14:56:47:276590 [SOURCE_UNLOAD ]T: Order by '', Resume Where '', Transformation Where ' (NOT (([MyDate1] <= '2014-09-10 10:01:06') )) ' (endpointshell.c:8433)

---

for duplicate, it could happen if the date is not the whole PK, Segment normally work with PK, see user guide :

https://help.qlik.com/en-US/replicate/May2022/Content/Global_Common/Content/SharedEMReplicate/Custom...

-For all endpoints, the Unique Index column is automatically selected. Select which additional columns whose data you wish to use to delineate the ranges and then click OK.

Help users find answers! Don't forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Proper syntax for parallel load segments from Oracle to Sql server

Configuration

Performance