Please try to avoid posting multiple times the same request.
This makes it hard to follow a discussion.
If you want to replace the NULL after a JOIN, you would need to perform a subsequent RESIDENT LOAD in which you check for NULL and replace it with something meaningful:
if(Len(Trim(FIELD)), FIELD, 'Formerly known as NULL or empty') as FIELD
Or don't JOIN, use ApplyMap() instead:
I think he is asking if it's a good practice to replace NULL values with a "selectable" <NULL> String, but he is unsure if that will have any impact later in his application.
My opinion is that it all comes down to your own particular requirements. Do selectable nulls make sense o not in the context of your application?
Thank you, that's right. My question was:
"Does it generally make sense to replace all NULLs with a placeholder, say "unknown", "w/o" or "<NULL>" etc.?
I read all I could find in the community, be it "NULL and Nothing" from HIC orothers. I think that it might lead to following issues:
1) Aggregations like avg() will lead to wrong results because e.g ."0" instad of "NULL" is counted as a valid denominator.
2) When doing JOINS, rows might get duplicated, again distorting mathematical operations
3) Sometimes there are reasons for NULL which is not the same as a "missing value". In that case, interpretations may be effected as well.
So all in all, it really depends on the particular case and you have to ask questions like "What does this NULL mean?", "Why is it there"? and "Does it make sense to replace it e.g. with <NULL> as a workaround or is this the result of bad data quality in the source systems".
Please correct me or add something, but I think that is everything general one can say about it.
I think you are right that there could be no clear recommendation to replace NULL in a single field and/or all fields - it will be always depend on the particular requirements of an application if it's useful or not.
In my experience it's often useful to replace NULL within fields which are used as dimensions but not in fields which are measured within the expressions (whereby you could of course react on any replacing-value within the expressions).