The best way to get better quality data is to perform cleanups in the source system. Always.
Otherwise you'll build BI solutions full of exception handling code. Which is not a pretty sight and may rapidly become a nightmare to maintain.
However, sometimes you're stuck with data that cannot be changed anymore (general ledgers, transaction tables, historical exchange rates and such), especially in ERP systems. IMHO such situations require ad-hoc solutions, tailored to keep the corrective actions efficient and almost invisible.
Also, some code you add to your scripts will not necessarily correct the data, but will bring bad data to the attention of the people responsible. When they correct their mistakes, the code will automatically become "inactive" (because of no more bad values).
- Impossible dates end up somehow in transactions, causing your calendar to extend into 3014 or something: make an inventory of these dates and if there aren't too many, filter them out using a mapping table.
- Many table fields should refer to master data, but sometimes have a NULL or empty value: always capture these by putting an out-of-range value (e.g. -1) in these fields and by creating corresponding master data records.
- Some product codes are used in orders, but there aren't any corresponding definitions in the Products table: add artificial product definitions to the Products table and give them a name like 'UNDEFINED (ProductID)' so that end-users know that something was missing here.