Good design approach for a fact table with 1000+ c... - Qlik Community

asifraaja · ‎2024-09-05

Hi,

Background:-

Currently I have a very wide fact table with almost 800 columns and nearly 100 millions rows in Oracle database which is storing events and each event has multiple attributes. Each attribute corresponds to a column in the database. The onboarding of event and attribute is from the product side and the steps are automated and hence at dev side we don't have any control over it.

At the product team side, we have created Materialized views for each event (precomputed) for faster retrieval. Hence from one huge fact table, we have created multiple smaller MVs for each event.

Problem:-

Oracle database has a capacity of 1000 columns only. As the product team is onboarding new events and attributes, the 1000 limit of columns will be breached. I wanted to understand how should I scale this.

Few approaches:-

Approach 1: Add another fact table which gives me support for next 1000 attributes. but problem is i would need to keep on adding multiple fact table as and when the column limit is breached. Also i would need to keep track which attribute or event is present in which fact table. Also i suspect there will be few scenarios where attributes from multiple fact table need be joined for some computation.

Approach 2: Use entity-relation model to group similar attributes together into a separate table and use a foreign key relation. This way have a single fact table with foreign key relation to multiple attribute groups. Aggregation based on attribute will be complicated as the attribute is now one level deep and now I will have multiple narrow tables with millions of rows instead of just single wide table.

Approach 3: Redesign the table to store Events as fact table. But common attributes will be part of every event and there would redundancy storage. Also another problem i see is co-relating multiple events together will be complex.

Please help/suggest a better approach for the problem.

David_Friend · ‎2024-09-05

@asifraaja I've noticed that no one has responded to you yet, I plugged in your question to chatGPT, it gave some similar suggestions as you detailed above already, but I suggest you do the same and compared its answers to your ideas, you might find a better approach.

---

Scaling a database schema and managing very wide tables with a large number of columns can be challenging, especially when you hit system limits like the 1000-column restriction in Oracle. Here are some strategies to consider for scaling and optimizing your Oracle database:

### 1. **Column Segmentation**

**Partitioning Tables:**
- **Vertical Partitioning:** Split your wide fact table into multiple smaller tables where each table contains a subset of the columns. For instance, you could group related attributes together. This way, each partitioned table has fewer columns, and you can avoid hitting the 1000-column limit.
- **Horizontal Partitioning:** Consider partitioning the data rows based on certain criteria like date ranges, event types, or geographical regions. This can help in managing and querying subsets of data more efficiently.

**Materialized Views:**
- Continue to use materialized views, but ensure that the views are designed to aggregate and summarize data in a way that reduces the number of columns required. You might also need to adjust which columns are included in the materialized views based on frequently accessed attributes.

### 2. **Normalization**

**Schema Normalization:**
- **Attribute Normalization:** Consider normalizing your schema by moving some attributes into separate tables. For example, if there are attributes that are not always used together, store them in different tables. This helps in reducing the number of columns in each table and can also improve query performance by reducing the amount of data scanned.

**Entity-Attribute-Value (EAV) Model:**
- If the attributes vary greatly between different events, you might want to explore an EAV model. In this model, you have a table with columns like `event_id`, `attribute_name`, and `attribute_value`. This can handle a large number of attributes without hitting the column limit, though it can introduce complexity in querying and performance.

### 3. **Use External Tables and Data Warehousing Solutions**

**External Tables:**
- For less frequently accessed or archival data, consider using external tables or files stored in Oracle's external tables feature or other data storage solutions. This keeps your main tables within the column limit while still allowing access to larger datasets.

**Data Warehousing Solutions:**
- For very large datasets and complex queries, consider using dedicated data warehousing solutions like Oracle Exadata, Oracle Autonomous Data Warehouse, or other cloud-based solutions that can handle large-scale data more efficiently. These systems often come with optimizations for handling wide tables and large volumes of data.

### 4. **Optimize Queries and Performance**

**Indexing:**
- Ensure proper indexing on columns that are frequently queried or used in joins. Proper indexing can help improve query performance and reduce the load on your fact table.

**Compression:**
- Use Oracle's compression features to reduce the storage footprint of your tables. This can help manage large volumes of data and improve performance.

### 5. **Review Database Design and Architecture**

**Database Design Review:**
- Conduct a thorough review of your database design and schema. Engage with database architects or consultants who can provide insights into optimizing your schema and scaling your database effectively.

**Event and Attribute Management:**
- Collaborate with the product team to review the process of onboarding new events and attributes. Understanding the business needs and how attributes are used can help in designing a more efficient schema.

By applying these strategies, you can manage and scale your database effectively while adhering to the column limit constraints.

Good design approach for a fact table with 1000+ columns

Developers

dimension

Set Analysis

Visualization