🐉 Data Dragons Quest Academy

Progress: /6

Realm Map

Choose your quest. Each path teaches a key dimensional modelling concept and includes a visit from the Shapeshifter.

⚔️ Quest 1: Grain Declaration Trial

Thornwood Trading Company needs to know their total sales for January in the North region. Below is the messy dataset they've handed you. Try to calculate the sales manually — you may be surprised by the result!

Order ID
Customer ID
Date
Region
Status

Rows highlighted in red indicate duplicate orders that will artificially inflate your totals.

To get this right, you need to decide the proper grain for the Sales fact table. Which grain makes sense?

Here's what happens with each grain choice:

  • Order grain: loses product level detail and returns total order value only.
  • Line item grain: correct – the atomic level, can roll up to orders, and analyze by product.
  • Shipment grain: duplicates order value when orders ship in multiple packages.

Line item grain is the right choice. By modelling each line item separately, you retain detail and avoid double counting shipments. You can still get order totals by summing line items.

The CEO now asks: "Who is our biggest customer?"

Here are the customer records:

Customer ID
Name
City
State

Before you can answer, you must decide whether C100, C101 and C102 are the same customer. This is the Shapeshifter's challenge: entity resolution.

Are these three rows the same real company?

Practice: The CFO wants to track shipping costs. What should be the grain of the Shipping fact table?

⏳ Quest 2: Slowly Changing Enigma

This quest explores Type 1 and Type 2 slowly changing dimensions. For brevity, this implementation summarises the key concepts; feel free to expand it further.

Customer History

Effective Date
Name
City
State
Tier
Account Manager
Contract

Question

How much revenue did Sarah Miller generate in 2023?

🏰 Quest 3: Conformed Dimension Summit

In a full version of the app you would build a conformed Region and Customer dimension here. Use the Grimoire to review entity resolution techniques.

🔥 Quest 4: Forge of Transformation

The Forge of Transformation guides you through cleansing messy source data and resolving fuzzy duplicates. Due to space, the interactive sub‑quest has been summarised; check the Grimoire for details.

⭐ Quest 5: Star Schema Smithy

This quest would normally include a drag‑and‑drop schema builder for the Starfall University case. Here, consider which dimensions you would need: Student, Course, Instructor, Date, and perhaps a Master Student identifier.

🌉 Quest 6: Bridge Table Trial

Here you would build a bridge table to correctly allocate sales across multiple authors. Beware of duplicate authors – the Shapeshifter lurks here too.

🔮 Shapeshifter's Grimoire

Entity resolution is the art of determining when two records refer to the same real‑world entity. Below are key techniques and concepts. Use this knowledge throughout your quests.

Techniques

  • Exact matching: compare natural keys directly (customer_id).
  • Normalized matching: upper‑case, trim whitespace, remove punctuation before comparing.
  • Fuzzy string matching: algorithms like Levenshtein distance or Jaro‑Winkler for spelling differences.
  • Phonetic matching: algorithms like Soundex or Metaphone for names that sound alike.
  • Domain knowledge: email domains, addresses, phone numbers help confirm matches.

Concepts

  • Match confidence scores: quantify the probability that two records refer to the same entity.
  • Threshold tuning: adjust match thresholds to balance false positives and false negatives.
  • Survivorship rules: determine which source attribute values become the “golden record.”
  • Golden record: the canonical representation of an entity after resolution.
  • Ongoing process: entity resolution is not one‑and‑done. New data may merge or split existing entities.

📚 Lexicon

A glossary of terms used in dimensional modelling and entity resolution.

Grain
The level of detail represented by a row in a fact table.
Fact Table
A table containing measurable, numeric data about a business process.
Dimension Table
A table containing descriptive attributes that give context to facts.
Slowly Changing Dimension (SCD)
A technique for tracking how dimension attributes change over time.
Conformed Dimension
A dimension that is shared across multiple fact tables to enable cross‑process analysis.
Bridge Table
A table used to resolve many‑to‑many relationships between a fact and a dimension.
Entity Resolution
The process of determining when two or more records refer to the same real‑world entity.

🏆 Achievements

Your badges for accomplishments in the academy.

  • 🏅 Grain Master: complete Quest 1.
  • 🏅 Keeper of History: complete Quest 2.
  • 🏅 Realm Unifier: complete Quest 3.
  • 🏅 Master Smith: complete Quest 4.
  • 🏅 Star Forger: complete Quest 5.
  • 🏅 Bridge Builder: complete Quest 6.
  • 🏆 Dragon Tamer: complete all quests.

⚙️ Settings

Adjust your preferences and manage your progress.