Loading…

Thursday May 28, 2026 10:30 - 11:00 CEST
Limited Capacity seats available
Decathlon's Data Lake is organized into progressive layers that transform data through increasing levels of complexity to power reporting, visualization (e.g., on interactive dashboards), and eventually advanced Machine Learning & AI (e.g., product recommendation, demand forecasting, dynamic pricing, …). To achieve this, we build and maintain complex, distributed pipelines written in SQL, and we leverage Apache Spark’s engine to handle Big Data processing at scale on multi-node clusters. However, complexity comes at a cost: as we stack more and more data transformations, manually tracing the exact origin of a specific data item becomes increasingly difficult and unmanageable, creating a critical need for an automated solution. We have recently recruited a Data Engineer intern and partnered with academic experts from ENS - PSL and Université Grenoble Alpes to prototype a (Fine-Grained) Data Provenance tool compatible with Apache Spark.
The ability to track the provenance/lineage of granular data portions is critical for:
- Trust & Reliability: guaranteeing the accuracy of results for data consumers.
- Root Cause Analysis: diagnosing anomalies (e.g., aberrant turnover figures) to pinpoint the exact source of a problem.
- Impact Analysis: predicting how data updates will propagate through our versioned datasets.
- GDPR compliance: ensuring that sensitive data (PII) does not unintentionally "leak" into refined datasets.
- Testing: extracting representative subsets of data for lightweight integration tests and prototyping.
In this talk, we will present few concepts of data provenance and present where we currently stand and what we plan to build in the future.
Speakers
avatar for Ronan Fruit

Ronan Fruit

Decathlon

Thursday May 28, 2026 10:30 - 11:00 CEST
🤖 DATA/AI ARENA 135 Rue Sadi Carnot, Ronchin, France

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link