Student: Johannes Schrott (2024)
Supervisor: a.Univ.-Prof. DI Dr. Wolfram Wöß
Content
A phrase often used for describing data quality is “fitness for use”, which highlights that the quality of data depends on the context it is viewed from. For context-dependent computation of data quality, there exist many data quality metrics that can be used to measure the quality of different data quality dimensions (i.e., aspects of data quality). These metrics can target different granularity levels of data (e.g., value-level, attribute-level, table-level). To get a higher level view on the data quality measurements of fine-granular data, they need to be aggregated.
Unfortunately, existing methods for aggregating data quality measurements, like arithmetic mean or the minimum function, are missing the context-dependent aspect initially mentioned.
Thus, the main contribution of this Master's Thesis is a new constraint-based data quality measurement and aggregation approach that incorporates domain-knowledge. The approach allows measuring data quality on different levels of data granularity and enables a context-dependent aggregation of fine-granular data quality measurements to more coarse granular results (e.g., aggregating from the value- to the record-level). The to-be-used aggregation functions are selected based on the fulfillment of the defined constraints.
To evaluate the new approach, a proof-of-concept implementation is being developed and applied onto different datasets in order to show the advantages over existing aggregation techniques.