Visualization and Management of Data Quality Measurements
Student: Sheny Illescas Martinez (2020)
Supervisor: a.Univ.-Prof. DI Dr. Wolfram Wöß
Co-Supervisor: DI Lisa Ehrlinger, BSc
Motivation and Challenges
Data quality measurement is an essential task in enterprises and organizations to continuously monitor and ensure the quality of query results and decisions that are based on these results. In larger organizations, data is usually stored in different heterogeneous information systems. Thus, data quality measurement can be carried out on one single information system, or in an integrated information system.
In a current project at our institute, we developed a system to analyze different information sources (e.g., relational databases, CSV files, ontologies, …) and calculate data quality metrics on different aggregation levels, e.g., attribute-level, concept-level, database-level, or for a complete integrated system. The results of these calculations are currently annotated to the analyzed elements and are available in a tree structure that grows with the complexity of the observed system. Additional information is stored in form of lists, e.g., specific tuples that violate a quality constraint.
To establish data quality measurement supported by this tool in practice, a clearly arranged and flexible manageable user interface is required.
Objective
In the course of this master's thesis a concept should be developed that enables a user to browse through an arbitrary complex tree structure of the calculated measurement results. Fine-grained information should be hidden in a general overview, but exposed when selecting sub-trees. In addition to the pure browsing functionality, the user should have the possibility to manage data quality rules (i.e., constraints) and trigger new quality calculations via the interface. The data quality measurement tool is currently implemented in Java and should be extended by a user interface that communicates interactively with the backend system (preferably a web interface).
The major requirements of this work are the clear presentation and intuitive use, also for non-technical users and the analysis of complex integrated information systems. In addition, an appealing design would be desirable.