1.1 Glossary

This document uses the following terms:

composite domain: A structure that is composed of a set of domains that share the same subject area. Following are some examples of composite domains: Name: Composed of first name, middle name, and family name; Address: Composed of street, city, state, postal code, and country.

data cleansing: An information scrap-and-rework process that corrects data errors in a collection of data to bring the quality of the data to an acceptable level to meet the information customers' needs. Data cleansing is the act of detecting and correcting or removing corrupt or inaccurate records from a recordset, table, or database.

data matching: A way to compare data so that similar, but slightly different, records can be aligned. Data matching can use "fuzzy logic" to find duplicates in the data. For example, data matching often recognizes that "Bob" and "Robert" might be the same individual. Data matching might be able to detect household connections or find links between husband and wife at the same address.

data quality (DQ): The degree to which the data is suitable for use in the required business processes. The quality of data can be defined, measured, and managed through various data quality metrics, such as completeness, conformity, consistency, accuracy, and duplication. Data quality is achieved through people, technology, and processes.

Data Quality Knowledge Base (DQKB): A repository of metadata that is created by the user or by the Data Quality Services (DQS) platform to improve the quality of data. A DQKB stores all the knowledge that is related to a specific type of data source. For example, one DQKB can handle information on an organization's customer database, while another DQKB handles an employee database.

data quality project (DQ project): A means of using a DQKB to improve the quality of source data by performing data cleansing and/or data matching activities and then exporting the resultant data to a Microsoft SQL Server database or a comma-separated value (.csv) file.

Data Quality Services (DQS): A knowledge-driven solution for creating and maintaining a DQKB that is used to perform various data quality operations, such as data cleansing and data matching.

data steward: A business user, information worker, or IT professional who improves the quality of data and manages the organization's data quality processes and tasks. The data steward is responsible for improving the reusability, accessibility, and quality of the organization's data assets. The data steward's responsibilities include approving business naming standards, developing consistent data definitions, determining data aliases and derivations, documenting the business rules of the corporation, and monitoring the quality of the data.

domain: A capture of the data semantics. Example domains include email address, gender, and state.

domain value: A term that is approved by the user as a valid domain value. This term is a word or compound word that is used in a specific context.

knowledge management: The conscious and systematic facilitation of knowledge creation or development, diffusion or transfer, safeguarding, and use at the individual, team, and organizational level.

matching policy: The matching rules that are used to perform data deduplication. The matching-policy process enables matching rules to be created and fine-tuned based on matching results and profiling data. The process also adds the policy to the knowledge base.

term-based relations: A correction to a term that is part of a value in a DQS domain. A term-based relation enables multiple values that are identical except for the spelling of a common part of them to be considered as identical synonyms. For example, a term-based relation might change the term "Inc." to "Incorporated" for every occurrence of the term "Inc." in the domain. In this example, instances of "Contoso, Inc." are changed to "Contoso, Incorporated", and the two values are considered to be exact synonyms.

XML: The Extensible Markup Language, as described in [XML1.0].

XML schema definition (XSD): The World Wide Web Consortium (W3C) standard language that is used in defining XML schemas. Schemas are useful for enforcing structure and constraining the types of data that can be used validly within other XML documents. XML schema definition refers to the fully specified and currently recommended standard for use in authoring XML schemas.