Semiconductor manufacturing is a very complex and sophisticated process and semiconductor manufacturing data are generally huge. In order to perform knowledge discovery from these huge sets of data, data has to be reduced in dimensions by only selecting certain fields which are of value towards a particular research. Most research is geared towards data mining and less importance is generally given to stages before data mining, namely problem definition, selection addition, preprocessing and data cleaning and transformation. This is undesirable because ad-hoc approaches to standardize the data during these initial stages tend to be inaccurate, any will affect the integrity of data mining performed in later stages. This paper proposes a structured data standardization framework which effectively breaks down huge semiconductor data of high dimensions into smaller values in order to perform knowledge discovery. The framework was effectively applied on two devices as a case study and the resulting processed data was successfully used for yield mining and defect clustering purposes