We are dedicated to detecting and correcting corrupt or inaccurate records from a record set, table, or database. The process of data cleaning includes data auditing, workflow specification, workflow execution, post-processing, and controlling. We can use popular methods. These include parsing, data transformation, duplicate elimination, and statistical methods.
By analyzing the data using the values of mean, standard deviation, range, and clustering algorithms, we can find values that are unexpected and thus erroneous. We can examine any standardized residual greater than about 3 in absolute value, Hat element greater than 3p/n (p=k+1, k degrees of freedom), a Cook’s distance > 1, and Mahalanobis’s distance for a case. We run outlier analyses such as a run-sequence plot, a scatter plot, a histogram, and a box plot.