Tuesday 9 January 2007

Finding invalid values

Data might contain errors, which can be cought by simple cleaning routines. Invalid data values can include anything from simple out-of-range values to complex combinations of values that should not occur.

This sample contains a variable quantity that represents only valid values (e.g. integer values). Command syntax exlude invalid data from analysis.

[Syntax]
DATA LIST FREE /quantity.
BEGIN DATA
1 1.1 2 5 8.01 2.31 4.11 5.85
END DATA.
COMPUTE filtervar=(MOD(quantity,1)=0).
FILTER by filtervar.
SUMMARIZE
/TABLES=quantity
/FORMAT=LIST CASENUM NOTOTAL
/CELLS=COUNT.
FILTER OFF.
[/Syntax]

Compute command creates a new variable with MOD function. If quantity divided by 1 is 0, then the expression is true and filtervar will have a value of 0. For integer values, filtervar is set to 1.
*This solution filters out the entire case, including valid values for other variables in data file!


Slightly better solution is to assign invalid values to a user-missing category, which indentifies values that shold be excluded or treated in a special manner for that variable.

[Syntax]
DATA LIST FREE /quantity.
BEGIN DATA
1 1.1 2 5 8.01 2.31 4.11 5.85
END DATA.
IF (MOD(quantity,1)>0) quantity = (-9).
MISSING VALUES quantity (-9).
VALUE LABELS quantity -9 "Non-integer values".
SUMMARIZE
/TABLES=quantity
/FORMAT=LIST CASENUM NOTOTAL
/CELLS=COUNT.
[/Syntax]

3 comments:

vas 5054a said...

A wonderful article. In my life, I have never seen a man be so selfless in helping others around him to get along and get working.

moncler jackets said...

Well worth to read this article, thanks for sharing this information. With this article you offered me got a chance to know about this, anyway i say Great Article! and waiting for you next article about this interesting subject.

Buy Contact Lenses said...

I am extremely impressed along with your writing abilities, Thanks for this great share.