Missing data can be very annoying to a programmer. In fact, it is so annoying that very often we'll write separate programs to clean up data and eliminate unpleasant conditions so that the main program doesn't have to deal with it. Here, I'll show some examples of the kind of problems we see.
Let's take a comman data format, a TAB delimited file. A simplistic Perl program to read such a file might be:
The Perl script works, but the shell script doesn't. Here's the output if the imput file looks like this
but the shell script messes up:
But things can be worse. For example, if we are processing what was once a report format, we may have no delimiters, just empty space. We might see something like this:
Which will produce:
Text::Parsewords module:
means that I have 1 and 3 on line 1, only 2 on line 2, and only 3 on line 3. It's actually much worse than this; there are other fields, some of which are always present and some which are not, and it is quite a challenge to normalize this stuff to be able to massage the data. The way to handle it is to do splits on / /, and then determine what we got. So it's something like this:
http://www.aplawrence.com
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com
Handling Missing Data in Inputs
0 views
Comments (0)
Please sign in to leave a comment.





No comments yet. Be the first to comment!