Tab Separated File Returns Wrong Number of Fields

So we had a CSV/TSV file with tabs separating each field but one of the fields was a description field that contained spaces. When parsing this TSV file some programs interpreted the spaces as field delimiters (TSV might be a bad idea!).

Input file similar to:

NULL NULL 7.0 Huayang Time Warner Cable 4430

The ‘Time Warner Cable’ field is all one field.

For example:

cat some_stats.csv | awk '{print NF }'

So with no specific delimiter set awk tells us that the number of fields (NF variable in awk) is 68 – thus treating both tabs and spaces as delimiters.
If we specifically tell it to use tabs as the delimiter:

cat some_stats.csv | awk -F"\t" '{print NF }'

It tells us there are 66 fields.

One of the ways around this is to convert the tab separated file, changing the tabs to something else e.g.

cat some_stats.csv | tr ' ' '!' | less

This gives us the correct number of fields (66 in this case) and preserves the spaces in the field that needs them.

 

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>