Tab Separated File Returns Wrong Number of Fields

So we had a CSV/TSV file with tabs separating each field but one of the fields was a description field that contained spaces. When parsing this TSV file some programs interpreted the spaces as field delimiters (TSV might be a bad idea!).

Input file similar to:

NULL NULL 7.0 Huayang Time Warner Cable 4430

The ‘Time Warner Cable’ field is all one field.

For example:

cat some_stats.csv | awk '{print NF }'

So with no specific delimiter set awk tells us that the number of fields (NF variable in awk) is 68 – thus treating both tabs and spaces as delimiters.
If we specifically tell it to use tabs as the delimiter:

cat some_stats.csv | awk -F"\t" '{print NF }'

It tells us there are 66 fields.

One of the ways around this is to convert the tab separated file, changing the tabs to something else e.g.

cat some_stats.csv | tr ' ' '!' | less

This gives us the correct number of fields (66 in this case) and preserves the spaces in the field that needs them.

Tab Separated File Returns Wrong Number of Fields

Leave a Reply

Popular

Comments

Archives