Tab Separated File Returns Wrong Number of Fields

So we had a CSV/TSV file with tabs separating each field but one of the fields was a description field that contained spaces. When parsing this TSV file some programs interpreted the spaces as field delimiters (TSV might be a bad idea!).

Input file similar to:

The ‘Time Warner Cable’ field is all one field.

For example:

So with no specific delimiter set awk tells us that the number of fields (NF variable in awk) is 68 – thus treating both tabs and spaces as delimiters.
If we specifically tell it to use tabs as the delimiter:

It tells us there are 66 fields.

One of the ways around this is to convert the tab separated file, changing the tabs to something else e.g.

This gives us the correct number of fields (66 in this case) and preserves the spaces in the field that needs them.


Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">