smbclient: Special Characters and UTF-8

I have a PHP web application that communicates to Windows Shares via the smbclient binary. My problem was that some characters (particularly accented characters such as é ) were not being shown on the PHP page and were even causing lines of output to be omitted. After ensuring the PHP page was outputting UTF8 I moved on to checking the output at the command-line and also comparing various shell_exec commands. This narrowed the problem to the smbclient command rather than the shell_exec command.

So I started to investigate the charset settings in Samba. Running the following command showed me the default charset settings for Samba:

testparm -v | grep "charset"

dos charset = CP850
unix charset = UTF-8
display charset = LOCALE

This was giving me my unwanted output – see the output below particularly the ‘h  h  jonny’ line (and compare this to the output in the next image) which should actually be 3 lines but each héllo is cut off at the é :

Selection_178

There is much more details on the charsets used in Samba/Windows here but I found that editing the smb.conf fileand manually setting the display charset to UTFB solved my problem. This despite my linux locale being en_US.UTF-8 :

vi /etc/samba/smb.conf

dos charset = CP850
unix charset = UTF-8
display charset = UTF-8

Selection_179and the web version:

Selection_180

 

 

 

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>