Real Software Forums
http://forums.realsoftware.com/

Convert a COMMA delimited file to a TAB delimited file
http://forums.realsoftware.com/viewtopic.php?f=21&t=43951
Page 1 of 1

Author:  DaveS [ Thu May 10, 2012 3:30 pm ]
Post subject:  Convert a COMMA delimited file to a TAB delimited file

and HONOR Double quotes at the same time!

it will determine if the file is comma or tab delmited and convert it to TAB if comma otherwise leaves it alone

you need to supply INP_F and OUT_F as folderitems

Dim t As TextInputStream
Dim i As Integer
Dim j As Integer
Dim tb As Integer
Dim cm As Integer
Dim s As String
Dim temp(-1) As String
tb=0
cm=0
t=TextInputStream.Open(inp_f)
s=t.readall
t.close
s=ReplaceLineEndings(s,EndOfLine.UNIX)
list=Split(s,EndOfLine.UNIX)
// remove blank lines and determine delimiter
If list.ubound>0 Then
For i=list.Ubound DownTo 0
s=Trim(list(i))
If s="" Then
list.remove i
Else
tb=tb+CountFields(list(i),ChrB(9))-1
cm=cm+CountFields(list(i),",")-1
End If
Next i
End If
//
// If commas out number TABS then it must be a comma delimited file
//
If cm>tb Then ' file is COMMA delimited! change it to TAB (watch out for ")
For i=0 To list.ubound
s=list(i)
If InStr(s,ChrB(34))=0 Then ' no " so do it fast
s=ReplaceAll(s,",",ChrB(9))
Else
temp=Split(s,",")
For j=temp.ubound DownTo 1
If Left(temp(j-1),1)=ChrB(34) And Right(temp(j),1)=ChrB(34) Then
temp(j-1)=Mid(temp(j-1),2)+","+Left(temp(j),Len(temp(j))-1)
temp.remove j
End If
Next j
s=Join(temp,ChrB(9))
s=ReplaceAll(s,ChrB(9)+ChrB(34),ChrB(9))
s=ReplaceAll(s,ChrB(34)+ChrB(9),ChrB(9))
s=ReplaceAll(s,ChrB(34)+ChrB(34),"'")
End If
list(i)=s
Next i
End If
//
// Write the File back out
//
Dim xxx As TextOutputStream
xxx=TextOutputStream.Create(out_f)
s=Join(list,EndOfLine.UNIX)
xxx.write s
xxx.close

Author:  NaNdummy [ Fri May 11, 2012 10:07 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

Why not use replaceall?

Author:  DaveS [ Fri May 11, 2012 10:12 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

because THIS is a valid comma delimited string


1234 , "Jones, Jim", Fred, " 1,2,3,4,5 "

a replace all would be wrong in this case as it would result in

1234 -> "Jones -> Jim" -> Fred -> " 1 -> 2 -> 3 -> 4 -> 5 "

where the correct output would be

1234 -> Jones,Jim -> Fred -> 1,2,3,4,5


and if you look close.. it DOES use a simple replaceall if there are no DOUBLE QUOTES in the string

Author:  NaNdummy [ Fri May 11, 2012 10:35 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

Oops , sry.

Author:  Bob Coleman [ Fri May 11, 2012 10:49 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

Don't feel bad. The whole point of these forums is to share knowledge and learn. :)

Author:  eduo [ Wed May 30, 2012 7:49 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

DaveS wrote:
because THIS is a valid comma delimited string


1234 , "Jones, Jim", Fred, " 1,2,3,4,5 "

a replace all would be wrong in this case as it would result in

1234 -> "Jones -> Jim" -> Fred -> " 1 -> 2 -> 3 -> 4 -> 5 "

where the correct output would be

1234 -> Jones,Jim -> Fred -> 1,2,3,4,5


and if you look close.. it DOES use a simple replaceall if there are no DOUBLE QUOTES in the string



One comment.

This would be a valid record as well:

1234,"1234",\"1234,"12,34",1234\"

Contents translate to:
1234
1234
"1234
12,34
1234"

When you find a quote, double quote or a comma you have to backpedal one position to see if it's escaped. If it is then it's a plain character and neither a delimiter nor an enclosure. Likewise a backslash is an escape character and should be translated as a backslash only if doubled.

Obviously, this only applies if you want to escape, escape with backslash and escape only certain characters otherwise take backslash literally.

Author:  ktekinay [ Wed May 30, 2012 8:14 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

eduo wrote:
When you find a quote, double quote or a comma you have to backpedal one position to see if it's escaped. If it is then it's a plain character and neither a delimiter nor an enclosure. Likewise a backslash is an escape character and should be translated as a backslash only if doubled.

Obviously, this only applies if you want to escape, escape with backslash and escape only certain characters otherwise take backslash literally.

Backpedalling would be insufficient. Suppose you had this string:

something,else\\,entirely

This is three values, the middle of which is "else\", but if you backpedal, your code would think it was two values, the second being "else\,entirely".

A better solution is to split all the characters into an array, then evaluate them each in order, skipping the ones that are appropriate to skip. You could even account for EndOfLine chars between quotes that way.

Author:  DaveS [ Wed May 30, 2012 8:31 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

Common convention is to use double double quotes to indicate a literal double quote... -OR- to use \"

However... it is also common that \" is always the sequence to escape a double quote, and that \\ escapes a literal \
with the \\ taking precedence over \"

So "test\",test" becomes test,test
and "test\\",test" becomes test\ test


neither situation is covered by the code I posted.


NOTE the use of the word "common". There ARE NO PUBLISHED "STANDARDS" for CSV... just guidelines, and it is up to each implementation to decide how or if it will handle certain situations.

To avoid these situations..... start with a TAB DELIMITED FILE

Author:  ktekinay [ Wed May 30, 2012 8:43 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

DaveS wrote:
To avoid these situations..... start with a TAB DELIMITED FILE

Best advice of the day. :-)

Author:  ktekinay [ Wed May 30, 2012 10:51 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

Here is another way to approach this. It doesn't have the cool feature of figuring out whether the string should be converted at all, but this preserves the encoding of the original string and should be pretty fast. Note that I use StrComp because it is faster than "=".

Function CSVToTab(s As String) As String
// Converts a comma-delimited string to tab-delimited.
// Assumes that "\" is an escape character and quotes should
// be ignored unless escaped.
// Values between quotes are taken in their entirety.

dim enc as TextEncoding = s.Encoding

dim tab as string = enc.Chr( 9 )
dim quote as string = """"
quote = quote.ConvertEncoding( enc )
dim comma as string = ","
comma = comma.ConvertEncoding( enc )
dim backslash as string = "\"
backslash = backslash.ConvertEncoding( enc )

dim chars() as string = s.Split( "" )
dim newChars() as string

dim inQuote as boolean
dim lastCharIndex as integer = chars.Ubound
dim i as integer
while i <= lastCharIndex
dim thisChar as string = chars( i )
dim nextChar as string
if i < lastCharIndex then nextChar = chars( i + 1 )

select case true
case StrComp( thisChar, quote, 0 ) = 0
inQuote = not inQuote
i = i + 1

case not inQuote and StrComp( thisChar, comma, 0 ) = 0
newChars.Append tab
i = i + 1

case StrComp( thisChar, backslash, 0 ) = 0
newChars.Append nextChar
i = i + 2

else
newChars.Append thisChar
i = i + 1

end select

wend

dim r as string = join( newChars, "" ).ConvertEncoding( enc )
return r

End Function

Author:  Nanoswitch [ Tue Mar 12, 2013 7:24 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

I like your code a lot as it is fast and I need to process a lot of big files from all across Europe, because not all Windows localisations use the same delimiter for csv files comming from Excel I adapted the program a little bit so I can pass the appropriate delimiter from outside.

Thanks for sharing this function it works like a champ.

Function CSVToTab(s As String,delimiter As String) As String
// Converts a separator-delimited string to tab-delimited.
// Assumes that "\" is an escape character and quotes should
// be ignored unless escaped.
// Values between quotes are taken in their entirety.
dim enc as TextEncoding = s.Encoding
dim tab as string = enc.Chr( 9 )
dim quote as string = """"
quote = quote.ConvertEncoding( enc )
dim Separator as string = delimiter
Separator = Separator.ConvertEncoding( enc )
dim backslash as string = "\"
backslash = backslash.ConvertEncoding( enc )
dim chars() as string = s.Split( "" )
dim newChars() as string
dim inQuote as boolean
dim lastCharIndex as integer = chars.Ubound
dim i as integer
while i <= lastCharIndex
dim thisChar as string = chars( i )
dim nextChar as string
if i < lastCharIndex then nextChar = chars( i + 1 )
select case true
case StrComp( thisChar, quote, 0 ) = 0
inQuote = not inQuote
i = i + 1
case not inQuote and StrComp( thisChar, Separator, 0 ) = 0
newChars.Append tab
i = i + 1
case StrComp( thisChar, backslash, 0 ) = 0
newChars.Append nextChar
i = i + 2
else
newChars.Append thisChar
i = i + 1
end select
wend
dim r as string = join( newChars, "" ).ConvertEncoding( enc )
return r
End Function

Author:  npalardy [ Tue Mar 12, 2013 10:11 am ]
Post subject:  Re: Convert a COMMA delimited file to a TAB delimited file

http://great-white-software.com/CSVParser.zip

Page 1 of 1 All times are UTC - 5 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/