Real Software Forums
http://forums.realsoftware.com/

Sorting of Portuguese Language
http://forums.realsoftware.com/viewtopic.php?f=1&t=48029
Page 1 of 1

Author:  bg1fpx [ Fri May 31, 2013 9:36 am ]
Post subject:  Sorting of Portuguese Language

"ábaco" (first letter is á, not a) is a Portuguese word which means "abacus".

If I use array.sort() command to sort a Portuguese vocabulary, "ábaco" is placed after "zoo". It is wrong.

Please tell me how to sort alphabetically. Thanks.

Author:  JeremK [ Mon Jun 03, 2013 9:04 am ]
Post subject:  Re: Sorting of Portuguese Language

Hi,

In my apps I use the following function:

Sub SortAccentuatedArray(ByRef Data() As String)

Dim accents As String = "àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ"
Dim correct As String = "aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY"
dim temp() As String

Dim i, j As Integer
Dim uData As Integer = UBound(Data)
Dim str As String
Dim uLen As Integer
Dim Pos As Integer

For i = 0 to uData
str = Data(i)
uLen = Len(str)

For j = 0 to uLen

If asc(str.Mid(j, 1)) > 127 then
Pos = accents.instr(str.Mid(j, 1))
If Pos > 0 then
//Replace the accentuated char.
//Using Replace is faster than using ReplaceAll
str = Replace(str, accents.Mid(Pos, 1), correct.Mid(Pos, 1))

End If
End If
Next


temp.Append str

Next

temp.SortWith(Data)


End Sub


You can test it by adding the function to a Window.
Then add the following code in the Window.open event:

dim a() As String

a = Array("Zoo", "ère", "arbre", "ábaco")

SortAccentuatedArray(a)

MsgBox(Join(a, EndOfLine))


[Edit]: Updated the code to improve performance by 30% on an Array of 89000 entries.
The sort takes ~780ms for the regular Array.Sort function
And takes ~2.800ms for the SortAccentuatedArray function.

Author:  silverpie [ Mon Jun 03, 2013 11:21 am ]
Post subject:  Re: Sorting of Portuguese Language

That looks like it might fold the case as well as the accents. Not a problem really for sorting, but it could cause issues if you reuse the stripping code for something else. You could avoid that issue by using the bytewise string functions (MidB, etc.), as long as you are working with known and matching encodings.

Author:  JeremK [ Mon Jun 03, 2013 11:46 am ]
Post subject:  Re: Sorting of Portuguese Language

Could you please elaborate silverpie ?

I'm sorry I don't understand what you mean by "it might fold the case as well as the accents".

The function I wrote doesn't modify anything in the passed Array.
It only creates a new array with no accentuated characters and does the sorting from there.

Author:  timhare [ Mon Jun 03, 2013 11:55 am ]
Post subject:  Re: Sorting of Portuguese Language

The bytewise string functions may not play well with UTF8 (multibyte) data.

Page 1 of 1 All times are UTC - 5 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/