Real Software Forums

The forum for Real Studio and other Real Software products.
[ REAL Software Website | Board Index ]
It is currently Tue Dec 12, 2017 3:34 am
xojo

All times are UTC - 5 hours




Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2
Author Message
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 6:56 am 
Offline
User avatar

Joined: Mon Feb 05, 2007 5:21 pm
Posts: 600
Location: New York, NY
We can help with a regular expression, but keep in mind that a regex will be slower, not faster, than InStr. Your best bet here is probably CountFields:
#pragma DisableBackgroundTasks

dim f As FolderItem
dim t as TextInputStream
dim count as integer
dim length As integer
dim start,stop as double
dim temp As string

f = GetFolderItem("test.txt")

if f.Exists then

t = TextInputStream.Open(f)

if t <> nil then

temp = t.ReadAll()
length = len(temp)
t.Close

start = microseconds

count = temp.CountFields( "TRGL" ) - 1

stop = microseconds

msgbox str(count) + " items found" + EndOfLine + _
"Elapsed time: " + str((stop - start)/1000000) + " seconds"

end if
end if

This assumes you want to count any variation of "TRGL", like "Trgl", "trgl", "TRgl", etc.

By the way, I think your code was counting the time to divide the microseconds, so I changed it.

_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 7:39 am 
Offline

Joined: Fri Sep 30, 2005 10:01 am
Posts: 283
Location: Germany, Munich
ktekinay wrote:
We can help with a regular expression, but keep in mind that a regex will be slower, not faster, than InStr.

I disagree - at least in theory regex should use the advanced algos (forgot the name) that allow it skipping bytes if the search term is longer. Worth a test.

_________________
User of RB since first version. Provider of many free and outdated plugins.
Code for sharing: http://www.tempel.org/RB/Resources
Arbed, a unique tool for editing projects: http://www.tempel.org/Arbed
Zip compression classes: http://www.tempel.org/RB/ZipPackage


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 8:22 am 
Offline

Joined: Wed Mar 22, 2006 11:15 am
Posts: 712
Location: Southern California
ktekinay wrote:
We can help with a regular expression, but keep in mind that a regex will be slower, not faster, than InStr. Your best bet here is probably CountFields:


CountFields is a good suggestion. RegEx is faster than InStr, though the time varies greatly depending on the encoding. Both vary depending on the OS which surprised me. I tested again and realized that earlier I had left the ASCII encoding in place when I tested RegEx.

Testing with RS2012r2.1 on a MacBook 2.7 GHz i7...

ASCII
InStr: 39s
InStrB: 0.003s
RegEx: 0.027s
CountFields: 0.054s

UTF8
InStr: 39s
InStrB: 0.003s
RegEx: 8.93s
CountFields: 0.055s

Testing in a Windows 7 VM on the same machine produced two surprises...
InStr (ASCII and UTF8): 19s
RegEx (UTF8): 36.4s

The other times were comparable.

I wonder what's going on with RegEx and UTF8, especially on Windows. I've worked with the PCRE Library directly in the past and the UTF8 results should be nearly identical to the ASCII results. Testing the compiled app that uses PCRE I can't provide a Microsecond time, but the count is returned instantly.

I also wonder why InStr performance is 2x faster on Windows.

_________________
Daniel L. Taylor
Custom Controls for Real Studio WE!
Visit: http://www.webcustomcontrols.com/


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 1:51 pm 
Offline
User avatar

Joined: Mon Feb 05, 2007 5:21 pm
Posts: 600
Location: New York, NY
tempel wrote:
I disagree - at least in theory regex should use the advanced algos (forgot the name) that allow it skipping bytes if the search term is longer. Worth a test.

As you might imagine, I've tested regular expressions extensively, and will again for my presentation at Real World.

I didn't mean to imply that any implementation of regular expressions are slower than InStr, just that Real's implementation is, and is slower than other implementations of PCRE too. Of course we're talking about an equivalent search, i.e., matching, single tokens with no wildcards or repeaters in a case-insensitive search. If anyone can find an example of where such a search is faster with RegEx than InStr, I'd like to know.

Where it might be faster is in a loop like the OP used. In that case, InStr gets a starting position in characters whereas RegEx, like InStrB, uses bytes, but that's why Split or CountFields is a better choice for that purpose.

_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 4:10 pm 
Offline

Joined: Tue Oct 04, 2005 10:55 am
Posts: 43
Location: Fort Myers, FL
ktekinay wrote:
I didn't mean to imply that any implementation of regular expressions are slower than InStr, just that Real's implementation is, and is slower than other implementations of PCRE too. Of course we're talking about an equivalent search, i.e., matching, single tokens with no wildcards or repeaters in a case-insensitive search. If anyone can find an example of where such a search is faster with RegEx than InStr, I'd like to know.

Where it might be faster is in a loop like the OP used. In that case, InStr gets a starting position in characters whereas RegEx, like InStrB, uses bytes, but that's why Split or CountFields is a better choice for that purpose.

Well, in Taylor's testing, just above your post, RegEx is more than a thousand times faster than InStr. Might be faster indeed.


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 4:24 pm 
Offline
User avatar

Joined: Mon Feb 05, 2007 5:21 pm
Posts: 600
Location: New York, NY
Yeah, I should have used "most likely" instead of "might". I'm very sleepy today. :-)

_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 5:13 pm 
Offline

Joined: Wed Mar 22, 2006 11:15 am
Posts: 712
Location: Southern California
Out of curiosity I tried adding a unique string to the end of the 1.2 MB file, and then timing InStr, InStrB, and RegEx in finding it once. All times in Microseconds:

ASCII
InStr: 52,551
InStrB: 3860
RegEx: 1802

UTF8
InStr: 55,636
InStrB: 2295
RegEx: 2492

I didn't average runs. I did note that runs could vary by +/- 500 Microseconds.

Strange...I would have assumed, as Kem did, that InStr would be faster in a non-looping test starting from the first character. I also think it's weird that for a single pass encoding doesn't matter in RegEx. But for the looping code, it matters a lot.

_________________
Daniel L. Taylor
Custom Controls for Real Studio WE!
Visit: http://www.webcustomcontrols.com/


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 5:28 pm 
Offline
User avatar

Joined: Mon Feb 05, 2007 5:21 pm
Posts: 600
Location: New York, NY
Can you post a link to your test project, or at least the code? I'd like to take a look.

_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 6:06 pm 
Offline

Joined: Wed Mar 22, 2006 11:15 am
Posts: 712
Location: Southern California
ktekinay wrote:
Can you post a link to your test project, or at least the code? I'd like to take a look.


Absolutely: http://webcustomcontrols.com/freeware/instr_vs.zip

I included the data file with the appended text for the Single search.

I should note, after a few more runs, that in the single search the variance is larger then I thought at first. InStrB is just as likely to be faster then RegEx as not. The values are small in any case (1,500-5,000 Microseconds).

_________________
Daniel L. Taylor
Custom Controls for Real Studio WE!
Visit: http://www.webcustomcontrols.com/


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 6:25 pm 
Offline
User avatar

Joined: Mon Feb 05, 2007 5:21 pm
Posts: 600
Location: New York, NY
Wow, I can't tell you how many times I've tested this with exactly the opposite results, but I'm able to reproduce your findings here too. RegEx is faster even for a straight token search, and I even wrote my own, different, test before you posted the link to yours.

I wonder if I'm misremembering what I tested, or if something changed?

No matter, I stand corrected, and its reinforced one of the principles I plan to talk about: Never make assumptions, just test, test, test. :-)

_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 6:43 pm 
Offline

Joined: Wed Mar 22, 2006 11:15 am
Posts: 712
Location: Southern California
ktekinay wrote:
Wow, I can't tell you how many times I've tested this with exactly the opposite results, but I'm able to reproduce your findings here too. RegEx is faster even for a straight token search, and I even wrote my own, different, test before you posted the link to yours.

I wonder if I'm misremembering what I tested, or if something changed?


Something probably changed in a recent RS update. Like I said above, I would have assumed as you did, and I would swear that I've tested the two in the past and went with InStr because it tested faster at the time.

_________________
Daniel L. Taylor
Custom Controls for Real Studio WE!
Visit: http://www.webcustomcontrols.com/


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 6:48 pm 
Offline
User avatar

Joined: Mon Feb 05, 2007 5:21 pm
Posts: 600
Location: New York, NY
One thing: The longer the search string, the faster InStrB gets, whereas InStr and RegEx stay roughly the same. I added this marker to the end of a 1 MB string:

"now is the time for all good men to come to the aid of their monkeys"

My results:

InStr: 46,116 microsecs, Pos: 1048577
InStrB: 112 microsecs, Pos: 1048577
RX: 3,510 microsecs, Pos: 1048576

When the marker is simply "abc":

InStr: 45,207 microsecs, Pos: 1048577
InStrB: 2,077 microsecs, Pos: 1048577
RX: 3,419 microsecs, Pos: 1048576

This makes sense since InStrB is doing straight byte-matching without regard to encoding. (This is all with UTF8-encoded strings.)

_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Thu Feb 21, 2013 9:02 pm 
Offline

Joined: Wed Mar 22, 2006 11:15 am
Posts: 712
Location: Southern California
ktekinay wrote:
One thing: The longer the search string, the faster InStrB gets, whereas InStr and RegEx stay roughly the same.
...
This makes sense since InStrB is doing straight byte-matching without regard to encoding. (This is all with UTF8-encoded strings.)


Good catch. The longer the find string, the fewer the loops to find it because InStrB can jump the length of the find string on each character mismatch. InStr and RegEx have to scan all the characters.

_________________
Daniel L. Taylor
Custom Controls for Real Studio WE!
Visit: http://www.webcustomcontrols.com/


Top
 Profile  
Reply with quote  
 Post subject: Re: About instr() performances
PostPosted: Fri Feb 22, 2013 12:01 pm 
Offline
User avatar

Joined: Mon Feb 05, 2007 5:21 pm
Posts: 600
Location: New York, NY
See topic Building a faster InStr.

_________________
Kem Tekinay
MacTechnologies Consulting
http://www.mactechnologies.com/

Need to develop, test, and refine regular expressions? Try RegExRX.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2

All times are UTC - 5 hours


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group