I'm carrying out a search and replace function, but the program execution is slow for the ammount of data that I want to crunch. Please have a look and advise if there is a better way to do things. This app is to be used on the desktop only. Ultimately the data will be a few MB.
I thought the writes to the screen - ProgressBar and % complete - may have been slowing it but if commented they make no difference. (Pitty, because I would have had a 1 second timer to do the screen updates instead.)
I've removed 63 of the 64 'IF' queries but that makes no difference either. (I would have been able to easily chop the 'IFs' in two but there is no point either.)
Is it because within my For...Next Loop; it takes a while to find the next row of the table?
Don't get me wrong, if you do a manual search and replace in Notepad - I could probably do it faster on paper(!), but if you load the file into MS Excel it is almost instantaneous.
The benefits of using a Table Control is that I may load the file as is, with <TAB> separation and save with <,> separation easily.
EDIT: 'IF' replaced with 'SELECT'
Last edited by Zenerdiode : 09-19-2008 at 09:54 PM.
I've removed 63 of the 64 'IF' queries but that makes no difference either.
You must have removed different IFs. I got a dramatic speedup by removing most of the "If Table1.Cell(Table1.ColName(4),i)= ...." statements.
You are losing performance on all the IFs that fail against the one that succeeds. I would try using Select ... Case instead. I don't know how the IDE/legacy compiler handles it but the optimised compiler emitted code uses a single index into a jump table to select the target code and so should be many times faster than all those If ... Else If ... statements.
DoEvents takes time as well, It might be worth trying something like "If (i mod 10) = 0 Then Doevents" to only call it every 10 times round the loop.
Thanks agraham. I've used the Select...Case and it has had a *marginal* improvement.
There's 3,500 records of 60 bytes each in the sample file and its taking 24 seconds to do the search/replace function. Thats only 150 records per second. That just sounds increadibly slow - I'm using a 2.0GHz Core Duo and would have thought even 150,000 per second is slow.
This task is not suited for a table control.
Each time you update a field the table needs to update its internal indexing.
I've updated the code to read the input file one line at a time, update this line and save it to the output file.
Takes me 0.06 seconds (60,000 records per second):
* Several select cases are removed from the following code because this post was too long (more than 10000 characters).
Code:
Sub Globals 'Declare the global variables here. Dim pars(0) End Sub
Sub App_Start Form1.Show End Sub
Sub Button1_Click OpenDialog1.Filter = "CDS Files|*.txt" If OpenDialog1.Show <> cCancel Then OrigFileName=OpenDialog1.File TextBox1.Text=OrigFileName If TextBox2.Text=""Then TextBox2.Text=StrInsert(OrigFileName,(StrLength(OrigFileName)-4)," [Resolved]") TextBox2.Text=SubString(TextBox2.Text,0,(StrLength(TextBox2.Text)-3))&"csv" EndIf EndIf End Sub
Sub Button2_Click SaveDialog1.Filter = "CSV Files|*.csv" If SaveDialog1.Show <> cCancel Then TextBox2.Text=SaveDialog1.File EndIf End Sub
Sub Button3_Click If TextBox1.Text=""OR TextBox2.Text=""Then Return EndIf Label4.Text="Decoding..." Label4.Visible=True Label6.Text="" Label6.Visible=True DoEvents Table1.LoadCSV(TextBox1.Text,Chr(9),False,True) Msgbox(Table1.RowCount) t=Now FileOpen(IN,textbox1.Text,cRead) FileOpen(OUT,textbox2.Text,cWrite) line = FileRead(IN) DoWhile line <> EOF pars() = StrSplit(line,cTab) Sub Globals 'Declare the global variables here. Dim pars(0) End Sub
Sub App_Start Form1.Show End Sub
Sub Button1_Click OpenDialog1.Filter = "CDS Files|*.txt" If OpenDialog1.Show <> cCancel Then OrigFileName=OpenDialog1.File TextBox1.Text=OrigFileName If TextBox2.Text=""Then TextBox2.Text=StrInsert(OrigFileName,(StrLength(OrigFileName)-4)," [Resolved]") TextBox2.Text=SubString(TextBox2.Text,0,(StrLength(TextBox2.Text)-3))&"csv" EndIf EndIf End Sub
Sub Button2_Click SaveDialog1.Filter = "CSV Files|*.csv" If SaveDialog1.Show <> cCancel Then TextBox2.Text=SaveDialog1.File EndIf End Sub
Sub Button3_Click If TextBox1.Text=""OR TextBox2.Text=""Then Return EndIf Label4.Text="Decoding..." Label4.Visible=True Label6.Text="" Label6.Visible=True DoEvents Table1.LoadCSV(TextBox1.Text,Chr(9),False,True) Msgbox(Table1.RowCount) t=Now FileOpen(IN,textbox1.Text,cRead) FileOpen(OUT,textbox2.Text,cWrite) line = FileRead(IN) DoWhile line <> EOF pars() = StrSplit(line,cTab) Select pars(4) Case"RCK1SLT02DI0001 " If pars(5)="OPEN "Then pars(5)="Down" Else pars(5)="Up" EndIf pars(4)="SBN TR" Case"RCK1SLT02DI0002 " If pars(5)="OPEN "Then pars(5)="Down" Else pars(5)="Up" EndIf pars(4)="SBM TR" Case"RCK1SLT02DI0003 " If pars(5)="OPEN "Then pars(5)="Down" Else pars(5)="Up" EndIf pars(4)="(TEST) R" Case"RCK1SLT02DI0004 " If pars(5)="OPEN "Then pars(5)="Down" Else pars(5)="Up" EndIf pars(4)="RECPR" Case"RCK1SLT02DI0005 " If pars(5)="OPEN "Then pars(5)="Up" Else pars(5)="Down" EndIf pars(4)="(CON) SR" Case"RCK1SLT02DI0006 " If pars(5)="OPEN "Then pars(5)="Down" Else pars(5)="Up" EndIf pars(4)="HJPR" Case"RCK1SLT02DI0007 " If pars(5)="OPEN "Then pars(5)="Down" Else pars(5)="Up" EndIf pars(4)="(UP) KR" CaseElse pars(4)=pars(4)&" <Unknown Operand>" EndSelect lineout = pars(0) & "," & pars(1) & "," & pars(2) & "," & pars(3) & "," & pars(4) & "," & pars(5) FileWrite(OUT,lineout) line = FileRead(IN) Loop FileClose(IN) FileClose(OUT) Label4.Text="Done" Msgbox("Done in "&Format((Now-t)/10000000,"F2")&" seconds.","Decode",cMsgboxOK,cMsgboxExclamation) TextBox1.Text="" TextBox2.Text="" Label4.Visible=False Label6.Visible=False End Sub
It certainly isn't. Assignment to a cell is astonishingly expensive!
Quote:
Each time you update a field the table needs to update its internal indexing.
And then some! I've looked at what's going on inside the CLR with Reflector and each assignment builds a new DataView from the underlying data, which is expensive in itself, and then goes on to index into that to make the changes. I can't find where the changes are reflected back to the underlying table but that's got to be expensive as well. I'm straying out of my field of expertise here but I guess that a lot of this expense is to maintain data integrity in multi-user situations where the tables are tied to a multi-user database and might possibly be subject to simultanous updates.
It certainly isn't. Assignment to a cell is astonishingly expensive!
Erel & Andrew, many thanks to both of you for your efforts with this. The difference in using the file directly without using the table is phenomenal. I just used the Table control through my inexperience as it produced the results I wanted; albeit at colossal expense of processing cycles. I also hadn't used the Select...Case methods before, again just going with what I knew from other languages. (Hence all of those IFs...)