How to read large csv files?

Yafuhenk

Active Member
Licensed User
Longtime User
Hi,

When reading csv files with StringUtils.Loadcsv I don't have any problem for files smaller than ± 2 MB. Larger files let the program crash.

I read on this forum that one should use Textreader. I did but still with larger files still let the program crash.
Does anybody knows how i can solve this?

This is de code I used to read the csv file:
B4X:
   Dim Reader As TextReader
      Reader.Initialize(File.OpenInput(File.DirRootExternal, "test4.csv"))
      Dim line As String
      line = Reader.ReadLine
      List1.Initialize
   Do While line <> Null
       Dim LineArray() As String
      LineArray = Regex.Split(";", line)
      List1.Add(LineArray)
        line = Reader.ReadLine
      Loop

Thanks for any feedback

Henk
 

Cableguy

Expert
Licensed User
Longtime User
Why not use a simple sql database?
A .csv file with 2mb carries a LOT of info...
In that case, a database would be easier to work with, and pehaps even faster to process...
 
Upvote 0

Yafuhenk

Active Member
Licensed User
Longtime User
Hi Thanks for your reply.

I want to import the csv into a database.
If you know an easy way to do it then please tell me.

This is the way I did it now

B4X:
Sub btnImport_Click
   Dim i As Int
   Dim FileDialog1 As FileDialog
   FileDialog1.FilePath = File.DirRootExternal
   If FileDialog1.Show("Choose csv file", "Ok", "Cancel", "", Null) = DialogResponse.POSITIVE Then
   'read csv file
      'read csv
      List1.Initialize
      List1 = StringUtils1.LoadCSV(FileDialog1.FilePath, FileDialog1.ChosenName, "";"") 
   Else
      Return False
   End If
   'Without DoEvents the progressbar doesn´t work
   DoEvents
   pgbMain.Progress = 0
   pgbMain.Visible = True
'   Dim Reader As TextReader
'      Reader.Initialize(File.OpenInput(File.DirRootExternal, "test4.csv"))
'      Dim line As String
'      line = Reader.ReadLine
'      List1.Initialize
'   Do While line <> Null
'       Dim LineArray() As String
'      LineArray = Regex.Split(";", line)
'      List1.Add(LineArray)
'        line = Reader.ReadLine
'      Loop
'      Reader.Close
   SQL1.Initialize(File.DirRootExternal, "test1.db", False)
   SQL1.ExecNonQuery("DROP TABLE IF EXISTS table1")
   SQL1.ExecNonQuery("CREATE TABLE table1 (col1 TEXT, col2 TEXT, col3 TEXT, col4 TEXT, col5 TEXT, col6 TEXT, col7 TEXT, col8 TEXT, col9 TEXT, col10 TEXT, col11 TEXT, col12 TEXT, col13 REAL, col14 TEXT)")
   SQL1.BeginTransaction
   Try
      For i = 0 To List1.Size-1
         Dim ColumnContent() As String
         Dim Amount As String      
         Dim Value As Double
         ColumnContent = List1.Get(i)
         'Delete first row with column text and rows with totals
         If ColumnContent(10) = "Material" OR ColumnContent(10) = "Result" OR ColumnContent(10) = "#" Then
         Else
            'Convert SAP-BI layout -> 12.345,67 EUR to useable layout -> 12345.67 
            Amount = ColumnContent(12)
            Amount = Amount.SubString2(0, Amount.Length - 4)   ' removes EUR
            Amount = Amount.Replace(".", "")                   ' removes the '.'
            Amount = Amount.Replace(",", ".")                  ' replaces ',' by '.'
            Value = Amount
            SQL1.ExecNonQuery2("INSERT INTO table1 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)", Array As Object (ColumnContent(0), ColumnContent(1), ColumnContent(2), ColumnContent(3), ColumnContent(4), ColumnContent(5), ColumnContent(6), ColumnContent(7), ColumnContent(8), ColumnContent(9), ColumnContent(10), ColumnContent(11), Amount, ColumnContent(13)))
         End If
         DoEvents
         pgbMain.Progress = i * 100 / List1.Size-1
      Next
      DoEvents
      pgbMain.Progress = 100
      SQL1.TransactionSuccessful
      pgbMain.Visible = False
      ToastMessageShow("Data imported! ",True)
   Catch
      Log(LastException.Message)
   End Try
   SQL1.EndTransaction
End Sub
 
Upvote 0

Cableguy

Expert
Licensed User
Longtime User
Go throught steps, try to first read the entire file (ReadToEnd)...
That will read the contents of the file into a String, if that works with crashing the app, the just split the resulting string, firstly by the CR&LF, then by the ","...
Its a bit more work, but should give you an idea of what is happening wrong with your actual code...
 
Upvote 0

Yafuhenk

Active Member
Licensed User
Longtime User
Hi

File is ± 7 MB
Program crashes:
B4X:
Dim Reader As TextReader
   Dim S As String
      Reader.Initialize(File.OpenInput(File.DirRootExternal, "test4.csv"))
      S = Reader.ReadAll
   Reader.Close
   ToastMessageShow("Data imported!",True)

Program doesn't crash:
B4X:
   Dim Reader As TextReader
   Dim List1 As List
      Reader.Initialize(File.OpenInput(File.DirRootExternal, "test4.csv"))
      List1 = Reader.ReadList
   Reader.Close
   ToastMessageShow("Data imported!",True)
but is does when I use List1.Get(i)

Any idea?
 
Upvote 0

Cableguy

Expert
Licensed User
Longtime User
I Have PM you...

But 7Mb is a lot!!!! you may be having memory leackage issues...
 
Last edited:
Upvote 0

Yafuhenk

Active Member
Licensed User
Longtime User
Finally, the issue is solved!
This code can read a large csv (7 MB)file without crashing on my tablet with 256 mb memory.

B4X:
Sub btnImport_Click
   Dim i, j As Int
   Dim FileDialog1 As FileDialog
   ProgressDialogShow("This process can take several minutes. Please wait!")
   FileDialog1.FilePath = File.DirRootExternal
   If FileDialog1.Show("Choose csv file", "Ok", "Cancel", "", Null) = DialogResponse.POSITIVE Then
      'set db
      SQL1.Initialize(File.DirRootExternal, "test1.db", True)
      SQL1.ExecNonQuery("DROP TABLE IF EXISTS table1")
      SQL1.ExecNonQuery("CREATE TABLE table1 (col1 TEXT, col2 TEXT, col3 TEXT, col4 TEXT, col5 TEXT, col6 TEXT, col7 TEXT, col8 TEXT, col9 TEXT, col10 TEXT, col11 TEXT, col12 TEXT, col13 REAL, col14 REAL)")
      SQL1.BeginTransaction
      'read csv File
      Dim Reader As TextReader
      Reader.Initialize(File.OpenInput(FileDialog1.FilePath, FileDialog1.ChosenName))
      Try
         Dim line As String
         Reader.ReadLine
         'Don´t do any thing with this first line since it contains the title
         line = Reader.ReadLine
         Do While line <> Null 
            j = j + 1
            line = Reader.ReadLine
            'Check if line contains a Result or Non-Defined ->(#). If so, do nothing
            If line.Contains("Result") OR line.Contains("#") Then
            Else
               Dim ColumnContent() As String
               ColumnContent = Regex.Split(";", line)
               For i = 0 To 13
                  ColumnContent(i) = ColumnContent(i).Replace(QUOTE, "")
                  'Column 12 contains the turnover figures
                  'Convert layout -> 12.345,67 EUR to useable layout -> 12345.67 
                  If i = 12 Then
                     Try
                        ColumnContent(i) = ColumnContent(i).SubString2(0, ColumnContent(i).Length - 4) '-4 for EUR & Space
                        ColumnContent(i) = ColumnContent(i).Replace(".", "")
                        ColumnContent(i) = ColumnContent(i).Replace(",", ".")
                     Catch
                        Log(i & " " & j & " " & LastException.Message)
                     End Try
                  End If
                  'Column 13 contains the quantity figures
                  'Convert layout -> 1.234,5 KG to useable layout -> 1234.5 
                  If i = 13 Then
                     Try
                        ColumnContent(i) = ColumnContent(i).SubString2(0, ColumnContent(i).Length - 2) '-3 for KG or PC and Space
                        ColumnContent(i) = ColumnContent(i).Replace(".", "")
                        ColumnContent(i) = ColumnContent(i).Replace(",", ".")
                     Catch
                        Log(i & " " & j & " " & LastException.Message)
                     End Try
                  End If
               Next
               SQL1.ExecNonQuery2("INSERT INTO table1 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)", Array As Object (ColumnContent(0), ColumnContent(1), ColumnContent(2), ColumnContent(3), ColumnContent(4), ColumnContent(5), ColumnContent(6), ColumnContent(7), ColumnContent(8), ColumnContent(9), ColumnContent(10), ColumnContent(11), ColumnContent(12), ColumnContent(13)))
               DoEvents
            End If
         Loop
      Catch
         Log(LastException.Message)
      End Try
      Reader.Close
      SQL1.TransactionSuccessful
      SQL1.EndTransaction
      ProgressDialogHide
      Msgbox("csv file imported!", "Message")
      Msgbox("Ready for use!", "Message")
   Else
      Return True
   End If
End Sub
 
Upvote 0

Mahares

Expert
Licensed User
Longtime User
In the IDE on the bottom right scroll right or left to display 'Libs', you will need to put a check mark next to the 'Dialogs' in the shown referenced libraries above. Then you will be able to use the filedialog. You may need to refresh the IDE or exit and get back in. Make sure you dim:
Dim FileDialog1 As FileDialog in globals.
 
Upvote 0

paskuale

Member
In the IDE on the bottom right scroll right or left to display 'Libs', you will need to put a check mark next to the 'Dialogs' in the shown referenced libraries above. Then you will be able to use the filedialog. You may need to refresh the IDE or exit and get back in. Make sure you dim:
Dim FileDialog1 As FileDialog in globals.

tnx Mahares Dialog it's ok, now can't find my csv file,

B4X:
FileDialog1.FilePath = File.DirRootExternal

....csv file lies in the file list B4a :BangHead: how can find it?
 
Upvote 0

Mahares

Expert
Licensed User
Longtime User
Did you try one of these two:
FileDialog1.FilePath = File.DirAssets
or FileDialog1.FilePath = File.DirRootInternal
 
Upvote 0

paskuale

Member
Did you try one of these two:
FileDialog1.FilePath = File.DirAssets
or FileDialog1.FilePath = File.DirRootInternal

...with dirRootInternal:
B4X:
Compiling code.                         Error
Error compiling program.
Error description: Unknown member: dirrootinternal
Occurred on line: 78
    FileDialog1.FilePath = File.DirRootInternal
Word: dirrootinternal

with DirAssets:
B4X:
java.lang.RuntimeException: Path 'AssetsDir'does not exist
 
Upvote 0

Kiffi

Well-Known Member
Licensed User
Longtime User
with DirAssets:
B4X:
java.lang.RuntimeException: Path 'AssetsDir'does not exist

2012-03-07_195944zckxd.png


:D
 
Upvote 0

paskuale

Member

here the code:

B4X:
 Dim i, j As Int
    Dim FileDialog1 As FileDialog
    ProgressDialogShow("This process can take several minutes. Please wait!")
    FileDialog1.FilePath = File.DirAssets
    If FileDialog1.Show("Choose csv file", "Ok", "Cancel", "", Null) = DialogResponse.POSITIVE Then
        'set db
        SQL1.Initialize(File.DirAssets, "test1.db", True)
.
.
.
.

why java call it 'AssetDir' ?!
 
Upvote 0

paskuale

Member
with log function:

B4X:
Log("File.DirAssets="&File.DirAssets)
Log("File.DirDefaultExternal="&File.DirDefaultExternal)
Log("File.DirInternal="&File.DirInternal)
Log("File.DirInternalCache="&File.DirInternalCache)
Log("File.DirRootExternal="&File.DirRootExternal)

return....

File.DirAssets=AssetsDir
File.DirDefaultExternal=/mnt/sdcard/Android/data/b4a.pabacus/files
File.DirInternal=/data/data/b4a.pabacus/files
File.DirInternalCache=/data/data/b4a.pabacus/cache
File.DirRootExternal=/mnt/sdcard


I try DirInternal but I can select only 2 files test.db;test.journal... where is my csv file :( ?
 
Upvote 0

Mahares

Expert
Licensed User
Longtime User
Some forum guru may ask you to zip your project and attach it to a post along with the csv file if you can. Sometimes, you get more relevant answers when they experience what you are experiencing. This is how to zip your project:
1. Open your project in IDE.
2. Click 'File', then click ''Export as zip'.
3. Give it a name and attach it to your post, along with the csv file.
 
Upvote 0

klaus

Expert
Licensed User
Longtime User
Attached a modified version.
The Files folder in the project is DirAssets, but you cannot use a SQLite database in that folder nor can you access this folder with the Dialog library.
You must copy the csv file to DirDefaultExternal with the code below:
B4X:
If File.Exists(File.DirDefaultExternal, "00palmcclienti.csv") = False Then
  File.Copy(File.DirAssets, "00palmclienti.csv", File.DirDefaultExternal, "00palmclienti.csv")
End If
Then you can access the csv file with the FileDialog and load it.

Then I saw that you try to seperate the colums with ";" but in your file the seperator character is a ","!
I changed this line:
B4X:
' ColumnContent = Regex.Split(";", line)
ColumnContent = Regex.Split(",", line)
This shows that the splitting does work.
But in column 12 you have a decimal number with a "," as decimal character so this is also splitted.
Therefore you MUST save your csv file with a ";" seperator character, and set this one back in the splitting line !
In the conversion for column 12 you are removing 4 characters for " EUR" but there is no " EUR" in the file. The same for "KG" in column 13.

I didn't check anything else.

Best regards.
 
Upvote 0

Mahares

Expert
Licensed User
Longtime User
I observed everything Klaus observed. I imported the text file after the changes I made and it worked very well. It populated the table. Klaus beat me to it. I thought he would be asleep in Switzerland. But here is my revised project attached. I am sure it took me a long longer than it did Klaus.
 
Upvote 0
Top