Category Archives: I/O hints

Write large files in java

One thing many people forget is that string concatenation is a polynomial time process. (O(n^2)).
Because every time it extends some characters, a new string is created and the original string and the extending characters are copied over.

In some programs, people try to write large arrays of data in to file, and to make it simple, they use a String variable to contain all the data, then write to a file. But the problem is, you have to loop through the array and adding each element to the string, this will take significantly more time than just using a BufferedWriter and just keep writing to it.

Java: read table with empty elements.

Suppose you have a table, tab or "," delimited. Look like this:

element1,element2,element3,element4

element21,,,element24

If you read each line and do
split(",");

The result will be a 2 element String[] for 2nd line. Because java will ignore the repeated ",".

The solution to this is to do split(",",-1);

Using I/O, When to close

In java, when using classes such as BufferedReader, it create a i/o stream to reads from the source. But one important for this is that the stream is always remain open unless you close it. Even after you scope out of the method that you created the instance, the stream is still there and not disposed.

This stream, if not closed, will still be forced quit after the program exit. Therefore if the program only use few streams, the problem is not detectable when running it.

BUT when you have a loop outside of the reader, to read thousands of files, you will get some kind of "stream reach its limit" exception. Meaning that the number of stream that's open is exceeded the limit. This limit is usually configured based on OS, in linux it's usually 10000. Theoriaticall if you have

while(true){

BufferedReader br = new   BufferedReader(new InputStream);

br.read();

}

You will get that exceed error after 10000 runs.

Now it occurs to me, that disposable doesn't do anything to close the stream, then switching to a different reader shouldn't do it neither.

Which means:

Buffered br= new BufferedReader(some stream);

br=new BufferedReader(another stream);

When br, this reference is redirected to another instance of BufferedReader, The original one is still not closed.

To verify this, I have run a while(true) loop on both cases that were mentioned above. (with Ubuntu 11.10)The program throws FileNotFoundException after 12000~14000 open streams. But does not throw any exceptions if the BufferedReader is properly closed.

This experiment is just to remind people that, most of us remembered to close a stream after the read. But did not realize that we also need to close the br before redeclaring the variable to something else.

P.S. This bufferedReader is just an example, other streams or readers, such as PrintStream, DataInputStream also need to be closed the same way.