Write large files in java

One thing many people forget is that string concatenation is a polynomial time process. (O(n^2)).
Because every time it extends some characters, a new string is created and the original string and the extending characters are copied over.

In some programs, people try to write large arrays of data in to file, and to make it simple, they use a String variable to contain all the data, then write to a file. But the problem is, you have to loop through the array and adding each element to the string, this will take significantly more time than just using a BufferedWriter and just keep writing to it.

Clash of clan: How is this game a life sucking app?

After playing this game for a while, I starting to think.. hmm how long does it to take me to reach the ultimate fully upgrade base.

So I pulled out the csv file from the game, and did a little spread sheet calculations.

Turns out, it will take over a year to upgrade everything( except for the walls). (assuming you have 4 builders and always building stuff).Which is ok, but think about how much gold you have to rob per day to keep all your builders in use.

Here is the link for the spread sheet.


There are 7 sheets in the google doc. Don't worry about the first 6 sheets. Only go to the
"Summary" sheet. and enter the current level of each of your buildings, also the current level of your troops, the current level of your heroes. Then on the right side of the sheet, you will see a summary of your status.

To be encouraged, or to be discouraged, it is totally up to you. lol

Malloc and free. C programming tips

In C programming, we usually have to define a lot of customized structs, they work similar to classes but without the methods.

Usually to construct a struct, we have the following

struct xyz {
int a;
int b;

Then when you use it, you have to do

struct xyz * apointer= malloc(sizeof(struct xyz));

To simplify the code, we use

typedef struct xyz * xyz

This way, we don't have to type "struct xyz *" every time, but simply use a "xyz".

Problem usually happen when you try to free the memory. After you are done with this struct.
Then you free the memory that "apointer" points to by using free(apointer)

Many times people will get a "invalid size", or "invalid address" error when they try to free the memory and get very confused. I found it very often that this happens very often with one case. So I point it out to remind myself.
xyz curPointer=malloc(sizeof(xyz));

This means you are only malloc enough space for a xyz pointer!, which is only 8 byte( or 4 byte). You should use
xyz curPointer=malloc(sizeof(struct xyz));

This is very important!
If you did not "malloc" enough space for one struct and write more bytes into the space. You'll overwrite the memory space adjacent to it. (C does not check for this when it writes to this location exceed the malloc space). When you try to free the next memory block, it will not understand what to free, because this block was overwritten.

Suppose you have block A and B adjacent to each other. A is 8 byte, B is also 8 byte.

Then you write 9 bytes to A, and then try to free B. You'll get the error I was talking about.

What is Random

A big concept in statistics is randomness. Rolling a fair dice, or throwing a quarter, picking a sample from a population, sampling from some distribution. These behaviors all involved some randomness.

Very often, you can see a question like this: what's the probability of throwing a fair (50/50) quarter and you get heads? This is based on the assumption that throwing this quarter, this quarter will land randomly.

However, it is very tricky to think about this concept. Why would there be random? If you think really deep( not saying I can think deeper than you), every random behavior can be explained to be a deterministic behavior, and can be monitored and predicted, if, the power of computing is good enough, and power of measuring is accurate enough.

For example, throwing a quarter. There are many factors that involved with the quarter show head or tail after it land, such as: air flow speed, direction, landing angle, throwing angle, landing material, and so on. These things can all affect the result.

However, can we actually measure the air flow, throwing angle, and all these factors? This answer is quite difficult. Let's imagine that the air flow direction is the only factor affect result, the air is left and right.(which obviously is not the case), then when we throw quarter 100 times and 50 heads. It does not mean the coin itself is a fair coin, it can also be explained as the coin is not a fair coin, but the air flow direction somehow compensated the bias.

Therefore, I believe that in a reduced factor scenario, where there is no air and the quarter is dropped the same way many times, then every time the quarter should show the same side.

I believe it is very important to realize that everything in the world is in fact deterministic, we use probabilistic models is because there are so many factors that we have no idea how to capture. And remember "air flow direction" when you make a statement of a quarter's fairness.

Tree and Tree node implementation

A tree is a very useful structure, many people like to use

class TreeNode{
E nodeData;
TreeNode child1;
TreeNode child2;

to represent nodes in a tree, I find it complicated and hoping to get create a way such that tree nodes can be any class that implement a TreeNode class. For example, anytime you try to get the child or parent, you'll have to throw away this wrapper of TreeNode. The code in my mind to use the node would be as easy as
class employee implements TreeNode{
Then we can use
employee e1= new employee();
employee e2= new employee();

But this bring up a problem. The tree node function such as, getChildren, getParent, getDecendants, they should be inherited instead of re-declared in each of the subclass that implements TreeNode.
So a better way is to use abstract class instead of interface.
public abstract class Treenode<T extends Treenode<T>> {
T next;
T myself=(T) this;
T parent;
public void addChild(T node) {
public T getChild() {
return next;
public T getParent(){
return parent;

The logic here is that any class that extends this Treenode class will automatically get the following functions without reprogramming them. The ugly part of this snippet is the T myself=(T) this;. Here the "this" is considered to be any Treenode, but myself is considered to be the subclass that extends it. This have to be a cast, but I can't see any reason such a cast could fail. Unless
class class1 extends Treenode{};
class class2 extends Treenode{};

Then for some reason you want

class1 c=new class1();
c.add(new class2());

This should not happen as a strongly typed programming practice. Period.

Windows Service Start up

Visual Studio 2010 has a very nice template for creating a windows service. It's fairly simple to just create it.

Click here to see tutorial on how to create and install a windows service.

Because a windows service cannot be running inside visual studio, it has to be running in a windows service management, therefore the service cannot be debugged in Visual Studio. The alternative option here is to attach a debugger on the service that is running.

Click here to see tutorial on how to attach a debugger on the service.

Few things to note is: The service is not viewable in the processes dialog box if it is not started. Which means you cannot attach your debugger unless you started the service. When the service start, it executes the Onstart() method. Once it started executing that method, you can see it in the process dialog box and attach debugger.

But what if you want to debug the Onstart() method?

You can use System.Threading, and use the static method Thread.sleep(x milisec); Make the service start wait for 1o sec, and attach the debugger during that 10 sec. But here comes another trick, the service management only gives arround 30-40 sec for the Onstart() to execute. If Onstart() takes too long, then error message appears saying your service is not responding. So the longer  you wait, the less time you give Onstart() to execute.


This method does NOT mean after Onstart(), it will be executed. This method is only called after the service is paused and resumed.

A general structure of a service should be,

Onstart: Use threading, generate thread to start any processes. Do not put the process here, Onstart has very limited time to start!

OnStop: interrupt the thread, disable any thread, or timer. and dispose.

Pre-phase What is statistics

In my opinion, statistic is ALL about estimation. Estimating the probability of some events that happen over the universe of all events.

Test of significance

When people first learn about statistics, they are probably learned from stats 101, where the professor told them how to do a t-test, or chi-square-test where they can decide a certain judgement is significant or not.  Well, this is an estimation too, in fact, these tests are estimating the probability of you making a mistake by saying the judgement is significant.  For example, if you are doing a t-test of two samples, and the p-value is 0.01, that is saying if you say these two samples are significant, the probability that you are incorrect is 0.001. It's pretty much means that you are almost correct.

Then why is there a whole area of statistics if the only goal of statistics is to estimate?

It's because there are so many models, that each have its own strength when estimating a probability. There's parametric, non-parametric to estimate a probability distribution over continuous, or discrete interval. There's also graphical models, multivariate models if the dataset you have got more than one variables, and you want to estimate conditional probability.

When you are estimating something, there are also many measurements of how good the estimator is. There are always trade-off between properties of an estimator, if your estimator is unbiased, it's probably going to have high variance.

There are many questions to ask when you want to estimate something.

Would you like an estimator that is generally good, but can make is very bad mistake or you'd like an estimator that is not as good, but is guaranteed to not make a very bad mistake?

Would you like an estimator that is unbiased when sample size is infinity with high variance or you'd like an estimator that is little biased but with very small variance?


So before you get into the field of statistics,  these questions are definitely important to keep in mind, and when you use statistics to solve problems in research, you'll always have to state how/why you chose such estimator.


Matlab: Double precision problem

In matlab, Sometimes when you try to compare two numbers, they don't usually gives you the answer you excepted. When you compare two integers,


will gives you 1.

but when you compare double, sometimes it doesn't work. Simple cases that if you do


will give you 1 still. But if you save a into a file, and use textread(filename) to get the value, the value may still look like 0.001, but if you do


It might give you 0 because the a was read from a file and it was in some weird format. This might be a bug. Some people fix it by doing


it basically means if a and 0.001 is very close, then they are equal.

I personally have a quicker fix.


For some reason, after any operation on the variable, the value no long have anything weird going on inside.