Category Archives: Statistics

What is Random

A big concept in statistics is randomness. Rolling a fair dice, or throwing a quarter, picking a sample from a population, sampling from some distribution. These behaviors all involved some randomness.

Very often, you can see a question like this: what's the probability of throwing a fair (50/50) quarter and you get heads? This is based on the assumption that throwing this quarter, this quarter will land randomly.

However, it is very tricky to think about this concept. Why would there be random? If you think really deep( not saying I can think deeper than you), every random behavior can be explained to be a deterministic behavior, and can be monitored and predicted, if, the power of computing is good enough, and power of measuring is accurate enough.

For example, throwing a quarter. There are many factors that involved with the quarter show head or tail after it land, such as: air flow speed, direction, landing angle, throwing angle, landing material, and so on. These things can all affect the result.

However, can we actually measure the air flow, throwing angle, and all these factors? This answer is quite difficult. Let's imagine that the air flow direction is the only factor affect result, the air is left and right.(which obviously is not the case), then when we throw quarter 100 times and 50 heads. It does not mean the coin itself is a fair coin, it can also be explained as the coin is not a fair coin, but the air flow direction somehow compensated the bias.

Therefore, I believe that in a reduced factor scenario, where there is no air and the quarter is dropped the same way many times, then every time the quarter should show the same side.

I believe it is very important to realize that everything in the world is in fact deterministic, we use probabilistic models is because there are so many factors that we have no idea how to capture. And remember "air flow direction" when you make a statement of a quarter's fairness.

Pre-phase What is statistics

In my opinion, statistic is ALL about estimation. Estimating the probability of some events that happen over the universe of all events.

Test of significance

When people first learn about statistics, they are probably learned from stats 101, where the professor told them how to do a t-test, or chi-square-test where they can decide a certain judgement is significant or not.  Well, this is an estimation too, in fact, these tests are estimating the probability of you making a mistake by saying the judgement is significant.  For example, if you are doing a t-test of two samples, and the p-value is 0.01, that is saying if you say these two samples are significant, the probability that you are incorrect is 0.001. It's pretty much means that you are almost correct.

Then why is there a whole area of statistics if the only goal of statistics is to estimate?

It's because there are so many models, that each have its own strength when estimating a probability. There's parametric, non-parametric to estimate a probability distribution over continuous, or discrete interval. There's also graphical models, multivariate models if the dataset you have got more than one variables, and you want to estimate conditional probability.

When you are estimating something, there are also many measurements of how good the estimator is. There are always trade-off between properties of an estimator, if your estimator is unbiased, it's probably going to have high variance.

There are many questions to ask when you want to estimate something.

Would you like an estimator that is generally good, but can make is very bad mistake or you'd like an estimator that is not as good, but is guaranteed to not make a very bad mistake?

Would you like an estimator that is unbiased when sample size is infinity with high variance or you'd like an estimator that is little biased but with very small variance?

etc.

So before you get into the field of statistics,  these questions are definitely important to keep in mind, and when you use statistics to solve problems in research, you'll always have to state how/why you chose such estimator.