# Pre-phase What is statistics

In my opinion, statistic is ALL about estimation. Estimating the probability of some events that happen over the universe of all events.

Test of significance

When people first learn about statistics, they are probably learned from stats 101, where the professor told them how to do a t-test, or chi-square-test where they can decide a certain judgement is significant or not.  Well, this is an estimation too, in fact, these tests are estimating the probability of you making a mistake by saying the judgement is significant.  For example, if you are doing a t-test of two samples, and the p-value is 0.01, that is saying if you say these two samples are significant, the probability that you are incorrect is 0.001. It's pretty much means that you are almost correct.

Then why is there a whole area of statistics if the only goal of statistics is to estimate?

It's because there are so many models, that each have its own strength when estimating a probability. There's parametric, non-parametric to estimate a probability distribution over continuous, or discrete interval. There's also graphical models, multivariate models if the dataset you have got more than one variables, and you want to estimate conditional probability.

When you are estimating something, there are also many measurements of how good the estimator is. There are always trade-off between properties of an estimator, if your estimator is unbiased, it's probably going to have high variance.

There are many questions to ask when you want to estimate something.

Would you like an estimator that is generally good, but can make is very bad mistake or you'd like an estimator that is not as good, but is guaranteed to not make a very bad mistake?

Would you like an estimator that is unbiased when sample size is infinity with high variance or you'd like an estimator that is little biased but with very small variance?

etc.

So before you get into the field of statistics,  these questions are definitely important to keep in mind, and when you use statistics to solve problems in research, you'll always have to state how/why you chose such estimator.

# Matlab: Double precision problem

In matlab, Sometimes when you try to compare two numbers, they don't usually gives you the answer you excepted. When you compare two integers,

```a=1; b=1; a==b ```
will gives you 1.

but when you compare double, sometimes it doesn't work. Simple cases that if you do

``` a=0.001; b=0.001; a==b```

will give you 1 still. But if you save a into a file, and use ` textread(filename)` to get the value, the value may still look like 0.001, but if you do

` a==0.001`

It might give you 0 because the a was read from a file and it was in some weird format. This might be a bug. Some people fix it by doing

` abs(a-0.001)<0.000001`

it basically means if a and 0.001 is very close, then they are equal.

I personally have a quicker fix.

` a+1==0.001+1`.

For some reason, after any operation on the variable, the value no long have anything weird going on inside.

# Java: read table with empty elements.

Suppose you have a table, tab or "," delimited. Look like this:

element1,element2,element3,element4

element21,,,element24

If you read each line and do
`split(",");`

The result will be a 2 element String[] for 2nd line. Because java will ignore the repeated ",".

The solution to this is to do `split(",",-1);`

# Deleting directory

Java language is safe but weird. A lot of the times it doesn't what you think it does.
For example, if you do

File f= new File( path) and then do f.delete().

It won't be deleted if it's directory. It doesn't even throw exception. So you have to recursively delete all files and folders inside before you delete it.

# SiriProxy ported in Java. NO RUBY,NO LINUX, EASIEST SETUP in any OS

Siri has been out for a while now, if you are looking at this post, I guess you have already looked at siriproxy, spire, and all other stuff on how to setup a proxy in linux system.

To do that, you have to install a linux system. (for hardcore developers, you probably already using linux) for windows users and mac users... not so much.

Then you have to install all these stuff, taking like an hour, and using that RUBY program that you probably are not familiar with.

Then eventually to keep the server running you have to keep a LINUX computer running, probably on a virtual box.

To me, that's a lot of work. It would be MUCH MUCH easier if you can just get a windows service in a windows computer, or a java program in Mac background.

So here, I have ported the siriproxy in java using socket programming.

How does the java program work

The way it works in simple. You run the java either in command line, or you can use a eclipse or netbean or any IDE to open the source code. Then run the servertest.java.

Few things to note are:

1. The server do require a ssl secure connection, which means you still need a pair of private and public certificate. Just like what you'd do in regular siriproxy, you can use them with this server too. Just go into servertest.java, and change the filename to the server certificate.

2. The server require additional libraries. plist for java SSL server for java

The program will require privilege to create file in the current folder( the folder the program is in). It will create a validation.dat file which contain the iphone 4S validation information when an iphone 4S is connected. Then you can use iphone 4 with a spire to connect to the server it should also work.

How to connect to the server.

To connect to the server.

if your computer is in a local network, which means the computer is connected to a router then to internet. Then you just need to setup a port forwarding to forward 443 to your computer's local ip address.

if your computer is not in a local network, which means it's directly connected to internet. Then don't have to do anything.

Setup Iphone 4 to connect to the computer

1. Jailbreak your iphone and install spire and iFile.

2. in iFile, browse to /etc/host. add a line "xxx.xxx.xxx.xxx guzzoni.apple.com" where the xxx.xxx.xxx.xxx is the IP address that you remembered.

In iphone 4S, it should be the same.

After this step, your siri should connect to the ip address you entered whenever you activate siri.

feel free to leave comments if it's confusing. if this post helped you in some way, please like it and share it to others. Thanks

# Siri binary data, how to process

Apple's new personal assistant Siri was cracked by Applidium few months ago. The data format was explained by them in detail.

Just for a personal interest, I try to make a java version of siriproxy based on the information they provided. Turns out there are much more details into it.

How iphone send voice packages
First of all, Siri convert voice data in to the codex Speex data format. Which is a public available library and there are java/c libraries that deals with it. For people who are interested in collecting the data and do some dictation on their own. They can simply write some code to build voice recognition models on a java proxy.(work the same as siri proxy).

Then, the speex data packets are packaged in to CFPropertyList. There are also c libraries to read binary data and convert into the property list too. I haven't find a java library, but if comes to that, I'll just write one myself.

Then this property list of binary data is compressed using zlib. and sent to the guzzoni.apple.com server.

When you write code to build a proxy to intercept the packages between iphone and apple server. You'll find that iphone first some few lines of headers of a "ace" request, not http request!.
Then it start to send a bunch of binary packages, each with various length.

How to read the data iphone sent out
before the binary data, there is this ACE header immediatly follows the 4 or 5 lines of ace header. which look like this

ACE /ace HTTP/1.0
Host: 192.168.0.1
User-Agent: Assistant(iPhone/iPhone3,1; iPhone OS/5.0.1/9A405) Ace/1.0
Content-Length: 2000000000
X-Ace-Host: xxxxxxxxxxxxxxxxxxxxxxx

Then there is an empty line, which means there is two bytes of carriage return and line feed, "/r/n" immediately after the X-Ace-Host:xxxxxx...

To unzip the binary data using zlib. One important thing to know is, the header of the first package is universal to all follow up packages. which means, in java, if you write something like this.

decompressor d= new decompressor();
d.unzip(line);
d.end();s
}

it will most likely not work. you have to look all packages as a whole when you unzip it.

decompressor d= new decompressor();
d.unzip(line);
}
d.end();

Then binary data starts. The binary data start with a 4 bytes of ace header. it should always look like 0xAACCEE, the 4th byte doesn't really mean anything. So to unzip the binary data, start from 5th byte. In java socket programming, a InflaterInputStream is a handy thing to use. It automatically convert the InputStream of iphone to InflaterInputStream, and anything it spits out is unzipped data.

There are two types of data in the unzipped data. One is a "ping" package, it doesn't mean anything except it's a small packages (usually few bytes) send to apple to keep connection alive.. The 5 bytes header for this is 0x030000xxxx(in hex), the xxxx means the length of the package follows the 5 bytes header.

Another type is Property List. The list contain the iphone 4S identification key, the voice packages and etc. The 5 bytes header for this is 0x020000xxxx.

One very important thing to know about the xxxx is that, it's only 2 bytes, but it's very easy to get confused when calculating the length. Because every packages are one after another, if the length of 1st package is wrong, then you won't find the 2nd package.

The LSD and MSD is here to play a big part. While your machine reads from left to right, in each byte, the representation is tricky.
For example: if you see a xxxx as 0x00EC, it does not mean that it's E(15x16)+C(13), it actually means the opposite. it means C(13x16)+E(15).Because the C is actually the most significant digit. It's very important to check if your machine is reading this way.

If you successfully convert the binary data into property list, you can extract the iphone 4S certifications and save for your iphone 4 just like siriproxy.

If you see 0x0123, it's not 1x256+2x16+3, it's actually 1x16^3+3x16+2. Of course, in many systems a simple convert is enough, but just to make sure you know that this might cause problem. In java a Short.parsebyte(0x0123) works nicely to know the length of the packages.

# Using I/O, When to close

In java, when using classes such as BufferedReader, it create a i/o stream to reads from the source. But one important for this is that the stream is always remain open unless you close it. Even after you scope out of the method that you created the instance, the stream is still there and not disposed.

This stream, if not closed, will still be forced quit after the program exit. Therefore if the program only use few streams, the problem is not detectable when running it.

BUT when you have a loop outside of the reader, to read thousands of files, you will get some kind of "stream reach its limit" exception. Meaning that the number of stream that's open is exceeded the limit. This limit is usually configured based on OS, in linux it's usually 10000. Theoriaticall if you have

while(true){

}

You will get that exceed error after 10000 runs.

Now it occurs to me, that disposable doesn't do anything to close the stream, then switching to a different reader shouldn't do it neither.

Which means:

When br, this reference is redirected to another instance of BufferedReader, The original one is still not closed.

To verify this, I have run a while(true) loop on both cases that were mentioned above. (with Ubuntu 11.10)The program throws FileNotFoundException after 12000~14000 open streams. But does not throw any exceptions if the BufferedReader is properly closed.

This experiment is just to remind people that, most of us remembered to close a stream after the read. But did not realize that we also need to close the br before redeclaring the variable to something else.

P.S. This bufferedReader is just an example, other streams or readers, such as PrintStream, DataInputStream also need to be closed the same way.

# Linux bash trick/bugs

This is simply a post of some thing that is easily ignored when programming in linux. For a while I had this idea, now start collecting.

1. In sh commands, when you have a shell file to run some commands like this

java -jar test.jar -para1

jara -jar test.jar -para2

The shell command actually pass the cr (carriage return) at the end of first line into the java parameters. This normally does not cause problems but if, just if, in the test.jar you try to creat a folder named para1. Then in linux it will creat a folder named "para1^M" and you won't be able to access it by typing

cd para1

because of that ^M symbol. so in your java code, you have to do a replace("r","") and replace("n","") before you create the folder.

# How to setup Siriproxy and Spire witn VPN access

I have done a lot of researches the past week about how to setup a siriproxy and spire with only VPN access. I have seen so many posts that described everything step by step in real detail. But I am a kind of person who want's to do know some fundementals before just pasting in commands and see if its running.

Pros:

Siri on the go. As long as you have internet, you can use siri

Cons:

Adding one more components into the whole system sometimes breaks it

Requirement before start

An public IP. Either your computer is directly connected to the modem, or your router can redicrect GRE protocol and ports.

An iphone 4S to get the key.

From what I learned, Whenever you talk to siri, iphone 4S send voice packages to Guzzoni.apple.com to translate into text, which also do all the AI stuff and figure out what siri should talk back to.

A siriproxy is a proxy server that redirect your iphone voice packages to guzzoni, and when guzzoni send back the converted text and the "siri command", siriproxy redirect them back to iphone.

The importance of the proxy for iphone 4S is, sometimes the guzzoni server doesn't understand what you are saying then siri will be like "sorry, i don't understand", if you ask about what movies there are for tonight and such, siri will not be able to answer you. But the siriproxy can hijack the command and text before send to your phone, read the text and execute a program if the text matches some commands. These are siriproxy-plugins.

The importance of siriproxy for iphone 4 is, it manages to grab the voice packages from spire before sending to guzzoni, add some identification keys into the packages to "fake" an iphone 4S then send to guzzoni. So guzzoni won't reject the voice package.

I'll cover the thought process, then give details.

Thought process is:

1. Setup a siriproxy server on some IP address. let's call it proxy IP

2. then somehow make your iphone to send network traffic under control. Basically you want everything you send to guzzoni.apple.com to not be send to them from your phone, but to send to the proxy IP.  This can be done by using a dnsmasq.

Here I need to cover a basic idea about DNS.
A DNS server is a server that gives you IP address if you give it a domain name. IP address is the address that uniquely define a computer or a local network. An example is, if you type google.com in browser address, your browser first send "google.com" to a DNS server, then DNS server gives back the ip address. Then your browser can find the website and display to you.

A dnsmasq here is a software in linux system, it can  "fake" a DNS server, that check the name that you send to DNS server before sending it. It almost work as a DNS server(To our convinience let's say it is). It you install it and set it up and not do anything to it, it redirect every request to real DNS server and reply back. The idea is to hijack the request for "guzzoni.apple.com" because this is the apple server, you want to send the voice packages to siriproxy, not the apple server. So this dnsmasq is used to hijack the name "guzzoni.apple.com" and tell your phone the proxy IP. Then the voice packages will be sent to the proxy IP, and siriproxy successfully catches it.

How to setup VPN server and use dnsmasq as DNS server

but to use a dnsmasq, first you need to make your phone think that your dnsmasq is the DNS server. Most of the time, a DNS server is provided to your phone, or computer by internet provider, and it's setup in your phone automatically. For example, if your phone is connected to a wireless, the router will tell you what the DNS server is. Same idea, if your phone is connected to a VPN, the VPN server will also tell you the DNS server.Changing in wireless is easy, and there are many videos to do it. Here we'll use a VPN to set the DNS server to the dnsmasq.

Summary of Introduction.

original iphone 4S process

The process of new set up

NOW THE TIME FOR DETAIL.

First you need to install and upgrade Ubuntu all the way to version 11.10. Update by clicking the right top corner and select updates available. . It IS important to update to current version because there are libraries that was not suitable for the ruby version and etc.

A Setup VPN

1. sudo apt-get install pptpd

Install the PPTP VPN server.

2. sudo nano /etc/pptpd.conf

edit the pptp server, scroll down and delete the "#" before localip and remoteip it should look like

localip 192.168.0.1
remoteip 192.168.0.234-238,192.168.0.245

Remember the localip.

3. sudo nano /etc/ppp/pptpd-options

edit the line with "ms-dns x.x.x.x" to ms-dns to the localip above. This is to set the DNS server that the VPN server should ask to. So your iphone connected to the VPN will use your dnsmasq as the DNS server.

4. sudo nano /etc/ppp/chap-secrets

5. ifconfig

In here you should find a etho0 or wlan0 as your internet device that let you connect to internet. Others such as "lo" is just a local ip, don't worry about them. remember what device is the device that connect to internet.

6. sudo nano /etc/rc.local

Setup ip-masquerading, ip-masquerading is the process of routing traffice from VPN to your iphone. if it's not setup, your phone cannot connect internet throught the VPN server.Add the following lines before the "exit 0"

# PPTP IP forwarding

iptables -t nat -A POSTROUTING -o XXX -j MASQUERADE

"XXX" represent the device that you found that let you connect to internet.

7. sudo nano /etc/sysctl.conf

net.ipv4.ip_forward=1

if it's commented out by a "#" in front of it, remove the "#".

8. Now reboot. After the reboot the VPN server is setup nice.

B Setup dnsmasq

1. sudo apt-get install dnsmasq

2. sudo nano /etc/dnsmasq.conf

scroll down to find

<localip> is the localip that you were asked to remember when you setup vpn. In my case it looks like this:

Now your dns should redirect all request to guzzoni.apple.com to your local ip.

3. reboot and let's test out the VPN and DNS server before continue.

C. TEST servers before going further

After the reboot. You want to use the VPN from your phone,

(Note: description is required but it can be anything, if you don't know your public ip, just type "what's my ip" in google)

It should connect and if you open browser in iphone, it should also get online.

Now to test if dnsmasq is actually redirecting guzzoni.apple.com to your local machine, you'll need another machine to connect to the vpn server. (same setup like your iphone). and in terminal do

nslookup guzzoni.apple.com

if it gives your localip, then it's good news.

D. Setup Siriproxy

Follow exactly what he did in both steps.

(In step 1. Command 2) He changed the dnsmasq for the local wireless network, but we are using VPN so do NOT change it. keep it what you did in my tutorial.

When he setup iphone 4S, he changed the wifi setting DNS server, again you are using VPN, do not change it. If you connect to VPN, the VPN server should redirect the request for you.

This is my first blog, so please be nice, post your problems if it didn't work for you.

This post was meant for someone with little/no knowledge of what's going on when trying to set up the siriproxy. Because if it doesn't work they will have no way of debugging it. I explain the fundamentals based on what I know, (not that I am expert or I actually did something high-tech and participated in jailbreak).

The whole process is cut into few parts so that you can determine exactly which part is wrong when trying to do it.

Acknowledgement

Randy's Tech