Saturday, July 23, 2022

Java

What are Inner Classes?

    An inner class is a class nested inside some outer class. You can define an (inner) class inside a (outer) class. The inner class has access to everything in the outer class. It's called like "OuterClass.InnerClass myInnerClassObject = new OuterClass.new InnerClass()". 

What are Abstract Classes?
    Abstract classes are classes where nothing contains a body or definition. The point is that subclasses will provide these. An analog to math is that an n-dimensional hypercube is really an abstract class of squares/cubes; it has a theoretical structure with sides, faces, a volume etc, but that is all it is, a theoretical structure. The subclass of n-dimensional hypercubes where n=3 is a "concrete" class which we can provide definitions to and instantiate in real-life - V=s^3, SA=6s^2, etc. (Of course we can assign such formulae to n-D cubes in actual math but I am pretending that we don't for the sake of illustration)

What are Interfaces?
    Interfaces are abstract classes EXCEPT (as per the Java docs): 
        all fields are automatically public, static, and final, and all methods that you declare or define (as default methods) are public
        you can extend only one class, whether or not it is abstract, whereas you can implement any number of interfaces. 

What are Lambda Expressions?
    Suppose you have an interface with only one (abstract) method. This is also called a functional interface. Normally to use this method, you would have to make a class implementing this interface, make the method concrete by defining it, instantiate an object from this class, then call the method from the new object. A shorter way to do this is with lambda expressions. Suppose you have an Interface I with single method M. Then you can instantiate an object i from I possessing method M only with definition D via a Lambda Expression like so:
    I i = (args) -> { D };
    i.M(args);
It's as simple as that! So the functional interface defines a structure for the method, and the lambda expression provides the details of its implementation. 

What are enums?
    An enum is a set of constants. It is defined like so:
        public enum myKids {bill, mike, kathy}

What are initialization blocks? What is the initialization order?
    Initialization blocks are {} blocks which initialize variables that were previously declared. They can be static or nonstatic. Static blocks run before nonstatic blocks runs before constructor blocks. These blocks can contains code other than initializations, such as sysouts. 

Friday, November 8, 2019

Making Sense of Statistics: Sampling Distributions

Purely speaking, a statistic is a number computed from a subset of numbers (called sample in statistics) from a larger set of numbers (called population), whereas a parameter is a number calculated from the larger set i.e. population mean/variance.

Thus, here is an alternative formulation of a statistic. Suppose there is a vector function f : R^n -> R. Our population can always be assumed to have n numbers, and our sample to have k numbers | k < n. Let Xn be a vector of n numbers (representing a population) and Xk be a subvector of k numbers (representing a sample). Then f(Xk) is a statistic of the population. For example a common f is (x1 + x2 + x3....+ xk)/k, i.e. the sample mean. 

Mathematics is notorious for abstracting concepts. Even though numbers are already abstract, and calculating a statistic (a number) from data (more numbers) is a further abstraction, a sampling distribution abstracts this abstraction. 

Suppose again we have our n numbers. Combinatorics tells us there exist nCk subsets of k numbers from n. f maps each of these k-groups to some real number. Now, almost certainly, f is surjective, meaning there exists Xki and Xkj such that f(Xki) = f(Xkj) (these arguments are the ith and jth Xk vectors respectively). Meaning, the cardinality of the image of f is smaller than it's domain of k-groups. This means that when one selects a Xk vector, f(Xk) is more likely to be some values than others. Therefore, f(Xk) has a distribution, and when sampling for a statistic, we are really picking that statistic out of a hat according to that distribution of f(Xk). It is the exact same process as rolling sums from two dice; here Xn = <1,2,3,4,5,6>, k=2, Xk is any two-set from Xn, and f(Xk) = x1 + x2. In this case, the statistic is the sum of the sampled numbers.

-----------------------------------------------------------------------------------------------------
Aside: Law of Large Numbers
This distribution is well known, and the probability of getting a sum of x is (13-x)/36 for x>=7 and (x-1)/36 for x<=7. In the same way, a distribution exists for any other statistic one can conceive, although it may not be easy to mathematically describe, as for more complicated f's, a closed form analytic solution for the distribution may not exist. 

However the law of large numbers makes this no problem. If we progressively take k-groups from n and calculate an f, these f's will incrementally distribute themselves according to their true theoretical distribution! (Remember, this is no different than drawing sums from dice!) That is to say, our observed frequency distribution will approach the statistic's theoretical asymptotic distribution.
--------------------------------------------------------------------------------------------------------------------------

Wednesday, October 23, 2019

A Simple Method to Count an Integer's Factors from its Prime Factorization

Suppose we have we have an integer x whose prime factorization is a^n * b^m * c^l * d^h, where all variables here are also integers.

Then a, a^2, a^3,...a^n each must be factors of x as they clearly divide x's prime factorization (PF), yielding n distinct factors.

Similarly this is true for b, b^2, b^3,...b^m, yielding m factors.
And so and so forth for prime factors c and d; they each yield a number of factors equal to their respective exponents (l and h).
Thus far we have n + m + l + h factors.

We also have each combination of two primes from the PF; here is one for a and b:
  a*b,     a*b^2,      a*b^3...     a*b^m
a^2*b, a^2*b^2, a^2*b^3....a^2*b^m
a^3*b, a^3*b^2, a^3*b^3....a^3*b^m
.
.
.
a^n*b, a^n*b^2, a^n*b^3....a^n*b^m

Each element in this table divides the PF of x (and therefore x). This is an mxn table and so contains mn elements to add to our total number of factors.
Similar tables can be constructed for each other pair of prime factors, and each table would have a number of elements equivalent to the product of the exponents of that pair.

Here is the table for b and c:
  b*c,     b*c^2,      b*c^3...     b*c^l
b^2*c, b^2*c^2, b^2*c^3....b^2*c^l
b^3*c, b^3*c^2, b^3*c^3....b^3*c^l
.
.
.
b^m*c, b^m*c^2, b^m*c^3....b^m*c^l
Which yields ml elements.


In the end we will take all possible pairs of products of exponents! Yielding nm + nl + nh + ml + mh + lh.

For three factors, instead of a two dimensional table as shown above, you will get a three dimensional cube, with dimensions equal to the product of the exponents of the prime factors! And we will iterate for each possible cube, i.e. each possible triplet of prime factors, as shown in the uploaded picture. Thus we take each triplet product: nml + mlh + nmh +nlh.

We repeat this process but for three factors, and then four, then five... etc.
Reiterate for four factors: this will give a four dimensional hypercube, and we count the elements in each possible 4D hypercube. In this case we only have one such possible hypercube with cardinality nmlh. 

With each new prime factor in the PF we would keep repeating this process.

Thus in total we have 
n + m + l + h  +
nm + nl + nh + ml + mh + lh +
nml + mlh + nmh +nlh +
nmlh
factors in x.

I am not quite sure how to write that out as a mathematical statement. This is the process in English: note the exponent for each prime factor of x. Add each possible singleton, pair product, triplet product, quadruplet product, and so on for how many exponents there are. This sum is the number of factors in x. Quite a handy shortcut for doing GRE factorization problems!








Tuesday, October 22, 2019

Exploring Confidence Intervals in Statistics: Confusions and Comprehensions

They say the best way to learn something is to explain it oneself (according to Feynman at least). I find this is also the best way to assess one's confusions about a topic; when I possess a confusion, I am oftentimes unable to precisely identify the source of confusion, and historically this inability to identify the confusions correlates with an inability to articulate the confusion. I also find historically that articulating a confusion poises me to solve it - that the mere articulation of a quandary initiates some process in my brain which solves it. This is a baffling but reassuring phenomenon which I consider when encountering new problems.

So let me attempt to apply this phenomenon to the study of confidence intervals in statistics.

The concept of a confidence interval is that it measures how close to the true theoretical value of a statistical parameter your experimental value is. For instance how close your sampling proportion p̂ is to the population proportion p.  To measure this we pick a confidence level we want, usually 90%, 95%, or 99%, and from there determine the range of values (the confidence interval) that p could be possibly be.

p̂ is not a binomial random variable as it denotes the mean # of successes, not the # of successes. p̂ = B(n,p)/n. The mean is then E(B(n,p)/n) = np/n = p and the standard deviation is SD(B(n,p)/n) = SD(B(n,p))/n = sqrt(npq)/n = sqrt(pq/n). 

p̂ is still discrete, and still takes the shape of B(n,p), although flattened. It appears that it approximates a normal distribution as long is p is near .5 and n is large. I have made various plots in R of binomial distributions with varying p's and n's and found this to be the case. 


In fact, the central limit theorem (CLT) states the distribution of a sum of independent random variables approaches a normal distribution. Thats why we can use a normal approximation for p̂; p̂ is like a B(n,p), and all binomial R.V.'s are just a sum of bernoulli R.V.'s, which are by definition independent! Ergo we can use the CLT here. Experimentally this proves to be the case too. 

(My God. That is profound. That is why the distance covered by a random walk, and why the sum of n dice rolls are both normally distributed...both are sums of independent random variables!).


Ok fine,but what then are the mean and standard deviation of this normal distribution that it approaches? Perhaps that is too in depth of a discussion to determine in general. For p̂ at least I can apply the expectation and variance functions E(X) and Var(X) to p̂ to obtain the respective mean and variance. 

Here is the problem: SD(p̂) = σp̂. = sqrt(pq/n). But we don't ever know p (or consequently q). So we need to make the estimation Var(p̂) = sqrt(p̂(1-p̂)/n). But how well does this approximate sigma? That is never addressed in any of the resources I have consulted, and I don't know how to evaluate the accuracy of  σp̂.  































I we are unable to precisely articulate our qua

So in statistics, here is my conception of the idea of a confidence interval.

Okay so suppose we have a psychic who says she can predict the future / read your mind / etc.
To assess her claims we may observe how may times the future actually matches her predictions.
The concept really is to see if her correctness rate is significantly greater than the chance rate, i.e., greater than the correctness rate of someone who is guessing at each prediction.

We will test our psychic for how often she can read our mind for a number we pick in our head from 1-5 for example. Someone guessing, over many trials,