# Introduction to Probability Theory and Mathematical Statistics

We have seen that probability theory actually reduces common sense to calculations. It enables us to use our rational minds to accurately evaluate intuitions and insights that are often unable to explain clearly. Attractive Yes, probability theory, a science that originated from thinking about games of chance, should have long been the most important part of human knowledge... What are the most important problems in life are mostly just problems of probability theory. Marquis of Laplace (Pierre-Simon Laplace) French mathematician and astronomer

"Probability Theory and Mathematical Statistics" is a commonly studied course in science and engineering, but after studying, students seem to have only learned some mathematical formulas, and they have not produced an intuitive understanding of statistical concepts and an understanding of the theoretical basis of the method. (It is related to the application conditions and limitations of the method), it is not clear to distinguish between subjective probability and objective probability. Mathematical statistics and probability theory are two closely related sister disciplines. Generally speaking: Probability theory is the basis of combing statistics, and mathematical statistics is an important application of probability theory.

Mathematical statistics is a highly applied subject with its methods, applications and theoretical foundations. In the West, the term "mathematical statistics" refers specifically to the part of the mathematical basic theory of statistical methods. In my country, it has a broader meaning, including methods, applications, and theoretical foundations, which is called in the West. For: "Statistics". In our country, because there is still a statistical subject that is recognized as a social science, it is sometimes necessary to use these two terms differently. "Probability Theory and Statistics"

## Preface

I want to understand some things, such as why the load factor of HashMap is 0.75. I am not satisfied that this is just a good explanation. My question is why 0.75 is just right. Then I saw on Zhihu that there are actually high-level languages similar to HashMap. The data structure of, such as 0.65 in Go, 0.8 in Dart, and 0.762 in python.

So why is HashMap in Java 0.75? I found the answer in the comments in HashMap:

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution ( en.wikipedia.org/wiki/Poisso ) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k)/factorial(k)).

(entry ) TREEIFY_THRESHOLD hash 0.75 , ( ) 0.5 k : $P(X=k)=\frac{0.5^k}{k!}e^{-0.5},k=0,1,$

0 8 0.75 8 StackOverFlow :

6001

?

:

:

It doesn t feel like a medical student, as if more bacteria are equal to dirty. In the toilet, most of the bacteria belong to the Actinobacteria, Bacteroides and other categories. These bacteria are very harmful to the human body, and some can even enter the air through the airflow of the toilet. Although there are many bacteria in human saliva, most of them are harmless. What I want to say is that whether it is dirty or not is not just a matter of the number of bacteria, but also the quality of bacteria. It s just that the teacher said that people should surrender to the light of reason in their heart, and not as if what the teacher said is all right. Everyone understands this truth, but everyone is superstitious of authority. I dislike this kind of attitude. The number of bacteria defines the idea of not being dirty. Salmonella has two lethal forms, including Salmonella enterica and Salmonella typhi. Salmonella typhi causes typhoid fever, which kills 216,000 people every year. It is spread through infections of the face and urine. There are about 500-1000 different kinds of bacteria in each person's body, and they can multiply about 100 trillion individual cells in an adult's body-about 10 times the total body cells of a person. Even if you take a bath every day, these bacteria can't be washed away. According to your statement, a person who pays attention to cleanliness is more dirty than a petri dish containing Salmonella. (If you don t believe me, you can check the number of bacteria on the human body. Each of us has more bacteria than the toilet. Every one of us is so dirty. It s the kind that can t be washed off. It s the kind of dirt. Even if it becomes a corpse, it is more dirty than the toilet.) Dirty is a perceptual realization. Your teacher is at least a graduate degree. Don t you understand? I always think this is a rumor, not knowledge. And are you from a science and engineering background? Seriously doubt it! If you offend, I hope to forgive me

For another example, in HashMap, the length of the hash bucket array table must be 2 to the nth power (it must be a composite number). This is an unconventional design. The conventional design is to design the size of the bucket as a prime number. Relatively speaking, the probability of a conflict caused by a prime number is less than a composite number. You see, here is the probability. So what is probability?

## What is probability?

### Subjective probability

The word probability appears very frequently in the real world. People often use probability to express a measure of whether a certain event has occurred. For example, when Xiao Yan swallowed the fire in the heart of Qinglian in "Fights Break Sphere" Description:

According to rough calculations, if there are no auxiliary items such as blood lotus pills, the success rate of swallowing abnormal fire is basically less than 1%, but with them, this success rate may be increased to about 10%.

We can find a lot of judgments like this in the real world. For example, when the weather is gloomy, people will roughly say that I am 90% sure that it will rain today, and there is a 90% chance that Shakespeare is true. Writing "Hamlet", the probability that Oswald assassinated President Kennedy alone is 80%.

But is this probability in mathematics? No. Simply put, it is a measure of people s confidence in what they say. In other words, less than 1% is because Xiao Yan is not so confident about whether he can successfully swallow the abnormal fire. Ninety% To be confident that it will rain today means that we are very sure that it will rain today. Probability as a measure of the degree of individual confidence is often called subjective probability. There is no such thing as the frequency in a coin toss experiment as a support for probability. It is just a measure of the degree of confidence. The difference in numbers given by different people does not have much meaning. Like today s gloomy weather, you think there will be an 85 percent chance of rain, but I think it s 90%. The only explanation for this is that I am more convinced that it will rain today.

The objective probability, the probability in mathematics is linked to the frequency. We always give the sample space first, that is, assuming that all possible results are known, we cannot know the results before the experiment, all possible results The formed set is called the sample space, and any subset of the sample space is called an event. An event is a set composed of some results of an experiment. If the result of the experiment is contained in E, then we say that event E has occurred.

Compared with the well-known coin tossing and dice tossing experiments, we can easily get the sample space. The sample space for coin tossing is as follows:

• Face up (digital side)
• Back side up (non-digital side)

If I ask what is the probability of being upright at this time? Maybe some students will answer one-half, which is 50%, without hesitation, but the assumption of the answer of 50% is that the coins are even. That is to say, the probability of occurrence of positive and negative sides is equal. After exhausting the sample space of the experiment, people often divide the number of events in the sample space by the total number of sample spaces to calculate the probability of an event. At this point, I remembered the discussion between me and my college classmates. When I started taking the course "Probability Theory and Mathematical Statistics", my classmates came up with the following conclusions:

• My probability of failing in this course is 50% (because there are only two events in the sample space: failing, not failing)
• The probability of being a monitor is also 50% (because there are only two sample spaces: being a leader, not being a leader)

The reason for this inference is that the reason is that there are too many questions. Many of the problems of probability theory are given equal possibilities. There are too many questions, whether it is a mathematical problem or the real world. The default and the possibility of presetting, etc., the above wrong conclusion is that the two coins of "being a class" and "being a monitor" are uniform in texture from the beginning. The long and improper heads are both one-half, so it can be concluded that the probability of a single event in a sample space with only two cases will be 50%. There is such a coincidence in the exercises in probability theory that the number of subsets in the sample space divided by the total number of sample spaces is exactly equal to the probability of this event (equal possible sample space). For example, if you roll a dice, there are six In this case, our general assumption is that the texture of the dice is uniform. This implies that the appearance of each face is equal to the probability, and then it is deduced that the probability of each face is one-sixth. This model We call it equal possible sample space in probability theory.

The wrong conclusion drawn above lies in the assumption of equal possibilities at the beginning. Taking the second thesis alone, we have no way of knowing what is the probability of an event happening in the sample space of being a monitor? Personally The inference is that the probability of being elected is 0.0000001%. Strictly speaking, I also gave the possibility of this model again, but here I just think that it is impossible for my college classmate to be elected as the monitor.

Subjective probability can be understood as a mentality or tendency. The root causes are basically two: 1. based on its experience and knowledge, for example, when we were young, our family was drying wheat on the road. The sky was gloomy that evening. It was decided that it would rain today, so I told my father that the wheat would be harvested in the evening. But a grandfather from the same village told my father that there is no need to collect it. The weather today is very good, the wind is very good, and it will not rain. My father finally listened to the opinion of the grandfather. Although the sky was gloomy that day, it did not rain. From this point of view, subjective probability also has an objective background, which is different from rhetoric after all. The second is based on a strong relationship. Take today s weather in Shanghai. It rained yesterday, but when I went out for breakfast, I still didn t bring an umbrella. The reason is that even if it rains, I don t care. I find it a bit troublesome.

The characteristic of subjective probability is that it is not established on a solid objective basis and is recognized by people, so it seems that it should be denied by science (the task of science is to explore the truth). It's not clear how to understand the subjective probability, but don't negate it completely. One of the reasons is that this concept has a broad foundation in life. We estimate the possibility of various situations almost all the time, and different people seldom reach agreement on an "objective" basis. The second reason is that this may also reflect a tendency of the cognitive subject, which has its social significance. For example, if you ask what is the probability of "the economic situation will be better in three years?", people of different economic conditions, social status, and even political inclinations will make different estimates. In terms of individual estimates, there may not be many reasons; but in general, it reflects the confidence of the general public in the long-term development of the society. This is very useful information for sociologists and even decision makers. The third reason is that in designing (economic and other) profit and loss decision-making issues, people in different positions and with different levels of information should weigh the possibility of an event with reference to these circumstances and consequences, and it is suitable for a certain event. A person's decision, such as a decision with less risk, may not be suitable for another person, and for another person, the risk of this decision may be somewhat greater. Therefore, the concept of subjective probability also has its practical basis. In fact, many decisions inevitably involve personal judgment, and this is subjective probability

### Objective probability

#### Experiments and events

The subjective probability we mentioned earlier is a kind of mentality or tendency to whether an event occurs, so let's examine the term "event" below. What is an event? In the usual sense, it often refers to a situation that has occurred, such as the XX air crash, the Japanese attack on Pearl Harbor in 1941, and so on. In probability theory, this is not the case. An event does not refer to a situation that has occurred, but a "statement" of a certain (or some) situation. It may or may not happen. Whether it happens or not depends on what is relevant." "Experiment" will only be known after the results have been obtained.

For example, it will not rain before 6 o'clock today. Of course, we didn't say that this has happened. Whether it happened or not, we have to wait for the results of the experiment. This experiment is to observe the weather conditions before 6 o'clock in the afternoon.

By extension, it is not difficult for us to understand: In probability theory, the general meaning of the term "event" is as follows: (1) There is a clearly defined experiment. The term "experiment" has artificial and active meaning As in the previous example, people are only in a passive position, only recording without interfering with the meteorological process. This type of situation is generally called "observation". In statistics, this one sometimes has practical meaning, but it is not important to the current discussion. You can include the word "experiment" as an observation. (2) All possible results of this experiment were clarified before the experiment. Taking the example above, there are only two possible results of the experiment: one is that it will not rain before 6 o clock today, which is represented by A, and the other is that it will rain before 6 o clock today.$\overline{A}$ , So this experiment can be written as {$A$ ,$\overline{A}$}. You don t have to complete the experiment, that is, you can know it until 6 pm, it s not${A}$ , just$\overline{A}$.

But in many cases, we cannot know exactly all possible results of an experiment, but we can know that it does not exceed a certain range. At this time, this range can also be used as all possible results of the experiment. For example, in the previous example, if we are not only interested in whether it rains before 6 pm, but need to record the rainfall before 6 pm (in millimeters), the experimental result will be a non-negative real number x, and we cannot determine the value of x Possible value range, but this range can be taken as [0,$\infty$ ], it can always contain all experimental results, although we know that some results, such as x> 10000, are impossible to appear. We can even take this range as (-$\infty$ ,$\infty$ ), there is a certain mathematical abstraction in it, which can bring great convenience, which will become clearer in the future.

(3) We have a clear statement that defines a certain part of all possible results of the experiment, or a certain part, which is called an event. For example, in the following example, A is all possible results ($A$ ,$\overline{A}$). In the example of rolling the dice, we can define many events, such as:

• E1 = {Throw an even number} = {2,4,6}
• E2 = {Throw prime points} = {2,3,5}
• E3 = {roll a multiple of 3} = {3,6}

And so on, they clearly defined a corresponding part of the set of all test results (1,2,...,6).

If we do the experiment now, that is, throw the dice once, when the result of the throw is 2 or 4, or 6, we say that event E1 "has happened", otherwise we say that event E1 "does not happen", so , We can also say: An event is a proposition related to the results of an experiment, and its correctness depends on the results of the experiment.

In probability theory, there are cases where a single experimental result is called a "fundamental event". In this way, one or some basic events are combined to form an event, and the basic event itself is also an event. In the example of rolling the dice, there are 6 basic events such as 1, 2,..., 6. Event E2 is composed of three basic events 2, 3, and 5.

Imagine you are in a situation where you throw a dice. If there are prime points, you will win the prize. Before the dice are thrown, you will think this way: Whether I can win depends on chance. Therefore, in probability theory, events are often referred to as "random events" or "incident events". The meaning of "random" is nothing more than to say that whether an event occurs in a certain experiment depends on chance. The extreme case is an "inevitable event" (things that must happen in an experiment, such as throwing a dice, whose number of points does not exceed 6) and "impossible events" (times that cannot happen in the experiment). There are no opportunities for these two situations, but for the convenience of calculation, we might as well regard them as special cases in random events.

#### Probability definition of classical probability

Continuing from the previous paragraph, suppose a certain experiment has a limited number of possible results e1, e2,..., en, assuming that from the analysis of the conditions and implementation methods of the experiment, we cannot find any reason to believe that one of the results, such as ei, It is more advantageous than any other result, such as ek (that is, more prone to happen), so we have to think that all the results e1,...,en have the same chance of appearing in the experiment, that is, 1/N When opportunities arise, such experimental results are often referred to as "equally possible".

Take the example of rolling a dice, if: The material of the dice is absolutely uniform; The dice is an absolute regular hexahedron; When the dice is thrown there is sufficient height from the ground, then most people will agree that the chances of each side appearing should be equal may. Of course, this can only be an approximation in real life, not to mention the number of points engraved on the dice will also affect its symmetry.

On the basis of the concept of "equal possibility", it is natural to introduce the definition of classical probability: Suppose an experiment has N equal possible outcomes, and event E contains exactly M results, then the probability of event E is recorded as P(E), defined as:

P(E) = M/N

The limitation of classical probability is obvious: it can only be used when all experimental results are finite and equal possibilities are true, but in some cases, this concept can be slightly extended to the case where there are infinitely many experimental results. This is the so-called "geometric probability".

#### Probability definition from statistics

From a practical point of view, the statistical definition of probability is nothing more than a method of estimating event probability through experiments. Take the example of "throwing a dice". If the dice is not a cube with a uniform texture, the probability of each side appearing at the time of throwing does not have to be the same. At this time, the probability of "6" E1 at this time cannot be determined by a theoretical consideration alone. But we can do an experiment: repeatedly throw this dice a large number of times, for example, n times, if there are 6 in these n throws and a total of m1 times, then m1/n is called E1. Throwing is counted as the "frequency" in an experiment). The main point of the statistical definition of frequency is to take this frequency m1/n as an estimate of the probability P(E1) of the event E1. The intuitive background of this probability is very simple: the probability of an event s occurrence should be characterized by the frequency of its occurrence in multiple repeated experiments.

The general situation is no different from this. Just in the above description, change "rolling a dice" to a general experiment, and change the event E1 of "6" to a general experiment. The main point is: The experiment must be repeated in large numbers under the same conditions so that we can observe the frequency of the event.

The shortcoming of the above definition is that frequency is only an estimate of probability rather than probability itself. Formally, the following statement can be used to solve this difficulty:

The probability of event E is defined as a number p with the following properties: When the experiment is repeated, the frequency of E swings around p, and when the number of repetitions increases, the swing becomes smaller and smaller, or simply: the probability is when the experiment The limit of frequency when the number of times increases indefinitely. To do so, the following questions must be answered: How do you prove the existence of p with the above properties, or that the existence of p is a hypothesis? Proponents who use frequency to define probability often answer the above questions like this: They think n(E)/n tends to the limit of a certain constant is a hypothesis or an axiom of the entire system. But this assumption seems extremely complicated, because, although the limit of frequency needs to be assumed in fact, it is not the most basic and simple assumption. At the same time, this assumption is not accepted by everyone. We are more inclined to define some simpler, more obvious axioms, and then prove that the frequency tends to the limit of a constant in a certain sense, isn't it more reasonable? This is also the axiomatic method of modern probability theory (implemented by the former Soviet mathematician Kolmogorov)

The importance of the statistical definition of probability is not that it provides a method of defining probability, it actually does not provide such a method, because you can never accurately define the probability of any event based on this definition. Its importance lies in two points: One is to provide a method of estimating probability. This has been mentioned above, there are many such applications. For example, in a population sample survey, the illiteracy rate of the entire population is estimated based on a small sample of people; in industrial production, the rejection rate of the product is estimated based on some products sampled. The second is that it provides a criterion for testing the correctness of the theory. It is assumed that the probability p of a certain event A is calculated based on a certain theory, assumption, etc. We are not sure whether this theory or assumption is consistent with reality, so we Recourse to experiments, that is, to conduct a large number of repeated experiments to observe the frequency of event A m/n. If m/n is similar to p, the experimental results are considered to support the relevant theory; if you want to go farther, you think the theory may have error. This type of problem belongs to an important branch of mathematical statistics-hypothesis testing.

## Mathematical expectations and average time complexity

500 1000 , 2/3, 1/3 , :

" " ( ) 1000 1000 3: 1. 750 250 :

$1000\times/frac{3}{4}$ + $0\times/frac{1}{4}$ = 750 $1000\times/frac{1}{4}$ + $0\times/frac{3}{4}$ = 250

X X ( ) X : 1000 0 $\frac{3}{4}$ $\frac{1}{4}$ X " "

X

This is the origin of the term "mathematics expectation" ("expectation" for short). This term originated from gambling. It does not sound very popular or easy to understand. It is not a very appropriate name, but it has a long history in probability theory. Recognized by everyone, it also has a firm foothold. Its another name is Mean, which is easy to understand and very commonly used. It will be explained below. Is the above explanation a bit abstract? We not only need to be abstract, but also need to be intuitive. If we carefully examine the above formula, is there a feeling of weighted average? The weight is the probability, and the weight is the possible value of X.

Another expected explanation comes from the frequency interpretation of probability. This explanation (we will talk about the strong theorem of large numbers later) believes that if an infinite number of independent repeated experiments are carried out, then the ratio of the number of occurrences of any event E to E is P(E). Assume that the possible values of the random variable X are x1, x2,..., xn. And the corresponding probabilities are P(x1), P(X2),...,P(Xn), and we interpret random variables as the winning unit of a game of chance. That is, in each game we use xi units of probability p(xi), i = 1,2...,n. Now we use frequency to explain, if we continue this game, then the proportion of our winning xi is p(xi). Since this is true for all i(i=1,2,...,n), our average game unit per game is:

$/sum_{i=1}^{n} x_ip(x_i)$ = E(X) E is the abbreviation of Expectation

Roughly speaking, we can think of the expectation, which we will call the average below, and understand it as the average win. If the average win is lower than your bet, then there is a high probability of losing money, because the game itself may be unfair. . Then we return to the average time complexity in the algorithm, we simply introduce a problem:

For a set of ordered numbers, what is the average time complexity of searching in order? This is actually an expected application of mathematics, that is, finding the average. We simply give an example to discuss: {1,2,3,4,5}. Find 1, you can find it immediately, and find 2 to find twice, in the second position. When searching, we cannot determine which number to find, which also involves the mean. Assuming that the probability of finding any number in an ordered sequence is equal, the mathematical expectation is $\sum_{i=1}^{n}/frac{1}{n}/times/frac{(n+1)/times n}{2} =/frac{(n+1) }{2}$
$\frac{(n+1)/times n}{2}$It is the number of digits compared when looking up the number. The mathematical expectation at this time is the average number of comparisons. This is the predecessor of the average complexity of the algorithm. Roughly speaking, the time complexity is how the time spent in the operation increases when n increases. The above is the sequential search, and we can say that the complexity of the sequential search is Linear growth because$\frac{(n+1) }{2}$ Is a linear function.

## Conclusion

This article is the beginning of probability theory and mathematical statistics, vaccine science, data structure and algorithm analysis. I did not introduce many mathematical formulas. I try to introduce probability and combing statistics in an intuitive way. We must not only be abstract, but also To restore abstraction to intuition. Roughly speaking, probability is the result of analysis, or prediction, when the cause is known. And mathematical statistics is to analyze the cause under the condition of known results. This article was originally intended to be called study notes, but it doesn't feel like study notes while writing. The "Basic Course of Probability Theory" and "Probability Theory and Mathematical Statistics" are mixed together. Because the learning articles of "Probability Theory and Mathematical Statistics" will be published later, this article will link the past and the future, so let's call it an introduction.