Ultimate Machine Learning Algorithm

Traditional algorithms take input data and output results. Machine learning algorithms, on the other hand, take input data and desired outputs and output an algorithm. If the industrial revolution liberated humans from physical labor, then the AI revolution is liberating humans from mental labor.

This article is from BEDROCK member Jimmy and is based on his discussions with the team.

Machine learning algorithms can be mainly divided into five theories, each with its own advantages, disadvantages, and scope of application. The Master Algorithm is an ideal ultimate algorithm that can incorporate all five algorithms：

Symbolists

The Symbolists represent knowledge as a collection of symbols and use induction and deduction to acquire new knowledge. They mainly rely on the theories of philosophy, psychology, and logic. However, this historically based approach has a serious problem, known as Russell's turkey: there is a scientist turkey who realizes that every morning at nine o'clock someone brings breakfast. The turkey, being intelligent, knows not to mistake an accidental occurrence for a general rule. It observes that people bring food every day regardless of weather, date, and other irrelevant factors and declares this to be a universal truth. On Halloween, the morning of nine o'clock arrives, but no breakfast is brought, and the turkey is captured and killed by its owner. This illustrates the limitations of the historically based approach. In addition, there are many concepts that humans have not yet discovered or defined, which further limits the Symbolists' approach.

BR Research：

Historical judgments are especially inappropriate when it comes to paradigm shifts. Using past experiences to evaluate new paradigms leads to serious errors. For example, the precise prevention and control strategy for the Shanghai epidemic was highly successful in the past. However, due to the large number of infected people this time, it was impossible to trace all of the close contacts, and even the infected individuals themselves may not have been identified, resulting in the gradual loss of control over the epidemic. This is a paradigm shift from "the ability to track exceeds the number of close contacts" to "the ability to track is less than the number of close contacts," brought about by a quantitative change that leads to a qualitative change.

The essence of the turkey problem is a philosophical issue, also known as Hume's problem, since all human knowledge and science come from information from the past (no one knows any information from the future). Scientific theories are therefore falsifiable but have not yet been falsified. The main job of scientists is to revise past theories based on new facts and experiments. For individuals, continually revising their own concepts is difficult and requires a lot of energy, but it is necessary for every investor. From this perspective, it is understandable why unfalsifiable theories (such as various religions) can attract so many believers.

Connectionists

Connectionists, also known as the connectionist school, propose the use of neural networks based on the study of the brain's functioning. Similar to the interconnected neurons in the brain, the focus is on the strength of connections between nodes rather than their actual meaning. By using gradient descent, the algorithm optimizes and modifies the strength of connections between nodes, resulting in the learning process. However, gradient descent is often referred to as a hill-climbing algorithm, which can lead to the problem of being stuck in a local optimum, like a small hill, as there may be no higher point within a small step around it. Although neural networks have a high degree of freedom due to their many parameters, their biggest problem is the difficulty in understanding them.

In computer science, P problems are those that can be quickly solved, such as sorting a finite length sequence in ascending order. On the other hand, NP problems are those that are difficult to solve, but can be quickly verified as to whether a given answer is correct, such as the famous traveling salesman problem: Is there a path from city A to city B passing through N cities with a total length less than L? Although NP problems are difficult to solve, the human brain can solve them approximately, which is an advantage of neural networks.

BR Research：

Neural networks are currently the most popular theory, while the symbolist school was more popular in the last century. Due to the large number of parameters, neural networks have high computational requirements, and it is only with the rapid progress of computing power in the 21st century that their development and application have become possible. From the current perspective, the brain is already the most powerful tool for learning and thinking, which is the result of natural selection. Therefore, the upper limit of this direction is at least the level of the human brain, and there is still much room for improvement. However, the current scale of computer neural networks is still far from comparable to the human brain, mainly because computer power consumption is much higher than that of the brain, so there is still room for improvement in terms of stronger computing power, lower power consumption, and other aspects.

The biggest problem with neural networks is still the difficulty in understanding them, making targeted improvements to them nearly impossible. If neural networks can be combined with the symbolist school, there may be hope to solve this problem. Additionally, many computer scientists are working towards making neural networks interpretable. If successful, it may help solve the problem of understanding and interpreting the results of neural networks in quantitative finance.

Evolutionaries

Evolutionaries (Evolutionary Learning) borrow from the algorithm of natural evolution. They first randomly or manually generate many samples, then use a fitness function to score each sample, and keep the ones with higher scores. The samples will then undergo gene crossover and random mutation to create the next generation, repeating this process. Since mutations are random and directionless, they can solve the problem of getting stuck in local optima. However, it takes a long time to wait for a beneficial mutation, so evolutionary algorithms often alternate between fast evolution and stabilization in local optima. In machine learning, crossover has a much smaller role than mutation, as it mainly causes the individual's performance to converge towards the mean. This is important when there is no clear evolutionary goal, such as natural selection in biology, but it is not very useful in machine learning where the goal is clear.

BR Research：

Efficiency and ability are two opposing features. Gradient descent algorithm is efficient, moving a small step towards the highest direction each time, but it is easy to get stuck in local optima. Random mutation has much lower efficiency because the probability of mutation being in the highest direction is very small, but its freedom is greater because it will try various ways to break the norm. It's like putting bees and flies in a glass bottle with the bottom facing the light source. Bees only fly towards the light source, so they can never fly out; flies randomly bump around and are more likely to escape.

Bayesians

Bayesians (Bayesian school) use probability theory as the basis for machine learning and view it as a process of inferring the probability of various hypotheses being true using data. For example, Laplace's rule of succession states that the probability of the hypothesis "the sun rises every day" being true is (n+1)/(n+2) when the sun has risen n times. The larger n is, the closer this probability is to 100%. In general, Bayesian theorem assumes that the occurrence of each event is relatively independent because they are separated by geography. However, the causality between events may be much more complex than probability theory suggests. For example, if A: the sprinkler is turned on, the lawn will be wet; B: if the lawn is wet, it has rained. If the probability of A is 80% and the probability of B is 60%, then according to probability theory, the probability of "if the sprinkler is turned on, it has rained" is only 48%, while in reality, the correlation between these two events is much lower. In addition, probability theory may lead to repeated counting, such as the news of "discovering aliens" being reprinted by many media outlets, but the probability of this event happening has not increased.

BR Research：

The core of Bayesian theory is to continuously update the probability of a theory being correct by using new information. In investing, this manifests as tracking a company's performance. If a company's performance consistently meets or exceeds expectations, confidence in its fundamentals will increase. However, if the observation object is selected incorrectly, the probability cannot be updated accurately. For example, before the subprime crisis, most institutions tracked housing prices because they believed that the real estate market was a fully-traded market, and housing prices could reflect the supply and demand relationship truthfully. As long as housing prices did not fall, mortgage loans would not be risky. The fact that housing prices kept rising made their beliefs stronger and stronger. However, the big short Michael Burry tracked the default rate of mortgage loans and found that it had been rising all the time, thus making the opposite judgment of the market, which ultimately proved to be correct.

Analogizers

Analogizers, based on similarity between data points, learn through comparison. The representative k-nearest neighbor algorithm is the simplest and fastest learning algorithm and is often used as a classifier. Linear fitting on all data points results in poor accuracy, but it is very effective within a local area. The linear fittings from different regions can be combined to form an overall non-linear fit. However, the k-nearest neighbor method is vulnerable to interference from irrelevant information. If some information is given a lower weight, it means that someone needs to define what is irrelevant information. If every feature has the same weight, when there is too much irrelevant information, the k-nearest neighbor method is similar to guessing. In addition, when there are many dimensions, it is difficult to define similarity. In three dimensions, assuming that the radius of the orange pulp accounts for 90%, the proportion of pulp is 0.9^3=72.9%. In a 100-dimensional space, the proportion of pulp is 0.9^100=one hundred thousandth. To solve the multidimensional problem, the Support Vector Machine algorithm assigns a weight to each feature, which is equivalent to selecting only a few of the most important features to perform k-nearest neighbor algorithm.

Unsupervised learning: The k-means algorithm assigns the nearest neighbor samples to a class and automatically labels them, but the differences between each category should be relatively obvious, like a bunch of points on a two-dimensional plane being far apart from each other, otherwise, they will be confused together. The process of reinforcement learning is like rapid evolution, constantly trying, evaluating, and finding better ways to partition data.

The master algorithm is composed of three parts: Optimization, used to improve programs and find optimal solutions; Evaluation, which calculates the goodness or badness of the current results; and Representation, which describes the output. A research group at Stanford has already used a metabolic network to construct a cell model, rather than using genes and proteins. Although it is still far from constructing the ultimate algorithm, the path is clear, and scientists have already set out on the journey.

The Role of Historical Experience in Investment

There are two ways for humans to acquire knowledge: induction based on facts, such as the sun rising in the east and setting in the west, and swans being white; and deduction based on hypotheses, such as Einstein's assumption that the speed of light is constant, which led to the development of the theory of relativity. Of course, both of these methods require verification through repeatable experiments. These methods are often used together, such as inductive reasoning based on a large number of facts to derive several mathematical axioms, and then using deductive reasoning to build the Euclidean geometry system.

In theory, if enough information and facts are collected, all knowledge can be obtained through induction, which is the method used by the Master Algorithm. However, this method is not applicable to humans because the information they collect cannot be transmitted or shared, and can only be stored in a distributed manner in each person's brain. Moreover, humans do not have the ability to extract knowledge from massive data. Computers, on the other hand, can easily store data collected from various sensors and measuring instruments in a database. Therefore, for us at present, traditional methods are still needed to acquire knowledge.

Since past experiences cannot predict what has not happened, the turkey problem, how much accuracy can be expected from knowledge derived from past history in the future?

Laplace's rule of succession provides some ideas. If something or a law has occurred many, many times, its correctness can be more guaranteed. If a turkey scientist can observe for two or three years, its theory can also be corrected. The more fundamental rules occur, the more times they occur, such as biological laws; the fewer macroscopic rules occur, such as economic crises. From this perspective, the importance of fundamental rules for investment is self-evident.

BR Research：

The research methods of the symbolic and Bayesian schools are similar to the cognitive process of humans, that is, summarizing rules from historical experience and continuously verifying these rules with new facts. However, it is important to distinguish whether it is a turkey problem or a problem of the sun rising as usual in practical applications. Although extending the time range of historical data can to a certain extent avoid the turkey problem, new things that have just emerged often do not have a long history of data, such as when smartphones were just introduced. It is also difficult to determine whether the selected time range is long enough and whether the selected data range is broad enough. Therefore, it is still necessary to use the method of hypothesis, reasoning, and verification to deal with the paradigm shift period.

BR Partners

Ultimate Machine Learning Algorithm - Book Review

Recent Posts

Comments