2021-Jan-18
The Three Character Classic is a 13th century Chinese text with three
characters per line which is traditionally read by children.
Below is an excerpt from the 1812
translation
by Robert Morrison, Presbyterian missionary and author of the first
Chinese-English dictionary.
Chung-ni [another name for Confucius] once called a boy of ten years of age
his instructor; for, of old, even perfect and wise men learned diligently.
Chao, when he held the office of Chung-ling, read Sun-yu. Though filling so
high a situation, he yet learned diligently – so much so, that he never laid
the book out of his hand.
In the time of the emperor Sung, Lu-wen-shu was constantly looking over the
books engraven on leaves.
Wu-yao made leaves of the reed bamboo, by paring it thin. Though he did not
possess books [as we do], he exerted himself in the pursuit of knowledge.
Sun-king suspended his head by its hair to the beam of his house, to prevent
his sleeping over his books.
Su-tsin pricked his thigh with an awl, to prevent his sleeping.
Those persons, though not taught, of themselves rigorously pursued their
studies.
Che-yin, when a boy, being poor, read his book by the light of a glow-worm
which he confined. And Sun-kang, in winter, read his book by the light reflected
from snow. Though their families were poor they studied incessantly.
Chu-mai-chin, though he subsisted by carrying fire-wood round the town to sell,
yet carefully read his book. At last he became capable of, and filled a public
office.
Li-mie, while watching his cattle in the field, always had his book at hand,
suspended to the horn of a cow. These two persons, though their bodies were
wearied by labor yet studied hard.
Su-lao-tsiuen, at the age of twenty-seven years began to exert himself, and
read a great many books. He, when at that age, repented of his delay: you,
a little boy, should early consider.
Leang-hao, at the age of eighty-two, was permitted to answer the emperor in
his palace, and was placed at the head of all the literati. In the evening of
life his wishes were fulfilled, and all spoke of his extraordinary learning.
You, a little boy, ought to determine to pursue your studies.
Yung, at eight hears of age could recite the Odes. Li-pi, at seven years of age
could play chess. These clever and studious boys were called by everyone
wonderful. You, youths, ought to imitate them.
Tsai-wen-ki could play a stringed instrument. Sie-tao-wen could sing well.
These ladies were clever. You, who are a gentleman, ought at an early time of
life, to perfect that which is suitable.
Chin-tung, a remarkable lad, was raised by the emperor to fill the office of
Ching-tsi. He, though a youth, was made a public officer. Do you, youths,
exert yourselves to learn, and you may arrive at the same. Let all who make
learning their pursuit be as those persons whom we have mentioned.
It is natural for a dog to watch at night, and for a cock to crow in the
morning; if anyone does not learn, how can he be called a man?
2020-Oct-31
In Bayesian inference, we can factor approximate computation (e.g.
linearization) into the actual
posterior probabilities.
Suppose we have a pmf
\(f(x) = P(X=x)\) which is hard to compute.
If we approximate \(f\) by \(\tilde{f}\) then
\[\begin{align*}
P\left(X = a \,|\, \text{we only compute } \tilde{f}\right)
&= \sum_x x P \left(f(a)=x \,|\, a, \tilde{f}(a) \right)\\
&= E\left(f(a) \,|\, a, \tilde{f}(a) \right)
\end{align*}\]
What is \(P\left(f(a)=x \,|\, a, \tilde{f}(a)\right)\)?
Well, if \(f\) is hard to compute then we probably can’t gather much data, so
there are various options to produce a subjective belief:
- average-case analysis of \(\tilde{f}\) with an uninformed prior, e.g.
probabilistic numerics
- reference classes of “similar” cases
- uniform distribution across worst-case bounds
- past empirical experience
- etc.
Note that if the mean of the pmf \(P(f(a)=\cdot \,|\, a, \tilde{f}(a))\)
is \(f(a)\) then \(P(X = a \,|\, \text{we only compute } \tilde{f}) = P(X=a)\).
So accounting for uncertainty due to approximation is equivalent to
“de-biasing” it.
Example:
Suppose \(f\) has a single atom and our approximation \(\tilde{f}\) is
modeled as \(f\) shifted by some unknown amount:
\(\tilde{f}(x) = f(x + Y - 5)\), where
\(Y \sim {\rm B{\small IN}}(10, 1/2)\).
If \(\tilde{f}(0) = 1\), then
\[\begin{align*}
P(X=0 \,|\, \text{we only compute } \tilde{f})
&= P(f(0) = 1 \,|\, \tilde{f}(0) = 1) \\
&\approxeq P(\tilde{f}(0) = 1 \,|\, f(0) = 1) \\
&=\binom{10}{5} 2^{-10} \doteq 0.246.
\end{align*}\]
(The approximate equality holds if, say, we assume the location of the atom is
a priori uniformly distributed on a large integer interval.)
Note that this is not completely new. E.g. when inferring how likely it is
that software is bug-free based on a finite set of
tests,
we are putting probability distributions on mathematically determined
statements, assuming the software is deterministic.
Inference is approximated for computational reasons in many places such as
linearization as mentioned already, clustering by compression using a zip
algorithm (instead of computing Kolmogorov complexity), PASS-GLM, MCMC
sampling, numerical methods, approximation algorithms, probabilistic
data structures, et cetera.
Is this ultimately rigorous in a decision theoretic sense? I don’t think so,
but
what is rigorous
can easily be mathematically intractable.
So whatever, it’s a heuristic.
2020-May-21
Comments on social sites have to be sorted somehow.
How do big platforms do it – is it some complicated mix of
recommender systems,
learning-to-rank algorithms,
Markov decision processes,
neural networks, and
learning automata?
Well, maybe in some cases
but often it’s just a simple formula.
In this article we put the formulas used by Hacker News, YouTube, and
Reddit, along with a few alternatives, to the test, using virtual comment
section simulations.
Spoiler alert: YouTube does not do well.
The simulation model
240 visitors arrive at equally spaced increments over a 24 hour period.
Each visitor is randomly assigned as a commenter (10%) or a voter (90%).
Commenters leave a single comment, which gets a randomly assigned quality
category: great (10%), mediocre (80%), or stinker (10%).
Great comments have a high probability of receiving upvotes and a low
probability of receiving downvotes;
stinkers are the reverse;
and mediocre comments have a low probability of receiving any votes.
Voters, on the other hand, see the top-ranked comment and vote according
to its probabilities.
At this point they stop reading or keep going based on a probability that
depends on the vote they just gave (0% for upvotes, 50% for downvotes, 15% for
non-votes).
If they don’t leave, they see the next-ranked comment and the process continues
until they finally do leave or they read all the comments.
When the simulation concludes, we log the average number of upvotes per
visitor which we use as our utility function.
See Python source code
for full details.
Of course this is not a perfect model of every comment section.
These parameter values will not always be accurate, although I did play around
with e.g. the commenter/voter ratio
and I got basically the same final conclusions.
Realistically the rate of visitors may vary over time.
A voter’s probability of leaving after a certain comment conditional on the
most recent (non-)vote may also depend on how many comments they’ve already
read.
Comment threads are not represented here.
Vote probabilities may change over time.
Et cetera, et cetera.
Here we use the following symbols
- Number of upvotes received so far: \(n_{+}\)
- Number of downvotes received so far: \(n_{-}\)
- Age of comment, in hours: \(h\)
All ranking methods in our analysis rank comments by scoring each comment
and sorting in descending order.
The scores are determined by the formulas below.
Starting with the basics, we have the ratio
\((n_{+} - n_{-})/(n_{+} + n_{-})\)
and the difference \(n_{+} - n_{-}\), a.k.a. the number of net upvotes.
We don’t expect these to be optimal but they’re useful baselines.
Another version of the ratio is
\(n_{+}/(n_{+} + n_{-})\) which performs similarly.
For testing purposes, we have the random ranking which is, well, just
random, and the upvote probability ranking which ranks according to the true
upvote probability.
Reddit’s algorithm, detailed here,
is a frequentist method for estimating the true voting probabilities
based on \(n_{+}\) and \(n_{-}\).
The Bayesian
version of this is what we’ll call the Bayesian average: the same as
ratio but we imagine that a few extra “phantom” votes have been cast, say 3
downvotes and 3 upvotes.
Hacker News roughly
uses the formula \((n_{+} - n_{-}) / (h+2)^{1.8}\),
which is like ratio, if we interpret the denominator \((h+2)^{1.8}\)
as an estimate of the number votes cast.
In fact, this denominator is probably more naturally thought of as an
estimate of the number of votes cast including implicit non-votes.
Non-votes (with a value of 0) would not impact the numerator.
To get a sense of how the simulations look, here are the comments as presented
to the 240th visitor from one run using the Hacker News scoring formula:
\(h\) |
Upvote probability |
Downvote probability |
\(n_{+}\) |
\(n_{-}\) |
HN score |
7.9 |
0.671 |
0.324 |
47 |
5 |
0.657 |
14.2 |
0.671 |
0.076 |
82 |
3 |
0.515 |
21.9 |
0.496 |
0.14 |
110 |
10 |
0.324 |
23.3 |
0.434 |
0.051 |
72 |
12 |
0.174 |
8.9 |
0.162 |
0.03 |
8 |
0 |
0.094 |
14.1 |
0.112 |
0.054 |
12 |
3 |
0.060 |
10.9 |
0.184 |
0.058 |
6 |
0 |
0.059 |
5.1 |
0.151 |
0.008 |
2 |
0 |
0.058 |
12.9 |
0.226 |
0.049 |
6 |
0 |
0.046 |
15.0 |
0.114 |
0.061 |
10 |
6 |
0.024 |
7.3 |
0.021 |
0.009 |
1 |
0 |
0.017 |
13.4 |
0.071 |
0.008 |
1 |
1 |
0.0 |
5.2 |
0.489 |
0.038 |
0 |
0 |
0.0 |
3.6 |
0.151 |
0.041 |
1 |
0 |
0.0 |
1.0 |
0.579 |
0.087 |
0 |
0 |
0.0 |
0.7 |
0.047 |
0.024 |
0 |
0 |
0.0 |
21.7 |
0.158 |
0.222 |
19 |
20 |
-0.003 |
20.7 |
0.048 |
0.017 |
1 |
3 |
-0.007 |
10.4 |
0.055 |
0.044 |
1 |
2 |
-0.010 |
11.3 |
0.041 |
0.027 |
0 |
2 |
-0.018 |
19.5 |
0.104 |
0.166 |
5 |
10 |
-0.019 |
5.4 |
0.045 |
0.604 |
1 |
3 |
-0.054 |
YouTube also uses a formula that
involves the age of the comment.
Their system additionally factors in the user’s lifetime ratio, which
for our tests we set to 0 as if all users are new.
Lastly, let’s consider how we might modify the Bayesian average to take
time into account.
To make new comments more visible we’ll make the phantom votes all upvotes
at first, then asymptotically reduce them to non-votes.
We’ll also switch to a denominator similar to the Hacker News formula’s in
order to estimate non-votes.
This yields the modified Bayes formula
\[\frac{n_{+} - n_{-} + n_p / (h+1)}{n_p + h},\]
where \(n_p\) is the number of phantom votes.
We use the value \(n_p=7\) in the simulations.
Ranking the rankings
I did enough simulation runs (1000-20000) with each formula
to be pretty confident about how they compare.
Without further ado, voila:
Ranking algorithm |
Average number of upvotes per visitor |
Upvote probability |
0.978 |
Modified Bayes |
0.916 |
Hacker News |
0.899 |
Bayesian average |
0.878 |
Difference |
0.848 |
Reddit |
0.836 |
Ratio |
0.813 |
YouTube |
0.644 |
Random |
0.607 |
So YouTube is marginally better than random, Reddit is worse than the simple difference, and
Hacker News is the only one of the three better than Bayesian average.
Disappointing but also plausible. How generalizable are the results?
As always, more work required…
2020-Apr-1
What would a market for mathematics look like?
Formal verification might allow an elegant mechanism:
Someone posts a proposition in a formal language like Coq and the first
to submit a proof that passes verification wins the bounty.
Everything can be automated and maybe even trustless.
This has been tried, at proofmarket.org, which was shut
down
due to consistency bugs in the verifier.
Even without bugs, proof assistants are still difficult to use;
mathematician Thomas Hales says
“It is very hard to learn to use Lean
proficiently. Are you a graduate student at Stanford, CMU, or Pitt writing a
thesis on Lean? Are you a student at Imperial being guided by Kevin Buzzard?
If not, Lean might not be for you.”
If we stick to natural language to avoid the learning curve, things get messy.
How does the market decide what a complete proof is, which proof is first, and
who did it? Perhaps the only tenable solution is to leave these decisions
up to the individuals who post the bounties. How would we
know that bounties would ever get paid? Stack Exchange forces bounties to be
put in escrow and if they’re not awarded to someone there’s no refund. Another
option is to rely on reputation by using certified identities (e.g. users’
email addresses are verified and
public
so they can be checked against personal webpages).
Something along these lines might be doable (and if someone wants to build it
I’ll donate the domain proofbounty.com) but what’s the use case?
Monetary rewards for mathematical problems are rare
and mathematicians generally already earn a salary, so the interest would
likely be modest.
Students (anywhere in the world) are plausible suppliers though, perhaps
even high school students,
while consumers could be anyone with a research grant usable for paying
“research assistants”, or industry and non-profit research groups.
A market that brings these two sides together could be of some value.
Paid question answering has been tried before, e.g.
Google Answers which
wasn’t very popular.
Did it fail due to lack of network effects,
lack of innovative mechanisms,
or an essential flaw in the concept? I don’t know.
Bounties on
GitHub issues seem to be a bit more successful.
In addition to bounties, there could be a prediction market.
The time of resolution may have to be indefinite, though, since
resolving “proposition X will be publicly proved by date Y” would in general
require determining the nonexistence of a public proof, which is at least
somewhat error-prone.
However, prediction markets are basically illegal so it’s a moot point.
2020-Mar-24
James I’s 1597 book Daemonologie,
“a philosophical dissertation on contemporary necromancy … touches on topics
such as werewolves and vampires”.
96.5% of 19-year-old males in Seoul have myopia.
List of Scottish Canadians.
Free
ebook of
classic novel plot summaries.
“Kime”: complex-valued time.
2020-Feb-17
Robin Hanson says
“In few months, China is likely to be a basket case, having crashed their
economy in failed attempt to stop COVID-19 spreading.” Quantifying the
forecast, he says China’s economy (or growth?) will be “a factor of two to ten
down” and seems to expect dramatic results in 6 months.