November 2022 updates

2022-Nov-18

2022-Jan-3

On the occasion of Nima Hoda’s wedding.

Is there for honest Poverty
That hings his head, an’ a’ that;
The coward slave - we pass him by,
We dare be poor for a’ that!
For a’ that, an’ a’ that.
Our toils obscure an’ a’ that,
The rank is but the guinea’s stamp,
The Man’s the gowd for a’ that.

What though on hamely fare we dine,
Wear hodden grey, an’ a that;
Gie fools their silks, and knaves their wine;
A Man’s a Man for a’ that:
For a’ that, and a’ that,
Their tinsel show, an’ a’ that;
The honest man, tho’ e’er sae poor,
Is king o’ men for a’ that.

Ye see yon birkie, ca’d a lord,
Wha struts, an’ stares, an’ a’ that;
Tho’ hundreds worship at his word,
He’s but a coof for a’ that:
For a’ that, an’ a’ that,
His ribband, star, an’ a’ that:
The man o’ independent mind
He looks an’ laughs at a’ that.

A prince can mak a belted knight,
A marquis, duke, an’ a’ that;
But an honest man’s abon his might,
Gude faith, he maunna fa’ that!
For a’ that, an’ a’ that,
Their dignities an’ a’ that;
The pith o’ sense, an’ pride o’ worth,
Are higher rank than a’ that.

Then let us pray that come it may,
(As come it will for a’ that,)
That Sense and Worth, o’er a’ the earth,
Shall bear the gree, an’ a’ that.
For a’ that, an’ a’ that,
It’s coming yet for a’ that,
That Man to Man, the world o’er,
Shall brothers be for a’ that.

Modern English:

Should honest poor hang their heads?
We pass by the coward ashamed of his poverty
We dare be poor despite all that!
Despite all that, and all that,
Our humble work, and all that,
Aristocratic rank is but the form that gold is cast into,
The man himself is the gold, despite all that.

So what if we dine on homely fare,
Wear rough grey tweed, and all that?
Give fools their silks, and knaves their wine -
A man is a man despite all that.
Despite all that, and all that,
Their ostentation, and all that,
The honest man, though ever so poor,
Is king of men despite all that.

You see that person called a “lord”,
Who struts, and postures, and all that?
Though hundreds worship at his word,
He is but a fool for all that.
Despite all that, and all that,
His regalia, and all that,
The man of independent mind,
He looks and laughs at all that.

A prince can bestow the title of knight,
Or marquis, duke, and all that!
But an honest man is above all of these -
Good faith, he must not fault that
Despite all that, and all that,
Their titles, and all that,
Strength of sense and pride of merit
Are higher rank than all that.

Then let us pray that it may come
(And it will come despite all that)
That sense and merit over all the earth
Will prevail and all that!
Despite all that, and all that,
It is coming yet despite all that,
That man to man the world over
Will be brothers despite all that.

Short musings

2021-Jun-20

When I was 17 my lowest grade was in math and I thought I wasn’t good at it. One year later I was obsessed with it. Things can change.

Robin Hanson says that academia views impractical research as more prestigious. Yes, pure mathematics and theoretical physics are impractical and prestigious but ceteris paribus a research finding plus an application is more prestigious than just a research finding.

There’s a meta-contrarian idea that the mechanisms of academia exclude some really good science that’s just too unconventional. This is not true to the extent claimed.

Computer algebra is useful but discovering new algorithms to automate mathematical work is hard.

As Robin Hanson and Steve Levitt say, life is long. There’s lots of time to do lots of different things.

Re: Where are All the Successful Rationalists?, rationality is an important scientific concept in AI, finance, and statistics; its value as a self-help technique is not so clear.

Juergen Schmidhuber is right and Tyler Cowen is wrong: China will surpass the US in dominance this century.

Geoffrey Miller and Robin Hanson have different views on what people are signaling when they engage in politics: Miller says personal traits and Hanson says tribal loyalty. Presumably it’s some of each but I find Miller more convincing.

Robin Hanson says meditation is about signaling who’s a better meditator. This is an example of meta-contrarianism at one too many levels of meta.

Here Robin Hanson proposes a much more efficient method of small claims resolution. The Enlightenment was about such ideas: approaching economic problems rationally where previously no one realized there was a problem.

The rapid decision-making abilities of basketball and soccer players impress me as much as the physical.

“Up to 40%” of travelers from developed to developing countries get travelers’ diarrhea; “in the normal population 1% to 2% of persons per year will develop irritable bowel syndrome (IBS), and 5% to 6% of travelers after traveler’s diarrhea will develop IBS”; and “the prevalence of depression and anxiety in IBS patients is 37.1 and 31.4% respectively”.

The Princeton Companion to Mathematics says “algebraists like to work with exact formulas and analysts use estimates. Or, to put it even more succinctly, algebraists like equalities and analysts like inequalities”. In computer science, algebraists like programming languages and analysts like algorithms and complexity. Or, to put it even more succinctly, algebraists like lambda calculus and analysts like Turing machines.

The Confucian virtue of learning

2021-Jan-18

The Three Character Classic is a 13th century Chinese text with three characters per line which is traditionally read by children. Below is an excerpt from the 1812 translation by Robert Morrison, Presbyterian missionary and author of the first Chinese-English dictionary.

Chung-ni [another name for Confucius] once called a boy of ten years of age his instructor; for, of old, even perfect and wise men learned diligently.

Chao, when he held the office of Chung-ling, read Sun-yu. Though filling so high a situation, he yet learned diligently – so much so, that he never laid the book out of his hand.

In the time of the emperor Sung, Lu-wen-shu was constantly looking over the books engraven on leaves.

Wu-yao made leaves of the reed bamboo, by paring it thin. Though he did not possess books [as we do], he exerted himself in the pursuit of knowledge.

Sun-king suspended his head by its hair to the beam of his house, to prevent his sleeping over his books.

Su-tsin pricked his thigh with an awl, to prevent his sleeping.

Those persons, though not taught, of themselves rigorously pursued their studies.

Che-yin, when a boy, being poor, read his book by the light of a glow-worm which he confined. And Sun-kang, in winter, read his book by the light reflected from snow. Though their families were poor they studied incessantly.

Chu-mai-chin, though he subsisted by carrying fire-wood round the town to sell, yet carefully read his book. At last he became capable of, and filled a public office.

Li-mie, while watching his cattle in the field, always had his book at hand, suspended to the horn of a cow. These two persons, though their bodies were wearied by labor yet studied hard.

Su-lao-tsiuen, at the age of twenty-seven years began to exert himself, and read a great many books. He, when at that age, repented of his delay: you, a little boy, should early consider.

Leang-hao, at the age of eighty-two, was permitted to answer the emperor in his palace, and was placed at the head of all the literati. In the evening of life his wishes were fulfilled, and all spoke of his extraordinary learning. You, a little boy, ought to determine to pursue your studies.

Yung, at eight hears of age could recite the Odes. Li-pi, at seven years of age could play chess. These clever and studious boys were called by everyone wonderful. You, youths, ought to imitate them.

Tsai-wen-ki could play a stringed instrument. Sie-tao-wen could sing well. These ladies were clever. You, who are a gentleman, ought at an early time of life, to perfect that which is suitable.

Chin-tung, a remarkable lad, was raised by the emperor to fill the office of Ching-tsi. He, though a youth, was made a public officer. Do you, youths, exert yourselves to learn, and you may arrive at the same. Let all who make learning their pursuit be as those persons whom we have mentioned.

It is natural for a dog to watch at night, and for a cock to crow in the morning; if anyone does not learn, how can he be called a man?

(Above: In the Temple of Literature in Hanoi.)

Uncertainty due to computational approximation in Bayesian inference

2020-Oct-31

In Bayesian inference, we can factor approximate computation (e.g. linearization) into the actual posterior probabilities.

Suppose we have a pmf \(f(x) = P(X=x)\) which is hard to compute. If we approximate \(f\) by \(\tilde{f}\) then

\[\begin{align*} P\left(X = a \,|\, \text{we only compute } \tilde{f}\right) &= \sum_x x P \left(f(a)=x \,|\, a, \tilde{f}(a) \right)\\ &= E\left(f(a) \,|\, a, \tilde{f}(a) \right) \end{align*}\]

What is \(P\left(f(a)=x \,|\, a, \tilde{f}(a)\right)\)? Well, if \(f\) is hard to compute then we probably can’t gather much data, so there are various options to produce a subjective belief:

average-case analysis of \(\tilde{f}\) with an uninformed prior, e.g. probabilistic numerics
reference classes of “similar” cases
uniform distribution across worst-case bounds
past empirical experience
etc.

Note that if the mean of the pmf \(P(f(a)=\cdot \,|\, a, \tilde{f}(a))\) is \(f(a)\) then \(P(X = a \,|\, \text{we only compute } \tilde{f}) = P(X=a)\). So accounting for uncertainty due to approximation is equivalent to “de-biasing” it.

Example: Suppose \(f\) has a single atom and our approximation \(\tilde{f}\) is modeled as \(f\) shifted by some unknown amount: \(\tilde{f}(x) = f(x + Y - 5)\), where \(Y \sim {\rm B{\small IN}}(10, 1/2)\). If \(\tilde{f}(0) = 1\), then

\[\begin{align*} P(X=0 \,|\, \text{we only compute } \tilde{f}) &= P(f(0) = 1 \,|\, \tilde{f}(0) = 1) \\ &\approxeq P(\tilde{f}(0) = 1 \,|\, f(0) = 1) \\ &=\binom{10}{5} 2^{-10} \doteq 0.246. \end{align*}\]

(The approximate equality holds if, say, we assume the location of the atom is a priori uniformly distributed on a large integer interval.)

Note that this is not completely new. E.g. when inferring how likely it is that software is bug-free based on a finite set of tests, we are putting probability distributions on mathematically determined statements, assuming the software is deterministic.

Inference is approximated for computational reasons in many places such as linearization as mentioned already, clustering by compression using a zip algorithm (instead of computing Kolmogorov complexity), PASS-GLM, MCMC sampling, numerical methods, approximation algorithms, probabilistic data structures, et cetera.

Is this ultimately rigorous in a decision theoretic sense? I don’t think so, but what is rigorous can easily be mathematically intractable. So whatever, it’s a heuristic.

Comment ranking algorithms: Hacker News vs. YouTube vs. Reddit

2020-May-21

Comments on social sites have to be sorted somehow. How do big platforms do it – is it some complicated mix of recommender systems, learning-to-rank algorithms, Markov decision processes, neural networks, and learning automata? Well, maybe in some cases but often it’s just a simple formula. In this article we put the formulas used by Hacker News, YouTube, and Reddit, along with a few alternatives, to the test, using virtual comment section simulations. Spoiler alert: YouTube does not do well.

The simulation model

240 visitors arrive at equally spaced increments over a 24 hour period. Each visitor is randomly assigned as a commenter (10%) or a voter (90%). Commenters leave a single comment, which gets a randomly assigned quality category: great (10%), mediocre (80%), or stinker (10%). Great comments have a high probability of receiving upvotes and a low probability of receiving downvotes; stinkers are the reverse; and mediocre comments have a low probability of receiving any votes. Voters, on the other hand, see the top-ranked comment and vote according to its probabilities. At this point they stop reading or keep going based on a probability that depends on the vote they just gave (0% for upvotes, 50% for downvotes, 15% for non-votes). If they don’t leave, they see the next-ranked comment and the process continues until they finally do leave or they read all the comments. When the simulation concludes, we log the average number of upvotes per visitor which we use as our utility function.

See Python source code for full details.

Of course this is not a perfect model of every comment section. These parameter values will not always be accurate, although I did play around with e.g. the commenter/voter ratio and I got basically the same final conclusions. Realistically the rate of visitors may vary over time. A voter’s probability of leaving after a certain comment conditional on the most recent (non-)vote may also depend on how many comments they’ve already read. Comment threads are not represented here. Vote probabilities may change over time. Et cetera, et cetera.

The ranking formulas

Here we use the following symbols

Number of upvotes received so far: \(n_{+}\)
Number of downvotes received so far: \(n_{-}\)
Age of comment, in hours: \(h\)

All ranking methods in our analysis rank comments by scoring each comment and sorting in descending order. The scores are determined by the formulas below.

Starting with the basics, we have the ratio \((n_{+} - n_{-})/(n_{+} + n_{-})\) and the difference \(n_{+} - n_{-}\), a.k.a. the number of net upvotes. We don’t expect these to be optimal but they’re useful baselines. Another version of the ratio is \(n_{+}/(n_{+} + n_{-})\) which performs similarly.

For testing purposes, we have the random ranking which is, well, just random, and the upvote probability ranking which ranks according to the true upvote probability.

Reddit’s algorithm, detailed here, is a frequentist method for estimating the true voting probabilities based on \(n_{+}\) and \(n_{-}\). The Bayesian version of this is what we’ll call the Bayesian average: the same as ratio but we imagine that a few extra “phantom” votes have been cast, say 3 downvotes and 3 upvotes.

Hacker News roughly uses the formula \((n_{+} - n_{-}) / (h+2)^{1.8}\), which is like ratio, if we interpret the denominator \((h+2)^{1.8}\) as an estimate of the number votes cast. In fact, this denominator is probably more naturally thought of as an estimate of the number of votes cast including implicit non-votes. Non-votes (with a value of 0) would not impact the numerator.

To get a sense of how the simulations look, here are the comments as presented to the 240th visitor from one run using the Hacker News scoring formula:

\(h\)	Upvote probability	Downvote probability	\(n_{+}\)	\(n_{-}\)	HN score
7.9	0.671	0.324	47	5	0.657
14.2	0.671	0.076	82	3	0.515
21.9	0.496	0.14	110	10	0.324
23.3	0.434	0.051	72	12	0.174
8.9	0.162	0.03	8	0	0.094
14.1	0.112	0.054	12	3	0.060
10.9	0.184	0.058	6	0	0.059
5.1	0.151	0.008	2	0	0.058
12.9	0.226	0.049	6	0	0.046
15.0	0.114	0.061	10	6	0.024
7.3	0.021	0.009	1	0	0.017
13.4	0.071	0.008	1	1	0.0
5.2	0.489	0.038	0	0	0.0
3.6	0.151	0.041	1	0	0.0
1.0	0.579	0.087	0	0	0.0
0.7	0.047	0.024	0	0	0.0
21.7	0.158	0.222	19	20	-0.003
20.7	0.048	0.017	1	3	-0.007
10.4	0.055	0.044	1	2	-0.010
11.3	0.041	0.027	0	2	-0.018
19.5	0.104	0.166	5	10	-0.019
5.4	0.045	0.604	1	3	-0.054

YouTube also uses a formula that involves the age of the comment. Their system additionally factors in the user’s lifetime ratio, which for our tests we set to 0 as if all users are new.

Lastly, let’s consider how we might modify the Bayesian average to take time into account. To make new comments more visible we’ll make the phantom votes all upvotes at first, then asymptotically reduce them to non-votes. We’ll also switch to a denominator similar to the Hacker News formula’s in order to estimate non-votes. This yields the modified Bayes formula

\[\frac{n_{+} - n_{-} + n_p / (h+1)}{n_p + h},\]

where \(n_p\) is the number of phantom votes. We use the value \(n_p=7\) in the simulations.

Ranking the rankings

I did enough simulation runs (1000-20000) with each formula to be pretty confident about how they compare. Without further ado, voila:

Ranking algorithm	Average number of upvotes per visitor
Upvote probability	0.978
Modified Bayes	0.916
Hacker News	0.899
Bayesian average	0.878
Difference	0.848
Reddit	0.836
Ratio	0.813
YouTube	0.644
Random	0.607

So YouTube is marginally better than random, Reddit is worse than the simple difference, and Hacker News is the only one of the three better than Bayesian average. Disappointing but also plausible. How generalizable are the results? As always, more work required…

Andrew's Blog All posts Feed About