Andrew's Blog

Is there for honest Poverty
That hings his head, an’ a’ that;
The coward slave - we pass him by,
We dare be poor for a’ that!
For a’ that, an’ a’ that.
Our toils obscure an’ a’ that,
The rank is but the guinea’s stamp,
The Man’s the gowd for a’ that.

What though on hamely fare we dine,
Wear hodden grey, an’ a that;
Gie fools their silks, and knaves their wine;
A Man’s a Man for a’ that:
For a’ that, and a’ that,
Their tinsel show, an’ a’ that;
The honest man, tho’ e’er sae poor,
Is king o’ men for a’ that.

Ye see yon birkie, ca’d a lord,
Wha struts, an’ stares, an’ a’ that;
Tho’ hundreds worship at his word,
He’s but a coof for a’ that:
For a’ that, an’ a’ that,
His ribband, star, an’ a’ that:
The man o’ independent mind
He looks an’ laughs at a’ that.

A prince can mak a belted knight,
A marquis, duke, an’ a’ that;
But an honest man’s abon his might,
Gude faith, he maunna fa’ that!
For a’ that, an’ a’ that,
Their dignities an’ a’ that;
The pith o’ sense, an’ pride o’ worth,
Are higher rank than a’ that.

Then let us pray that come it may,
(As come it will for a’ that,)
That Sense and Worth, o’er a’ the earth,
Shall bear the gree, an’ a’ that.
For a’ that, an’ a’ that,
It’s coming yet for a’ that,
That Man to Man, the world o’er,
Shall brothers be for a’ that.

Modern English:

Should honest poor hang their heads?
We pass by the coward ashamed of his poverty
We dare be poor despite all that!
Despite all that, and all that,
Our humble work, and all that,
Aristocratic rank is but the form that gold is cast into,
The man himself is the gold, despite all that.

So what if we dine on homely fare,
Wear rough grey tweed, and all that?
Give fools their silks, and knaves their wine -
A man is a man despite all that.
Despite all that, and all that,
Their ostentation, and all that,
The honest man, though ever so poor,
Is king of men despite all that.

You see that person called a “lord”,
Who struts, and postures, and all that?
Though hundreds worship at his word,
He is but a fool for all that.
Despite all that, and all that,
His regalia, and all that,
The man of independent mind,
He looks and laughs at all that.

A prince can bestow the title of knight,
Or marquis, duke, and all that!
But an honest man is above all of these -
Good faith, he must not fault that
Despite all that, and all that,
Their titles, and all that,
Strength of sense and pride of merit
Are higher rank than all that.

Then let us pray that it may come
(And it will come despite all that)
That sense and merit over all the earth
Will prevail and all that!
Despite all that, and all that,
It is coming yet despite all that,
That man to man the world over
Will be brothers despite all that.

Short musings

2021-06-20T00:00:00+00:00

When I was 17 my lowest grade was in math and I thought I wasn’t good at it. One year later I was obsessed with it. Things can change.

Robin Hanson says that academia views impractical research as more prestigious. Yes, pure mathematics and theoretical physics are impractical and prestigious but ceteris paribus a research finding plus an application is more prestigious than just a research finding.

There’s a meta-contrarian idea that the mechanisms of academia exclude some really good science that’s just too unconventional. This is not true to the extent claimed.

Computer algebra is useful but discovering new algorithms to automate mathematical work is hard.

As Robin Hanson and Steve Levitt say, life is long. There’s lots of time to do lots of different things.

Re: Where are All the Successful Rationalists?, rationality is an important scientific concept in AI, finance, and statistics; its value as a self-help technique is not so clear.

Juergen Schmidhuber is right and Tyler Cowen is wrong: China will surpass the US in dominance this century.

Geoffrey Miller and Robin Hanson have different views on what people are signaling when they engage in politics: Miller says personal traits and Hanson says tribal loyalty. Presumably it’s some of each but I find Miller more convincing.

Robin Hanson says meditation is about signaling who’s a better meditator. This is an example of meta-contrarianism at one too many levels of meta.

Here Robin Hanson proposes a much more efficient method of small claims resolution. The Enlightenment was about such ideas: approaching economic problems rationally where previously no one realized there was a problem.

The rapid decision-making abilities of basketball and soccer players impress me as much as the physical.

“Up to 40%” of travelers from developed to developing countries get travelers’ diarrhea; “in the normal population 1% to 2% of persons per year will develop irritable bowel syndrome (IBS), and 5% to 6% of travelers after traveler’s diarrhea will develop IBS”; and “the prevalence of depression and anxiety in IBS patients is 37.1 and 31.4% respectively”.

The Princeton Companion to Mathematics says “algebraists like to work with exact formulas and analysts use estimates. Or, to put it even more succinctly, algebraists like equalities and analysts like inequalities”. In computer science, algebraists like programming languages and analysts like algorithms and complexity. Or, to put it even more succinctly, algebraists like lambda calculus and analysts like Turing machines.

The Confucian virtue of learning

2021-01-18T00:00:00+00:00

The Three Character Classic is a 13th century Chinese text with three characters per line which is traditionally read by children. Below is an excerpt from the 1812 translation by Robert Morrison, Presbyterian missionary and author of the first Chinese-English dictionary.

Chung-ni [another name for Confucius] once called a boy of ten years of age his instructor; for, of old, even perfect and wise men learned diligently.

Chao, when he held the office of Chung-ling, read Sun-yu. Though filling so high a situation, he yet learned diligently – so much so, that he never laid the book out of his hand.

In the time of the emperor Sung, Lu-wen-shu was constantly looking over the books engraven on leaves.

Wu-yao made leaves of the reed bamboo, by paring it thin. Though he did not possess books [as we do], he exerted himself in the pursuit of knowledge.

Sun-king suspended his head by its hair to the beam of his house, to prevent his sleeping over his books.

Su-tsin pricked his thigh with an awl, to prevent his sleeping.

Those persons, though not taught, of themselves rigorously pursued their studies.

Che-yin, when a boy, being poor, read his book by the light of a glow-worm which he confined. And Sun-kang, in winter, read his book by the light reflected from snow. Though their families were poor they studied incessantly.

Chu-mai-chin, though he subsisted by carrying fire-wood round the town to sell, yet carefully read his book. At last he became capable of, and filled a public office.

Li-mie, while watching his cattle in the field, always had his book at hand, suspended to the horn of a cow. These two persons, though their bodies were wearied by labor yet studied hard.

Su-lao-tsiuen, at the age of twenty-seven years began to exert himself, and read a great many books. He, when at that age, repented of his delay: you, a little boy, should early consider.

Leang-hao, at the age of eighty-two, was permitted to answer the emperor in his palace, and was placed at the head of all the literati. In the evening of life his wishes were fulfilled, and all spoke of his extraordinary learning. You, a little boy, ought to determine to pursue your studies.

Yung, at eight hears of age could recite the Odes. Li-pi, at seven years of age could play chess. These clever and studious boys were called by everyone wonderful. You, youths, ought to imitate them.

Tsai-wen-ki could play a stringed instrument. Sie-tao-wen could sing well. These ladies were clever. You, who are a gentleman, ought at an early time of life, to perfect that which is suitable.

Chin-tung, a remarkable lad, was raised by the emperor to fill the office of Ching-tsi. He, though a youth, was made a public officer. Do you, youths, exert yourselves to learn, and you may arrive at the same. Let all who make learning their pursuit be as those persons whom we have mentioned.

It is natural for a dog to watch at night, and for a cock to crow in the morning; if anyone does not learn, how can he be called a man?

(Above: In the Temple of Literature in Hanoi.)

Uncertainty due to computational approximation in Bayesian inference

2020-10-31T00:00:00+00:00

In Bayesian inference, we can factor approximate computation (e.g. linearization) into the actual posterior probabilities.

Suppose we have a pmf $f(x) = P(X=x)$ which is hard to compute. If we approximate $f$ by $\tilde{f}$ then

\[\begin{align*} P\left(X = a \,|\, \text{we only compute } \tilde{f}\right) &= \sum_x x P \left(f(a)=x \,|\, a, \tilde{f}(a) \right)\\ &= E\left(f(a) \,|\, a, \tilde{f}(a) \right) \end{align*}\]

What is $P\left(f(a)=x \,|\, a, \tilde{f}(a)\right)$? Well, if $f$ is hard to compute then we probably can’t gather much data, so there are various options to produce a subjective belief:

average-case analysis of $\tilde{f}$ with an uninformed prior, e.g. probabilistic numerics
reference classes of “similar” cases
uniform distribution across worst-case bounds
past empirical experience
etc.

Note that if the mean of the pmf $P(f(a)=\cdot \,|\, a, \tilde{f}(a))$ is $f(a)$ then $P(X = a \,|\, \text{we only compute } \tilde{f}) = P(X=a)$. So accounting for uncertainty due to approximation is equivalent to “de-biasing” it.

Example: Suppose $f$ has a single atom and our approximation $\tilde{f}$ is modeled as $f$ shifted by some unknown amount: $\tilde{f}(x) = f(x + Y - 5)$, where $Y \sim {\rm B{\small IN}}(10, 1/2)$. If $\tilde{f}(0) = 1$, then

\[\begin{align*} P(X=0 \,|\, \text{we only compute } \tilde{f}) &= P(f(0) = 1 \,|\, \tilde{f}(0) = 1) \\ &\approxeq P(\tilde{f}(0) = 1 \,|\, f(0) = 1) \\ &=\binom{10}{5} 2^{-10} \doteq 0.246. \end{align*}\]

(The approximate equality holds if, say, we assume the location of the atom is a priori uniformly distributed on a large integer interval.)

Note that this is not completely new. E.g. when inferring how likely it is that software is bug-free based on a finite set of tests, we are putting probability distributions on mathematically determined statements, assuming the software is deterministic.

Inference is approximated for computational reasons in many places such as linearization as mentioned already, clustering by compression using a zip algorithm (instead of computing Kolmogorov complexity), PASS-GLM, MCMC sampling, numerical methods, approximation algorithms, probabilistic data structures, et cetera.

Is this ultimately rigorous in a decision theoretic sense? I don’t think so, but what is rigorous can easily be mathematically intractable. So whatever, it’s a heuristic.

Comment ranking algorithms: Hacker News vs. YouTube vs. Reddit

2020-05-21T00:00:00+00:00

Comments on social sites have to be sorted somehow. How do big platforms do it – is it some complicated mix of recommender systems, learning-to-rank algorithms, Markov decision processes, neural networks, and learning automata? Well, maybe in some cases but often it’s just a simple formula. In this article we put the formulas used by Hacker News, YouTube, and Reddit, along with a few alternatives, to the test, using virtual comment section simulations. Spoiler alert: YouTube does not do well.

The simulation model

240 visitors arrive at equally spaced increments over a 24 hour period. Each visitor is randomly assigned as a commenter (10%) or a voter (90%). Commenters leave a single comment, which gets a randomly assigned quality category: great (10%), mediocre (80%), or stinker (10%). Great comments have a high probability of receiving upvotes and a low probability of receiving downvotes; stinkers are the reverse; and mediocre comments have a low probability of receiving any votes. Voters, on the other hand, see the top-ranked comment and vote according to its probabilities. At this point they stop reading or keep going based on a probability that depends on the vote they just gave (0% for upvotes, 50% for downvotes, 15% for non-votes). If they don’t leave, they see the next-ranked comment and the process continues until they finally do leave or they read all the comments. When the simulation concludes, we log the average number of upvotes per visitor which we use as our utility function.

See Python source code for full details.

Of course this is not a perfect model of every comment section. These parameter values will not always be accurate, although I did play around with e.g. the commenter/voter ratio and I got basically the same final conclusions. Realistically the rate of visitors may vary over time. A voter’s probability of leaving after a certain comment conditional on the most recent (non-)vote may also depend on how many comments they’ve already read. Comment threads are not represented here. Vote probabilities may change over time. Et cetera, et cetera.

The ranking formulas

Here we use the following symbols

Number of upvotes received so far: $n_{+}$
Number of downvotes received so far: $n_{-}$
Age of comment, in hours: $h$

All ranking methods in our analysis rank comments by scoring each comment and sorting in descending order. The scores are determined by the formulas below.

Starting with the basics, we have the ratio $(n_{+} - n_{-})/(n_{+} + n_{-})$ and the difference $n_{+} - n_{-}$, a.k.a. the number of net upvotes. We don’t expect these to be optimal but they’re useful baselines. Another version of the ratio is $n_{+}/(n_{+} + n_{-})$ which performs similarly.

For testing purposes, we have the random ranking which is, well, just random, and the upvote probability ranking which ranks according to the true upvote probability.

Reddit’s algorithm, detailed here, is a frequentist method for estimating the true voting probabilities based on $n_{+}$ and $n_{-}$. The Bayesian version of this is what we’ll call the Bayesian average: the same as ratio but we imagine that a few extra “phantom” votes have been cast, say 3 downvotes and 3 upvotes.

Hacker News roughly uses the formula $(n_{+} - n_{-}) / (h+2)^{1.8}$, which is like ratio, if we interpret the denominator $(h+2)^{1.8}$ as an estimate of the number votes cast. In fact, this denominator is probably more naturally thought of as an estimate of the number of votes cast including implicit non-votes. Non-votes (with a value of 0) would not impact the numerator.

To get a sense of how the simulations look, here are the comments as presented to the 240th visitor from one run using the Hacker News scoring formula:

$h$	Upvote probability	Downvote probability	$n_{+}$	$n_{-}$	HN score
7.9	0.671	0.324	47	5	0.657
14.2	0.671	0.076	82	3	0.515
21.9	0.496	0.14	110	10	0.324
23.3	0.434	0.051	72	12	0.174
8.9	0.162	0.03	8	0	0.094
14.1	0.112	0.054	12	3	0.060
10.9	0.184	0.058	6	0	0.059
5.1	0.151	0.008	2	0	0.058
12.9	0.226	0.049	6	0	0.046
15.0	0.114	0.061	10	6	0.024
7.3	0.021	0.009	1	0	0.017
13.4	0.071	0.008	1	1	0.0
5.2	0.489	0.038	0	0	0.0
3.6	0.151	0.041	1	0	0.0
1.0	0.579	0.087	0	0	0.0
0.7	0.047	0.024	0	0	0.0
21.7	0.158	0.222	19	20	-0.003
20.7	0.048	0.017	1	3	-0.007
10.4	0.055	0.044	1	2	-0.010
11.3	0.041	0.027	0	2	-0.018
19.5	0.104	0.166	5	10	-0.019
5.4	0.045	0.604	1	3	-0.054

YouTube also uses a formula that involves the age of the comment. Their system additionally factors in the user’s lifetime ratio, which for our tests we set to 0 as if all users are new.

Lastly, let’s consider how we might modify the Bayesian average to take time into account. To make new comments more visible we’ll make the phantom votes all upvotes at first, then asymptotically reduce them to non-votes. We’ll also switch to a denominator similar to the Hacker News formula’s in order to estimate non-votes. This yields the modified Bayes formula

\[\frac{n_{+} - n_{-} + n_p / (h+1)}{n_p + h},\]

where $n_p$ is the number of phantom votes. We use the value $n_p=7$ in the simulations.

Ranking the rankings

I did enough simulation runs (1000-20000) with each formula to be pretty confident about how they compare. Without further ado, voila:

Ranking algorithm	Average number of upvotes per visitor
Upvote probability	0.978
Modified Bayes	0.916
Hacker News	0.899
Bayesian average	0.878
Difference	0.848
Reddit	0.836
Ratio	0.813
YouTube	0.644
Random	0.607

So YouTube is marginally better than random, Reddit is worse than the simple difference, and Hacker News is the only one of the three better than Bayesian average. Disappointing but also plausible. How generalizable are the results? As always, more work required…

Mathematics as a service

2020-04-01T00:00:00+00:00

What would a market for mathematics look like?

Formal verification might allow an elegant mechanism: Someone posts a proposition in a formal language like Coq and the first to submit a proof that passes verification wins the bounty. Everything can be automated and maybe even trustless. This has been tried, at proofmarket.org, which was shut down due to consistency bugs in the verifier. Even without bugs, proof assistants are still difficult to use; mathematician Thomas Hales says “It is very hard to learn to use Lean proficiently. Are you a graduate student at Stanford, CMU, or Pitt writing a thesis on Lean? Are you a student at Imperial being guided by Kevin Buzzard? If not, Lean might not be for you.”

If we stick to natural language to avoid the learning curve, things get messy. How does the market decide what a complete proof is, which proof is first, and who did it? Perhaps the only tenable solution is to leave these decisions up to the individuals who post the bounties. How would we know that bounties would ever get paid? Stack Exchange forces bounties to be put in escrow and if they’re not awarded to someone there’s no refund. Another option is to rely on reputation by using certified identities (e.g. users’ email addresses are verified and public so they can be checked against personal webpages).

Something along these lines might be doable (name: proofbounty.io?) but what’s the use case? Monetary rewards for mathematical problems are rare and mathematicians generally already earn a salary, so the interest would likely be modest. Students (anywhere in the world) are plausible suppliers though, perhaps even high school students, while consumers could be anyone with a research grant usable for paying “research assistants”, or industry and non-profit research groups. A market that brings these two sides together could be of some value.

Paid question answering has been tried before, e.g. Google Answers which wasn’t very popular. Did it fail due to lack of network effects, lack of innovative mechanisms, or an essential flaw in the concept? I don’t know. Bounties on GitHub issues seem to be a bit more successful.

In addition to bounties, there could be a prediction market. The time of resolution may have to be indefinite, though, since resolving “proposition X will be publicly proved by date Y” would in general require determining the nonexistence of a public proof, which is at least somewhat error-prone. However, prediction markets are basically illegal so it’s a moot point.

March 2020 links

2020-03-24T00:00:00+00:00

James I’s 1597 book Daemonologie, “a philosophical dissertation on contemporary necromancy … touches on topics such as werewolves and vampires”.

96.5% of 19-year-old males in Seoul have myopia.

List of Scottish Canadians.

Free ebook of classic novel plot summaries.

“Kime”: complex-valued time.

Robin Hanson predicts China virus disaster

2020-02-17T00:00:00+00:00

Robin Hanson says “In few months, China is likely to be a basket case, having crashed their economy in failed attempt to stop COVID-19 spreading.” Quantifying the forecast, he says China’s economy (or growth?) will be “a factor of two to ten down” and seems to expect dramatic results in 6 months.

Ranking cities by weather

2020-02-12T00:00:00+00:00

Let’s analyze data from https://darksky.net from the last 10 years to compare weather (technically “climate”) in a selection of North American cities.

If we define a “nice day” as one where

there are at least 10 hours of daylight,
the high apparent temperature is at least 0°C and at most 30°C,
the cloud cover is at most 70%, and
the UV index is at most moderate (unfortunately I used UV index at a single point in time during the day and didn’t adjust for time zones),

we get:

City	Probability of nice day
San Diego	0.27
Los Angeles	0.23
San Francisco	0.22
Raleigh	0.22
Austin	0.2
Vancouver	0.19
New York	0.19
Cambridge	0.19
Chicago	0.16
Ottawa	0.16
Toronto	0.15

What are the nicest months to visit Toronto?

Month	Average number of nice days in Toronto
January	0
February	2.9
March	9.0
April	4.7
May	1.2
June	0.4
July	0.5
August	4.0
September	12.1
October	15.8
November	2.4
December	0

If we define a “sunny day” as one where

there are at least 10 hours of daylight,
the high apparent temperature is at least 15°C, and
the cloud cover is at most 50%,

we get:

City	Probability of sunny day
Los Angeles	0.69
Austin	0.56
San Francisco	0.49
Raleigh	0.46
San Diego	0.45
New York	0.33
Cambridge	0.32
Chicago	0.26
Toronto	0.23
Vancouver	0.2
Ottawa	0.18

What are the sunniest months to visit Toronto?

Month	Average number of sunny days in Toronto
January	0
February	0
March	0.7
April	2.6
May	10.0
June	12.5
July	17.9
August	17.8
September	15.1
October	6.1
November	0.4
December	0

Lastly, if we define a “warm day” as one where

the high apparent temperature is at least 15°C and at most 25°C and
the UV index is at most high,

we get:

City	Probability of warm day
San Diego	0.5
San Francisco	0.45
Los Angeles	0.37
Vancouver	0.33
Raleigh	0.28
New York	0.25
Austin	0.25
Ottawa	0.23
Toronto	0.23
Cambridge	0.22
Chicago	0.21

What are the warmest months to visit Toronto?

Month	Average number of warm days in Toronto
January	0
February	0.3
March	1.8
April	6.9
May	11.7
June	11.5
July	4.8
August	10.2
September	19.9
October	13.7
November	2.1
December	0.1

Q&A with William Saunders: Preventing AI catastrophes

2020-02-11T00:00:00+00:00

William Saunders was a fellow Fellow at MIRI in 2016 and now researches AI safety at Ought. Below we go over his 2017 paper “Trial without Error: Towards Safe Reinforcement Learning via Human Intervention”.

Q: Say we’re training an autonomous car by running a bunch of practice trips and letting the model learn from experience. For example, to teach safe driving we might input a reward if it makes a trip without running anyone over and input a penalty otherwise. What’s the flaw in this approach, and how serious is this issue in AI systems present and future?

Two big flaws, if we use traditional model-free reinforcement learning algorithms (Deep Q learning, policy gradient):

The RL agent won’t learn to avoid running over the human until it actually runs over the human and recieves the penalty a large number of times.

The RL agent will suffer “The Sisyphean Curse of RL”. Once it learns to avoid running over humans, it will keep having new experiences where it doesn’t run over humans. Eventually, it will forget that running over humans is bad, and occasionally needing to run over humans a few times and get penalized in order to remember. This will repeat as long as the agent is being trained.

So, the training process can lead to an arbitrary number of humans being run over. (In practice of course, you’d stop after the first one if not sooner).

Q: Your proposal, called Human Intervention Reinforcement Learning (HIRL), involves using humans to prevent unwitting AIs from taking dangerous actions. How does it work?

A human watches the training process. Whenever the RL agent is about to do something catastrophic, the human intervenes, changing the RL agent’s action to avoid the catastrophe and giving the RL agent a penalty.

We record all instances when the human intervenes, and train a supervised learning algorithm (“the blocker”) to predict when the human intervenes.

When the blocker is able to predict when the human intervenes, we replace the human with the blocker and continue training. Now the blocker is called for every new action the agent takes, and decides whether it should intervene and penalize the agent.

Eventually, the RL agent should learn a policy that performs well on the task and avoids proposing the blocked actions, which should then be safe for deployment.

Q: What’s a practical example where HIRL might be useful?

One example might be for a chatbot that occasionally proposes an offensive reply in a conversation (e.g. Microsoft Tay). A human could review statements proposed by the chatbot and block offensive ones being sent to end users.

Q: Is there a use case for HIRL in simulated learning environments?

In simulated environments, one can simply allow the catastrophic action to happen and intervene after the fact. But depending on the simulation, it might be more efficient for learning if catastrophic actions are blocked (if they would end the simulation early, or cause the simulation to run for a long time in a failed state).

Q: In what situations would human intervention be too slow or expensive?

Even for self-driving cars, it can be difficult for a safety driver to detect when something is going wrong and intervene in time. Other robotics tasks might be similar. In many domains, it might not be possible to fully hand things over to the blocker. If the agent doesn’t try some kinds of actions or encounter some kinds of situations until later in the training process, you either need to have the human watch the whole time, or be able to detect when new situations occur and bring the human back in.

Q: How does the applicability of HIRL change (if at all) if the human is part of the environment?

HIRL could still apply if the intervening human is part of the environment, as long as the human supervisor is able to block any catastrophic action that harms or manipulates the human supervisor, or the human supervisor’s communication channel.

Q: Theoretically the idea here is to extract, with an accuracy/cost tradeoff, a human’s beliefs and/or preferences so an AI can make use of them. At a high level, how big a role do you think direct human intervention will play in this process on the road to superintelligent AI?

Ideally, you would want techniques that don’t require the human to be watching and able to effectively intervene, it would be better if the blocker could be trained prior to training or if the AI could detect when it was in a novel situation and only ask for feedback then. I think HIRL does illustrate how in many situations it’s easier to check whether an action is safe than to specify the optimal action to perform, and this principle might end up being used in other techniques as well.

Whence the English names of countries

2020-01-24T00:00:00+00:00

Some exonyms:

India (Hindustan). Same origin: Sanskrit síndhu (“river”), as in Indus River.
Japan (Nihon). Same origin, the English comes via Chinese and Malay.
China (Zhongguo). Via Sanskrit possibly from Qin, the westernmost ancient Chinese state.
Korea (Hanguk). From Goryeo, a Korean kingdom from 918-1392.
Germany (Deutschland). From Latin name Germānī used for tribes east of the Rhine.
Finland (Suomi). From Old Norse word for Finland, Finnland.
Greece (Hellas). Via Latin from Graecus, a son of either Zeus or someone named Thessalus.
Hungary (Magyarország). Via Latin from Turkish name Onoğurs (“ten tribes”) used for Turkic tribes to the north of Turkey (not where Hungary is).

Nov 2019 links

2019-11-06T00:00:00+00:00

Tall buildings by city: look out for Toronto. The current top cities in North America for sky scrapers are unambiguously New York, Chicago, and Toronto in that order. However, if we count proposed buildings and buildings under construction, Chicago has 18 at least 150m tall (the dataset is only complete for buildings at least 150m) and Toronto has 90.

C. Hitchens sometimes just made stuff up.

Web LaTeX chat.

How to mirror YouTube videos (Korean).

How to make the weird Australian O sound.

Ascending auction bidder strategy

2019-11-05T00:00:00+00:00

Ascending auctions are a common mechanism for selling a set of products. The basics are covered in this video:

The exact rules of an ascending auction depend on the auctioneer and may include complexities such as:

Activity rules, where bidders can never bid on more products than in previous rounds
Anonymous bidding, where information on who bid on what is hidden
Whether bid prices in each round are fixed by the auctioneer or chosen by bidders

Below we review some of the main ideas of optimal bidding strategy, give practice scenarios, and provide pointers to the relevant literature.

Terminology

VCG = Vickrey-Clarke-Groves (sealed bid, Vickrey prices)

SAA = simultaneous (multiple products at once) ascending auction (same as SMRA)

SMRA = simultaneous multiple round ascending (same as SAA)

CCA = Combinatorial clock auction (not the same as SAA/SMRA)

Schelling point = a way for independent parties to intentionally coordinate on one choice among many

Value bidding = selecting a package to maximize value of package minus cost

Notation

Products = $\{1, 2, 3, \ldots, n \}$

Quantities = $\{q_1, \ldots, q_n\}$

Bidders = $\{1, 2, 3, \ldots, m \}$

Package: $x = x(1), \ldots, x(n)$, where $x(i)$ is the quantity of product $i$

Valuation of bidder $i$ of package $x$: $v_i(x)$

Nutshell

Generally SMRA auctions have a cooperative phase and then a competitive phase. In the cooperative phase, bidders reduce demand (relative to value bidding) in order to allocate products without bidding prices up. Bidders must agree on this allocation without communicating. Typically this implicit allocation is chosen because it’s fair, natural, symmetric, or otherwise “makes sense” given the context of the auction.

Keys to the game

(More details below.)

Demand reduction negotiation
- Bidders try to indirectly find agreeable allocation
- Selecting quantities: Schelling points based on available info
  - Auction-based, industry-based info
  - E.g. split product units 50/50 if two bidders are expected to be interested
- Negotiation by sending signals within the auction
  - Presumably cheap talk in this context, but it happens
  - Much noise little signal in auctions where bids are constrained or hidden
- If there is an activity rule, once you’ve submitted low demand, there is no way to increase without decreasing somewhere else
Competition
- Basically value bidding
- Usually happens if negotiation fails
Complementarity/exposure: value bidding fails and “cooperation” is inefficient and less likely.
- Bids for a quantity $q$ can turn into bids for quantities $< q$ so be careful how high you bid if there are complementarities.
- See literature review below for more discussion.
Price raising
- Only do in lots where you’re not going to win anything
- Start early (to maintain activity)

Demand reduction

Value bidding is no longer a dominant strategy, as it is in VCG/CCA.

Say there is a single product and Bidder 1 bids on quantity 1 at price=$1,2,\ldots,10$. Assume $v_2(1)=9, v_2(2)=10$.

Bidder 2 (B2), strategy 1: bid on $q=2$ for $p=1,\ldots,5$, then bid on $q=1$ for $p=6$. B2 strategy 2: bid on $q=1$ for $p=1$.

B2 results:

CCA, strategy 1: wins $q=1$ @ $p=0$
CCA, strategy 2: wins $q=1$ @ $p=0$
SMRA, strategy 1: wins $q=1$ @ $p=6$
SMRA, strategy 2: wins $q=1$ @ $p=1$

Thus reducing demand (strategy 2) pays in the SMRA format where it didn’t in the CCA. When both bidders reduce demand, it’s called “cooperation” aka “tacit collusion”. See the literature review below for more examples.

However, with the activity rule, there can be a risk to reducing too much at the beginning if there is uncertainty about the cooperative outcome, so a somewhat gradual reduction may be wise.

Price raising

Raising prices for other bidders is a realistic motive. In the SMRA format it’s relatively simple because raising auction price is the same as raising price paid. You don’t have to work backwards from Vickrey price calculations to see what action would cause an increase in price. Instead, you simply have to create excess demand on one or more products where there otherwise would not be. But, it’s risky because your bids might end up being winning bids.

The ideal scenario is as follows: Two rivals of yours neatly split supply 50-50, and price doesn’t increase. Then you come in and place a bid for $q=1$ (no point using higher $q$ unless you need the activity) for a few rounds and then get out before they decrease their bids.

So this can work for disrupting demand reduction, but only for products you don’t actually want to win (or you’d be raising your own price too).

Demystifying strategies through experimentation

Try the following mini scenarios one or multiple times to better understand tactics.

People are assigned to bidders
Bidders’ valuations may be random (independently among bidders)
- Other bidders know the possible valuations but not which one was selected
People gain points according to their valuation, lose points to pay for won products
Possible bonus points for raising rivals’ prices
Goal is to maximize points, not to get more points than opponent
In an actual auction, the other bidders may not be rational

After gaining familiarity with the mini scenarios, full scale mock auctions may also be helpful.

Scenario: Dealing with uncertainty

1 product, $q_1=2$, 2 bidders

\[v_1(1) = 2, v_1(2) = 3\]

With probability $1/2$: $v_2(1) = 1, v_2(2) = 2$
With probability $1/2$: $v_2(1) = 0, v_2(2) = 2$

Scenario: Are they price raising?

2 products, $q_1=q_2=2$, 2 bidders

$v_1(x, 1) = 2x + 2$, $v_1(x, 2) = 2x + 3$

With probability $1/3$:

$v_2(1, y) = 2$, $v_2(2, y) = 3$

Bonus points for bidder 2 only if its score is positive: price paid by bidder 1 for product 2

With probability $2/3$:

\[v_2(x, 1) = v_2(x, 2) = x + 2\]

Scenario: Cooperating without an obvious Schelling point

1 product, $q_1=3$, 2 bidders

\[v_1(1)=6, v_1(2)=10, v_1(3)=12\] \[v_2(1)=6, v_2(2)=10, v_2(3)=12\]

Scenario: Cooperating with uncertainty 1

1 product, $q_1=1$, 2 bidders

With probability $1/2$: $v_1(1) = 3$
With probability $1/2$: $v_1(1) = 5$;
With probability $1/2$: $v_2(1) = 4$
With probability $1/2$: $v_2(1) = 6$

Scenario: Classic intra-product exposure

1 product, $q_1=3$, 3 bidders

$v_1(3) = 10$, otherwise $v_1(x)=0$

With probability $1/2$: $v_2(1) = v_2(2) = v_2(3) = 5$
With probability $1/2$: $v_2(1) = v_2(2) = v_2(3) = 1$;
With probability $1/2$: $v_3(1) = v_3(2) = v_3(3) = 4$
With probability $1/2$: $v_3(1) = v_3(2) = v_3(3) = 0$

Scenario: Cooperating with uncertainty 2

2 products, $q_1=q_2=3$, 2 bidders

$v_1(1, y) = 3 + \sqrt{2y}$, $v_1(2, y) = 5 + \sqrt{2y}$, $v_1(3, y) = 5 + \sqrt{2y}$

With probability $1/2$: $v_2(1, y) = v_2(2, y) = v_2(3, y) = 2 + \sqrt{y}$
With probability $1/2$: $v_2(1, y) = 4 + \sqrt{y}, v_2(2, y) = 7 + \sqrt{y}, v_2(3, y) = 8 + \sqrt{y}$

Scenario: Universal intra-product complementarity

1 product, $q_1=3$, 2 bidders

\[v_1(1) = 2, v_1(2) = 5, v_1(3) = 9\] \[v_2(1) = 1, v_2(2) = 4, v_2(3) = 8\]

Scenario: Finding a Schelling point

2 products, $q_1=q_2=3$, 2 bidders

\[v_1(x,y) = v_2(x,y) = \sqrt{x} + \sqrt{y}\]

Scenario: Inter-product cooperation 1

2 products, $q_1=q_2=1$, 2 bidders

With probability 1/2: $v_1(x,y) = \sqrt{5x + 3y}$
With probability 1/2: $v_1(x,y) = \sqrt{3x + 5y}$;
With probability 1/2: $v_2(x,y) = \sqrt{5x + 3y}$
With probability 1/2: $v_2(x,y) = \sqrt{3x + 5y}$

Scenario: Inter-product cooperation 2

4 products, $q_1=q_2=q_3=q_4=1$, 2 bidders

With probability 1/2: $v_1(x_1, x_2, x_3, x_4) = \sqrt{5(x_1+x_2) + 3(x_3+x_4)}$
With probability 1/2: $v_1(x_1, x_2, x_3, x_4) = \sqrt{3(x_1+x_2) + 5(x_3+x_4)}$;
With probability 1/2: $v_2(x_1, x_2, x_3, x_4) = \sqrt{5(x_1+x_2) + 3(x_3+x_4)}$
With probability 1/2: $v_2(x_1, x_2, x_3, x_4) = \sqrt{3(x_1+x_2) + 5(x_3+x_4)}$

Guide to the literature: Theoretical

Brusco, Sandro, and Giuseppe Lopomo. 2002. “Collusion via Signaling in Simultaneous Ascending Bid Auctions with Heterogeneous Objects, with and Without Complementarities.” The Review of Economic Studies 69 (2): 407–36.

Synopsis: Increasing the ratio of bidders to products decreases cooperation. Complementaries among products decreases cooperation. Optimal strategy is attempting to cooperate and value bidding if that fails. EP is not part of the model.

Grimm, Veronika, Frank Riedel, and Elmar Wolfstetter. 2003. “Low Price Equilibrium in Multi-Unit Auctions: The Gsm Spectrum Auction in Germany.” International Journal of Industrial Organization 21 (10): 1557–69.

Synopsis: In German 1999 auction, products were split 50-50 between two major players at relatively low prices. A simple game is defined. Assume there are $m$ bidders, and $n=mk$ products each with quantity $1$, bidders have equal valuations with strictly decreasing marginal values. The optimal strategy is to bid on $k$ products each. If someone competes with you for your $k$, value bid.

Brusco, Sandro, and Giuseppe Lopomo. 2009. “Simultaneous Ascending Auctions with Complementarities and Known Budget Constraints.” Economic Theory 38 (1): 105–24.

Synopsis: Analysis of exposure problem. For example: Big bidder has extremely complementary (convex, e.g. $x^2$) values. A number of small bidders have extremely supplementary (concave, e.g. $\sqrt{x}$) values. Due to lack of package bids, big bidder may decide to not bid at all. However, in spectrum auctions I’m not sure if this is a big factor. (Not an issue in CCA/VCG.)

Goeree, Jacob K, and Yuanchuan Lien. 2014. “An Equilibrium Analysis of the Simultaneous Ascending Auction.” Journal of Economic Theory 153: 506–33.

Synopsis: Analysis of exposure problem.

Janssen, Maarten, and Vladimir Karamychev. 2017. “Raising Rivals’ Cost in Multi-Unit Auctions.” International Journal of Industrial Organization 50: 473–90.

Synopsis: Discussion of when raising prices is optimal or suboptimal, when bidders have an interest in doing so.

Guide to the literature: Empirical

Synopsis: See above.

Kwasnica, Anthony M, and Katerina Sherstyuk. 2007. “Collusion and Equilibrium Selection in Auctions.” The Economic Journal 117 (516): 120–45.

Synopsis: Lab experiments were conducted on spontaneous cooperation in auctions. Results (p15): Players cooperate more if they get to play the game many times. As the number of bidders per product increases, cooperation decreases. With complementary products, there was little cooperation.

Cramton, Peter. 2010. “Simultaneous Ascending Auctions.” Wiley Encyclopedia of Operations Research and Management Science.

Synopsis (Sec. 5): Discusses auctions from around 2000 where bidders signaled and coordinated to reduce demand.

Bichler, Martin, Vitali Gretschko, and Maarten Janssen. 2017. “Bargaining in Spectrum Auctions: A Review of the German Auction in 2015.” Telecommunications Policy 41 (5-6): 325–40.

Synopsis: Analysis of German auction in 2015 which featured cooperation, competition, and signaling. The auction had high transparency and a great range of actions (submitting bids higher than clock price). E.g. TEF bids on product A that VOD was bidding on to send message that VOD should reduce demand in product B where TEF and VOD are negotiating demand reduction.

Cramton, Peter, and Axel Ockenfels. 2017. “The German 4G Spectrum Auction: Design and Behaviour.” Oxford University Press Oxford, UK.

Synopsis: Analysis of German auction in 2010 which was competitive due to lack of (or too many) Schelling points. Specifically there were different ways to divide up the blocks that might have made sense depending on factors such as future mergers or network sharing agreements and bidders worked towards conflicting outcomes.

Belief aggregation with computational constraints

2019-09-30T00:00:00+00:00

Imagine a risk-neutral set of traders, each with a common prior which is updated with some private information. The traders buy and sell contingent claims until prices reach an equilibrium. The resultant prices are the conditional expectations of the terminal payoffs under a probability measure $\mathbb{P}$. Is $\mathbb{P}$ equal to the posterior obtained by updating the common prior with the combined private information? At least under certain conditions, yes.

Great, so maybe there should be an efficient distributed algorithm to do Bayesian inference by splitting up the dataset, doing inference on each worker, and then aggregating the results? Well, presumably yes – if workers have an exact representation of their posteriors. But if the workers obtain their posteriors approximately by MCMC sampling, the answer so far is no. Distributed Bayesian consensus methods exist that use heuristics such as weighted averaging but they “lack rigorous justification and provide no guarantees on the quality of inference”.

So, do precise distributed Bayesian inference methods exist? If yes, we unlock a new world of Bayesian big data. If no, what is the character of belief aggregation in markets with computationally bounded Bayesian traders?

North American population growth

2019-08-19T00:00:00+00:00

Which places in North America are growing and which aren’t? City populations are poorly defined and hard to compare but the boundaries of states and provinces are more objective. Here is the growth in population since 1970/71 in states/provinces with over 4 million people.

Province	1971 Pop.	2016 Pop.	Growth
Alberta	1,627,874	4,067,175	150%
British Columbia	2,184,621	4,648,055	113%
Ontario	7,703,106	13,448,494	75%
Quebec	6,027,764	8,164,361	35%

State	1970 Pop.	2016 Pop.	Growth
Arizona	1,770,900	6,931,071	291%
Florida	6,789,443	20,612,439	204%
Colorado	2,207,259	5,540,545	151%
Texas	11,196,730	27,862,596	149%
Georgia	4,589,575	10,310,371	125%
Washington	3,409,169	7,288,000	114%
North Carolina	5,082,059	10,146,788	100%
California	19,953,134	39,250,017	97%
Oregon	2,091,385	4,093,465	96%
South Carolina	2,590,516	4,961,119	92%
Virginia	4,648,494	8,411,808	81%
Tennessee	3,923,687	6,651,194	70%
Maryland	3,922,399	6,016,447	53%
Minnesota	3,804,971	5,519,952	45%
Alabama	3,444,165	4,863,300	41%
Kentucky	3,218,706	4,436,974	38%
Wisconsin	4,417,731	5,778,708	31%
Missouri	4,676,501	6,093,000	30%
Louisiana	3,641,306	4,681,666	29%
Indiana	5,193,669	6,633,053	28%
New Jersey	7,168,169	8,944,469	25%
Massachusetts	5,689,170	6,811,779	20%
Illinois	11,113,976	12,801,539	15%
Michigan	8,875,083	9,928,300	12%
Ohio	10,652,017	11,614,373	9%
Pennsylvania	11,793,909	12,784,227	8%
New York	18,236,962	19,745,289	8%

Among the states with population over 10 million, there is a clear clustering with Florida, Texas, Georgia, North Carolina, and California (all in the south or west coast) at the top, and Illinois, Ohio, Pennsylvania, and New York (all in the midwest or northeast) at the bottom.

Counting solutions to equations

2019-04-08T00:00:00+00:00

There are $\binom{n-1}{m-1}$ integer compositions of $n$ with $m$ parts. Complex polynomials of degree $n$ have $n$ zeros, counting multiplicity. Where else do we count solutions to equations? Our criteria are that the equations must be parametrized, and that for each parameter value there is a finite solution set. Some examples are given.

Integer solutions in a ball:

Theorem: If $f$ is a polynomial over $\mathbb{Z}$ in $n$ variables, let

\[N(f, B) = |\{\mathbf{x} \in \mathbb{Z}^n: f(x_1, \ldots, x_n) = 0, \max_i |x_i| \leq B \}|.\]

If $f$ is a singular homogeneous polynomial over $\mathbb{Z}$ of degree $d$ in $n > (d-1)2^d$ variables, then $N(f, B) \sim c_f B^{n-d}, B \to \infty$, under some technical conditions.

Diophantine equations:

Theorem: Say $f$ is a polynomial of degree $d$ over $\mathbb{Z}_p$ where $GCD(p,d) = 1$. If $N(f)$ is the number of solutions $\mathbf{x} \in \mathbb{Z}_p^n$ to $f(\mathbf{x}) = 0$, then $N(f) = p^{n-1} + O(p^{n/2}), p \to \infty$, assuming a non-singularity condition.

Non-negative integer solutions to linear equations:

E.g. $\{ (x,y,z) : 3x + 5y + 17z \leq \lambda, x \geq 0, y \geq 0, z \geq 0 \}$.

Theorem: Let $\Delta(\lambda) = \{ \mathbf{x} \in \mathbb{Z}^n: M \mathbf{x} \leq \lambda\mathbf{b} \}$. Then $|\Delta(\lambda)|$ is a polynomial in $\lambda$ of degree $n$.

Note that wlog $\mathbf{b}$ takes possible values $-1,0,1$. If not, multiply $b_i$ and $[M]_{i,*}$ by $\textrm{lcm}(\mathbf{b})/b_i$ and set $\lambda' = \lambda / \textrm{lcm}(\mathbf{b})$.

If $\mathbf{b}$ takes possible values $-1,0,1$, we may take the difference $\Delta(\lambda) \setminus \Delta(\lambda -1)$ to get solutions to an equality.

(Above: Example solution sets for different values of $\lambda$.)

Locally restricted words over finite groups:

Theorem: If $G$ is a finite group and $x_1, \ldots, x_m \in G$, let $N(m, a)$ be the number of solutions to $x_1 \cdots x_m = a$ such that $(x_1, \ldots, x_m)$ satisfies a local restriction. Then under some conditions, as $m \to \infty$ we have $N(m, a) \sim N(m, e)$, where $e \in G$ is the identity element.

Quote: Bertrand Russell's anecdote

2019-02-05T00:00:00+00:00

To return to my grandmother’s family … the … sister, Lady Charlotte Portal was … apt to express herself unfortunately. On one occasion when she had to order a cab for three people, she thought a hansom would be too small and a four-wheeler too large, so she told the footman to fetch a three-wheeled cab. On another occasion, the footman, whose name was George, was seeing her off at the station when she was on her way to the Continent. Thinking that she might have to write to him about some household matter she suddenly remembered that she did not know his surname. Just after the train had started she put her head out of the window and called out, ‘George, George, what’s your name?’ ‘George, My Lady’, came the answer. By that time he was out of earshot.

-Bertrand Russell, “Autobiography”

Advice for students

2019-01-16T00:00:00+00:00

Be challenged. Seek material at the level and pace appropriate for you. Learn with people who aren’t all dumber than you.
If you’re motivated to learn or build something, do it. If you’re not motivated, don’t force yourself.
Don’t be afraid of the unknown. Just because a topic is advanced or is in an unfamiliar field doesn’t mean it’s difficult to learn; go for it.
Some things can’t be learned from textbooks because the textbooks don’t exist, e.g. decision theory.
Some concepts take a while to really absorb, possibly years. In the words of John von Neumann, “Young man, in mathematics you don’t understand things. You just get used to them.”
There’s nothing wrong with funding your studies by borrowing against future earnings as long as you’re not overpaying for your education. Internships are great too.
Don’t do undergrad if you don’t need to. Subjects like mathematics, computer science, and economics can be learned conveniently and effectively using books, videos, and other resources from the internet. If you need an academic credential, write a paper with a professor in your city and get it published. The mentorship will be valuable and with just publications and reference letters applying to grad school is an option (see e.g. link).
If you formally take a course, ideally learn the material by yourself beforehand (paradoxical as that sounds).
Don’t go to a bad grad school. At the graduate level, low-status schools have poor funding, low research activity, and few students.
Get a thesis supervisor who is unambiguously an expert in the field.
Attend economics seminars because they’re a blast.

Qs

2018-07-16T00:00:00+00:00

(These are some questions, which may well be already addressed in the literature, I haven’t checked.)

Should someone start selling insurance against online mob victimization and other “life/career ruining” reputation attacks?

How should a country transition to futarchy, if it starts with a highly corrupt government?

Should children be able to sue parents if they divorce, give birth out of wedlock, etc?

People have a tendency, perhaps intentional, to claim that policies themselves, rather than the policies’ apparent goals, are axiomatic good things. Is this a vulnerability in futarchy, since then they would be voted on as values rather than bet on as beliefs?

Why is betting psychologically safe in finance (that is, trading) but dangerous in casinos? As binary options are prone to fraud according to Wikipedia, what does that say about the viability of prediction markets?

How should we apply and evaluate informal models, e.g. the broad theories of Carl Jung and Stephen Wolfram? Does abstract/vague/informal/subjective/qualitative imply unfalsifiable or is there a way of formalizing the informal?

If the health system is inefficient, how much of the problem is caused by doctors (and their guilds)? Similarly for legal system and lawyers.

Why are “theories” helpful in science but “ideologies” unhelpful in politics?

Was Julian Simon right? What is the real relationship between population and GDP in the short and long run?

People from Bertrand Russell to Tyler Cowen and Peter Thiel have remarked on increasing societal risk/change aversion and the concomitant stagnation – what is best to reboot “dynamism”: Move to a different country? Reform the current one? Seasteading? Form more-dynamic enclave within the country? Work in digital/crypto realm? Is this a farmer-forager issue? Can foragers be dynamic?

If certain mental traits are Zahavian signals that indicate computational resources, does this, pace Miller, predict asymmetry in cognitive abilities between the sex that signals and the sex that chooses, assuming $P \neq NP$?

In Japan, finding a defendant not guilty is culturally frowned upon, and the consequences of this are predictably bizarre; see link1, link2. Is there a hidden rationality here?

If tastes in physical beauty change a lot, to what extent can they be explained by sexual selection?

Why are criminals so disliked?

In schools, why do teachers but not students get comfortable office chairs?

Why does Microsoft employ developers in Seattle when they could get them for half price in Vancouver?

What’s the modern appeal of live music? Is there a more interesting way that musicians can perform live than playing rehearsed songs?

Why do people get temporary disabilities (diseases) but not temporary superpowers?

Quote: Knuth on Dijkstra

2018-06-20T00:00:00+00:00

One of the pleasures I’ve had over the years is to play four-hands piano music with Edsger. … When we’re playing a Haydn waltz the thing I had to get used to was that Edsger doesn’t count one-two-three, one-two-three it’s always zero-one-two, zero-one-two.

-Don Knuth, 2000

Link: Generated poetry

2018-05-28T00:00:00+00:00

https://www.reddit.com/user/haikubot-1911/comments/?sort=top

Procedure: Search for groups of sentences that form haikus. Syllable counting can be done with pyphen. Format the haiku-valid text into three lines and do capitalization. Post the poems and measure quality with votes.

Three excellent lecture videos

2018-05-25T00:00:00+00:00

1. Aakar Patel - English and its influence on our national priorities (2016). This examines the peculiar phenomenon in India where the popular media, due to the distribution of languages spoken and very high advertising revenues, ends up being skewed towards elites and their issues, systematically biasing the propagation of news information.

2. David Starkey - When I hear the word ‘art’, I reach for my gun (2017). Starkey gives a sweeping account of the history of art, in a broad sense, and illuminates the phenomena of Dadaism and modern art.

3. Edsger Dijkstra @ Joint International Seminar on the Teaching of Computing Science (1992). Dijkstra presents an algorithmic problem and walks through a solution based on ideas in A Discipline of Programming, where formal semantics are used to guide the search for an algorithm.

Quote: Maud Ray Kent

2018-04-07T00:00:00+00:00

Paraphrasing from Macrae’s biography of J. von Neumann:

In 1940 the Germans captured Denmark plus Niels Bohr. A cable arrived in Britain from Otto Frisch’s aunt in Sweden saying, “Met Niels and [wife] Margarethe recently. Both well but unhappy about events. Please inform Cockroft [British nuclear scientist] and Maud Ray Kent”.

Who or what was “Maud Ray Kent”? The British decided it was an anagram for “radyum taken” and thus gave warning that the Germans were moving fast to develop an A-bomb. Perhaps that was the reason the Nazis had occupied Bohr’s Copenhagen, and were capturing Norway and its heavy water too? Someone suggested that Maud might actually stand for “Military Application: Uranium Disintegration”. The British nuclear program commenced under the name “MAUD Committee”, as a tribute to the brilliant anagram.

Down in her home in Kent, Miss Maud Ray, the English governess to Bohr’s children, remained uncontacted because nobody had heard of her.

Classic probability puzzles and their solutions

2018-01-17T00:00:00+00:00

Envelope paradox

Problem: You are given two blank envelopes which each contain money. One envelope contains twice as much as the other. You may choose an envelope and keep the money it contains. After choosing, you have the option to switch for the other envelope. Should you switch?

Pitfall: Clearly there is no reason to switch (or not to switch) since the envelopes are blank and at no point do you learn anything new about their contents. However, the following argument seems to show you actually should switch. Let $X$ be the amount of money in the envelope chosen originally. The other envelope contains an amount of either $2X$ or $X/2$, each with probability $1/2$. Thus the expected value of the other envelope is

\[\frac{1}{2} (2X) + \frac{1}{2} (X/2) = \frac{5}{4} X > X.\]

…so you should switch?

Solution: Whenever things get tricky, it’s best to be as formal and methodical as possible. What do we actually mean by $X$? We’re taking $X$ to be the amount of money in the envelope we choose originally. That means $X$ is a random variable whose value depends on the random choice of the original envelope. It’s true that $X$ is equally likely to be either the smaller or larger amount. Let these be $x$ and $2x$. Then we need to find the expected value of the other envelope. Let the value of the other envelope be $X'$. Implicitly we are finding the expected value of $X'$ by conditioning on the value of $X$. There are two possibilities, and in each case we know what we get:

\[\begin{align*} \mathbb{E}(X') &= \mathbb{P}(X' > X)\mathbb{E}(X' | X' > X) + \mathbb{P}(X' < X)\mathbb{E}(X' | X' < X) \\ &= \frac{1}{2}\mathbb{E}(X' | X' > X) + \frac{1}{2}\mathbb{E}(X' | X' < X) \end{align*}\]

Note that depending on whether $X' > X$ or $X' < X$ the value of $X$ is different. If $X' > X$, then $X = x$ and $X' = 2X = 2x$, and if $X' < X$, then $X = 2x$ and $X' = X/2 = x$. So,

\[\begin{align*} \frac{1}{2}\mathbb{E}(X' | X' > X) + \frac{1}{2}\mathbb{E}(X' | X' < X) &= \frac{1}{2}\mathbb{E}(X' | X'=2X) \\ &\qquad {} + \frac{1}{2}\mathbb{E}(X' | X'=X/2) \\ &= \frac{1}{2}2x + \frac{1}{2}x \\ &= \frac{3}{2}x \\ &= \mathbb{E}(X). \end{align*}\]

Comparing with the pitfall solution, the key difference is that we cannot use the same symbol $X$ in two cases if our assumption about the value of $X$ is different in each case. Tricky!

Monty Hall problem

Problem: In the TV game show Let’s Make a Deal, you get to win a prize by opening one of three doors. Behind one door is a car and behind the others are goats. You pick a door, say #1, and without opening #1 the host Monty Hall intentionally shows you that a goat is behind another door, say #3, and gives you the chance to change to #2. Should you change doors?

Pitfall: You have the choice between two doors, one with a car, the other with a goat. Reasoning by symmetry, the probability of each configuration is $1/2$, so there is no point switching.

Solution: It’s true that we know there are two possible configurations at this point, but that’s not all we know. We also know that #2 was not a door chosen by Monty as a door with a goat. This gives us an extra clue that #2 might have the car.

Again, the best way to deal with tricky problems is being as rigorous as possible. Let’s explicitly set up a probability model as follows. Let $C \in \{1,2,3\}$ be the random variable for the door with the car. Let $S_3$ be the event that Monty chooses #3 to show a goat. Then we can write down the following conditional probabilities.

\[\begin{align*} \mathbb{P}(S_3|C=1)&=\frac{1}{2} \\ \mathbb{P}(S_3|C=2)&=1 \\ \mathbb{P}(S_3|C=3)&=0. \end{align*}\]

These are all we need to compute $\mathbb{P}(C=2|S_3)$, which is the probability of winning if we switch. Using Bayes’s rule, we have

\[\begin{align*} \mathbb{P}(C=2|S_3) &= \frac{\mathbb{P}(S_3|C=2)\mathbb{P}(C=2)}{\mathbb{P}(S_3)} \\ &=\frac{\mathbb{P}(S_3|C=2)\mathbb{P}(C=2)}{ \sum_{i=1}^3 \mathbb{P}(S_3|C=i)\mathbb{P}(C=i)} \\ &=\frac{\mathbb{P}(S_3|C=2)}{ \mathbb{P}(S_3|C=1)+\mathbb{P}(S_3|C=2)+\mathbb{P}(S_3|C=3)} \\ &=\frac{1}{\frac12+1+0} =\frac23 > \frac{1}{2}. \end{align*}\]

Feminist bank teller question

Problem: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

Linda is a bank teller.
Linda is a bank teller and is active in the feminist movement.

Pitfall: The description makes it very plausible that Linda is active in the feminist movement, therefore #2 is more likely than #1.

Solution: This is an example of the conjunction fallacy. Let $B$ be the event Linda is a bank teller, and let $F$ be the event that Linda is active in the feminist movement. Then we can immediately say $\mathbb{P}(B) \geq \mathbb{P}(F \cap B)$ since $F \cap B \subseteq B$. And so #2 cannot be more probable than #1.

In general, if more details are added, we cannot become more confident in a claim. This goes against a bias we have to consider situations more plausible if they are specific and vivid. For more on cognitive biases and heuristics, see Thinking, Fast and Slow, but also see the mistakes in that book caused by cognitive biases and heuristics.

Base rate neglect

Problem: A certain disease affects 1 in 1000 people. A medical diagnostic test for the disease has 95% accuracy, i.e. 95% of the time it gives the correct diagnosis (whether you’re diseased or not). Suppose you take the test and it reads positive, what is the probability that you have the disease?

Pitfall: Since the diagnostic has 95% accuracy, then in my case I can conclude that the probability I have the disease is 95%.

Solution: Let $D$ be the event I have the disease, and let $P$ be the event I receive a positive diagnosis. We seek $\mathbb{P}(D|P)$. Bayes’s rule gives

\[\begin{align*} \mathbb{P}(D|P) &= \frac{\mathbb{P}(P|D)\mathbb{P}(D)}{\mathbb{P}(P)} \\ &= \frac{\mathbb{P}(P|D)\mathbb{P}(D)}{\mathbb{P}(P|D)\mathbb{P}(D) + \mathbb{P}(P|\neg D)\mathbb{P}(\neg D)} \\ &= \frac{(0.95)(0.001)}{(0.95)(0.001) + (0.05)(0.999)} \\ &= 0.019 = 1.9\%. \end{align*}\]

So the probability I have the disease is actually very small, even though I got a positive test. This is another case of not using all the information we have. The 95% accuracy needs to be combined with the very low probability that anyone has the disease. Problems of this type have been given to doctors and medical students who often fail to find the solution.

Birthday paradox

Problem: In a group of 23 people, what is the probability that at least 2 of them have the same birthday?

Pitfall: The probability must be small since there are only 23 people and 365 possible birthdays.

Solution: Let $N$ be the event no 2 people have the same birthday. The total number of assignments from people to birthdays is $365^{23}$. The total number of assignments from people to birthdays with no repeats is $365 \cdot 364 \cdot \,\cdots\, \cdot 343$ since there are 365 possibilities for the first person’s birthday, which leaves 364 possibilities for the next, and so on. But

\[\mathbb{P}(N) = \frac{365^{23}}{365 \cdot 364 \cdot\, \cdots\, \cdot 343} = 0.492703.\]

Then the probability that 2 people share a birthday is $1-0.492703=0.507297$. So the probability isn’t small; in fact it’s greater than $1/2$! It’s true that any given pair of people are unlikely to share a birthday, but as the size of the group grows, the probability that all pairs do not share a birthday becomes small. There are $\binom{23}{2} = 253$ different pairs of people in a group of 23.

What's the probability of the Riemann Hypothesis?

2017-10-10T00:00:00+00:00

Usually when we talk about probabilities, we have certain given information, which takes the form of a $\sigma$-algebra of possible events, and there is also a probability function that assigns values to each event. The rationality of a probability function is judged based on the relationships between events. For example if $A \subseteq B$ then we must have $P(A) \leq P(B)$. But as long as these relationships are satisfied (giving a proper probability measure), the probabilities could be anything. As such, we do not judge subjective probabilities based on whether they’re actually accurate or not, just whether they are consistent with each other.

Now, imagine if information isn’t the limiting factor in our uncertainty, but rather it’s our lack of mathematical knowledge. A statement like the Riemann Hypothesis (RH) is unknown even though it is entirely determined by the axiom system we use, leaving aside issues of completeness. Here there’s no given $\sigma$-algebra and in fact the relationships between RH and other statements may themselves be difficult to determine.

A more realistic view is that we have limited computational resources, we want to solve an intractible problem, and we’ll settle for the best approximation we can get. Thus a probability function is seen as a kind of approximation algorithm. With this algorithmic language, however, we aren’t able to give a very good answer for single propositions like RH. If RH is the entire set of inputs, the optimal approximation is the exact truth value, because it takes a trivial amount of computational resources to output the constant 1 or 0. If the set of inputs is infinite, then the particular input corresponding to RH makes no difference in an asymptotic analysis. For more elaboration on this theme, see this paper.

In traditional Bayesianism there is a seemingly ineradicable source of subjectivity from the choice of prefix Turing machine used to define Solomonoff’s prior. Any one input can be assigned a wide range of probabilities. Perhaps we are left with an analogous but different kind of subjectivity for mathematical probabilities.

(Above: Andrew Critch thinking about this in Berkeley.)

The Fundamental Theorem of Asset Pricing is Bayesianism

2017-08-17T00:00:00+00:00

If uncertainties encode bet preferences as represented by probabilities, Bayesianism is a collection of Dutch book arguments proving that probabilities must be consistent with each other (defining a probability measure) to be rational. Weisberg has an excellent paper that explains the details. On the other hand, the Fundamental Theorem of Asset Pricing proves that for prices to be arbitrage-free, they must be conditional expectations. Details on the relevant results are found in “The Mathematics of Arbitrage”, by Delbaen and Schachermayer. Having a consistent probability function has been shown to be equivalent to minimizing a proper scoring rule. And conditional expectations have been shown to minimize Bregman divergences. Et cetera. The correspondance between these theories is alluded to by Nau.

Data from Canadian 700MHz and 2500MHz spectrum auctions

2017-06-28T00:00:00+00:00

The 700MHz (2014) and 2500MHz (2015) spectrum auctions generated revenues of 5,270,636,002 CAD from 302 licenses and 755,371,001 CAD from 97 licenses. Both auctions used a combinatorial clock auction (CCA) format involving an ascending clock phase followed by a sealed-bid supplementary stage where bids could be made on packages of products. Final prices were determined using Vickrey pricing with a core-adjustment. An activity rule was used which required bidders to make bids or lose eligibility to bid in later clock rounds, along with a revealed preference rule which allows the eligibility limit to be exceeded as long as consistency checks are satisfied. For full details on the auction formats see the official documentation (700MHz rules, 700MHz additional details, 2500MHz rules); and the record of bids placed is here for 700MHz and here for 2500MHz.

Bid consistency

The revealed preference rule prevents some inconsistent behavior but not all. By “truthful”, we mean bids that are true indications of subjective value, and by “consistent” we mean bids that are indications of some fixed set of valuations, possibly not the bidder’s actual valuations.

The following table gives the values of Afriat’s critical cost efficiency index (CCEI) for the 700MHz auction. Recall that for a CCEI value $x$, if $x < 1$ there is at least some intransitivity in preferences (i.e. inconsistent bidding) and $1-x$ can be interpreted as the fraction of expenditure wasted making inefficient choices (see this by S. Kariv for more).

Bidder	CCEI (clock rounds)	CCEI (clock and supp. rounds)
Bell	0.930	0.417
Bragg	0.880	0.420
Feenix	1	1
MTS	0.996	0.627
Novus	1	1
Rogers	0.998	0.742
SaskTel	1	1
TBayTel	1	1
Telus	0.970	0.488
Videotron	0.879	0.560

Kroemer et al. conclude for the 700MHz auction, “the numbers suggest that bidders deviated substantially from straightforward bidding” in the clock rounds. But “it is not unreasonable to believe that bidders tried to bid up to their true valuation in the supplementary stage” because of higher bid amounts compared to the clock rounds.

The next table gives CCEI values for the 2500MHz auction. We extend the definition of CCEI to apply to supplementary bids as in Kroemer’s paper.

Bidder	CCEI (clock rounds)	CCEI (clock and supp. rounds)
Bell	0.913	0.712
Bragg	0.920	0.530
Corridor	1	1
MTS	1	1
Rogers	1	1
SSi Micro	1	1
TBayTel	1	1
Telus	0.997	0.996
Videotron	1	1
WIND	1	1
Xplornet	1	0.578

Kroemer et al. (Sec. 5.2) also point out that the total number of bids submitted in the 700MHz auction was much smaller than the number of possible bids, which probably indicates untruthful bidding since an omitted package must have valuation less than or equal to its (low) opening price. The same observation holds for the 2500MHz auction. More exactly, the auction formats enforced a limit on the number of packages bidders were allowed to submit, which was in the hundreds, and bidders generally did not reach the limit.

Ideally, we would determine whether the bids made are consistent with a non-truthful strategy incorporating gaming and/or coordination. The papers Janssen and Karamychev - “Raising Rivals’ Cost in Multi-unit Auctions” and Janssen and Kasberger - “On the Clock of the Combinatorial Auction” derive Bayesian Nash equilibria under gaming preferences and conclude that GARP is not violated in equilibrium gaming strategies. We note that the assumptions in these game models do not include all features of the CCAs under consideration e.g. discrete products, many bidders, public aggregate excess demand, revealed preference rule, initial EP limit, supplementary package limit, 50% mid-auction deposits.

Bids, budgets, and final prices

Bidders may have a notion of a budget – the maximum they are willing to spend. But how should this correspond to the maximum they should bid? In general, bidders may end up paying the exact amount of their highest bid, but looking at the data we see bid prices and final prices can be very different in practice. The following tables show figures from both auctions that illustrate this difference. All prices are given in CAD.

700MHz auction: highest bid placed. Average ratio: 0.192. Max ratio: 0.766.

Bidder	Max bid ($M$)	Allocation stage final price ($p$)	Ratio ($p/M$)	Final clock bid
Bell	3,999,999,000	565,705,517	0.141	1,366,867,000
Bragg	141,894,000	20,298,000	0.143	38,814,000
Feenix	60,650,000	284,000	0.005	346,000
MTS	73,067,000	8,772,072	0.120	10,853,000
Novus	112,359,000	0	0	0
Rogers	4,299,949,000	3,291,738,000	0.766	3,931,268,000
Sasktel	75,000,000	7,556,929	0.101	11,927,000
TbayTel	7,683,000	0	0	0
Telus	3,750,000,000	1,142,953,484	0.305	1,313,035,000
Videotron	677,524,000	233,328,000	0.344	468,530,000

700MHz auction: (highest) bid placed on package eventually won. Average ratio: 0.447. Max ratio: 0.766.

Bidder	Max bid on won package ($W$)	Allocation stage final price ($p$)	Ratio ($p/W$)	Allocation stage Vickrey price
Bell	2,583,868,000	565,705,517	0.219	565,705,000
Bragg	51,000,000	20,298,000	0.398	20,298,000
Feenix	425,000	284,000	0.668	284,000
MTS	40,000,000	8,772,072	0.219	3,198,000
Novus	N/A	0	N/A	0
Rogers	4,299,949,000	3,291,738,000	0.766	3,291,738,000
Sasktel	62,400,000	7,556,929	0.121	2,755,000
TbayTel	N/A	0	N/A	0
Telus	1,607,300,000	1,142,953,484	0.711	1,142,953,000
Videotron	490,000,000	233,328,000	0.476	233,328,000

2500MHz auction: highest bid placed. Average ratio: 0.135. Max ratio: 0.277.

Bidder	Max bid ($M$)	Allocation stage final price ($p$)	Ratio ($p/M$)	Final clock bid
Bell	542,746,000	28,730,000	0.053	76,214,000
Bragg	35,935,000	4,821,021	0.134	12,091,000
Corridor	9,300,000	2,299,000	0.247	N/A
MTS	13,609,000	2,242,000	0.165	2,609,000
Rogers	304,109,000	24,049,546	0.079	52,343,000
SSi Micro	851,000	0	0	0
TBayTel	12,001,000	1,731,000	0.144	1,731,000
Telus	1,771,723,000	478,819,000	0.270	1,038,472,000
Videotron	749,128,000	66,552,980	0.089	231,851,000
WIND	22,609,000	0	0	0
Xplornet	91,974,000	25,472,454	0.277	57,839,000

2500MHz auction: (highest) bid placed on package eventually won. Average ratio: 0.235. Max ratio: 0.410.

Bidder	Max bid on won package ($W$)	Allocation stage final price ($p$)	Ratio ($p/W$)	Allocation stage Vickrey price
Bell	536,563,000	28,730,000	0.054	28,730,000
Bragg	19,000,000	4,821,021	0.254	3,536,000
Corridor	6,440,000	2,299,000	0.357	2,299,000
MTS	11,000,000	2,242,000	0.204	2,242,000
Rogers	OR	24,049,546	N/A	21,252,000
SSi Micro	N/A	0	N/A	0
TBayTel	12,001,000	1,731,000	0.144	1,731,000
Telus	1,771,723,000	478,819,000	0.270	478,819,000
Videotron	358,477,000	66,552,980	0.186	61,092,000
WIND	N/A	0	N/A	0
Xplornet	62,200,000	25,472,454	0.410	22,917,000

Across both auctions, we see that bidders paid an average of 16% of their maximum bid placed, where each bidder is equally weighted.

Misc. notes

Researchers design approximation algorithms for winner and price determination in auctions (in e.g. this paper) because exact optimization can be intractable as the number of bids grows, at least in the worst case. However, in these recent auction instances, theoretical intractability did not present a problem because the solution was computable in a small amount of time. The 2500MHz allocation stage involved 2,239 bids and a GLPK-powered solver finds the winners and final prices in a couple minutes on a standard computer. Simulations involving well over 30,000 random bids still take a feasible amount of time.

In the 2500MHz auction, there were 4 pairs of package submissions where the lower-priced package had strictly higher quantities of products. In this case the lower-priced package is superfluous. This table shows the packages, submitted by Bell.

	Price of larger package (CAD)	Price of smaller package (CAD)	Number of products difference
1	535,917,000	536,214,000	3
2	536,628,000	536,645,000	3
3	536,401,000	536,545,000	3
4	536,434,000	536,563,000	3

Acknowledgement: Thanks to Z. Gao for pointers.

Link of the day: Journal of Craptology

2017-06-21T00:00:00+00:00

http://www.anagram.com/jcrap/

The prestigious Journal of Craptology is an open-access publication in the area of cryptology.

U of Toronto study space info

2017-06-09T00:00:00+00:00

(a partial list)

(Above: U of T cannons pointed at Ryerson)

Libraries

Robarts Library

noise	low
power plugs	yes
network access	good
ergonomics	poor
busyness	busy
typical hours	8:30-11

Richard Charles Lee Canada-Hong Kong Library

noise	low
power plugs	yes
network access	good
ergonomics	poor
busyness	modest
typical hours	10-7

Gerstein Science Information Centre

noise	low
power plugs	some
network access	poor
ergonomics	poor
busyness	busy
typical hours	8:30-11

Sanford Fleming Library

noise	medium
power plugs	yes
network access	poor
ergonomics	ok
busyness	busy
typical hours	9-6

Mathematics Library

noise	low
power plugs	some
network access	ok
ergonomics	poor
busyness	busy
typical hours	9-5

EJ Pratt Library

noise	low
power plugs	some
network access	?
ergonomics	poor
busyness	busy
typical hours	8:30-11:45

Architecture Library (Eberhard Zeidler Library)

noise	low
power plugs	yes
network access	?
ergonomics	poor
busyness	busy
typical hours	9-9

Kelly Library

noise	?
power plugs	yes
network access	good
ergonomics	poor
busyness	?
typical hours	8:30-11:30

The Inforum (Faculty of Information)

noise	?
power plugs	yes
network access	good
ergonomics	poor
busyness	?
typical hours	9:30-9:30

East Asian Library

noise	?
power plugs	no
network access	?
ergonomics	poor
busyness	?
typical hours	9-7:30

Centre for Reformation and Renaissance Studies

noise	?
power plugs	no
network access	?
ergonomics	poor
busyness	?
typical hours	9-5

Local Toronto public library branches

Toronto Reference Library

noise	low
power plugs	some
network access	yes
ergonomics	poor
busyness	medium
typical hours	9-8:30

Lillian H. Smith public library

noise	low
power plugs	some
network access	good
ergonomics	poor
busyness	medium
typical hours	9-8:30

Other areas with seating

Tables in Bahen Centre

noise	moderate
power plugs	yes
network access	ok
ergonomics	poor
busyness	busy
typical hours	?

Markov of Chain: Automating Weird Sun tweets

2017-05-27T00:00:00+00:00

Let’s use python to train a Markov chain generator using all the tweets from a certain list of users, say this one. We’ll use the following libraries.

from functional import seq
import markovify
import re
import tweepy
import unidecode

To use the Twitter API, we need to authenticate ourselves. Register for your personal keys at https://apps.twitter.com/ and then create a config.json file that looks like this

{
  "consumer_key":    "...",
  "consumer_secret": "...",
  "access_key":      "...",
  "access_secret":   "..."
}

Now we can initialize the Twitter API provided by tweepy.

config = seq.json('config.json').dict()
auth = tweepy.OAuthHandler(
    config['consumer_key'], config['consumer_secret'])
auth.set_access_token(config['access_key'], config['access_secret'])
api = tweepy.API(auth)

First we write the following function (based on this gist) which returns the most recent tweets of a given user. The API limits us to at most 3240 tweets per user.

def get_user_tweets(screen_name):
    alltweets = []

    #  200 is the maximum allowed count
    # 'extended' means return full unabridged tweet contents
    new_tweets = api.user_timeline(screen_name=screen_name, count=200,
                                  tweet_mode='extended')

    alltweets.extend(new_tweets)

    # save the id of the oldest tweet less one
    oldest_id = alltweets[-1].id - 1

    # keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        # since we're grabbing 200 at a time, we use `max_id` to
        #   ask for a certain range of tweets
        new_tweets = api.user_timeline(
                screen_name = screen_name, count=200,
                tweet_mode='extended', max_id=oldest_id)

        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest_id = alltweets[-1].id - 1

        print("...{} tweets downloaded so far".format(len(alltweets)))

    # put each tweet on a single line
    tweet_texts = [re.sub(r'\s*\n+\s*', ' ', tweet.full_text)
                   for tweet in alltweets]

    return tweet_texts

The other interaction with Twitter we need to perform is get all users in a list. We’ll write a function that fetches the usernames and calls get_user_tweets on each:

def get_list_tweets(screen_name, list_name):
    '''
    params: `screen_name` is the username of the owner of the list,
    `list_name` is the name of the list found in the URL
    '''

    # get list of all users in list
    user_names = []
    for user in tweepy.Cursor(
            api.list_members,
            screen_name,
            list_name).items():
        user_names.append(user.screen_name)

    # for each user, get their tweets
    list_tweets = []
    for user_name in user_names:
        list_tweets += get_user_tweets(user_name)
    print('Found {1} tweets from @{2}.'
        .format(len(list_tweets), user_name))
    return list_tweets

Let’s run get_list_tweets and save the output to a file.

tweets = get_list_tweets('Grognor', 'weird-sun-twitter')

with open('data/tweetdump.txt', 'w') as f:
    f.write('\n'.join(tweets))

With all of the raw data saved, we’re done with the Twitter API and we can process the data and auto-generate tweets offline. Assuming the file tweetdump.txt has a set of tweets, one per line, we load them as a list of strings tweets.

tweets = open('data/tweetdump.txt').readlines()

Some processing needs to be done in order to get high quality text from the tweets. The next function process_tweet is called on each one.

def process_tweet(tweet):
    # convert to ASCII
    tweet = unidecode.unidecode(tweet)
    # remove URLs
    tweet = re.sub(r'http\S+', '', tweet)
    # remove mentions
    tweet = re.sub(r'@\S+', '', tweet)

    tweet = tweet.strip()

    # append terminal punctuation if absent
    if len(tweet) > 0:
        last_char = tweet[-1]
        if last_char not in '.!?':
            tweet += '.'

    return tweet

processed_tweets = [ process_tweet(tweet) for tweet in tweets ]

And we remove any tweets that aren’t useful.

def is_excluded(tweet):
    ex = False
    # no RTs
    ex = ex or bool(re.match(r'^RT', tweet))
    # remove whitespace-only tweets
    ex = ex or bool(re.match(r'^\s*$', tweet))
    return ex

good_tweets = [ tweet for tweet in processed_tweets
               if not is_excluded(tweet) ]

We save the fully processed tweets for easy access later.

with open('data/processed_tweets.txt', 'w') as f:
    f.write('\n'.join(good_tweets))

The markovify library lets us train, and generate from, a Markov chain very easily. Just load the training text and set a state size.

text = open('data/processed_tweets.txt').read()

text_model = markovify.Text(text, state_size=3)

for x in range(5):
    print('* ' + text_model.make_short_sentence(140))

Some favorites:

“It is no coincidence we call them gods because we suppose they are trying to convince Robin Hanson.”
“Tell anyone who does not produce Scott Alexander.”
“Weird sun is a costly signal of the ability to remember sources of information, not just the study of complex manifolds.”
“If you read The Hobbit backwards, it’s about a layer of radioactive ash that develops the capacity to become larger.”
“When you read a physical book, you get a dust speck in the eye.”
“We all continuously scream about how the people in it are breaking the awkward silence.”
“People are important, but so are lexicographic preferences.”
“You don’t need an expert Bayesian Epistemologist to ensure it’s not a markov chain.”

Building a shell with JavaScript

2017-05-20T00:00:00+00:00

ShellJS is a JS library that provides functions like cd() and ls() which you can use to write Node scripts instead of bash scripts. That’s great for scripts, but what about an interactive shell? Well, we could just run the Node repl and import ShellJS:

$ node
> require('shelljs/global');
{}
> pwd()
{ [String: '/tmp']
  stdout: '/tmp',
  stderr: null,
  code: 0,
  cat: [Function: bound ],
  exec: [Function: bound ],
  grep: [Function: bound ],
  head: [Function: bound ],
  sed: [Function: bound ],
  sort: [Function: bound ],
  tail: [Function: bound ],
  to: [Function: bound ],
  toEnd: [Function: bound ],
  uniq: [Function: bound ] }

Hmm, that’s a little verbose, and we might want to avoid manually importing ShellJS. We also might want more features than the Node repl offers, such as vi keybindings.

We can get vi keybindings with rlwrap, but then tab completion goes away. The solution is given in this SO answer. First we need to install an rlwrap filter that negotiates tab-completion with a Node repl. The filter file can be found at that link, where it’s called node_complete. Put node_complete in $RLWRAP_FILTERDIR, which should be the folder on your system containing the RlwrapFilter.pm Perl module. For me it’s /usr/share/rlwrap/filters.

Now rlwrap is ready to negotiate tab completion, but the Node repl isn’t. We’ll have to actually write our own Node repl, which is easy because the repl module gives us all the tools we need. We’ll create a file called, say, myrepl.js, the contents of which are also given in the SO answer, only 9 lines. This script starts a repl with a hook to negotiate tab completion with rlwrap. If myrepl.js is in ~/bin, now we can run

$ rlwrap -z node_complete -e '' -c ~/bin/myrepl.js

and have both JS tab completion and rlwrap features, such as vi keybindings if that’s what we’ve configured. Let’s create a file called mysh with the following contents:

#!/usr/bin/env bash
rlwrap -z node_complete -e '' -c ~/bin/myrepl.js

Assuming ~/bin is in our path variable, we can put mysh there and launch our shell anywhere by just running mysh. So far so good but we wanted to automatically import ShellJS. In myrepl.js, add the following:

var shell = require('shelljs');
Object.assign(myrepl.context, shell);

Those two lines add all the ShellJS functions to the JS global object inside the repl. We have:

$ mysh
> pwd()
{ [String: '/tmp']
  stdout: '/tmp',
  stderr: null,
  code: 0,
  cat: [Function: bound ],
  exec: [Function: bound ],
  grep: [Function: bound ],
  head: [Function: bound ],
  sed: [Function: bound ],
  sort: [Function: bound ],
  tail: [Function: bound ],
  to: [Function: bound ],
  toEnd: [Function: bound ],
  uniq: [Function: bound ] }

Progress. Now, how do we clean up this output? The repl module allows us to define a custom writer. This is a function which takes the output of a line of JS and returns a string to represent the output in the repl. What we need to do is intercept objects like the one returned by pwd() above and only show the stderr and stdout properties. Add the following near the beginning of myrepl.js:

var util = require('util');

var myWriter = function(output) {
  var isSS = (
      output &&
      output.hasOwnProperty('stdout') &&
      output.hasOwnProperty('stderr'));
  if (isSS) {
    var stderrPart = output.stderr || '';
    var stdoutPart = output.stdout || '';
    return stderrPart + stdoutPart;
  } else {
    return util.inspect(output, null, null, true);
  }
};

And load this writer by changing

var myrepl = require("repl").start({terminal:false});

var myrepl = require("repl").start({
  terminal: false,
  writer: myWriter});

Now we get

$ mysh
> pwd()
/tmp

Much better. However, since the echo function prints its argument to the console and returns an object with it in the stdout property, we get this:

$ mysh
> echo('hi')
hi
hi

I haven’t solved this issue quite yet although I’d be surprised if there isn’t a reasonable solution out there. You can add to mysh and myrepl.js to get more features, such as colors, custom evaluation, custom pretty printing, other pre-loaded libraries, et cetera. The sky is the limit. I added an inspect function which allows us to see the full ShellJS output of a command if we really want it. My complete myrepl.js file is:

#!/usr/bin/env node

var util = require('util');
var colors = require('colors/safe');

var inspect = function(obj) {
  if (obj && typeof obj === 'object') {
    obj['__inspect'] = true;
  }
  return obj;
};

var myWriter = function(output) {
  var isSS = (
      output &&
      output.hasOwnProperty('stdout') &&
      output.hasOwnProperty('stderr') &&
      !output.hasOwnProperty('__inspect'));
  if (isSS) {
    var stderrPart = output.stderr || '';
    var stdoutPart = output.stdout || '';
    return colors.cyan(stderrPart + stdoutPart);
  } else {
    if (typeof output === 'object') {
      delete output['__inspect'];
    }
    return util.inspect(output, null, null, true);
  }
};

// terminal:false disables readline (just like env NODE_NO_READLINE=1):
var myrepl = require("repl").start({
  terminal: false,
  prompt: colors.green('% '),
  ignoreUndefined: true,
  useColors: true,
  writer: myWriter});

var shell = require('shelljs');
Object.assign(myrepl.context, shell);
myrepl.context['inspect'] = inspect;

// add REPL command rlwrap_complete(prefix) that prints a simple list
//   of completions of prefix
myrepl.context['rlwrap_complete'] =  function(prefix) {
  myrepl.complete(prefix, function(err,data) {
    for (x of data[0]) {console.log(x);}
  });
};

So this is basically what we wanted. We have a JS repl with convenient ShellJS commands. We also have vi keybindings, and tab completion for JS and filenames. It’s very rough around the edges, but it was really simple to make. GitHub user streamich built a more advanced form of this, called jssh which adds many features but lacks some too. The bottom line is, if you know JS, you might be surprised at what you can build.

Modeling aesthetics in mathematics

2016-07-05T00:00:00+00:00

What exactly is beautiful math?

[A]bove all, adepts [of mathematics] find therein delights analogous to those given by painting and music. They admire the delicate harmony of numbers and forms; they marvel when a new discovery opens to them an unexpected perspective; and has not the joy they thus feel the esthetic character, even though the senses take no part therein? Only a privileged few are called to enjoy it fully, it is true, but is not this the case for all the noblest arts?

-Henri Poincaré, The Value of Science

One expects a mathematical theorem or a mathematical theory not only to describe and to classify in a simple and elegant way numerous and a priori disparate special cases. One also expects “elegance” in its “architectural”, structural makeup. Ease in stating the problem, great difficulty in getting hold of it and in all attempts at approaching it, then again some very surprising twist by which the approach, or some part of the approach, becomes easy, etc. Also, if the deductions are lengthy or complicated, there should be some simple general principle involved, which “explains” the complications and detours, reduces the apparent arbitrariness to a few simple guiding motivations, etc. These criteria are clearly those of any creative art.

-John von Neumann, The Mathematician

The moral: a good proof is one that makes us wiser.

-Yuri Manin, A Course in Mathematical Logic for Mathematicians

My hypothesis is that generally when people talk about beauty in mathematics they’re talking about things that teach us something useful for proving new facts. For example, proving a difficult but simple theorem is useful because its difficulty means it may imply other previously difficult theorems, and its simplicity means it may show up and be used often. A theorem that establishes a connection between two previously disparate areas of mathematics is considered beautiful, and such a connection allows knowledge from one are to be applied to the other, potentially cracking new problems. An unexpected proof – “an unexpected perspective” or “surprising twist” – offers something new to be learned, something that can then be used for other problems.

Quote of the day: Yuri Gurevich

2016-04-29T00:00:00+00:00

I remember, in a geometry class, my teacher wanted to prove the congruence of two triangles. Let’s take a third triangle, she said, and I asked where do triangles come from. I worried that there may be no more triangles there. Those were hard times in Russia and we were accustomed to shortages. She looked at me for a while and then said: ‘Shut up’.

-Platonism, Constructivism, and Computer Proofs vs. Proofs by Hand

Link of the day: Learn the Greek alphabet

2016-04-03T00:00:00+00:00

A handy flashcards web app for memorizing all the Greek letters

Link of the day: Jon Skeet speaks

2016-03-15T00:00:00+00:00

Jon Skeet on the tricky edge cases that can show up with basic data types and how they model reality. Back to basics: the mess we’ve made of our fundamental data types

Command line Grooveshark post-Grooveshark

2015-08-28T00:00:00+00:00

In this post I’ll share a way to get a Grooveshark-like experience with a Linux command-line application.

Step 1: Build a YouTube playlist with some music you like.

Step 2: Go to your channel, select Playlists, find the one you just made, and click View full playlist. Make sure the privacy setting is either public or unlisted. Copy the playlist ID in the URL (after list=). For example, the playlist ID of one of my playlists is PLmaRvdyzIrIGkRbl7jJdzEYrRcXe4od9G.

Step 3: Install mpv and mps-youtube.

Step 4: Run

mpsyt pl <playlist ID>, dump, \*

You’re streaming the playlist! Press <space> to pause/play, < to play previous track, > to play next track.

To play the playlist shuffled, run this instead:

mpsyt pl <playlist ID>, dump, shuffle \*

I have this alias in my .zshrc file:

alias playlistName="mpsyt pl PLmaRvdyzIrIGkRbl7jJdzEYrRcXe4od9G, dump, shuffle \*"

so I can get music playing with just one command.

If, as in Grooveshark, your YouTube playlists are public, you can open mpsyt and run userpl <YouTube username> to see your YouTube playlists and select one to play.

The good thing is that the whole videos aren’t streamed, just high-quality audio.

Among other features, mpsyt allows you to search for YouTube videos and create local playlists (not connected to a YouTube account), which you can do if you want to avoid the YouTube web interface completely.

Status of the day

2015-04-21T00:00:00+00:00

Wojciech Szpankowski:

From Wojciech Szpankowski’s book, drawn by Philippe Jacquet:

Quote of the day

2015-04-19T00:00:00+00:00

The first intellectual operation in which I arrived at any proficiency, was dissecting a bad argument, and finding in what part the fallacy lay; and though whatever capacity of this sort I attained was due to the fact that it was an intellectual exercise in which I was most perseveringly drilled by my father, yet it is also true that the school logic, and the mental habits acquired in studying it, were among the principal instruments of this drilling. I am persuaded that nothing, in modern education, tends so much, when properly used, to form exact thinkers, who attach a precise meaning to words and propositions, and are not imposed on by vague, loose, or ambiguous terms. The boasted influence of mathematical studies is nothing to it; for in mathematical processes, none of the real difficulties of correct ratiocination occur.

-J.S. Mill, Autobiography

I believe “school logic” is a.k.a. scholastic logic and is something along the lines of “philosophical” logic and what Mill covered in his A System of Logic. Sometimes I found combinatorics problems to require careful thinking in order to avoid plausible-looking mistakes. At Art of Problem Solving, instructors suggest “counting in two ways”, i.e. using two different counting strategies and comparing the results.

Quote of the day

2014-10-14T00:00:00+00:00

Programming wisdom:

Someone wrote to me once suggesting that JSLint should give a warning when a case falls through into another case. He pointed out that this is a very common source of errors, and it is a difficult error to see in the code. I answered that that was all true, but that the benefit of compactness obtained by falling through more than compensating for the chance of error. The next day, he reported that there was an error in JSLint. … I investigated, and it turned out that I had a case that was falling through. … I no longer use intentional fall throughs.

-Douglas Crockford

Quote of the day

2014-08-18T00:00:00+00:00

This reminds me of a story from the time when Queen Mary and Westfield College, University of London decided to change its name to Queen Mary, University of London. My colleague Wilfrid Hodges was giving a lecture in Germany, and put up his first slide, giving his name and affiliation as “Wilfrid Hodges, Queen Mary, University of London”. Somebody asked, “Is that a joint publication?”

-Peter Cameron

Link of the day

2014-06-29T00:00:00+00:00

An interesting comparison of the cultures of the US and Germany:

http://math-www.uni-paderborn.de/~axel/us-d.html

When I travel internationally, I wish I could absorb this level of detail about a culture, but of course it’s impossible without living there for an extended period of time.

Some resources for learning Mandarin

2014-05-26T00:00:00+00:00

(This is an old list from when I was studying it in around 2009.)

This is enough to learn how to pronounce Chinese people’s names: https://www.youtube.com/watch?v=b9Ayvjy-Dgs
Rocket Chinese Grammar & Culture Series with MegaChinese vocabulary software
eChineseLearning newsletter
Serge Melnyk’s lessons
Rosetta Stone software
Conversion tools:
- Characters -> pinyin: http://translate.google.com
- Characters <-> english: http://translate.google.com
- Pinyin -> sound: http://www.quickmandarin.com/chinesepinyintable/
Desktop translation tools:
- LEC Translate Chinese
- Lingoes
CCTV Learn Chinese Videos
WatchtoLearnChinese.com Videos
http://resources.echineselearning.com/funstuff/funstuff-chinese-20.html Various resources
http://digchinese.com/en/measure-words A guide to measure words
http://www.nciku.com/conversation Conversation texts in chinese, pinyin and english
Wenlin Software for Learning Chinese
A list of language learning web apps: http://translate.google.com/translate?hl=en&sl=auto&tl=en&u=http%3A%2F%2Fwww.cnbeta.com%2Farticles%2F84038.htm
Kids Chinese Podcast
John DeFrancis - Chinese Language: Fact and Fantasy

Quote of the day

2014-05-25T00:00:00+00:00

From Modern Computer Algebra: “We start by using [Newton iteration] to find a custom-Taylored division algorithm that is as fast as multiplication…”

LessWrong gems

2014-05-22T00:00:00+00:00

lukeprog ridicules Will_Newsome’s “post-rationality”

Will_Newsome defends XiXiDu

proposed cover design for book of the sequences

Eliezer Yudkowsky facts

Random LW-parodying Statement Generator

Cards Against Rationality

Bayes: The Exclusive Less Wrong Magazine

LessWrongers’ creed

The end is nigh

Signal to signaling ratio

Overcoming bias guy

Stop mass greenvoting!

(Above: Louie Helm.)

Status of the day

2014-05-21T00:00:00+00:00

AC in pop culture: Pitbull fans must exist but it’s impossible to find an explicit example.

Status of the day

2014-05-18T00:00:00+00:00

Linux tip: Install Helvetica and/or Helvetica Neue on your computer and some websites will look better

Quote of the day

2014-05-15T00:00:00+00:00

From a ratemyprofessors.com review of Mike Newman’s teaching: ‘Clearly LOVES math, makes tons of jokes; ie, someone asks how many questions on the midterm and he replies “well… it’s an integer number”.’

General thoughts on computer security

2014-01-07T00:00:00+00:00

The following is a document I wrote in mid 2008. I haven’t edited it, although I was tempted to since parts of it sound a bit strange now. Rather than improve on it, I’ll just clarify what it was attempting to say: The connections between devices and/or agents have a finite set of security properties which determine the options available to secure the assets involved. These properties can be simply classified, providing a way to systematically explore the space of security configurations. I felt that the ideas in the document were a good summary of some facts about computer security.

The culmination of my independent studies in computer security involved the development of a system of detailed original models and axioms to describe every aspect of the security state of digital systems by generalizing all the disparate case-by-case knowledge I had compiled. I had heard of no similar methods or structure from anywhere else because the vast majority of IT security knowledge is in applied form for various technical domains. Indeed, when work on this system concluded, I no longer needed to continue learning to increase my understanding of computer security. While the concepts used apply to security of all domains, it the models are optimized for computerized resources. A summary of it follows.

The fundamental principles of security are authentication and authorization. An asset is secure to the degree that the owner is willing to make it if these properties are properly configured to function at that degree (or higher) and understood by its owner to function at that degree (or lower). Security is needed when access is to the asset is enabled. In the context of digital security, in which the asset is a digital substance (hardware or software), access is generally arbitrary usage, reading, and/or writing.

If one is to determine the optimal expenditure of funds on security, one must first find the optimal system of security, on a per-asset or per-asset-group basis. The asset involved will have one or more points of access (security/trust chains, or simply chains). Each chain will have one or more links, with the distinction between links depending on the granularity of the analysis. The conceptual chain will “connect” the potential user to the asset in question (with no regard for physical circumstances). Each link of the chain involves an authentication/authorization process of which there are three primary orders.

The first order is preventive, and akin to the wall and gate around a medieval castle. This order provides access control at the site of the access in full context of the asset’s execution environment. This is also the order at work in file-access Access Control Lists (ACLs) in an operating system. The second is pre-preventive, and akin to a security check before boarding an airplane. This order is used by an intermediate system before the request arrives at the destination, but strives to serve the same purpose as first order security. It may be cheaper to implement, especially if one specialized device can be used to serve many others. The principal drawback is that it must be aware of the full system state of the destination device (asset) in order to rule on the safety of the incoming request. This is often not the case in practice, but the benefit in economy may outweigh the cost. The other main drawback to second order security procedures is that even if enough information about the asset is known at the time of access to make a ruling, there may not be enough to “sanitize” the incoming request, or provide an effective, safe alternative to the requested access, and may instead simply drop it; this would be unintended in some circumstances. The third order, also called policy, involves a time delay, the need for evidence, the tracking of identity, the requirement for effective deterrence, a willing judiciary, and a cost-effective punishment system. The costs involved with the use of this method in the digital world dictate that its use tends to be primarily as a backup, although there have been questions about the merits of its existence at all.

Each link in the security/trust chain will either provide access control on one of the three levels, involve trusting an uncontrolled third party to determine the security of the access, or do neither. In practical modeling, links of the final option would likely only be included to illustrate failures of security.

All security systems in use in digital environments can be described using this method. A web of chains can be used to illustrate trust relationships between end devices and the role they play in the security chain of others. A potential attacker would be presumed to use what would be perceived by him to be the “weakest” chain, or sequence of connections of chains, that leads him to the asset. Note that what is often derided as “security through obscurity” is often quite secure indeed. For example, a file on an Internet-connected Web server that is only meant to be accessible by a specific individual could possibly be protected in a number of ways. Two of these are storing the file in a password-protected folder, or putting the file in a hard-to-guess directory. The latter is considered “security through obscurity”, but requires almost the exact same security procedures to access as the former (except for the traffic patterns created, types and permissions of log files stored, etc., none of which are necessarily risks for a given situation). Ultimately, all chains from the user to the asset need only one strong link, but combinations of the different kinds are used in different domains. Possible examples of the system in practice are numerous, a few follow (with thorough background).

An example of a simplistic security chain model illustrating the typical protection used against XSS (cross-site scripting) attacks involves three entities: the server hosting the vulnerable Web page, the attacker, and the victim user. In a persistent XSS attack, the chain begins with a second order link in between the attacker and the server. The server will inspect the input sent to determine whether or not it will violate the user’s presumed (and usually generalized) security policy. In the case of JavaScript malware, the server checks to make sure none of the input could be executed (in the [likely] context of the eventual Web page it will be rendered in) and either encodes it to prevent this happening, or cancels the request if the check fails. Note that this process may have to be looped to prevent some encoding bugs, until the result is safe. The user then requests the Web page that the attacker’s input is found in (the reason for the visit may be the result of the function of another chain). Generally, the chain link here will be one of trust between the user and server: the user trusts that the server (uncontrolled third party) will not serve anything dangerous. If dynamic content does get served and attempts to connect to a remote host other than the original server (subject to a few loopholes and exceptions), the connection will be blocked. This is called the Same Origin Policy (SOP). This requires the removal of functionality (and the need for chains of security) for controlling this kind of execution access, which could potentially be legitimate. However, the check involved in comparing the server and the remote host is done using domain names. This causes problems when the SOP (showing second-order characteristics) presumes it knows the way in which requests are sent in the user’s network, and is known to have weaknesses. The user’s understanding of his security state is fundamentally to blame in this case. Note that the integrity of the network all devices are on and other aspects of the scenario represent the need for additional security chains, but are ignored as perfectly secure in this example.

The final link in the most common scenario for persistent XSS defense can be the user’s disallowing of JavaScript altogether on a per-domain basis. Essentially similar to the SOP, this involves overriding the trust in the domain from the previous link, and in effect closes access (removes the need for a security chain altogether, but at the cost of functionality) to JavaScript-enabled Web sites that are not on the domain-allow list. A non-persistent XSS would involve the user trusting or not trusting the attacker’s sent link based on the information available (trust link) or using security of some other kind. Note that links containing JavaScript in the query string or post body may be legitimate. The link here would be one of either possible first order security (if the script affects only client-side functions) or trust (if the script accesses the server, where the user cannot monitor). Note also that the protections described here are as fine-grained as generally regarded to be feasible. A first order security link only on the part of the client, combined with trust links based on the destination domain of remote-connecting scripts would enable flexible cross-domain access. The feasibility of this has been doubted (by Mark Russinovich for one) but by no means disproved.

Further examples could be included (SQLi, encrypted/steganographed traffic in a workplace)