An approximately monthly collection of random links and commentary, now split to two parts: math/tech-heavy (this one) and less-mathematical one. This post covers stuff I’ve read during (approx) the past July and August.

Garden of Forking Paths

[stats.references, stats.nhst, stats.p_hacking, stats.general, stats.lessons]

Read the classic P-hacking paper by Gelman and Loken 2013.

Important lesson. Summary: In NHST, hypothesis-formulating decisions made contingent on data may inadvertently introduce p-hacking-like symptoms to research, ie the argument does not hold.

Also note that the title of the paper is a literary reference.

Computational complexity of Rubik’s Cube

[math.recreational, math.general, theorcompsci.complexity, theorcompsci.fun, random.rubiks_cubes]

Blog post by Lance Fortnow. In short: Recently it was found that the task of finding optimal (shortest) solution of NxNxN Rubik’s cube is NP-complete. The key is the word “shortest”. Finding a solution is computationally less demanding.

How To Visualize Tweets

[software.visualizations]

Note to self. One example.

Beautiful Complex Analysis

[math.complex, math.analysis]

At kettenreihen.wordpress.com/

via Evelyn J. Lamb.

Benford’s law

[math.benfords_law, math.general]

Mark Levi on SIAM News: “Benford’s Law and Accelerated Growth”.

The perhaps unintuitive Benford’s law can be made intuitive by observing that an ‘exponential-like’ series will spend ‘more time’ in the part of the number line with numbers with low starting digits (e.g. 1) than larger (e.g. 9).

Wikipedia also has nice pictures.

Gaussian Correlation Conjecture

[stats.general, math.probability]

Quanta Magazine

As almost always is the case with these Quanta magazine articles, this particular branch of mathematics (that the article is about) is outside of my area of expertise, but the story is an interesting read.

The paper on arxiv: T.Royen, “A simple proof of the Gaussian correlation conjecture extended to multivariate gamma distributions.” arxiv:1408.1028.

The proof is a bit too involved for me to really to say anything, but it was fun to look at how arguments are laid out.

P and NP

[theorcompsci.complexity, theorcompsci.PNP, theorcompsci.general]

There has been quite much noise about a claimed P != NP proof. The idea is, of course, fascinating. I haven’t really studied complexity theory to say anything else than that, so I’ll defer to the experts.

Scott Aaronson commented quite fast::

To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.

The paper itself is a quite tedious read if you (like me) are not familiar with approximation theory, but thankfully Luca Trevisan has a write-up what the proof is supposed to be about.

Old fun article on R. J. Lipton’s blog: Is P=NP an Ill Posed Problem?.

as of 2017-08-19 According to the discussion on cstheory.SE, Blum’s paper truly appears to have subtle mistake. Which is something that could have been expected.

More detail in R. J. Lipton’s blog.

as of 2017-09-03 Blum retracted the proof.

More Gerrymandering

[politics.gerrymandering, math.applications, math.general, applied.general]

Jeremy Kun attended a workshop on Gerrymandering; here are his notes. Summary quote:

The speakers were almost unanimous in a few central tenets, each of which I’ll expand on:

  • In order for mathematics to helpful in fighting partisan gerrymandering, it needs to be in touch with the intricacies of the legal, political, and local aspects of the problem.
  • The “obvious” ideas have all essentially been tried and rejected for various reasons, both mathematical and legal.
  • There is a huge gray area in between what’s easy and what’s known to be hard where mathematics can help, and lots of tentative ideas are on the table.

About 20 min read. Included:

  • Efficiency gap.
  • Coast of England paradox vs simple compactness measures.
  • MCMC might take infeasibly long time to get a proper sample. Courts will not like this uncertainty.
  • Ricci curvature. Whatever that is.

Previously on gerrymandering here and here.

Christian P.’s PHP-bib-file-thing

[software.php, software.fun, software.stuff, software.latex, software.web]

Christian Lawson-Perfect has a web interface to edit .bib files.

I don’t even. That’s …sort-of… amazing … and amazingly geeky … a thing … exactly the kind of thing that I’ve planning to create for ages, except of course not in PHP, but never got around to doing that. (I’m terribäd in PHP.)

Mandelbrot in Terminal

[software.fun, math.fun]

Exactly what the title above says.

ML/Stats/Data Science Textbook References

[stats.edu, stats.resources, applied.resources, applied.ML, stats.ML, stats.general, stats.methods, stats.bayesian, applied.stats, applied.datasci]

A recent-ish textbook:

Computer Age Statistical Inference, by Bradley Efron and Trevor Hastie. Free PDF. According to table of contents, possible a very useful reference / resource. Hastie’s other ML textbooks have been great.

And while we are listing data science textbooks I’ve been intending to read, here’s another:

Foundations of Data Science (most recent version pdf), by Hopcroft, Blum and Kannan.

Things That Are Normally Distributed

[stats.general, stats.edu, applied.probability]

A reminder from an active maths blogger John D. Cook:

Adult heights follow a Gaussian, a.k.a. normal, distribution [1]. The usual explanation is that many factors go into determining one’s height, and the net effect of many separate causes is approximately normal because of the central limit theorem.

If that’s the case, why aren’t more phenomena normally distributed? Someone asked me this morning specifically about phenotypes with many genetic inputs.

The central limit theorem says that the sum of many independent, additive effects is approximately normally distributed [2]. Genes are more digital than analog, and do not produce independent, additive effects. For example, the effects of dominant and recessive genes act more like max and min than addition. Genes do not appear independently—if you have some genes, you’re more likely to have certain other genes—nor do they act independently—some genes determine how other genes are expressed.

[..] Incidentally, if effects are independent but multiplicative rather than additive, the result may be approximately log-normal rather than normal.

HN discussion. Some additional points:

Model-Free Statistics?

[stats.general, stats.theory, stats.applications, stats.models]

Noahpinion blogs: ‘Theory vs. Data” in statistics too’. Summary quotes:

Via Brad DeLong – still my favorite blogger after all these years – I stumbled on this very interesting essay from 2001, by statistician Leo Breiman. Breiman basically says that statisticians should do less modeling and more machine learning.

[..]

So if even OLS and logit are too theoretical and restrictive for Breiman’s tastes, what does he want to do instead? Breiman wants to toss out the idea of a model entirely. Instead of making any assumption about the DGP, he wants to use an algorithm - a set of procedural steps to make predictions from data. As discussant Brad Efron puts it in his comment, Breiman wants “a black box with lots of knobs to twiddle.”

Everybody loves to talk about Deep Learning. After reading countless DL blog posts and articles (which does not make a pro, but an interested amateur), my impression as a deep-learning-dilettante is that most Deep Learning neural models are not as model-free as these articles claim. For example, convolutional nets include an important modeling choice – the convolutional structure, inspired by certain capabilities we know about the V1 visual cortex! Likewise, the choice of activation functions on the nodes of net, or the choice to train the network with e.g. backpropagation, or choice to create recurrent structure such as LSTM to handle sequential data … choices everywhere. To my understanding, a large enough single layer perceptron would be capable of approximating any function (the true magical black box!), but training large scale single-layer perceptrons that would be capable of anything impressive is nigh impossible. So the researchers use deep networks that have imposed structure.

Link to the original essay: http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726 (I’m yet to read it.)