Linear Algebra (algebra not included)

Algebra and combinatorics are different viewpoints on how to put structure on a set of things. An algebraist looks at a set and sees a bunch of things she can smash together to produce other things (e.g. numbers can be added, cake ingredients can be mixed, delicious avocados can be bred.) A combinatorialist* looks at a set and sees a bunch of things xe can group together (“combine” if you will) in different ways (e.g. categories can be Venn diagrammed, cake ingredients separated into dry/wet/nonbinary sections, delicious avocados will be eaten too quickly to be combined with anything if your combinatorialist is anything like this humble author.)

Algebraists get extremely pumped about linear algebra. Anytime you see a group of algebraists (as in murder of crows) hanging out, you can be damn sure one of them is just itching to use some linear algebra. And it was just such a group that, long ago, was infiltrated by a collection of combinatorialists who perpetrated the greatest heist of mathematical history: the theft of linear algebra itself.

The story goes like this. Let V be a finite-dimensional vector space, and let’s think about the collection \mathcal{I} of all linearly independent sets of vectors in V. This collection has certain structural properties as a collection of sets: (1) if A \in \mathcal{I} and B \subseteq A, then B \in \mathcal{I}, (2) the empty set is always independent, and (3) if A and B are both in \mathcal{I} and |A| < |B|, then there is some v \in B such that A \cup v \in \mathcal{I}. These properties are all proven in linear algebra using the operations defining a vector space, but they are purely combinatorial properties when \mathcal{I} is thought of as simply a list of ways to group together elements of V (what fancy-pants mathematicians call a set system). So instead of starting with algebraic operations, as an algebraist might, and deriving this structure, our combinatorial heisters decided to take these properties as axioms…

That is, let’s define a matroid to be a pair M = (E,\mathcal{F}) where E is a finite set, and \mathcal{F} is a nonempty collection of subsets of E such that

  1. (hereditary property) if A \in \mathcal{F} and B \subseteq A, then B \in \mathcal{F}; and,
  2. (augmentation property) if A and B are both in \mathcal{F} and |A| < |B|, then there some v \in B \setminus A such that A \cup v \in \mathcal{F}

It turns out that this is enough to recover many of the theorems we expect to be true of a collection of linearly independent sets. For example, suppose A and B are both maximal independent sets (with respect to inclusion) in a matroid M = (E,\mathcal{F}). By augmentation, if |B| > |A| then there is some v \in B \setminus A such that A \cup v is independent. But this contradicts maximality of A. By playing the same game with roles reversed, we can’t have |A| > |B| either. So all the maximal independent sets of M have the same size (and are usually called bases — cause yeah, we stole the lingo too.)

Okay, you say, so this is a fun game, but what’s the point? Stay tuned…

*some combinatorist is getting really pissed off at me right now.

So you’re starting grad school (in pure math)

I have some friends starting grad school this week and, because I’m old and wise now (and also to counter academic survivorship bias) I want to offer some dubious advice to them and other new grad students.

  1. Focus on mathematics and (if this is part of your goal) teaching. This might sound weird, but trust me: your department is going to try to get you to do all sorts of other things, especially if you’re a woman or underrepresented minority. Departmental or school service may seem tempting if you are pursuing an academic career, but I’m highly skeptical of the utility of grad students sitting on department committees or similar types of service, both to the grad student and to the school. If it’s important to you to give back to your school, or, more likely, to force your school to treat future students better, get involved in organizing at the graduate student level (unions, senate, etc.) I say this primarily because you will likely be more effective at that level, but also because you’ll have better opportunities for building community and developing your own leadership/organizational skills.
  2. Build collaborations with other grad students. This is something I mostly failed at, but even if you don’t get papers out of it, working with other grad students will teach you a lot more about collaboration than working with your advisor or other established mathematicians.
  3. Students will like you and buy into your teaching style if you just explain it to them — provided, of course, you’ve got good reasons for teaching the way you do (this is advice for some advanced professors as well….) Learning starts with metacognition, and in order to metacognify (it’s a word) well, your students need you to be transparent in your pedagogical choices.
  4. Don’t trust anyone’s self-estimation of the time they spend working — including your own. Grad students are prone to wild inflation of these numbers — “I was at the office for 12 hours today” usually does not translate, in my experience, to “I worked 12 hours today.” So don’t be intimidated by the dudebro (and it is  often a dudebro) loudly talking about his hundred-hour weeks. Hike your own trail. Be honest with yourself about how much time you spend working and how much time you need to work.
  5. Get off campus. Do other things. You need a break. Yes, you. Yes, right now. Put down the chalk, let your brain mull it over on its own, and climb a fucking mountain or something. Go to a knitting circle. Paint a damn frog.
  6. Learn to give really good talks. It’s easy to skate by in math, where most talks are actively horrible, but people will remember you for giving good talks and, if you get a reputation for it, invite you to give more. It’s like magic!
  7. A lot of mathematicians are very vocal these days about diversity and inclusion. This does not automatically mean you can trust them to have your back. I don’t mean this in a negative way. I’m just saying that a lot of relatively privileged mathematicians are at the point where they are aware of and care about diversity and inclusion but not yet at the point where they really understand just how bad the picture is, and the ways in which they themselves contribute. Be cautious, is all.
  8. Find out what the on-campus therapy is like. Now you have to be careful about this, because at some schools you don’t want to go near campus therapists with a 10 foot pole (you can google the horror stories.) But if they’re good and free, establish care early on. Grad school is notoriously bad for mental health. I joke with my therapist that leaving grad school cured my mental illness. It didn’t, of course, but the difference really is night and day. Even if you don’t have any mental health problems (now…), you’re going to need support. So if there’s good, free, professional support available…..take advantage.
  9. On that note, alcohol is a depressant.
  10. Pick a non-academic career option and start developing it now. Like, don’t invest a ton of time in it, just get started. Make a long-term plan for how you’ll get there. Certain math professors will tell you that it is easy to get an industry job (they never specify an industry) with a PhD in pure math. They are wrong.
  11. The library is a great place to work and study — probably more effective than the usually crowded and distracting graduate offices.
  12. If you want to teach, your school probably has pedagogy seminars or teaching circles or something. Your math professors likely don’t know about them (burn), but they are probably there. Go forth! Learn from areas that invest more time in pedagogical training than ours typically does!
  13. You’ve got this! You’re going to prove something amazing! And spend a lot of time not understanding other people’s mathematics! That’s okay!

How quickly do soccer leagues converge to their final point distribution?

In a typical season of a soccer league, each team participating in the league plays every other team twice, earning three points for each win, one point for each draw, and no points for a loss. At the end of the season, the accumulated point totals are used to determine the league table: a list of teams and their point totals ordered from least points to most.

By normalizing the point totals for a season, we can treat the final table as a probability distribution, the shape of which can be used to understand the final structure of the season. As an example, I’ve plotted the final distributions for the top five European soccer leagues after the 2016-2017 seasons (note that all the code and data involved in this post can be found on my github account). Take a minute to see if you notice any qualitative differences between these seasons.

You can have all sorts of fun with just these distributions (stay tuned for a followup post), but what I’d like to talk about today is the manner in which these distributions are formed over the course of a season. To be precise, I want to think of the league season overall as an iterative computation of the final distribution, in which each game played updates a running point distribution converging to this final distribution on the last day of the season. The animation below (made using matplotlib’s animation package) shows the end of this process for the 2000-2001 season of Germany’s top soccer league. The running distributions are represented by the blue dots approaching the final distribution given by the underlying bar plot (all distributions are ranked to make the visuals clearer, but in general the computations below compare distributions team-to-team not by rank).

As with any computational or convergent process, we might be interested in characterizing how quickly the solution is computed or how quickly the limit is well-approximated. To understand this, we need some way of measuring the distance between the running point distribution after a given number of games and the final point distribution at the end of the season. Enter the Jensen-Shannon divergence, a measure of distance between probability distributions based on cross-entropy.

One reason for using Jensen-Shannon in this case is that early in seasons there are often one or more teams with zero points. Other measures, such as Kullback-Leibler, return infinite values when one of the distributions assigns zero probability to an event and the other does not. In many cases this is a feature, not a bug,  but since I expect more variation in point distribution early in the season, when just a few new points can have significant impacts, I want to be able to measure the change in distance meaningfully even when some teams have no points.

The data I use consists of 110 seasons stretching back to 1995/96 from the top five European soccer leagues (England, Spain, Germany, Italy, and France), obtained from For each season, I computed the Jensen-Shannon divergence between the league table updated after each game and the final league table. The resulting curves, for all 110 seasons, are plotted below.


As you can see, the Jensen-Shannon divergence appears to decrease exponentially. Moreover, the only real variance in the curves seems to happen in the first hundred games (zoomed in on below).


To get a better sense of this exponential decrease, I also plotted the average of these curves and, assuming an exponential model f(x) = ae^{-bx}+c, an optimal least-squares fit of f(x) = .518e^{-.115x} + .004 (standard deviation errors for these parameters being 0.004, 0.001, and  0.0003 respectively).


Given this, one might reasonably expect that the distance from the point distribution after 100 games to the final distribution would be approximately e^{-.115*100} \approx 1 \times 10^{-5}. (By comparison, the JSD between the — sorted! — final distributions for Spain and Italy plotted above is 1 \times 10^{-2}, three orders of magnitude higher.) League seasons are typically 380 games long, so what this means is that by about a quarter of the way through a season, the running distribution and the final distribution are not that far away from each other.

I want to emphasize that this result is not about team rankings. Very small changes to point totals can have significant effects on rankings without changing the distribution much at all. Indeed, a one-point change to a team’s point total, while potentially insignificant probabilistically, could very well affect who wins the league. So the takeaway here is not that the league ranking doesn’t change after a hundred games (thank goodness!), but rather that the distribution of points — in a sense, the overall shape or structure of the league — is not significantly changing after that point.

Let me know if you have any questions or comments!