Minimalist Mathematician

A blog about life as a math grad student

Category: Graph theory

Linearity of expectation

Expectation is linear. This is one of the most surprising facts in probability theory. Expectation is the “average value” of a random variable. In a sense, if the random variable X represents the outcome of an experiment, then the expected value is what we would expect the average outcome to be if we ran the experiment a lot of times.

When we say that expectation is linear, that means that if X and Y are random variables and a is a constant, then E[aX + Y] = aE[X] + E[Y]. This relation follows from the definition of expected value in terms of integrals. Since integrals are linear operators, it follows that expectation is linear.

It is surprising because it is true regardless of the dependence between the two random variables. Most facts in probability theory have dependence conditions on them, but not linearity of expectation. So no matter how interdependent two random variables X and Y may be, the expected value of their sum is the sum of their expected values.

Linearity of expectation is very useful when working with the probabilistic method. There are a number of very elegant proofs of somewhat deep results that use nothing more complicated than linearity of expectation. Here is an example from chapter 2 of The Probabilistic Method.

Theorem There is a 2-colouring of K_n with at most {n \choose a}2^{1 - {a \choose 2}} monochromatic K_a.

Proof. Take a random 2-colouring of K_n. Let X be the number of monochromatic K_a. Then X = \sum_c X_c where X_c is the indicator variable indicating if a given a-subset is monochromatic. The expected value for each X_c is 2 \times 2^{-{a \choose 2}} = 2^{1-{a \choose 2}}. There are {n \choose a} possible a-subsets, so by linearity of expectation, E[X] = {n \choose a}2^{1-{a \choose 2}}. Hence there exists some colouring with at most the expected value of monochromatic K_a.

Mindful practice

Oh impostor syndrome. I still feel like everyone else in my program is just gliding through the classes, and will ace the prelims in August, whereas I am struggling with most of it. And then I have a sudden breakthrough when I finally figure out a homework question on semidirect products that no one could figure out during the semester, and all the feelings of inadequacy go away. For a few hours, or maybe a few days, only to come back again. So I’ve been reading a good book on what connects successful people over all disciplines: math, music, sports, business, and anything else you can think of. It’s called “Talent is overrated” by Geoff Colvin. From the title, I imagine you have some idea what it might say. The author goes through the research on successful people from various disciplines, and discusses some very interesting findings. For example, there is no evidence that what we call “talent” even exists. Mozart started preforming at a very young age, but he had already been trained by his father, one of the top music teachers in Vienna, for years before he started. Those of his works that are still preformed today were composed when he was around 20, after 17 years of training.

The same pattern is found for all other successful people. Tiger Woods trained from a very young age, much harder than anyone else, and thus became a golf prodigy. Simply because he had more practice than anyone else that young. Not because he had magic golf playing genes. Simple practice. Benjamin Franklin spent years practicing how to write, and emulating famous writers, before he wrote any of the things he is now famous for.

This is where you object “but write 3000 word for my blog every week, and I’m not a professional writer”. That objection get at the heart of the problem: what sets successful people apart from average performers is what is called “conscious practice”. Playing the same piece over and over again on the piano will not help you improve, no matter how many hours you spend with it. Sitting down for an hour or two every day with a specific goal in mind: “today I will practice the transition between chord X and chord Y” will help you improve. Spending hour every day mindlessly playing chess will not make you a chess grad master. Spending hours every day analyzing games played by masters and your own games for advantages and weaknesses will help you get better. This is the reason ballerinas spend hours every day at the barre, practicing the same developpe they have been doing for 20 years.

The idea is that being mindful and practicing the basic, underlying skills of your trade will make you better at it: much, much better than simply mindlessly trying to dance that pas de deux, or playing chess over and over again. What separates the experts from average performers is their absolute mastery of the basic skills of their trade.

I sure hope that's conscious practice (curtesy of SMBC).

I sure hope that’s conscious practice (curtesy of SMBC).

So, how does this translate to math research? The basic skills needed are a deep knowledge of your particular area of interest, as well as any closely related fields, and a knowledge of the current research. I’m a graph theorist, so for the basic knowledge of my field, I have a few books to work through: currently “Graph theory” by Reinhard Diestel. Every day, I spend at least 30 minutes working through the book. I don’t focus on exercises: I focus on parts where I can get immediate feedback. I make sure I understand the text, try to predict what would be the next natural development, and try my hand at all of the proofs. I read the theorem or lemma, try to prove it, and then compare how my (sometimes only partial) proof compares to the one in the book. Which one is more elegant? easy to understand? This gives me immediate feedback on how I’m improving as a mathematician, and how well I’m internalizing the key methods used in my field. To supplement this, I also keep up to date with current research. Every week, I pick a newly submitted paper from ArXiV to read. I spend a half hour every night on this paper, with special focus on the proof technique(s) they used. After all, proof techniques are the tool kit of a mathematician.

I’ve only been adhering to the principle of mindful practice for a couple of months, but I already feel more confident in my weekly meetings with my adviser, and during the department graph theory seminar, because I have so much more background knowledge. As for how it will work out over the long term, I will keep you updated!

Hall’s Marriage Theorem

Let’s talk matchings! A perfect matching of a graph is a set of independent edges such that they cover the entire vertex set of the graph.

Now, let G be a bipartite graph with vertex partitions A and B. The question Hall’s marriage theorem answers is: what condition(s) on G are sufficient and necessary for G to have a perfect matching? First of all, it is obvious that for any S, a subset of say A, we must have that |S| is at most equal to the number of neighbours of the vertices of S, denoted N(S). It turns out that this criteria is also sufficient. I’m going to present my favourite inductive proof of this theorem (all credit to Graph Theory by Reinhard Diestel).

Dinosaur comics

Borrowed from Dinosaur Comics, my favourite way to waste time

Theorem. (Hall, 1935) Let G be a bipartite graph with bipartition (A, B). For any $\latex S \subseteq A$, denote by N(S) the number of neighbours of the vertices in S. Then G has a perfect matching if and only if for each S \subseteq A, we have |S| \leq N(S).

We already have that the condition of the Marriage theorem is necessary. So in each proof, we only need to show that given a graph that satisfies the marriage criterion, we can find a perfect matching.

Proof. We apply induction on |A|. For |A| = !, the assertion is true. Now let |A| \geq 2, and assume that the marriage condition is sufficient for the existence of a matching of A whenever |A| is smaller.

If |N(S)| \geq |S| +1 for every non-empty set S \subsetneq A, we pick an edge ab \in G and consider the graph G' = G - \{a, b \}. Then every non-empty set S \subseteq A \setminus \{a\} satisfies:

$|N_{G’}(S)| \geq |N_G(S)| – 1 \geq |S|$

so by the induction hypothesis G’ contains a matching of A \setminus \{a\}. Together with the edge ab, this constitutes a matching of G.

Suppose now that A has a nonempty proper subset A’ such that |B’| = |A’| for B’ := N(A’). By the induction hypothesis, the graph G' = G[A' \cup B'] contains a matching of A’. But G\G’ satisfies the marriage condition too. So by the inductive hypothesis, G\G’ contains a matching of A\A’. Putting the two matchings together, we obtain a matching of A in G. Q.E.D.

Normal spanning trees and swirl cupcakes

Today I had my last official class of my first year, a topology class. We talked about homology invariants and ate the chocolate vanilla swirl cupcakes I made for the occasion. It’s weird that I don’t have any more classes left. I almost even miss analysis homeworks. I have plenty of things to keep myself occupied until August though. Tomorrow I have a mandatory research ethics course to attend. 8 hours of listening to someone talk about how we shouldn’t experiment on people without the permission of the ethics review board. I imagine it will be moderately interesting, at best. Well, unless I come up with some experiment like these two mathematicians, courtesy of Spiked Math.

Animal testing

Today I want to share some interesting graph theory. Given a connected graph G, we can find a spanning tree T. This can be proved by simple induction. However, it is also true that we can find a normal spanning tree in any connected graph.

Given a tree T fix a vertex called the root, r, of the tree. Next, we define a partial order on the vertex set of G by x \leq y if x is in the unique path from r to y. This is the partial tree-order associated with r.

We call a rooted tree with root r normal in G if every pair of vertices that are adjacent in G is comparable in the tree-order associated with r.

Theorem. Every connected graph G has a normal spanning tree.

The reason I wanted to share this result with you is because it has a beautiful inductive proof where we induct on the number of edges. Think about it for a few minues before you scroll down to see.

Proof. For the base case, consider a connected graph on one edge. Clearly, this graph is K_2, which is its own normal spanning tree. Now suppose every graph on n edges has a normal spanning tree. Then there are three cases for the inductive step: if we add another vertex, then the new normal spanning tree will be the old one joined with the new edge. Since there are no more edges connected to the new edge, this is still a normal spanning tree.

In the second case, we add the new edge to the old graph, but it does not affect the partial order on the normal spanning tree, so we are done.

The final case is the tricky one: suppose that the new edge connects two vertices that are not comparable in the tree order. In this case, we create a new spanning tree by adding in the new edge, connecting two branches of the spanning tree. Then we pick one of the original branches and delete the edge in it closest to the root of the tree. Then we have a spanning tree again, and it is normal. Q.E.D.

Oh, and those cupcakes I mentioned? I used a recipe I found on JoyOfBaking. This is one of the best sites I have ever found for basic recipes, like vanilla cupcakes, brownies, red velvet cake, peanut butter cookies, and so on. I highly recommend it. I separated the frosting in two bowls, melted a handful of dark chocolate chips and whipped into one of them to make half chocolate and half vanilla frosting. And here’s the end result:

Ribbon graphs of link diagrams

This has been a weird week. Grad school seems to go in waves: first I feel really stupid and like I understand nothing, and then suddenly I’m on top of everything. Up until now, this rollercoaster has been out of sync in all of my classes, but this week I’ve been on top of everything. I wonder how long it will last. I even understand what we’re doing in analysis (intro to functional analysis), though that will probably change when we start with abstract measure theory next week…

Update: It did not change when we did abstract measure theory. Shocking, I know. I found the abstract stuff so much easier to do than the concrete Lebesgue measure. When we did abstract measure theory, I didn’t get confused by all the things I do know about the real numbers and I could focus on just understanding measures on sets, rather than mixing in all the weird things that happen with real numbers. Much more fun.

I have a vague recollection of promising a post on the topic of my conference presentation, so here it goes. First of all, you need to know what a knot is. A normal knot, that you tie your shoes with, is not mathematically interesting. It can be tied, so it can (mathematically at least) be untied. Instead, to define a mathematical knot we tie a knot in a piece of string, and connect the ends. That’s the intuitive definition. There are some tweaks to make it behave nicely, but you get the general idea. A link is a knot with several components.

The title of this post mentions link diagrams. We can find a link diagram of a link by simply projecting the link “nicely” onto the plane. I’m fudging a lot of the details about knots, because they are not the main topic here. I might do a post later on knots where I explain all the details properly. Here “nicely” should be taken to mean that one can recover the original knot from the projection.

Now that we have a vague idea what a link diagram is, we define a ribbon graph. A ribbon graph G is a collection V(G) of discs called vertices, and a collection E(G) of discs called edges. The edge discs intersect exactly two vertex discs along an arc on each vertex, and no edges intersect. An easier way to think about a ribbon graph is as a graph embedded in a surface, and then we cut out a thin neighbourhood around the graph to obtain a ribbon graph.

Next, I’m going to tell you how ribbon graphs are connected to link diagrams. We take a link diagram, and eliminate each crossing just like when constructing a knot polynomial, choosing a negative or a positive splicing at each crossing. We can choose whichever of the two ways to eliminate each splicing we like. When all the crossings have been eliminated, we are left with a number of disjoint circles. These will be the vertices of our ribbon graph. Connect the vertices where there were crossings with edge discs, and you will end up with a ribbon graph. Since we could choose any splicing we for each crossing, there are up to 2^{\text{\# crossings}} possible ribbon graphs associated with each link diagram.

The natural questions to ask is then: how are these connected? What can we say about the ribbon graphs of a given link diagram? If two link diagrams have the same set of ribbon graphs associated with them, how are the link diagrams related?

My undergraduate advisor already answered both of those questions for links in \mathbb R^3. It turns out that the ribbon graphs arising from the same link diagram are partial duals of each other. For the second question, it turns out that the link diagrams are related by a very simple “flip” move. In my undergraduate thesis, I proved that the same results holds for checkerboard-colourable link diagrams of links in \mathbb RP^3. This is roughly what I presented at the conference a few weeks ago, to a room full of grad students. Here is a link to the paper on ArXiV.

It was a very nice conference, by grad students, for grad students. The organizers did a really good job putting it together, and there were a lot of interesting people there. It was all very friendly, in part I think because we are all so early in our careers. There were none of the “let me interrupt to talk about why my research is superior disguised as a question for 10 minutes”, which was nice. I also saw quite a few interesting talks, in particular one on the Erdös-Ko-Rado conjecture that I might try to turn into a blog post on a later date. No promises though, since apparently things keep popping up that keep me from blogging for weeks at a time. It should be easier for me to write regularly now that term is over, but no promises on the topics.

Graph embeddings – The Map Colour Theorem

Happy 2015 everyone! Last year was a good one. I graduated with a good undergraduate degree, and passed my first semester in grad school. I’ve spent some time thinking about what area I want to specialise in during the next four and a half years. I’m leaning towards graph theory, and in particular graph embeddings as my specialisation in grad school. This will be the first post of many on graph embeddings, why they are interesting and different useful techniques. I’m sure most of you have heard of the map colour theorem before: it states that given a simple planar graph, the vertices can be coloured using at most four colours such that no two adjacent vertices have the same colour.

This result was first conjectured in the 19th century by a Francis Guthrie. There were many attempted proofs, but no correct ones until over 100 years later. The final proof was done using a computer. It is based on the idea that if the conjecture is false, then there exists a smallest graph which requires five colours to be coloured. The proof showed that no such counterexample can exist. They managed to show that most graphs can be reduced to smaller ones with the property that if the smaller map can be 4-couloured, so can the original. Using this idea, they reduced the infinitely many planar simple graphs to 1936 cases that were checked by computer. The complete proof was announced in 1976 by Kenneth Appel and Wolfgang Haken at University of Illinois. It was considered quite controversial at the time, and still is to some, since it was the first proof of a major theorem that was done in large part by computers. Several people have claimed to have found major errors in the proof, but none have stood up. The proof has also been made more efficient by several computer scientists, minimising the chance that there might be a mistake in it. In 2005, Benjamin Werner and Georges Gonthier formalised the proof in the Coq proof assistant (possibly a future post). This mans that we don’t have to trust that Appel  and Haken’s algorithm’s are correct any more, we just have to trust Coq.

One of the first things that was proven related to the four colour theorem was the five colour theorem. It was proven by Percy John Heawood in 1890, and is based on the false proof of the four colour theorem by Kempe in 1879. The proof is by contradiction as follows:

We assume that G is the smallest graph which cannot be coloured using only five colours. Using the Euler formula we can show that either every vertex is of degree at most 4, in which case the graph is trivially five-colourable, or it has a vertex, say v, of degree 5. Now consider the graph G', obtained by removing vertex v. It had five adjacent vertices. Since G' is smaller than G, it can be five-coloured, so these vertices, v_1, \dots v_5 (in cyclic order around v) are coloured with at most five different coulours. If they are coloured with less than five coulours, we can use one of the unused ones for v to obtain a five-colouring of G. Hence they all have different colours: 1, 2, \dots 5. Consider the subgraph G_{13} containing only the vertices coloured by 1 or 3. Either v_1 and v_3 are in different components, or they are in the same component. If they are in different components, then the colouring of say the component containing v_1 can be reversed, freeing up colour 1 for the vertex v. Hence they are in the same component, so there is a path joining v_1 and v_3, using only colours 1 and 3. Now consider the subgraph G_{24}. By a similar argument, vertices v_2 and v_4 must be connected by a path coloured only with colours 2 and 4. However, since the vertices were labeled cyclically, this path must intersect the one between v_1 and v_3, which is a contradiction. Hence G can be five-coloured.

A few last words: the four-colour theorem can be generalised to graphs on any surface, and this is where graph embeddings come in. Heawood conjectured that the maximum number of colours needed for any graph is given by the formula p = \lfloor \frac{7+\sqrt{49-24\chi}}{2} \rfloor. This was proven true by Ringel and Young in 1968, with one exception. I’ll talk more about this in my next math post.

First Week of Classes

I haven’t been blogging like i should during the past two weeks, and I apologise for that. A couple of days ago, I finally finished writing up an article of some results in graph theory form my undergrad with my advisor and it was submitted for publication. As invaluable as the experience was, I have to say that redrawing the figures for article 4 times was not the most thrilling thing I have ever experienced. Having that submitted is a huge weight off my shoulders, and I can now start really focusing on my classes. As you may have figured from the title of the post, I just had my first full week of classes. Technically, Wednesday last week was my first day, but that wasn’t a complete week. After over a week of classes, what can I tell you about grad school?

As in most mathematics graduate programs, the first year consists of basic classes in algebra, analysis and topology. I didn’t manage to test out of algebra, as I thought I might, but that also makes me happy, because it means they have very high standards on what they expect grad students to know. The class algebra is a very standard lecture course with the added fun of problem sets due every class (3 times a week), not every week. It’s keeping me busy, but it’s not challenging yet. Like I said, I’ve covered a lot of the material before. Analysis on the other hand, was my main weakness as an undergrad. It’s also a standard lecture course, and I’m finding that it takes a lot of my time, simply because I haven’t quite figured out how to do analysis problems yet. I’m trying to work on one problem every day, either from the problem set or from the book just to get used to the style and way of thinking. I’m also trying to not complain too much about it while I’m working, so I don’t annoy my office mates to death.

Topology is the most interesting of my classes thus far. It’s taught using a version of the Moore method. We get a set of notes, containing definitions, theorems and problems. Every class, we get to claim the problems we have managed to solve using the notes, and then our lecturer goes through the list alphabetically, letting us solve one problem at the board. Once we finish all the problems in a set of notes, we get a new one. This sounds simple, right? Well, we also have that our entire grades depend on the problems we solve by the board. If you can’t solve any of the problems that are left, your lose your turn. While you are up there, anyone in the room can ask you to prove anything you state on the board, just to make sure you know what you are talking about. It’s brutal, intense and amazing. I love the class. I feel like I learn more since I have to figure it out on my own, and I know I become a better presenter every time I go up there.

In addition to these classes, I also managed to convince one of the professors to let me participate in his research seminar on graph theory and matriod theory. It’s promising to be very interesting. We only had one seminar this far, but I have a lot of interesting thought to follow up. The focus of the first half of the semester is graph embeddings, which I studied a little for my senior thesis, so I even have a slight head start on some of the older grad students.

I’ll be back in a few days with some more about the social life in grad school!

Ramsey Theory

I just today received the forms I need from U to apply for my student visa. Now all that remains is to explain repeatedly on forms and in an interview that no, I am not a terrorist. I’m still also only half way through my exam term. My senior thesis is submitted and four out of my eight exams are done. And then there is the oral defense of my senior thesis. But half of it is done. Today I’m going to explain something I came across revising for my combinatorics exam, namely Ramsey theory.

The basic idea is that any set which is sufficiently large will contain some ordered subset. One example is the Bolzano-Weierstrass theorem in analysis. In a simple form it states that every bounded sequence in \mathbb{R} has a convergent subsequence. So there is a subset with some type of structure.

For Ramsey theory we instead look at graphs. We take the complete graph K_n and colour each edge either red or blue. The Ramsey number R(s,t) is the smallest positive integer such that any two-colouring of the edges of K_{R(s,t)} contains a monochromatic red K_s or a monochromatic blue K_t. To see how this works, I suggest you try to colour the edges of a K_6 such that there is no monochromatic triangle (K_3).

We are going to prove that R(3,3) = 6. First, see the picture below for a colouring of K_5 with no monochromatic triangle. From this we deduce that R(3,3) > 5.

Ramsey

Source: Wikipedia

Consider a K_6. If we take the vertex labeled 1 and focus on the five edges incident to it, it is trivial to see that at least three of them must be of the same colour, say red. We call these edges \{1, x_1 \}, \{1, x_2 \}, \{1, x_3 \} . Now, we look at the edges connecting the vertices x_1, x_2, x_3. There are three of them and either all of them are blue, in which case we have a monochromatic blue triangle, or at least one of them is red, We assume that the edge \{x_1, x_2 \} is red. Then we have that the triangle 1, x_1, x_2 is monochromatic red. Hence we always have a monochromatic triangle in any colouring of K_6 and so R(3, 3) = 6.

Ramsey’s theorem states that each of these Ramsey numbers exist. Proving it is not difficult, though it can be a bit fiddly. I would recommend looking up a book on Ramsey theory, because it is a very interesting subject. In practice, these numbers are very difficult to find. Only some of the smaller ones have been found, and others have been bounded. Finding a lower bound is as simple as presenting a colouring with no red K_s or blue K_t. Finding upper bounds is much more difficult, since the argument used above often does not work. So one either has to find another general method or check each colouring of the upper bound. Checking this is a computationally difficult problem and hence takes a lot of time.

Ramsey numbers can be extended to arbitrarily many colours and can be shown to exist for any finite number of colours. In this case, finding the precise numbers becomes even more difficult as they must be larger. We will prove that R(3, 3, 3) \leq 18. To do this, we will assume that R(3, 6) = 18.

We call the third colour yellow. First, we combine the colours blue and yellow into a colour blue-yellow. We then use our assumption to deduce that any colouring of K_{18} must contain either a red K_3, in which case we are done, or a blue-yellow K_6. In the second case, we use that we have showed R(3, 3) = 6 to deduce that this blue-yellow K_6 must contain a monochromatic blue or yellow triangle. Hence we have shown that R(3, 3, 3) \leq 18.