Quotients and Homomorphisms for Beginners: Part 6, Modern Ideas of Category Theory AKA Quotients Done Right

Some university’s have adopted the convention of naming a first course in algebra “Modern Algebra,” as opposed to the traditional (at least for the past few decades) “Abstract Algebra,” or just “Algebra.” I am not a fan of this convention in most cases. The reason is simple: most first courses in algebra fail to mention category theory; students may not even see a single commutative diagram! I am not an algebraist per se, but my work involves a tremendous amount of algebra from representation theory, Galois theory, and much, much more. I cannot get through a lecture or a paper without seeing something like:


(pg. 38, Integral p-adic Hodge Theory by Sholze, Morrow, and Bhatt)

Category theory is the common language of modern algebra, and it has been for a few decades now; indeed, category theory is approaching the level of universal understanding to rival naive set theory as a lingua franca of mathematics as a whole. (Of course, categories include sets – well, classes – so set theory will remain.) I am of the opinion that all undergraduate pure mathematics majors ought to see some category theory in their required introductory algebra course(s). The beginnings of the theory are not difficult to understand, and the framework is far too important, far too beautiful not to know. Some “Modern Algebra” courses do include a discussion of category theory, e.g., MIT’s Math 18.703, and I commend the departments in which that decision was made, for I believe it a wise one. It is fairly standard to introduce category theory by spreading the ideas throughout a first course in graduate algebra, where the ideas naturally arise. (Categories are of obvious use in undergraduate algebra as well, but such courses essentially serve to introduce to the language of algebra in broad terms, so matters never get particularly complex as to find significant simplification via categorical techniques.)  Some departments, including my own, I am sad to say, lack a regular offering of basic category theory in of itself, however, and so the primary exposure a student, undergraduate or graduate, will have to category theory in through the fundamentals of algebra.

In this post, we hope to demonstrate how quotient groups and the first isomorphism theorem fits into the modern picture of algebra as well as why it might be useful to take this view. We begin by introducing the basic ideas of category theory, suitable for anyone with a bit of mathematical maturity as well as some exposure to fundamental mathematics objects, in particular sets, groups, vector spaces, and topological spaces; if you do not know what all of those objects are, you may still stand to gain something from reading this, so I encourage you to try to understand the examples involving objects you know, perhaps doing some searching on the web to understand the objects you do not. You will want to know all of these objects as a serious student of mathematics, anyway, as they are soundly in the class of things every math student ought to know. Without further adieu, we define a category, thus embarking on our journey to understand the framework that has quickly become the preferred language of mathematicians the world over and which has been suggested as an alternative foundation for mathematics. We cover somewhat more category theory than is strictly needed. 

Definition. category \mathcal{C}consists of the following data:

  • A class of objects, denoted \mathbf{Ob}(\mathcal{C})
  • A set of arrows between objects where the arrows between objects X, Y \in \mathbf{Ob}(\mathcal{C}) is denoted \mathrm{Hom}_{\mathcal{C}}(X,Y)
  • A sensible operation of composition on arrows, i.e., if X, Y, Z \in \mathbf{Ob}(\mathcal{C}), then there is a map \displaystyle \mathrm{Hom}(Y,Z) \times \mathrm{Hom}(X,Y) \longrightarrow \mathrm{Hom}(X,Z) such that:
    • The composition is associative in the sense that if f: X \to Y \, , \, g: Y \to Z \, , \, h: Y \to Z are arrows, then h \circ (g \circ f) = (h \circ g) \circ f, where the notation \circ denotes the obvious composition of arrows as one is accustomed to in the special cases of functions. (Note: just as in the case of groups and other such structures, this makes the expression f_1 \circ f_2 \circ \cdots \circ f_n unambigious — Prove!)
    • There exists an identity arrow for each object, i.e., if X \in \mathcal{C}, then there exists \mathrm{id}_X \in \mathrm{Hom}_{\mathcal{C}}(X,X) so that for every f: X \to Y in the category we have f \circ \mathrm{id}_x = f; similarly, for every g: Z \to X in \mathcal{C}, we have \mathrm{id}_X \circ g = g.

This definitions takes far more words than it ought to. The idea is really quite simple. It is so simple that it may even be described as an evolution in notation. The eighteenth century point of view of a function was more or less “I know it when I see it,” and then set theory formalized this a bit, and we began writing functions using the now omnipresent notation f: X \to Y (or X \overset{f}{\to} Y). This represented somewhat of a shift toward considering the relations (maps) between sets important, rather than the sets (objects) themselves. Category theory takes this to the extreme by essentially doing away with objects altogether.

Before diving into examples, I feel an obligation to mention some matters of notation as well as one technicality. Some authors write \mathrm{Hom}(X,Y) as \mathrm{Mor}(X,Y), where “Mor” is clearly an abbreviation for “morphism.” It is a common abuse of notation to write X \in \mathcal{C} in place of X \in \mathbf{Ob}(\mathcal{C}), and similarly, some might write f \in \mathcal{C} for f \in \mathrm{Hom}(X,Y) for two objects X, Y \in \mathcal{C}. Frequently, categories are denoted with fancy fonts, especially bold (\LaTeX: \mathbf{}, e.g., \mathbf{C}), calligraphic (\LaTeX: \mathcal{}, e.g., \mathcal{C}), and script (\LaTeX: \mathscr{}, e.g., \mathscr{C}). Usually, authors have enough sense to write composing g after f as g \circ f, but sometimes the order is reversed, or people get lazy and write gf. Finally, the reason we require the collection of objects to be a class rather than a set is a technical one, which can almost always be ignored; a class is a collection of objects that can be defined by a property that all its members share, and sets are a proper subset of the collection of classes (non-set classes are called “proper”). The idea of a class was rigorously formulated to avoid some paradoxes found in naive set theory around the turn of the century.

For the sake of concreteness, let us look at a stupid example of a very small category, one which is easily diagrammed. The category \mathbf{3} is represented by the following commutative diagram:


(pg. 8, Category Theory, Second Edition, Oxford Logic Guides, Steve Awodey)

Note that I am borrowing other people’s commutative diagrams principally because I am lazy; I am not particularly proficient at TeX‘ing such diagrams, but these simple ones are trivial to do.

The identity arrows are exempted, as per usual, because they are annoying to write. Note that the symbols are essentially arbitrary objects. It should be easy enough to verify the axioms are satisfied. I leave this as an exercise to the reader. Let us look at a less trivial example now.

The category \mathbf{Sets} consists of sets as objects and maps as functions. This is in some sense the most concrete category we have to work with. Try to play with this idea. Here is a question: can you see why \mathbf{Ob}(Sets) must be a class? This can be answered by doing nothing more than citing a famous paradox. Sometimes, we write specific categories in bold, italics, print, or with an underline. One might let sets denote the finite elements of \mathbf{Sets}, which is a convenient notation, as it clearly generalizes to other categories. 

The category Vec_k consists of the vector spaces over a field k with arrows between them being linear maps (linear transformations). The category \mathbf{Top} consists of topological spaces as objects. If you know anything about topology, try to guess at what the maps are.

The category \mathbf{Grp} has as its objects groups and as its arrows group homomorphisms. Whence, \mathbf{Grp} consists of all groups and all homomorphisms between them. This might sound like a big category, and indeed, it is fairly large. Things are not always so bad, however. What I mean by this is that, for instance, \mathrm{Hom}(X,Y) might be as small as the trivial homomorphism. Indeed, the objects can be quite simple as well, e.g., \{e\} \in \mathbf{Grp}. What do you think the subcategory \mathbf{Ab} is? 

A beautiful consequence of the formalism of category theory is that we can make concrete the idea that many theorems transfer from one algebraic object to another. For instance, the isomorphism theorems hold not just for vector spaces but for groups as well. This also shows us why abstraction can be useful in some cases. My favorite example here is from analysis, because few texts take this approach, even at the graduate level, but my school’s text (written by two of our own) does. The text Integration and Modern Analysis by Bendetto and Czaja introduces Lebesgue integration first in the special case of \mathbb{R}, then in somewhat more general case of \mathbb{R}^n, then in full generality for general spaces X. This gets pretty technical, but the jist of it is that, the authors start concrete, indeed the first chapter is devoted to the classical study of real variables, then the second chapter begins with the aforementioned spaces, but then it generalizes massively, which takes some effort at the start, but it helps in the long-term. In particular, no longer do theorems need to be proven again for every change of category, so a lot of time is saved. 

Definition. If \mathscr{C} is a category and f \in \mathscr{C}, then f is an isomorphism if and only if there exists an arrow g: Y \to X in \mathscr{C} such that g \circ f = \mathrm{id}_X and f \circ g = \mathrm{id}_Y.

Naturally, we might also call isomorphisms invertible arrows, and we might call the map g the inverse of f. It is easy to check that inverses are unique, and that if f,g are invertible, then so is their composition. What else might you be able to say about invertibility and composition? If there exists an isomorphism between objects in a category, say X and Y, then we say X and Y are isomorphic, X \cong Y. Try to figure out what the isomorphisms for \mathbf{Sets} and \mathbf{Grp} are, and investigate how many isomorphisms there might be between elements. 

A clear extension of isomorphisms are automorphisms, which are isomorphisms from an object X to itself. We denote the set (group) of automorphisms of X as expected as \mathrm{Aut}(X).

Definition. groupoid is a category in which all arrows are invertible.

Here is an exercise from Ravi Vakil’s The Rising Sea: Foundations of Algebraic Geometry, which I cannot resist including here. I highly recommend Vakil’s notes, by the way, and I amazed he has yet to formally publish them. The exercise is as follows: realize a group as a groupoid of a single object. 

I should note that every example of a category considered here is a so-called concrete category, which is one which can be realized in an obvious way as having objects and maps with additional structure between said objects. Evan Chen’s An Infinitely Large Napkin (currently located in Chapter 22.3 on page 235) provides a good discussion of the important example of an abstract category, that of posets. If you are interested, you can look there. 

It would be remiss of me not to mention some other arrows of sorts in categories. I will not give the topics of functors, natural transformations, dual categories, or isomorphism and equivalence of categories nearly enough attention, but I think I should at least mention them.

Definition. (covariant) functor \mathcal{F}: \mathscr{C}_1 \to \mathscr{C}_2 is an arrow associating every object X \in \mathscr{C}_1 with an object \mathcal{F}(X) \in \mathscr{C}_2 and associating to every arrow f: X \to Y in \mathscr{C}_1 an arrow \mathcal{F}(f): \mathcal{F}(X) \to \mathcal{F}(X) in \mathscr{C}_2 so that \mathcal{F}(g \circ f) = \mathcal{F}(g) \circ \mathcal{F}(f) whenever f:X \to Y and g: Y \to Z are morphisms in \mathscr{C}_1. Furthermore, \mathcal{F}(\mathrm{id}_X) = \mathrm{id}_{\mathcal{F}(X)} for all objects.

Functors are usually assumed to be covariant, but they need not be. Contravriant functors simple reverse the directions of all arrows. Somewhat more precisely, a contravariant functor can be defined as a covariant functor on the dual (or opposite) category, which is the category derived from another by simply reversing all arrows. A stupid example of a functor is the identity functor or the inclusion functor. Functors where some structure is lost such as the functor Vec_k \hookrightarrow \mathbf{Sets} are called forgetful. Injective functors are called faithful, whereas surjective functors are call full; bijective functors are called fully faithful. Functors are written in capital print calligraphic font, or capital Greek letters most frequently. 

Definition. Let F,G be functors between \mathscr{C,D}, then a natural transformation \eta: F \to G is a collection of morphisms such that every X \in \mathscr{C} has associated to it an arrow \eta_X: F(X) \to G(X) (the component of \eta at X), and so that the components are such that every f: X \to Y in \mathscr{C} has the diagram commute (i.e., \eta_Y \circ F(f) = G(f) \circ \eta_X — sometimes people write a curly arrow in the middle of the diagram to denote the direction to read to see the commutative property).


(Wikimedia Foundation)

A collection of natural transformations where each map is invertible is called an isomorphism. There is a natural composition law on functors and natural transformations. The category of functors is denoted \mathrm{Funct}. (Why not have a category \mathrm{Nat}?) This can be taken infinitely further by doing higher category theory with 2-categories and even \infty-categories. 

If there is a functor between categories such that there exists another functor in the reverse direction so that their compositions are the identity functors, then the categories are called isomorphic. The inverse here is unique. If there is a functor between categories such that there exists another in the reverse so that their compositions are isomorphic to the identity functors, then the functor is an equivalence of categories. Here we do not have unique inverses and hence do not usually use the term at all. 

In his notes, Vakil promises not to present excessively more than was needed (i.e., no topoi). In that vain, I will not present any more general category theory except the essential idea of universal properties, which we will look at through groups (i.e., no Yoneda’s lemma). 

Definition. Let G be a group and H be a normal subgroup of G. The quotient G/H in the categorical sense is a group K alongside a homomorphism k: G \to K such that \mathrm{Ker}(k)=H, which is universal among all homomorphisms of that form in the sense that if \phi: G \to G' is any homomorphism such that H \subset \mathrm{Ker}(H), then a unique homomorphism \overset{\sim}{f} is induced, which makes the following diagram commutes.


(Joseph R. Heavner — I already had this diagram in a document of mine, so there I had no excuse but to reproduce it myself.)

It does require a proof to show that \mathbf{Grp} admits such quotients, but we will not bother ther, because it is not too different from things we have already seen. Notice that this absolute kills the proof of the first isomorphism theorem. Replace K with G/K and add in the usual map, and you are done! (OK, it does require a bit more of a detailed argument than this, but that is what exercises are for, right?) 

This explains the alternative nomenclature for quotient groups as “factor groups.” Any group homomorphism f: G \to G' factors through the quotient of the domain by the kernel of the map. 

I have always found thinking of the first isomorphism theorem in this way to be most revealing. This fundamental fact in addition to the fact that the theorem allows us to prove an isomorphism between a quotient and a group by simply finding a surjective homomorphism with the kernel satisfying the necessary condition. This is almost always how it is done in practice. Universal properties are quite prevalent in mathematics; another example of something with a universal property in the direct sum of vector spaces.

And, with that we bid adieu to the series. I hope you have learned something. In particular, I hope the motivation was helpful, the basic theory is now clear, the problems have allowed for some proficiency in solving problems involving this subject matter with relative ease, and you now have some appreciation for the modern perspective.


Quotients and Homomorphisms for Beginners: Part 5, The Isomorphism Theorems and Their Significance

Preface. This post has been in my drafts for over a month. Eventually, I will finish this eight part series, but expect other posts to be interspersed. In my drafts are also an introduction to Galois theory, a post on the unreasonable effectiveness of mathematics in the natural sciences (this relates to an expository paper I recently wrote), rationalism, Sylow’s theorems, and the Jordan-Holder program. I also hope to write about my research a bit. We shall see.

In this post we will explore three important theorems regarding isomorphisms and quotient groups. We will introduce these using fairly familiar language and try to gain some intuition for their significance. In a later post we shall view these crucial theorems from a somewhat more abstract and modern point of view, namely that of categories, which one might call the “universal perspective.” For most introductory undergraduate courses, these theorems, sometimes along with the Sylow theorems, which I may write about eventually, mark the grand culmination of all the group theory for the course. Some courses do not even include the second and third theorems; admittedly, these are frequently considered less fundamental than the first.

Before moving on I should note that different texts use different labeling and sometimes do not even use these names (Lang’s Algebra, for instance), or use “Homomorphism” instead and present a more general set of theorems.

Note that \simeq and \cong are equivalent for our purposes. Recently, the author has begun to prefer the latter for isomorphisms in all scenarios, whereas previously that was only the case when \simeq denoted some other equivalence as well.

First Isomorphism Theorem. Let f: G \to G' be a group homomorphism, then \mathrm{Im}(f)  \cong  G / \mathrm{Ker}(f) given by \psi: gH \to f(g).


We will prove the special case where f is surjective. Modifying the proof to the general case is not difficult and is left as an exercise.

Let H = \mathrm{Ker}(f). There is a canonical homomorphism \phi: G/H \to G'. This is well defined, for the image is independent of coset representative. We now prove this map is indeed a homomorphism. In particular, we show \phi(gH \cdot hH) = \phi(gH) \star \phi(hH). By direct computation,

\phi(gH \cdot hH) = \phi( (gh)H ) = f(gh) = f(g) f(h) = \phi(gH) \phi(hH)

where f(gh)=f(g)f(h) is because f is assumed a homomorphism.

Clearly, \phi is onto; showing this explicitly is hardly worth the space at this point. We now need only prove \phi is injective, which will show it is a bijection. We use a well-known fact that a group homomorphism is with trivial kernel is injective. (Note that H is the identity in G/H.) Suppose x \in G/H, then x is of the form $latex gH$, so f(g)=\phi(x). This means g \in H and x = H, meaning the kernel of \phi is trivial. A well-known theorem states that a homomorphism with trivial kernel is injective.


I do not want this to be unmotivated or uninteresting, so I will explain why we would even think of this theorem or care about it as well what it means in more plain English.

Whenever we are considering mappings, f_i:G_i \to H_i, we want to consider composition maps. It is natural to see if we can’t go from G_1 \mapsto H_1 \mapsto H_2, for example. There are many legitimate applications for this sort of thing which you should be comfortable with from elementary algebra. This makes it more obvious why we might want to examine something like the first isomorphism theorem.

The first isomorphism theorem just says that if you want to go from G to H, you can always go through G/\mathrm{ker} (f) in a very natural way. This is a beautiful theorem. It has analogs for most other algebraic structures we care about (e.g., vector spaces), and it is an example of something important categorically, which we will discuss in the next (and final) post.

Second Isomorphism Theorem. Let H < G and let N be normal in G. Then, H \cap N is normal in H and H/H \cap N \cong HN/N. 

Proof.  Omitted (Exercise)

Third Isomorphism Theorem. Let K \subset H be two normal subgroups of G. Then, G/H \cong (G/K)/(H/K).

Proof. Omitted (Exercise)

At first, I thought I would prove both of these theorems, but I have decided not to for two reasons. First, the proofs are pretty easy from the first isomorphism theorem and make for great practice. Second, these are neither frequently useful not frequently considered fundamental or deep. The third isomorphism theorem is also very easy to remember (“cancel” the K), and the proof is easy enough to recreate at will.


In the sixth and final post of this series, we will reset the first isomorphism theorem in more modern language: that of category theory. We will see a useful property of the group-theoretic quotient, see how the same idea applies in other areas, and see where the alternative name “factor group” comes from.


Quotients and Homomorphisms for Beginners: Part 4, Homomorphisms, Isomorphisms, Kernels, Normality, and the Theorems of Cayley and Lagrange

This post is admittedly somewhat of a digression, but in order to proceed we should develop some general theory. So, we begin with some interesting types of groups.

Definition. A group G is said to be cyclic if any g \in G can be written as a^n for some a \in G where n is an integer. The group is said to be generated by a, denoted \langle a \rangle.

We also naturally have cyclic subgroups of groups.

Example. The group \mathbb{Z} under addition is cyclic with generators 1 and -1.

Theorem. All cyclic groups are abelian.

Proof. Let G=\langle g \rangle, then a \cdot b = g^n \cdot g^m for some m,n. But, in any group we have the usual exponent properties for when the base is shared, so a \cdot b = g^{n+m} = g^{m+n}=g^m g^n = b \cdot a. We can use the commutativity of + here because it is not some abstract binary operation but rather the well-known addition of integers. \blacksquare

We will now prove an incredibly important theorem on the order of a group.

Lagrange’s Theorem. For finite G the order of the subgroups H_i \leq G divide the order of G.

Proof. The left cosets of H in G are equivalence classes which partition G, as we have seen before. We relate x,y \in G by x \sim y \Leftrightarrow \exists h \in H : x=yh. If all cosets have the same magnitude, then each coset has |H| elements. We have that the number of cosets times the order of H is the order of G. Thus, the order of H divides the order of G. Define the map \varphi aH \to bH to be f: x \mapsto ba^{-1}x. This is a bijection, which we can exhibit simply by finding the inverse: f^{-1}(y)=ab^{-1}y. \blacksquare

This actually gives us a better form of the theorem. Let the index, or the number of (left) cosets of H in G, be denoted [G \, : \, H]. Then, we have shown that |G| = [G \, : \, H] \cdot |H|. In fact, one can generalize further to say that (G \, : \, K)=(G \, : \, H)(H \, : \, K) where G > H >K.

Exercise. Show that every group of order less than six is abelian. (This is a classic which every student ought to spend some time on.) Hint: Use Lagrange’s theorem and the fact that cyclic groups are abelian.

Now, we move on to perhaps the most important subject in all of algebra and beyond. As one continues one’s study of mathematics, it becomes more and more obvious that that which one ought to care about is not the objects involved, even though they may be incredibly rich and interesting in their own right, but the relations involved: the morphisms between objects, to use some category theoretic lingo. Given two sets A and B, the morphisms we usually care about are the functions f: A \to B. But, if one uses functions to relate groups via their set-structure, it is clearly realized that the group structure may be lost: namely, functions do not preserve identities and inverses, the distinguished features of groups. If we wish to preserve this structure, we need something more: a homomorphism.

Definition.group homomorphism from G to G' is a map \varphi: G \to G' such that for all a,b \in G

\displaystyle \varphi(a \cdot b) = \varphi(a) \star \varphi(b)

where \cdot is the operation in G and \star is the operation in G'.

Exercise. Prove that indeed group homomorphisms do preserve inverses and identities, namely that \varphi: e \to e' (phi maps one identity to the other) and \phi(a^{_1})=\phi^{-1}(a).

Let us get a concrete example or two down before moving on. But first we ask: given two groups, is there always a group homomorphism between them? We shall let the reader ponder this for a moment and in the meantime look at two examples.

Example. All cyclic groups of the same order are homomorphic. For, consider G,H, and let G= \langle a \rangle and H = \langle b \rangle. Given g \in G we have that g=a^p for some p, so we send g \mapsto b^p. This is well-defined, but one should check this fact. It also happens to have the defining property of homomorphisms, which one should also check.

This time, we will work out the details mostly for the sake of explaining why a certain well-known mathematical function is so important (well, one reason).

Example. The reals under addition and the positive reals under multiplication are homomorphic. The map, for any x \in (\mathbf{R},+), is given by f(x)=e^x. It is well-known that e^{x}e^{y}=e^{x+y}, and so we have that this is a group homomorphism. This actually starts to reveal the connections between the study of differential equations and group theory, because the exponential is the function which solves the IVP y=y' \; y(0)=1. Phrased loosely, groups with differential structure are called Lie groups after Sophus Lie, and the field is a major area of active research.

There are all sorts of special homomorphisms. As it turns out, both of the aforementioned examples are isomorphisms, a very, very special kind of homomorphism indeed. Isomorphisms are in some sense much stronger than homomorphisms, and algebraists often think of two objects being isomorphic meaning they are essentially the same, up to relabeling.

Definition. An isomorphism is a homomorphism which is a one-to-one correspondence (a bijection).

For those who have forgotten, bijections are maps which are injective or one-to-one (two elements being distinct implies their evaluations under the map are distinct) and surjective or onto (all points in the codomain are mapped to). We leave it as an exercise to show that the two above homomorphisms are bijective and therefore isomorphisms.

It is worth noting here that an automorphism is an isomorphism from a set to itself, for instance the map given by f(x)=axa^{-1}.

The author wishes to prove another theorem related to Lagrange’s theorem, but first we need to get past one last definition.

Definition. permutation group is a group whose elements are permutations of a given set S and whose operation is composition of permutations, which are themselves bijections \rho_i : S \to S. The group of all permutations of a set is called the symmetric group of the set, denoted Sym(S). Finally, in the case that S= \{1,2, \ldots , n \} we call the symmetric group the symmetric group on n letters and denote it S_n.

We now prove an important theorem, but before moving on the reader is encouraged to “play” around with permutation groups a bit. For instance, look up notations for permutations, and prove that S_n \, n \geq 3 is non-abelian.

Cayley’s Theorem. Every group G is isomorphic to a subgroup of the symmetric group on G.

Proof. Let H be the permuations of the set G. Define f: G \to H by taking an element and sending it to the permutation defined by \sigma(g)=ag for any g. Then, f is an isomorphism onto its image. It is not hard to see that f is a bijection, so then one simply checks that f(ab)=f(a)f(b). Let \tau=f(a), \tau' = f(b), \sigma = f(ab), then \tau(\tau'(g))=\tau(bg)=a(b(g))=(ab)g=\sigma (g). \blacksquare

This is clearly fundamental, and so we will leave it at that. Moving on, we shall prove a result on how normal subgroups manifest themselves.

Definition. The kernel of a homomorphism f, denote ker(f), is the inverse image of the identity.

Those who have taken linear algebra should be familiar with kernels in the context of linear transformations. The kernel and the image are two fundamental subgroups of group homomorphisms.

Theorem. Let f: G \to H be a group homomorphism, then ker(f) is a normal subgroup of G.

Proof. We leave the proof that the kernel is a subgroup at all to the reader. To check for normality suppose h \in ker(f), then f(ghg^{-1})=f(g)f(h)f(g)^-1=f(g)f(g)^-1=e. Therefore, ghg^{-1} \in ker(f). \blacksquare

It turns out that the converse is true as well! This means that every normal subgroup is realized as the kernel of some group homomorphism! In fact, we can always find said homomorphism, or at least one such.

Converse. Let H \triangleleft G, then H=ker(f) for some f:G \to G'.

Proof. Let H \triangleleft G. Define the canonical map \varphi: G \to G/H where \varphi(g)=gH. Given the rule (aH)(bH)=(ab)H, which one can check is well-defined, we have a group homomorphism. \blacksquare

This is a beautiful proof because it is constructive. Given a normal subgroup, one can find at least one group homomorphism with said subgroup as its kernel.

We have admittedly met the climax of this post, but before we end I would like to prove one more simple but useful theorem.

Theorem. All left cosets aH of a subgroup H in G have the same order.

Proof. Those who have been paying close attention will already have realized this. You should note that if two sets have the same size (cardinality), then by definition we have a bijection between them. This is how we count; although, we rarely think about counting in this way. Since |aH|=|H|=|bH|, all cosets are equal in size. \blacksquare


Math Isn’t So Black & White

Below is a reproduction of my response to someone claiming mathematics is binary in nature, that answers are always either wrong or right, that there is no ambiguity. For context, the claim was made as a comment on a Vlogbrothers YouTube video. The monologue is very to the point albeit inelegant, but hopefully it will prove enjoyable or educational.
This is not far off, I suppose, but it is not entirely true. Take, for instance, the question: Is there a cardinal number strictly between aleph-0 and the cardinality of the continuum? This is not answerable in the canonical axiomatic structure of mathematics; in math lingo it is said to be independent of ZFC. Moreover, the implications of mathematics can often be disputed. If you had told me that Gromov-Witten invariants would be important for science and presented me with the paper introducing them but nothing else, I probably would have said that they would not be, at least not for a long time. But almost immediately they were put to use in physics!
And while it is true that proof is absolute in that given the axioms as premises, the conclusion (theorem) must follow due to the validity of the proof, many mathematicians and philosophers still argue about what our premises should be. Are our axioms even consistent? What about completeness? Furthermore, what is interesting is very subjective, and this drives mathematics. Few people still study constructions via algebra these days because it is not deemed all that important except for historical reasons perhaps, but there is still plenty to be done in that field if one so wished to pursue it. Also, what if mathematicians cannot follow a peer’s work? This has been an issue lately with Shinichi Mochizuki’s work on deformation theory (“Inter-Universal Teichmuller theory”).
Finally, mathematicians are people, and people make mistakes, and sometimes those mistakes are missed. Nerdighteria’s resident mathematician Daniel Biss is actually best known in the math community for being an incredibly promising, bright, and hard-working student and young mathematician who published several massively influential papers which were later found to be false due to human error. (This is not to say he was a bad mathematician or anything of that sort. I chose him only because he is probably the best known modern example.)
Similar gripes could be made with saying any discipline is so decisive. In a way, math shows us that such decisiveness can never occur.

Math Makes Physics Easy

When freshmen undergraduates have to take introductory physics perhaps the most common complaint, aside from the incredibly creative “this is hard,” is that the material is too mathematically involved. Classical mechanics, electrodynamics, and the other topics covered in introductory physics almost exclusively involve calculus (differential equations, really), elementary algebra, and some basic geometry and trigonometry. Traditionally, calculus is the mathematics course undergraduates at large struggle with. So, it appears these students think the calculus makes it hard. While certainly students who struggle with calculus will struggle with physics, those who are more mathematically inclined will probably argue quite the opposite: the math is the easy part! Quickly and accurately assessing physical situations irrespective of mathematical abstraction is where the difficulty lies.

Of course, before Newton and others applied mathematics to physics the subject was very limited. But, it is also true that the more math that has been introduced to physics, the easier it has become. The somewhat unfortunate part about freshmen physics is that oftentimes there is little mathematical complexity available that would both be reasonably tame as to be teachable and be beneficial for solving actual physics problems. Sure, we could all learn Lagrangian and Hamiltonian mechanics from the beginning, but only the former can help make some problems easier, and both are clearly much more mathematically complicated than the traditional approach. This is why no freshmen course I have yet seen has even attempted the approach; it is a horrible idea.

Inexperienced students often have the misconception that quantum mechanics or statistical mechanics, because of their now quite popular weirdness, will be the most difficult physics courses they will take. But for many, the mathematically inclined in particular, these courses will actually be quite easy! The same could be said for, say, relativity as well; although, the mathematical background required for general relativity is pretty substantial in and of itself. In much of the well-understood “modern physics” the physical situation is complicated and very weird, but we have enough mathematics at this point to do quite well in solving problems.

Now, quantum mechanics is not a generically easy course, but the point I am trying to make is that math is easy but physics is hard. Moreover, even potentially great physicists are quite susceptible to struggling in freshmen physics, because it is hard, period.

Quotients and Homomorphisms for Beginners: Part 3, Nomenclature and the Integers Modulo n

The last post culminated in the definition of the quotient group. We saw that this group is constructed by essentially “dividing” the group into cosets and working with those. This is one of the reasons we deem the group the quotient group. One should note, however, that some authors use the alternative terminology factor group, e.g. Lang’s famous Algebra uses the term, but it is worth noting that he often uses odd conventions, for instance he introduces but never names the “Isomorphism theorems.”

There is another reason for the name “quotient,” which is probably where the term comes from in the first place. Before explaining the name, however, we should introduce the quintessential example of a quotient group: the integers modulo n. When working in general algebra, the integers are almost always the example we care about, because algebra was developed in no small part to do number theory. Let us quickly prove a theorem about the group of the integers under addition before proceeding.

Theorem. The subgroups of (\mathbb{Z}, +) are of the form n\mathbb{Z} for n an integer.

Proof. We will only prove that the subsets of that form are indeed subgroups. The proof that these are the only such subgroups will be left as an exercise (Hint: recall the Euclidean algorithm).

First, we show closure under addition. If nx, ny \in n\mathbb{Z}, then nx+ny=n(x+y), which is in n\mathbb{Z}. Clearly, the identity is 0= (0)n \in n\mathbb{Z}, and the inverse of n x is -nx=n(-x) \in n\mathbb{Z}. \blacksquare

It is easy to see that the subgroups n\mathbf{Z} are normal, because the subgroups are more than that: they are abelian. Therefore, we have the quotient group \mathbf{Z} / n\mathbf{Z} also known as the integers mod n and sometimes denoted \mathbf{Z}_n (not to be confused with the n-adic numbers). In my opinion, it is actually a little weird to see how this group fits in with quotient groups in general. Understanding the group is easy, but the notation and how we can figure out what it is just based on the general definition of G/H was not instant for me when I was first introduced to the subject. So, I will take a bit of care to show how the two coincide, in case I am not the only person susceptible to this little struggle. Either way, it will take some effort to truly internalize all of this.

Generally, we say \mathbf{Z}/n\mathbf{Z} := \{0, \ldots n-1 \} with addition mod n, but this is a bit misleading. Let us start from the beginning. The subgroup is n \mathbf{Z}=\{ 0 , \pm n , \pm 2n , \ldots \}, and so the cosets are

0+n\mathbf{Z} , 1+n\mathbf{Z}, \ldots , (n-1)+n\mathbf{Z}

We claim that these are all the cosets, because if k \in \mathbf{Z}, then

k=nq+r \; , \; 0 \leq r < n \implies k+n\mathbf{Z}=nq+n\mathbf{Z}+r= n \mathbf{Z} + r

But, this is already in the list, because of the restriction on r. \blacksquare

So, for simplicity we generally identify coset m + n\mathbf{Z} with m.

And, so we can compute the Cayley table (multiplication table) for small values of n quite easily. Below is the Cayley table for Z/4Z.

Z Mod 4Z Cayley

We can now see where the integers come in the naming of this group. When one divides, for instance, 15 by 5, one obtaains 3 because one can regroup 15 objects into 3 groups (in the colloquial sense) of 5 objects. In the quotient group, we have the same thing but with additional structure, namely the group structure of the integers.

Next time, we will digress a bit before continuing with the theory of quotient groups. Eventually, we will make our way to the isomorphism theorems and more.

Contemporary Research in Algebra: A Note

Mathematics students across the country take courses such as “abstract algebra” and “modern algebra.” The content of these courses is incredibly standardized at this point. Generally, students cover group theory with emphasis on finite groups, perhaps “up to” Sylow’s theorems, then they move on to some linear algebra over a general, fixed field k (rather than \mathbf{R} or \mathbf{C}), then they cover some elementary ring and ideal theory, and finally some courses will delve into field theory and Galois theory, culminating with such classic results as the insolubility of the general quintic. As with most mathematics courses, after having taken elementary algebra students may get the feeling that the subject has been mostly exhausted, but this is simply not the truth. In fact, even some basic questions in finite group theory are still unresolved.

However, this is not to say that the primary topic of interest is finite group theory, for instance. Truthfully, we do have an excellent understanding of finite groups. This is why, in so many related subjects, we seek to translate hard questions into group theoretic terms, rendering them simple by comparison. There are any topics which use algebra to study other domains, e.g. algebraic topology, algebraic geometry, and algebraic number theory. In fact, it seems few topics in modern mathematics have gone untouched by algebraic influence.

But, what of “pure algebra,” as dubious as the idea of such a classification may be? Well, one major research area I feel most would say fits this description is the theory of Lie groups (and Lie algebras), which are groups which have additional structure, namely that of a smooth manifold. Many of the groups you are familiar with are Lie groups, for instance GL(2,\mathbf{R}), \mathbf{S}^3, and \mathbf{R}^n. This document will give you some citations of fairly recent developments. And, if you are really interested, it is frankly not difficult to pick up the basics of the subject if you have a decent background in algebra and differential geometry.

A fairly recent development in finite group theory was the classification of finite simple groups. This was a massive collaborative effort spanning decades, which ended in 2004 and which took tens of thousands of pages contained in hundreds of articles and produced by over a hundred authors. While one needs to be an expert (with a lot of time) to understand the classification theorem’s proof, its statement is very simple.

Theorem. Every finite simple group is isomorphic to one of the following: a cyclic group of prime order, an alternating group of degree at least , a simple Lie group, or the 26 sporadic simple groups.

But, this was clearly a monumental task. Are there smaller problems in “classical” group theory which have only recently been solved? Why, yes, there are! There is still some work to be done for finite groups, but frankly the number of mathematicians working in that area has been and will continue to be dropping quite quickly. If we turn to groups of infinite order, however, we can find several interesting, difficult problems. For instance, there have been several recent developments with respect to the so-called “inverse Galois group problem,” though none that I know of have been particularly groundbreaking (full disclosure: I am not a group theorist). A good note on the subject was written by Zywina, and the same mathematician authored a proof in 2012 that the fundamental group PSL_2(\mathbb{F}_p) solves the inverse Galois problem for p \geq 5.

But, what of ring, ideal, and module theory? Is this still an area of research? Certainly, commutative algebra and related subjects are now used by many, many mathematicians, and there have been too many developments from too many areas to even begin to discuss any in detail here. Nonetheless, here is a link to a somewhat recent MSRI workshop on the subject, where you may be able to find some guidance.

Even linear algebra, which many see as now becoming more computational and applied, is still an area of current research in pure mathematics. Moreover, derivatives of linear algebra such as representation theory remain very active. In fact, representation theory has played a major role in the famous Langlands program.

And, indeed, Galois theory too is an active area of research, though it is becoming increasingly uncommon for that terminology to be used. A good sampling of what mathematicians care about related to Galois theory can be found at this page for a program held at the University of Pennsylvania in 2006.

So, while it seems everyone’s research involves some algebra these days, from geometry to number theory to combinatorics, there are still plenty of “pure” algebraists out there. Hopefully, this post has given the reader some idea as to what it is these people do.

Note: My work involves a lot of algebra, but I am not an algebraist.

Quotients and Homomorphisms for Beginners: Part 2, Introducing the Quotient Group

Consider the group G= (\mathcal{G}, \cdot) where \mathcal{G} is the set structure of G and \cdot is the group operation. Now, suppose we partition \mathcal{G} into \mathcal{G}_1 , \ldots \mathcal{G}_n. Can we make a group with \mathcal{G}_1 , \ldots \mathcal{G}_n as elements? In other words, can we divide G into parts and make a group out of those parts?

Recall the definition of the (left) coset from the last post. We claim that the cosets of a group represent a partition of the group into disjoint subsets. This shall be proved presently with little more than the introduction of an alternative definition. But, first we must review the basic theory of equivalence classes.

Definition. An equivalence relation is a binary operation \sim on a set X such that for all x \in X x \sim x (reflexive), for any x , y \in X if x \sim y, then y \sim x (symmetric), and for any x,y,z \in X if x \sim y and y \sim z, then x \sim z (transitive).

The equivalence class of an element x \in X is defined as [x] := \{ y \in X : x \sim y\}. The set of all equivalence classes in X (for a given relation R) is generally denoted X/R (read: X modulo R).

A more intuitive way of thinking about equivalence relations is to think of them as different ways of partitioning a set, i.e. separating a set into disjoint subsets.

Theorem. An equivalence relation \sim on a set S partitions S.

Proof. Clearly, every x \in X belongs to the class [x]. Suppose x \sim y. If x' \in [x], then x' \sim x, so x' \sim y by transitivity. Hence, x' \in [y]. The same argument can be applied for y' \in [y]. Therefore, [x]=[y]. Moreover, suppose we have that x and y are not equivalent under \sim, then if z \in [x] \cap [y], then z is in both equivalence classes, hence z \sim x and x \sim y. By symmetry, x \sim z, but similarly x \sim y. Thus, we have a contradiction, and so [x] \cap [y] = \varnothing. We have therefore shown that distinct equivalence classes are disjoint subsets. To finally prove this is a partition, we must show that the union of all subsets is equal to the original set. This is clear because \displaystyle \cup_{x \in X} [x] = S\blacksquare

It turns out that the converse is also true: every partition of a set is due to an equivalence relation! A formal proof of this fact is left to the reader as an exercise.

Now, let us define the coset using different language.

Definition. The left cosets of H in G are equivalence classes under the relation on G given by x \sim y \Leftrightarrow x^{-1}y \in H, or equivalently if xh=y for some h \in H.

To ensure the reader is involved, we leave the verification that the given relations above are equivalence relations and are indeed equivalent. It therefore follows that the cosets of H in G partition \mathcal{G}.

Now, can we use cosets to make a group? Let us try. Suppose we have a group G and a subgroup H, then G/H := \{ gH : g \in G \}. Naturally, the operation should be gH \cdot g' H := (gH)(g'H) := (gg')H. The issue with this is that it is not consistent for coset representatives! If aH=bH, then we cannot guarentee that (ab)H = (cb)H! We need to fix coset representatives for the operation to be well-defined.

It turns out that normality is a necessary and sufficient condition for this to occur. For, if left and right cosets are equivalent, then

\displaystyle (aH)(bH)=a(Hb)H=a(bH)H=(ab)HH=(ab)H

Associativity is easily verified, the identity is simply H, and the inverse is a^{-1} H.

We call this important group the quotient group, and it is one of the most important groups in mathematics.

An Accessible Introduction to Sheaves

This has been reproduced from my Personal Blog. It will remain there as well, but since breaking my one blog into three, it is now best suited here.

Below is reproduced a post that will be featured on the blog cozilikethinking at https://cozilikethinking.wordpress.com/ as a part of a series of posts on introductory algebraic geometry. This reproduction is partly for the sake of ensuring there are no \LaTeX errors and partly because I would like to soon begin blogging again, for the first time here. The aforementioned blog, cozilikethinking, is run by Ayush Khaitan, who I have had great pleasure communicating with and who writes exceptionally about a wide variety of mathematical topics, which are typically within the realm of undergraduate interest. Without further ado, below is the post.

Let me just preface this by saying that I look forward to writing, in an accessible way, about the realm of algebraic geometry with Ayush on this blog.

In the study of algebraic geometry, one often hopes to delve into abstract notions, research tools, and topics such as those of cohomology, schemes, orbifolds, and stacks (or maybe you just want to prove that there are 27 lines on a cubic surface). But, all of these things probably seem very far away, because algebraic geometry is a very rich, very technical field. It is also one that was inaccessible to most until recently, despite often being concerned with rather simple ideas and objects. Luckily, today we all need not read the SGA and EGA. While I think it unwise to jump ahead to something as modern as schemes just yet, it should be possible to study one of algebraic geometry’s most important and deceivingly intuitive tools: sheaves.

In not-so-technical terms, a sheaf is a device used to track locally defined information attached to the open sets of some (topological) space. Basically, this is a nice tool to organize information. To get a bit more formal, we’ll need to first define what a presheaf is.

Presheaf: A presheaf F on a topological space X is a functor with values in some category \mathbf{C} defined as follows:

  • For each open set U \subset X, there is a corresponding object F(U) in \mathbf{C}
  • For each inclusion of sets S \subseteq U there exists a corresponding morphism, called the restriction morphism, \mathrm{res}_{S, U}: F(U) \to F(S) in \mathbf{C}
  • \mathrm{res}_{S, U} must satisfy the following:
    • For all U \subset X, \mathrm{res}_{U,U}:F(U) \to F(U) is the identity morphism on F(U)
    • For open sets S \subseteq T \subseteq U, \mathrm{res}_{S,T} \circ \mathrm{res}_{T,U} = \mathrm{res}_{S,U}, i.e. restriction can be done all at once or in steps.

Presheaves are certainly important, but I will stay focused on our goal to understand sheaves instead of dealing with details about presheaves. With that having been said, recall the property of locality from our loose definition of a sheaf, as it is also our first axiom for sheaves, with the other being concatenation or gluing. These two axioms may also be thought of as ensuring existence and uniqueness.

Sheaf: A presheaf satisfying the following:

  • (Locality) If \{U_i\} is an open covering of the open U, and if x,y \in F(U) such that \mathrm{res}_{U_i,U}(x)=\mathrm{res}_{U_i,U}(y) for each U_i, then x=y.
  • (Gluing) Suppose \{ U_i \} is an open cover of U. Further suppose that if for all i a section s_i \in F(U_i) is given such that for each pairing U_i, U_k of the covering sets \mathrm{res}_{U_i \cap U_k, U}(s_i)=\mathrm{res}_{U_i \cap U_k, U}(s_j), then there exists s \in F(U) where \mathrm{res}_{U_i,U}(s)=s_i \; \forall i.

In other words, a sheaf is a presheaf if we can uniquely “glue” pieces together.

For the sake of brevity and simplicity, we will ignore some technical details henceforth. We will also focus on functions and let the reader work out the details for the same reason.

For our first example, consider topological spaces X and Y with a “rule” \mathscr{F} such that open U \subset X is associated with

\mathscr{F}(U)=\{f: U \to Y : f \; \mathrm{is \; continuous} \}

This is a sheaf. It is actually a pretty well-known example too. (Hint: In justifying that this is a sheaf, it may be a good strategy to begin by considering restriction maps.)

For our second and final example, we shall consider the sheaf of infinitely differentiable (\mathbb{C}^{\infty} smooth) functions. The sheaf of infinitely differentiable functions of a differential manifold M has two properties:

  • For all open sets U the ring of C^{\infty} functions is associated from U to \mathbb{R}
  • The restriction map is a function restriction

Again, it’s worth playing around with this sheaf a bit.

So, to review a sheaf is a tool used in algebraic geometry and other fields, and it serves as a sort of data collection method such that the data is all put in one place, i.e. a point.


  • It’s worth noting that there are tons of different definitions of presheaves and sheaves, but I think the provided one is most intuitive and works well for many instances.
  • An alternate notation for \mathrm{res}_{V,U}(s) is s|_{V}; keep this in mind if you plan to read more on this topic.
  • Be sure to keep functions in mind while you’re developing an understanding of sheaves. Another example to consider is the sheaf of regular functions.

Generalizing the Difference of Squares Formula: Why Don’t We Teach This?

In high school, inevitably students are taught the formula for expressing a difference of squares in terms of linear factors, specifically that

\displaystyle x^2 - y^2 =(x+y)(x-y)

This can be a useful factorization method, and while students do not derive it, they can easily verify it by expansion of the right hand side.

Some students will also learn the formula for the sum and difference of squares:

\displaystyle x^3 + y^3 = (x+y)(x^2-xy+y^2) \; \; \& \; \; x^3 - y^3 = (x-y)(x^2+xy+y^2)

Again, verification is simple, even if one does not derive the formula. Moreover, while the formula is longer, memorization is made easy by the mnemonic SOAP, which dictates the signs (“Same Opposite Always Positive”).

Likely even fewer students, presumably in pre-calculus where complex number arithmetic is discussed in more detail than in previous coursework, may even know the sum of squares formula:


Once more, verification is a trivial exercise.

But, why teach all these different formulas when they can be combined? Of course, most students do not ask this question, because they do not know that they can be, but in fact, they can.

Theorem. Let x,y \in \mathbf{R} and let n be some natural number, then

x^n-y^n = (x-y)( x^{n-1}+x^{n-2} y + \cdots + xy^{n-2} + y^{n-1})

The best part of this is that its proof is elementary and can be one of the first examples used in the introduction of mathematical induction, which really ought to be taught to all high schoolers at some point.


The base case clearly works. So, suppose it works for n =k, then we wish to show it necessarily works for k+1 and hence the formula is valid for all k \in \mathbb{N}.

Let us restate our induction hypothesis as follows:

\displaystyle \sum_{i=0}^{k-1} x^i y^{(k-1)-i} = \frac{x^k-y^k}{x-y}

Now, we proceed as follows with the k+1 case. (The image is via Codecogs’ editor, as I am still getting used to WordPress’ \LaTeX editor.)

CodeCogsEqn (1)

which proves the theorem.


The only tricky part here is the alternative form of the induction hypothesis. We do this so the induction problem seems more familiar and additive in nature. After that observation, we proceed just as with most classic cases of induction in elementary algebra and number theory.

And, what do we get from this? Well, we get all of the forms of difference of powers we could possible have. Personally, I prefer this to memorizing multiple formulas.