In the past few sections, one of the many things you may have conjectured about sums of squares is that every prime of the form can be represented as the sum of two squares. Combined with Fact 13.1.9, limiting the question to primes should be sufficient to finish analyzing the question for any positive number. (See Theorem 13.5.5 for the final steps putting this all together.)
It turns out it is true that can always be written as a sum of squares, and we will spend most of the remainder of this chapter proving it. At the end of the chapter, we’ll add in Fact 13.1.1 about primes of the form to see exactly which numbers can be thus represented.
To keep with the theme of the unity of mathematics, we do this geometrically, not algebraically as in most texts, though the core ideas are similar with both proofs. We roughly follow [E.2.1, Chapter 10.6], but expanded greatly to avoid any direct reference to Hermann Minkowski’s theorem on lattice points in a convex symmetric set. Interestingly, [E.4.16, Theorems 4.3 and 8.3] only states this and Lagrange’s four square theorem, precisely because although Minkowski’s Theorem provides a general framework for existence of such points geometrically, one still requires information about quadratic residues to provide lattice points to work on in the first place.
First, let’s look at the following plot on the integer lattice. As you can see, I am plotting certain points on the circle , with to begin. I have done some ‘magic’ to turn the square root of (mod into these points. Before telling you the magic, Figure 13.4.2 (and the interact following it) will help us get ready.
To be precise, I’ve used this square root of to create the regularly spaced grid of blue points. You can think about it as a bunch of corners of parallelograms.
Sometimes we call things like the set of blue dots a lattice, though in this text I will usually use the word lattice only to refer to the usual integer lattice of the black dots. A general lattice is something related to a concept from linear algebra – vectors generated by a basis, except instead of being vectors over or , they are over .
Here is how I constructed the blue grid. First, assume that is our prime and pick as a square root of negative one (or its additive inverse, if you prefer); we can use the residue modulo for convenience. Then the blue points are of the form for all integers .
To prove the theorem that for any we can write it as a sum of squares, we need to prove there is a blue dot (somewhere) that is not at the origin but also has norm smaller than . We will prove this by heavy reference to graphics, but all claims also make sense algebraically. Sometimes we need help to be able to think about more involved proofs.
We include a variation on the graphic in Figure 13.4.7 to make this visually clear. The bigger circle is the one we care about now – it has formula , so radius . If we find a blue point inside the disk bounded by that circle, but not at the origin, then the argument in the proof sketch given for Theorem 13.4.5 shows this point must be on the smaller circle.
Very strangely, the best way to do this is by considering the areas of the various circles, and showing that they are so big you just must have a blue point in their interior (but not at the origin). Let’s see how this works.
What we do now is to create a sublattice of the blue dots, which we will color green. (This is just a subset of a lattice which still otherwise satisfies the conditions for being a lattice.) To create the green sublattice, take all blue dots, and just double their coordinates. Naturally, each green dot is still a blue dot, including the origin. See Figure 13.4.8.
Next, we take a look at certain triangles made by the different colored dots; continue following Figure 13.4.8, or see the interact at the end of this subsection.
Compare the thinnest such triangles one can form, with respect to the vertical axis.
The thinnest triangle made by blue dots would be of height one. A typical one would have vertices the origin and the points (with ) and (with where as above).
The thinnest triangle made by the green dots has height two. It has width (from the origin to , the previous point doubled); the apex is the point , which is doubled.
Now consider the parallelogram with the solid red lines made of two of these triangles – from the origin to to to and back. (Recall that is a square root of modulo .) This quadrilateral has area , which means its area is smaller than that of the bigger circle.
In Figure 13.4.8 we have and . To see this all interactively, evaluate the interact; click triangles_on to see the green dot triangle and parallelogram outlined in red.
The last stage of the proof is very visual. Before we move on, make sure you believe all the claims of this stage, especially the claims about areas. Those are the ones we will analyze more closely to finish the proof of Theorem 13.4.5. Remember always that we are trying to prove that there is a blue point contained inside the disk bounded by the bigger blue circle, but away from the origin.
To finish the proof, we need to find a blue point other than the origin interior to the bigger blue circle of radius . The gist of the argument splits into two parts.
Because all points inside the parallelogram (not just green, blue, or lattice points) will “repeat” outside of it in another parallelogram, is the biggest area of a region that you can have and not “repeat” some point. (This parallelogram is often called a fundamental region in more general treatments.)
So, the interior of the circle, having a bigger area, must have two points (not necessarily blue points, just points on the plane) which are “repeated” by translation of this parallelogram.
Secondly, we show why the previous claim leads to a proof in Claim 13.4.12:
We start with the two points from Claim 13.4.11 in the disk bounded by the circle (points which are not necessarily on any lattice, blue, green, or even black).
Then we use elementary geometry to construct a blue point (namely, one of the form ) which is strictly in the interior of the disk bounded by the circle of radius . In particular, this point is not the origin.
Let be the parallelogram with vertices ,,, and and its interior (where is a square root of modulo ). Any plane region is the union of its intersection with all possible translations of by rigidly moving so that the origin is translated to another green point.
Proof.
We are not going to prove topological facts in this text, nor explore the further depths of lattices. So it suffices to note that every green point can serve as the leftmost vertex of a unique parallelogram not just congruent to, but translated from, , and that by construction these cannot overlap (other than possibly along their edges).
We say that two distinct points in a plane region are “repeated” if they are both rigid translations of the same point in , where the allowed translations are those described in Fact 13.4.9.
We now prove the two remaining claims to finish the proof of Theorem 13.4.5, after which we encourage the reader to explore the large interact in Example 13.4.13 which ends the section.
Consider the circle of radius centered at the origin. The interior of the disk bounded by this circle has two points “repeated” by shifting the parallelogram .
Proof.
Recall from Fact 13.4.9 that the disk is composed of all its intersections with different parallelograms congruent to .
Suppose that there are not two points “repeated” within the disk (not including the boundary circle). Then every point thereof is a translation of a different point of . One can make this a one-to-one function from the disk to by sending each point in the disk to the corresponding one in .
Because each such move is rigid, this function is area-preserving 8
If you looked at this footnote because you want a proof of this, recall we do not prove topological facts in this text! Next you’ll be wanting a proof of the Jordan curve theorem from first principles. More seriously, we have to draw the line somewhere, and I find pedagogically that students would find proving assertions of this kind similar to proving using Russell and Whitehead as a text. Convincing students that proving Fact 1.2.2 is useful is hard enough.
, which means the area of the disk must be less than or equal to that of .
However, at the end of Subsection 13.4.3 we asserted the opposite! So by way of contradiction we have our two points.
Given two points (in the interior of the circle of radius centered at the origin) which “repeat” from , we can construct a point, not the origin, of the form .
Proof.
Given how we defined “repetition”, we know that the line segments from and to the leftmost vertex of their respective translations of must themselves be rigid translations of each other, hence the line segment connecting and can be translated to a segment connecting the origin and another green point. Give this point the name 9
In fact, as vectors of course this is the point, but we minimize formal use of vectors in this text.
.
Since is of the form by definition, then the point halfway between it and the origin (or “”) is a blue point of the form , and clearly not the origin since itself is not the origin. It remains to show that this blue point is in the interior of the circle.
To see this, consider the distance between and . By definition of a circle, it cannot possibly be further than twice the radius, so is strictly less than . But then cannot be more than units from the origin, so the point , being exactly half that distance from the origin, is less than distance to the origin. By definition is in the interior of the larger circle, as desired.
In Figure 13.4.14 we see the picture of how Claims 13.4.11 and 13.4.12 find the blue point in the circle. The black points are and , the arrows point between and and from the origin to , and the midpoint of the second arrow is indeed blue.
How to find the lattice point on the circle
Figure13.4.14.How to find the lattice point on the circle
This is by far the longest code we’ve seen up to this point. It is a brute force check of all movements of all points in the parallelogram to find two points in the bigger circle. Can you think of ways to make it more efficient?
Why was this so hard? I can think of three reasons.
First, we are trying to prove something about squares by proving something about square roots. It works, but it means there will be many steps.
Secondly, we are not just algebraically proving it exists by solving an equation; we are forced to prove our square root exists with inequalities, which brings another set of complications.
Third, we chose to examine those inequalities geometrically to gain insight, so our proofs must use that insight – worthwhile, but stretching.
Many more theorems of this kind, such as Lagrange’s four square theorem, can be proved using similar techniques, which we are intentionally avoiding stating in their full generality. The names of Minkowski and Blichfeldt are associated with theorems using various symmetries and the notion of convexity in order to apply things more generally. Those who have had some physics may have heard of Minkowski before, as his work nearly beat Einstein to the notion of special relativity; his geometric framework for space-time gave Einstein the necessary apparatus to generalize to curved spacetime and general relativity.