Section 13.4 Primes as Sum of Squares
Remark 13.4.1.
To keep with the theme of the unity of mathematics, we do this geometrically, not algebraically as in most texts, though the core ideas are similar with both proofs. We roughly follow [E.2.1, Chapter 10.6], but expanded greatly to avoid any direct reference to Hermann Minkowski's theorem on lattice points in a convex symmetric set. Interestingly, [E.4.16, Theorems 4.3 and 8.3] only states this and Lagrange's four square theorem, precisely because although Minkowski's Theorem provides a general framework for existence of such points geometrically, one still requires information about quadratic residues to provide lattice points to work on in the first place.
Subsection 13.4.1 A useful plot
First, let's look at the following plot on the integer lattice. As you can see, I am plotting certain points on the circle
Remark 13.4.3.
Sometimes we call things like the set of blue dots a lattice, though in this text I will usually use the word lattice only to refer to the usual integer lattice of the black dots. A general lattice is something related to a concept from linear algebra β vectors generated by a basis, except instead of being vectors over
xxxxxxxxxx
def _(p=(5,[q for q in prime_range(200) if q%4==1])):
f=mod(factorial((p-1)/2),p)
viewsize=ceil(math.sqrt(p))+2
g(x,y)=x^2+y^2
plot1 = implicit_plot(g-p, (-viewsize,viewsize), (-viewsize,viewsize), plot_points = 100)
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j in [-viewsize..viewsize]]
plot_grid_pts = points(grid_pts,rgbcolor=(0,0,0),pointsize=2)
lattice_pts = [coords for coords in grid_pts if (coords[1]-f*coords[0])%p==0]
plot_lattice_pts = points(lattice_pts, rgbcolor = (0,0,1),pointsize=20)
show(plot1+plot_grid_pts+plot_lattice_pts, figsize = [5,5], xmin = -viewsize, xmax = viewsize, ymin = -viewsize, ymax = viewsize, aspect_ratio=1)
Definition 13.4.4.
We call the norm of a point
Subsection 13.4.2 Primes which are sums of squares
We are now ready to state our big theorem for the section. (See Fact 14.1.8 for a quite different proof.)Theorem 13.4.5.
Every prime
Proof.
The proof is fairly long. Here is the strategy; the first step will be detailed in Subsection 13.4.3 and Subsection 13.4.4.
Suppose we find some blue dot \((a,af+bp)\) such that
Then we know, modulo \(p\text{,}\) that
so \(p\) in fact divides the norm of the point \((a,af+bp)\text{.}\)
So we have that \(0<a^2+(af+bp)^2<2p\) and that \(p\mid a^2+(af+bp)^2\text{,}\) meaning the only possibility is \(p=a^2+(af+bp)^2\text{,}\) which gives \(p\) explicitly written as a sum of perfect squares.
Example 13.4.6.
For instance, with
Guess and check with
so this point should work, and this does give the correct statement that
Subsection 13.4.3 Visualizing the proof
To prove the theorem that for any
xxxxxxxxxx
def _(p=(5,[q for q in prime_range(200) if q%4==1])):
f=mod(factorial((p-1)/2),p)
viewsize=floor(sqrt(2*p))+2
g(x,y)=x^2+y^2
plot1 = implicit_plot(g-p, (-viewsize,viewsize), (-viewsize,viewsize), plot_points = 100)
plot2 = implicit_plot(g-2*p, (-viewsize,viewsize), (-viewsize,viewsize), plot_points = 100)
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j in [-viewsize..viewsize]]
plot_grid_pts = points(grid_pts,rgbcolor=(0,0,0),pointsize=2)
lattice_pts = [coords for coords in grid_pts if (coords[1]-f*coords[0])%p==0]
plot_lattice_pts = points(lattice_pts, rgbcolor = (0,0,1),pointsize=10)
show(plot1+plot2+plot_grid_pts+plot_lattice_pts, figsize = [5,5], xmin = -viewsize, xmax = viewsize, ymin = -viewsize, ymax = viewsize, aspect_ratio=1)

The thinnest triangle made by blue dots would be of height one. A typical one would have vertices the origin and the points
(with ) and (with where as above).The thinnest triangle made by the green dots has height two. It has width
(from the origin to the previous point doubled); the apex is the point which is doubled.
triangles_on
to see the green dot triangle and parallelogram outlined in red.
xxxxxxxxxx
def _(p=(5,[q for q in prime_range(200) if q%4==1]), triangles_on=False):
f=mod(factorial((p-1)/2),p)
viewsize=2*p
g(x,y)=x^2+y^2
plot1 = implicit_plot(g-p, (-viewsize,viewsize), (-viewsize,viewsize), plot_points = 100)
plot2 = implicit_plot(g-2*p, (-viewsize,viewsize), (-viewsize,viewsize), plot_points = 100)
plot3 = line([[0,0], [2*p-2*Integer(f),2], [2*p,0], [2*Integer(f),-2], [0,0]], rgbcolor=(1,0,0))
plot4 = line2d([[0,0], [2*p,0]], rgbcolor=(1,0,0), linestyle='--')
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j in [-viewsize..viewsize]]
plot_grid_pts = points(grid_pts,rgbcolor=(0,0,0),pointsize=2)
lattice_pts = [coords for coords in grid_pts if (coords[1]-f*coords[0])%p==0]
plot_lattice_pts = points(lattice_pts, rgbcolor = (0,0,1),pointsize=10)
plot_lattice_pts2 = points([[2*coords[0],2*coords[1]] for coords in lattice_pts], rgbcolor = (0,1,0),pointsize=20)
if triangles_on:
show(plot1+plot2+plot3+plot4 + plot_grid_pts + plot_lattice_pts+plot_lattice_pts2, xmin = -viewsize/2, xmax = viewsize, ymin = -viewsize/2, ymax = viewsize/2, aspect_ratio=1)
else:
show(plot1+plot2+plot_grid_pts + plot_lattice_pts + plot_lattice_pts2, xmin = -viewsize/2, xmax = viewsize, ymin = -viewsize/2, ymax = viewsize/2, aspect_ratio=1)
Subsection 13.4.4 Finishing the proof
Let's take stock.We've created circles of various sizes to find points in, and two lattices to examine.
The area of the circle is more than the area (
) of the smallest parallelogram made by green dots.
Because all points inside the parallelogram (not just green, blue, or lattice points) will βrepeatβ outside of it in another parallelogram,
is the biggest area of a region that you can have and not βrepeatβ some point. (This parallelogram is often called a fundamental region in more general treatments.)So, the interior of the circle, having a bigger area, must have two points (not necessarily blue points, just points on the plane) which are βrepeatedβ by translation of this parallelogram.
We start with the two points from Claim 13.4.11 in the disk bounded by the circle (points which are not necessarily on any lattice, blue, green, or even black).
Then we use elementary geometry to construct a blue point (namely, one of the form
) which is strictly in the interior of the disk bounded by the circle of radius In particular, this point is not the origin.
Fact 13.4.9.
Let
Proof.
We are not going to prove topological facts in this text, nor explore the further depths of lattices. So it suffices to note that every green point \((2a,2af+2bp)\) can serve as the leftmost vertex of a unique parallelogram not just congruent to, but translated from, \(L\text{,}\) and that by construction these cannot overlap (other than possibly along their edges).
Definition 13.4.10.
We say that two distinct points
Claim 13.4.11.
Consider the circle of radius
Proof.
Recall from Fact 13.4.9 that the disk is composed of all its intersections with different parallelograms congruent to \(L\text{.}\)
Suppose that there are not two points βrepeatedβ within the disk (not including the boundary circle). Then every point thereof is a translation of a different point of \(L\text{.}\) One can make this a one-to-one function from the disk to \(L\) by sending each point in the disk to the corresponding one in \(L\text{.}\)
Because each such move is rigid, this function is area-preservingβ3β, which means the area of the disk must be less than or equal to that of \(L\text{.}\)
However, at the end of Subsection 13.4.3 we asserted the opposite! So by way of contradiction we have our two points.
Claim 13.4.12.
Given two points
Proof.
Given how we defined βrepetitionβ, we know that the line segments from \(v\) and \(w\) to the leftmost vertex of their respective translations of \(L\) must themselves be rigid translations of each other, hence the line segment connecting \(v\) and \(w\) can be translated to a segment connecting the origin and another green point. Give this point the nameβ4β \(v-w\text{.}\)
Since \(v-w\) is of the form \((2a,2af+2bp)\) by definition, then the point halfway between it and the origin (or β\((v-w)/2\)β) is a blue point of the form \((a,af+bp)\text{,}\) and clearly not the origin since \(v-w\) itself is not the origin. It remains to show that this blue point is in the interior of the circle.
To see this, consider the distance \(d\) between \(v\) and \(w\text{.}\) By definition of a circle, it cannot possibly be further than twice the radius, so \(d\) is strictly less than \(2\sqrt{2p}\text{.}\) But then \(v-w\) cannot be more than \(d\) units from the origin, so the point \((a,af+bp)\text{,}\) being exactly half that distance from the origin, is less than distance \(\sqrt{2p}\) to the origin. By definition \((a,af+bp)\) is in the interior of the larger circle, as desired.
Example 13.4.13.
In Figure 13.4.14 we see the picture of how Claims 13.4.11 and 13.4.12 find the blue point in the circle. The black points are

Sage note 13.4.15. Examining code is good for you.
The next Sage cell makes Figure 13.4.14 interactive. But don't just use it to view the proof for other primes; examine the code itself.
This is by far the longest code we've seen up to this point. It is a brute force check of all movements of all points in the parallelogram to find two points in the bigger circle. Can you think of ways to make it more efficient?
xxxxxxxxxx
def _(p=(5,[q for q in prime_range(200) if q%4==1])):
f=Integer(mod(factorial((p-1)/2),p))
big = math.floor(math.sqrt(2*p))
viewsize=2*p
g(x,y)=x^2+y^2
plot1 = implicit_plot(g-p, (-viewsize,viewsize), (-viewsize,viewsize), plot_points = 100)
plot2 = implicit_plot(g-2*p, (-viewsize,viewsize), (-viewsize,viewsize), plot_points = 100)
plot3 = line([[0,0], [2*p-2*f,2], [2*p,0], [2*f,-2], [0,0]], rgbcolor=(1,0,0))
plot4 = line2d([[0,0],[2*p,0]], rgbcolor=(1,0,0), linestyle='--')
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j in [-viewsize..viewsize]]
plot_grid_pts = points(grid_pts,rgbcolor=(0,0,0),pointsize=2)
lattice_pts = [coords for coords in grid_pts if (coords[1]-f*coords[0])%p==0]
plot_lattice_pts = points(lattice_pts, rgbcolor = (0,0,1),pointsize=10)
big_lattice_pts = [[2*coords[0],2*coords[1]] for coords in lattice_pts]
plot_lattice_pts2 = points(big_lattice_pts, rgbcolor = (0,1,0),pointsize=20)
w = []
v = []
mw = []
for i in [1..2*p-1]:
for coords in [l for l in big_lattice_pts if l!=[0,0]]:
if (i+coords[0])^2+(coords[1]-1)^2 < 2*p:
for coords2 in [k for k in big_lattice_pts if k!=[0,0] and k!=coords]:
if (i+coords2[0])^2 + (coords2[1]-1)^2 < 2*p:
w = [i+coords[0],coords[1]-1]
v = [i+coords2[0],coords2[1]-1]
vmw = [v[0]-w[0],v[1]-w[1]]
break
if w: break
if w: break
if not v:
for i in [j for j in [f..p+f]]:
for coords in [l for l in big_lattice_pts if l!=[0,0]]:
if (i+coords[0])^2+(coords[1]-1)^2 < 2*p:
for coords2 in [k for k in big_lattice_pts if k!=[0,0] and k!=coords]:
if (i+coords2[0])^2 + (coords2[1]-1)^2 < 2*p:
w = [i+coords[0],coords[1]-1]
v = [i+coords2[0],coords2[1]-1]
vmw = [v[0]-w[0],v[1]-w[1]]
break
if w: break
if w: break
if not v:
for i in [j for j in [p-f..2*p-f]]:
for coords in [l for l in big_lattice_pts if l!=[0,0]]:
if (i+coords[0])^2+(coords[1]+1)^2 < 2*p:
for coords2 in [k for k in big_lattice_pts if k!=[0,0] and k!=coords]:
if (i+coords2[0])^2+(coords2[1]+1)^2 < 2*p:
w = [i+coords[0],coords[1]+1]
v = [i+coords2[0],coords2[1]+1]
vmw = [v[0]-w[0],v[1]-w[1]]
break
if w: break
if w: break
P1=point(v,pointsize=20,rgbcolor=(0,0,0))
P2=point(w,pointsize=20,rgbcolor=(0,0,0))
Z=point(vmw,pointsize=20,rgbcolor=(0,0,0))
plot4 = arrow(w,v,rgbcolor=(0,0,0), thickness=1, linestyle='--', arrowsize=3)
plot5 = arrow((0,0),vmw,rgbcolor=(0,0,0), thickness=1, linestyle='--', arrowsize=3)
plot6 = point((vmw[0]/2,vmw[1]/2),pointsize=30)
show(plot1+plot2+plot3+plot4 + P1+P2+Z+plot4+plot5+plot6 + plot_grid_pts + plot_lattice_pts + plot_lattice_pts2, figsize = [5,5], xmin = -viewsize/2, xmax = viewsize, ymin = -viewsize/2, ymax = viewsize/2, aspect_ratio=1)
First, we are trying to prove something about squares by proving something about square roots. It works, but it means there will be many steps.
Secondly, we are not just algebraically proving it exists by solving an equation; we are forced to prove our square root exists with inequalities, which brings another set of complications.
Third, we chose to examine those inequalities geometrically to gain insight, so our proofs must use that insight β worthwhile, but stretching.
Historical remark 13.4.16. Hermann Minkowski.
Many more theorems of this kind, such as Lagrange's four square theorem, can be proved using similar techniques, which we are intentionally avoiding stating in their full generality. The names of Minkowski and Blichfeldt are associated with theorems using various symmetries and the notion of convexity in order to apply things more generally. Those who have had some physics may have heard of Minkowski before, as his work nearly beat Einstein to the notion of special relativity; his geometric framework for space-time gave Einstein the necessary apparatus to generalize to curved spacetime and general relativity.