set theory, axioms, Zermelo, Fraenkel, Frankel, infinity, Cantor, Frege, Russell, paradox, formal, axiomatic, Russell paradox, axiom, axiomatic set theory, comprehension, axiom of infinity, ZF, ZFC

Back to title page.

## 2. Axiomatic Set Theory

For a general overview and set theory links, see Set Theory by Thomas Jech in Stanford Encyclopedia of Philosophy.

### 2.1. The Origin of Cantor's Set Theory

In the dates and facts of the real history I am following the excellent books by Fyodor Andreevich Medvedev (1923–1993):

F. A. Medvedev. Development of Set Theory in the XIX Century. Nauka Publishers, Moscow, 1965, 350 pp. (in Russian)

F. A. Medvedev. The Early History of the Axiom of Choice. Nauka Publishers, Moscow, 1982, 304 pp. (in Russian)

Online paper "A history of set theory" in the MacTutor History of Mathematics archive.

A. Kanamori. Set Theory from Cantor to Cohen, Bulletin of Symbolic Logic, 1996, N2, pp.1-71 (online text at http://math.bu.edu/people/aki/cancoh.ps).

In XIX century, development process of the most basic mathematical notions led to the intuition of arbitrary infinite sets. Principles of the past mathematical thinking were developed up to their logical limits. Georg Cantor did the last step in this process, and this step was inspired by a specific mathematical problem.

G. Cantor. Ueber die Ausdehnung eines Satzes aus der Theorie der trigonometrischen Reihen. "Mathematische Annalen", 1872, Vol. 5, pp. 123-132 (available at http://www.maths.tcd.ie/pub/HistMath/People/Cantor/Ausdehnung/, see also online comments by Stanley N. Burris, and a photocopy at Goettinger Digitalisierungs-Zentrum)

In this 1872 paper Cantor proved the following theorem: if two Fourier series are known converging to identical limit values at all but a finite number of points of the segment [-pi, pi], then these series (i.e. their respective coefficients) are identical. How far this theorem could be extended?

Cantor started with the simplest kinds of infinite sets of exception points, for example:

{1/n | n ≥ 1}.

This set is infinite, it possess only one the so-called condensation point (namely, x=0). Cantor succeeded in proving that his theorem holds, if the set of exceptions possess only one condensation point. The generalization to any finite number of condensation points was straightforward.

The next step would be considering sets having infinitely many condensation points. The most simple kind of such sets has exactly one "second order" condensation point, i.e. a condensation point for the usual ("first order") condensation points, for example:

{1/m + 1/n | m, n ≥ 1}

Cantor succeeded in proving his theorem for such sets of exceptions, too. The following step would involve the "third order" condensation points etc. In this way Cantor was forced to work with sets of points of rapidly growing complexity. Thus, gradually, the intuitive notion of arbitrary infinite sets of points began to take shape...

To bring some order into this process Cantor introduced the notion of the derivative set: if P is a set of points, then P' denotes the set of all condensation points of P. Further one can define P'' as (P')', P''' - as (P'')' etc.

Exercise 2.1. For any fixed k>=1, let us consider the set

Q(k) = {1/n1 +... + 1/nk | n1,... , nk ≥ 1}.

Prove that the k-th derivative set Qk(k) = {0}.

Cantor successfully generalized his Fourier series uniqueness theorem for exception sets of any finite order (i.e. for sets P having a finite Pk for some k).

In this way, investigating Fourier series, and having built a plentiful collection of complicated infinite sets of points, Cantor came to the intuition of arbitrary infinite sets. At some moment he arrived to the question: have all infinite sets equal "number" of members?

This critical point was reached in the fall of 1873. Cantor's letter to Richard Dedekind from September 29, 1873 contains the surprising one-to-one correspondence between natural and positive rational numbers:

1     2     3     4     5     6     7     8     9   …
1/1, 1/2, 2/1, 1/3, 3/1, 1/4, 2/3, 3/2, 4/1,...
---   --------   --------   ------------------
2        3            4                5  …

The sequence starts with the only fraction m/n such that m+n=2, after that follow: two fractions having m+n=3, two fractions having m+n=4 (2/2 is omitted, it equals to 1/1 that was already encountered) etc.

In other words, the (dense!) set of all rational numbers possess the same "number of members" as the (discrete!) set of all natural numbers! As the next step Cantor proposed to try enumerating of all real numbers, i.e. all the points of a straight line. In his reply, Dedekind pointed out that also the set of all algebraic numbers can be enumerated by using natural numbers. Still, he did not succeed in enumerating all the real numbers...

In his next letter to Dedekind (December 7, 1873) Cantor proved that this is not possible: there is no one-to-one correspondence between real numbers and natural numbers. Cantor's proof involves the method that is called now the diagonal method (perhaps, first used for another purpose by P. du Bois-Reymond in 1871).

Namely, assume that we have some segment [a, b] and some "enumerating" sequence of real numbers r1, r2,... , rn,.... Divide [a, b] into three equal parts, and take the part that does not contain the number r1. Denote this part by [a1, b1], divide it again into three equal parts, and take the part that does not contain the number r2, etc. This process produces a sequence of contracting segments:

a1 ≤ a2 ≤ a3 ≤ ... ≤ b3 ≤ b2 ≤ b1.

The only common point of these segments is some real number r that does not belong to the "counting" sequence r1, r2,... , rn,.... Hence, you cannot enumerate all real numbers (even a tiny segment of them!) by using natural numbers.

Thus there are at least two kinds of infinite sets: the so-called countable sets (that can be enumerated by using natural numbers), and some "more strongly infinite" sets that cannot be enumerated. This discovery by Georg Cantor is the most significant event in the history of the mathematical investigation of infinity. Still, at the moment of the discovery, for Cantor himself the following corollary was even more significant: "most" real numbers are transcendent. (Indeed, all the algebraic numbers can be enumerated, but the transcendent ones cannot.) The above Cantor's proof is simple enough to be followed by children (unlike the 1844 construction of particular transcendent numbers by J. Liouville, and the 1873 proof by Ch. Hermite that e is transcendent). A striking example of the power of non-constructive reasoning: sometimes, it is much easier to prove that "most" objects possess a property, than to construct or identify at least one such object!

These first set theoretical results Cantor published in 1874:

G.Cantor. Ueber eine Eigenschaft des Inbegriffes aller reelen algebraischen Zahlen, "J. reine und angew. Math.", 1874, Vol. 77, pp, 258-262 (see also comments by Stanley N. Burris)

Exercise 2.2. Construct one-to-one correspondences between: a) two segments (of different length), b) between the segment [a, b] (i.e. the set {x | a≤x≤b}) and the interval (a, b) (i.e. the set {x |a<x<b}), c) between an interval and the entire set of all real numbers.

Having discovered the existence of at least two radically different types of infinite sets, Cantor went further. In a letter to Dedekind (January 5, 1874) he asks: is a two-dimensional continuum (for example, a rectangle) equivalent to a one-dimensional continuum (for example, a segment)? In other words: does a rectangle contain more points than a straight-line segment? Cantor conjectured "yes", and spent the following three years trying to prove that a rectangle contains more points than a segment. He did not succeed. Only after he decided to explore the "unrealistic" opposite hypothesis (rectangles and segments contain equal numbers of points), he succeeded almost immediately! His proof (first exposed in a letter to Dedekind from July 20, 1877) is simple enough to be followed by children:

A one-to-one correspondence between the rectangle [0, 1)x[0, 1) and the segment [0, 1] can be produced easily from the decimal fractions of coordinates:

(x, y) -> z

x = 0.abcd...

y = 0.ABCD...

z = 0.aAbBcCdD...

Exercise 2.3. Fill in the gaps and complete this proof (published in 1878):

G. Cantor. Ein Beitrag zur Mannigfaltigkeitslehre. "J. reine und angew. Math.", 1878, Vol. 84, pp. 242-258 (see also comments by Stanley N. Burris and a photocopy at Goettinger Digitalisierungs-Zentrum)

Cantor thought that his simple proof has "destroyed" the notion of dimensionality. Replying to this, Dedekind pointed out that Cantor's correspondence seems to be discontinuous, and conjectured that continuous correspondences between continuums of different dimensionalities would be impossible. Still, G. Peano in 1890 and D. Hilbert in 1891 succeeded in producing continuous (yet not one-to-one!) mappings from a segment onto a rectangle. And only in 1911 L. Brouwer "saved" the notion of dimensionality by proving that Dedekind was right: a continuous one-to-one correspondence between continuums of different dimensionalities is impossible:

L. Brouwer. Beweis der Invarianz der Dimensionenzahl. "Math. Ann.", 1911, Vol. 70, pp. 161-165.

Cantor's remarkable paper of 1878 contains another famous statement - the so-called continuum hypothesis. Working hard with various infinite sets of points Cantor established that all infinite sets he could produce, fall into two categories:

a) the so-called countable sets (i.e. the sets that can be enumerated by using natural numbers),

b) the sets that are equivalent to the entire continuum (i.e. the set of all real numbers).

Cantor was unable to produce sets of "intermediate power", i.e. uncountable sets of points that were not equivalent to the entire continuum. This is why he conjectured that such sets do not exist. This conjecture is known as the continuum hypothesis: each infinite set of points either is countable, or it is equivalent to the entire continuum.

Cantor spent many years trying to prove the continuum hypothesis. For this purpose, he developed further his apparatus of derivative sets Pk. He introduced condensation points of infinite order, i.e. points that are simultaneously condensation points of any finite order, and defined the infinite derivative set Pω ("P omega") as the intersection of P', P'', P''' etc. After this, he introduced the next derivatives:

Pω+1 = (Pω)', Pω+2 = (Pω+1)',...

Pω*2 = intersection of Pω+k for all finite k,

Pω*2+1,... , Pω*3, Pω*3+1,...

Pω*ω, Pω*ω+1,...

Exercise 2.4. Try to produce a set P such that Pω = {0}, such that Pω*2 = {0} etc.

In this way Cantor arrived to a remarkable extension of the natural number system, the so-called transfinite ordinal numbers. These numbers extend the usual finite counting process:

Finite (i.e. natural) numbers are called first class ordinal numbers.

ω (omega, the first transfinite ordinal number) follows after all finite numbers,

ω+1 follows immediately after ω,

ω+2 follows immediately after ω+1,

...

ω*2 is defined as ω+ω, i.e. it follows after all ω+k where k is finite,

ω*3 is defined as ω*2+ω, i.e. it follows after all ω*2+k where k is finite,

...

ω2 is defined as ω*ω, i.e. it follows after all ω*k where k is finite, etc.

...

ωω follows after all ωk where k is finite, etc.

...

ε0 (epsilon 0) follows after all expressions built up of natural numbers and w by using addition, multiplication and exponentiation,

...

Infinite numbers having a countable set of predecessors are called second class ordinal numbers.

ω1 (omega 1) follows after all second-class ordinal numbers, i.e. it is the least third class ordinal number.

...

Cantor proved that the set of all second-class ordinal numbers O2 is the least uncountable set, i.e. if some infinite subset S of O2 is not equivalent to the entire O2, then S is countable. Thus, to prove the continuum hypothesis, it was sufficient to prove that O2 is equivalent to the entire continuum, i.e. "simply" to produce a one-to-one correspondence between the second class ordinal numbers and the real numbers. Still, Cantor and all his numerous followers failed to do this...

Thus, in some sense, the continuum hypothesis represents one of the most beautiful problems in mathematics: it can be explained to children, yet it still remains unsolved for 130 years!

Cantor was already tired of many years of failed attempts to prove the continuum hypothesis, when he received another blow. In 1895 he discovered that his set theory may lead to contradictions...

Still, all these difficulties cannot change the "verdict of the history": Georg Cantor remains one of the most outstanding personalities in the history of mathematics. He succeeded in developing the principles of the past mathematical thinking up to their logical limits.

### 2.2 Formalization of Cantor's Inconsistent Set Theory

First let us formalize the set theory directly as G. Cantor created it. This theory is based on the intuitive concept of "a world of sets" where all sets (finite and infinite) and all their members exist simultaneously and completely. In our axioms we would like to describe the governing laws of this frozen "world of sets".

At the very beginning we must answer the following question: can our "world" consist of sets only? A set can consist of members, which are sets. If the set x1 contains a member x2, which is also a set, if x2 contains x3, x3 contains x4 etc. - then, following along this chain must we not encounter something more tangible than "sets of sets"? If nothing tangible exists, how can sets exist? If nothing exists, then the world is ... an empty set o (i.e. from nothing we obtain something). Having o we can build the set which contains o as a member - the set {o}. Having o and {o} we can build a two-element set {o, {o}} etc.:

x0 = o; x1 = {o}; x2 = {o, {o}}; ...; xn+1 = xn U {xn}; ...

Therefore, even postulating that "nothing exists" we can obtain infinitely many different sets (compare Devlin [1977]).

Now, as the first step, let us define the language of the formal set theory. We will use only one sort of variables: x, y, z, ... - subscripted or not. Intuitively, the range of each variable consists of "all possible sets" (since we have decided that the "world of sets" consists from sets only). We do not introduce constants (like as o to denote the empty set) and function symbols (like as xUy to denote union of sets). Later we will see how to do without these symbols. We introduce two predicate symbols: "in" (intuitively, "x in y" means "x is a member of y"), and "=" (intuitively, "x=y" means "x is the same set as y"). We can combine atomic formulas like as "x in y" and "x=y" by using connector and quantifier symbols, thus obtaining formulas of set theory. For example, the formula Ay ~(y in x) says that x is empty set, and the formula Ey(y in x)&Az(z in x -> z=y)) says that x possess exactly one member.

Exercise 2.4a. Provide formulas expressing the following assertions:

"x possess exactly two members",

"z consists of two members – x and y",

"x is a subset of y (i.e. all the members of x are members of y)",

“y consists of all subsets of x”

"x and y do not intersect (i.e. x and y do not possess common members)",

"x is an infinite set" (hint: try saying something about some of proper subsets of x).

As the second step, as any serious formal theory, set theory adopts the axioms and inference rules of the classical logic. For example, the following formulas in the language of set theory we will use as logical axioms:

x in y -> (u=z -> x in y) - an instance of the axiom schema L1: B->(C->B);

x=x V ~(x=x) - an instance of the axiom schema L11: B V ~B;

x=x -> Ex(x=x) - an instance of the axiom schema L13: F(t) -> ExF(x).

After this, as the third step, we must introduce the specific axioms of set theory.

First, let us introduce the axioms defining the specific meaning of the identity predicate in the set theory: x=y means that the sets x and y have the same members. It may seem obvious, yet it requires special axioms - you cannot derive this meaning from pure logic. If the sets x and y have different definitions, and after some effort we could establish that Az(z in x <-> z in y), then we can conclude that x=y only by using the specific meaning of identity adopted in set theory. Therefore, to define the specific meaning of set identity we must introduce the following axioms:

x=y <-> Az (z in x <-> z in y), --------------- (C1)

x=y -> Az (x in z <-> y in z). ----------------- (C1')

The axiom C1 is called the Extensionality Axiom (it says that in our set theory the identity of sets is treated extensionally, not intensionally, i.e. two different set definitions can lead to identical sets).

Exercise 2.4b. Verify the following properties of set identity: a) x=x (reflexivity), x=y -> y=x (symmetry), x=y & y=z -> x=z (transitivity), d) F(x,x) & x=y -> F(x,y) (substitution of an equal). Here, F is any formula, the designation F(x, x) means that the occurrences of the free variable x are split into two groups. In F(x, y), the occurrences of the second group have been replaced by y (which equals to x). (Hint: use induction by the structure of F.)

Since we adopted the classical logic for our "world of sets", we can prove the formula Ex(x=x), i.e. that at least one set exists in our "world" (it follows from the logical axiom F(x)->ExF(x), hence, x=x -> Ex(x=x)). Still, pure logic does not allow to conclude something about the properties of this set x. To obtain sets having specific properties (for example, the empty set) we must introduce additional axioms.

The main principle of Cantor's set theory says that a set is a "many" of which we can think as of a single "whole":

"Unter einer Mannigfaltigkeit oder Menge verstehe ich naemlich allgemein jedes Viele, welches sich als Eines denken laesst, d.h. jeden Inbegriff bestimmter Elemente, welcher durch ein Gesetz zu einem ganzen verbunden werden kann..." (Cantor [1883]).

Or:

"Unter einer 'Menge' verstehen wir jede Zusammenfassung M von bestimmten wohlunterschiedenen Objecten m unsrer Anschauung oder unseres Denkens (welche die 'Elemente' von M genannt werden) zu einem Ganzen" (Cantor [1895-1897]).

"A set is a collection into a whole of definite distinct objects of our intuition or of our thought. The objects are called the elements (members) of the set.'' (An English translation by Fraenkel, Bar-Hillel, Levy [1973]).

How could we think of many things as of a whole? One of the ways would be to define the combination of properties which allows to separate things that belong to the whole and that do not belong to it. Now, the corresponding notation is used widely in mathematics, for example, c = { x | C(x) }, where C(x) says that "x is a crocodile". Mathematicians say that c is the "set of all crocodiles" and may operate with c as with a single object.

In the modern language we formulate Cantor's principle as the Comprehension Axiom Schema. Let F(y) be a formula of set theory (it may contain additional free variables, let us call them parameters). We may think of F(y) as of an assertion "the set y possess the property F". Hence, according to the Cantor's principle we can introduce the set x of all y's possessing the property F: x= { y | F(y) }. To legalize operations like this one we must adopt the following axiom schema:

ExAy (y in x ↔ F(y)) -------------- (C2[F])

(of course, formula F does not contain x). For each formula F we have a separate comprehension axiom C2[F].

In particular, the axiom C2[~(y=y)] allows to prove the existence of the empty set:

ExAy (y in x ↔ ~(y=y)).

Hence, ExAy ~(y in x), i.e. "there is a set x that is empty".

Let's prove the existence of the set {o}, i.e. of x such that Ay(y un x ↔ y=o). It follows from the axiom C2[y=o], or, more precisely, C2[Az~(z in y)].

Exercise 2.5. Using appropriate comprehension axioms: a) prove the existence of the following sets (o is the empty set): {o, {o}}; {o, {o}, {o, {o}}}; b) prove that the complement of a set, difference of two sets, intersection and union of any collection of sets ("set of sets") is a set.

The axioms C1, C1' and C2[F] (for all formulas F(y) that do not contain x, but may contain other parameters) and the axiom of choice define a formal set theory C which corresponds almost 100% to Cantor's intuitive set theory (of the "pre-paradox" period of 1873-94).

Cantor and the Axiom of Choice

Of course, in 1873-94 Cantor believed in the unrestricted comprehension principle. Still, did Cantor really accept a kind of axiom of choice as a valid principle of set theory? Let us check his two main papers on foundations of set theory:

G.Cantor. Grundlagen einer allgemeinen Mannigfaltigkeitslehre. "Math. Annalen", 1883, Vol.21, pp.545-586

G.Cantor. Beitraege zur Begruendung der transfiniten Mengenlehre. "Math. Annalen", 1895, Vol.46, pp.481-512; 1897, Vol.49, pp.207-246 (see also photocopies at Goettinger Digitalisierungs-Zentrum).

I am using the book by F.A.Medvedev "The early history of the axiom of choice" (Medvedev [1982]) that contains extensive relevant quotes from Cantor's works and Zermelo's comments.

Two main conclusions:

1. Cantor is using ad hoc - as he needs, and tens of times! - the "possibility of arbitrary choices" without any attempt to formulate something like a general axiom of choice. For example, when proving that each infinite set contains a countable subset (a quote from the 1895-97 paper):

"If we have some rule of deleting of elements t1, t2, ..., tn-1 from [an infinite set - K.P.] T, then always there is a possibility to delete one more element tn..."

2. Cantor believed in "validity" of some assertions that are equivalent (or almost equivalent) to the axiom of choice. For example, in the 1883 paper he qualifies the thesis "each well defined set can be well ordered" as a "remarkable generally valid law of thought" and promises to consider it in one of subsequent papers. Still, he never did, and in the 1895-97 paper this thesis does not appear at all. Because of the "smell" of paradoxes?

Today, we can guess, did Cantor's thesis "each well defined set can be well ordered" mean that all sets can be well ordered (this would be equivalent to the axiom of choice, see below)? Or, maybe, Cantor intended (already in 1883?) to distinguish between "well defined" and "ill defined" sets? This would mean that Cantor believed only that "each constructible set can be well ordered" (a theorem proved by K.Goedel in 1938, see Section 2.4 below).

Extending the Exercise 2.5 by introducing the notions of subset, "the set of all subsets of the set x", relations, functions, etc., we could derive from our formal set theory C (i.e. from a very simple set of axioms!) all the common mathematics. A very remarkable fact - 100% of mathematics can be derived from an extremely simple set of axioms!

Note. At the time of Cantor and Frege, the axioms C1, C1' and C2 were regarded as "pure logical" ones. Thus, the reduction of mathematics to these axioms could be interpreted as a reduction of mathematics to logic (the so-called logicism, for details, see Logicism by Wikipedia, and Logicism by R.B.Jones).

Surprisingly,... these axioms are also sufficient... to derive a contradiction!

A very simple way how to do this was invented by Bertrand Russell in 1901, and is now called Russell's paradox - first published in

B. Russell. Principles of Mathematics. Cambridge University Press, Cambridge, 1903.

A detailed reconstruction: How Bertrand Russell discovered the “Russell Paradox” by Paul Elliott.

About the history and significance of Russell' paradox see the online article Russell's Paradox in Stanford Encyclopedia of Philosophy.

In terms of our formal set theory C Russell's paradox can be derived as follows.

Normally, sets are not members of themselves, i.e. normally, ~(y in y), for example, ~(o in o). Still, the axioms of C do not exclude the existence of "abnormal" sets, which are members of themselves. Hence, let us consider the set of all "normal" sets: x = { y | ~(y in y) }. The existence of this x is guarantied by the comprehension axiom C2[~(y in y)]:

ExAy (y in x <-> ~(y in y)).

Now substitute x for y, and you will have a contradiction:

Ex (x in x <-> ~(x in x)).

Hence, unexpectedly, some of the comprehension axioms lead to contradictions. Cantor's set theory C is inconsistent. The unrestricted comprehension axiom scheme cannot serve as a foundation of set theory!

The first paradox in Cantor's set theory was discovered already in 1895 by Cantor himself. In 1897, Cesare Burali-Forti discovered - and published another paradox. These paradoxes were more complicated than Russell's paradox, yet much easier to discover! See

Cantor did not publish about the problem, he only communicated about it with David Hilbert. Hilbert proposed his own simpler version of a paradox, and, after this, around 1900, Hilbert's Goettingen collaborator Ernst Zermelo made one more step - he simplified Hilbert's paradox, thus, in fact, inventing Russell's paradox before Russell! Still, this discovery remained "Goettingen folklore" until Russell's publication in 1903.

For details of the story, see

V. Peckhaus, R. Kahle. Hilbert's Paradox. "Historia Mathematica", 2002, vol.29, N2, pp.157-175.

Note. In the above text the terms "invented" and "discovered" were used as synonyms. Mathematics is the only branch of science for which this is true (see Section 1.2).

Today, 100 years after, perhaps, the discovery of paradoxes in mathematics would not be perceived as a catastrophe. Still, for G. Cantor (1845 - 1918) and G. Frege (1848 - 1925), who started to believe in the unrestricted comprehension schema in 1870's and lived with this absolute belief for more than 20 years, the discovery of paradoxes was - at least for Frege - a kind of a personal tragedy. For Georg Cantor - his set theory, and for Gottlob Frege - his formal system were their main contributions to mathematics.

The solution was found by mathematicians of the next generation.

### 2.3. Zermelo-Fraenkel Axioms

G.Cantor did not think that the paradoxes discovered in his set theory are real contradictions (because of his platonist background: for him, investigating set theory meant exploring "the world of sets"). Instead of revising the foundations of set theory, he started ad hoc separating "finished sets" / "non-finished sets", "transfinite sets" / "absolutely infinite sets", and finally - in 1899 - "consistent multiplicities" / "inconsistent multiplicities" (see Peckhaus, Kahle [2002]). In modern terms, this would mean that Cantor proposed to restrict the comprehension axiom schema to those instances of it that do not lead to contradictions.

Ernst Zermelo proposed a more successful idea. He proposed to restrict the comprehension axiom schema by adopting only of those instances of it, which are really necessary for reconstruction of common mathematics. In 1908 Zermelo published his account:

E. Zermelo. Untersuchungen ueber die Grundlagen der Mengenlehre. "Math. Annalen", 1908, Vol. 65, pp. 261-281 (see English translation in Heijenoort [1967], (see also comments by Stanley N. Burris)

We will consider (a modern equivalent of) Zermelo's axioms some time later (for Zermelo's original system see the above online document). Of course, you will not find the axiom C2[~(y in y)] among them, since such a kind of reasoning is not used in common mathematics.

Sets and Classes

Still, how to handle the situation, when we have some formula F(y), we use it to collect the sets y having the property F, yet trying to treat this collection as a set, we obtain a contradiction? For example, what to do with the Russell's collection R = {y | ~(y in y)}? The empty set o belongs to it: ~(o in o), hence, o in R. Still, if you will consider the collection R as a set and denote it by r, then you will have the Russell's paradox: "r in r" will be equivalent to ~(r in r). How to solve this problem? Let us try applying the well-known method due to King Solomon: let us consider Russell's paradox as the proof that Russell's collection is not a set!

Using this approach we must legalize some collections of sets that are not sets. Let F(y, z1, ..., zn) be a formula in the language of set theory (z1, ..., zn are optional parameters). Then let us say that for any fixed values of z1, ..., zn the formula F defines a class

A = {y | F(y, z1, ..., zn)},

i.e. the class of all y's possessing the property F. In general, different values of z1, ..., zn yield a different class. For example, the class { y | ~(y=z1)} consists of all sets except the set z1, i.e. it depends on the parameter z1. On the other hand, the class

V = {y | y=y}

consists of all sets, and does not depend on parameters. The class V-{x} = {y | ~(y=x)} depends on the parameter x.

Each set is a class: the set x coincides with the class {y | y in x}. In its general (inconsistent!) form, the comprehension axiom schema says that all classes are sets:

ExAy(y in x <-> F(y, z1, ..., zn)).

Still, "in fact", some classes are not sets. For example, the Russell's class R = {y | ~(y in y)} cannot be a set: if R=x, then "y in x" is equivalent to ~(y in y), and by setting y=x we obtain a contradiction. Classes that are not sets we will call proper classes. Now we can say simply that each paradox of set theory "generates" some proper class.

We will denote classes by capital letters: A, B, C, ... (small letters a, b, c, ... are variables of the language of set theory, i.e. they denote sets). This notation must remind to us of the metaphoric character of the following "formulas":

y in A, A=B, A≤B, A∩B, AUB, A−B.

These "formulas" do not belong to the language of set theory, which does not contain "capital" variables. They are used merely as a convenient notation for the following ("proper") formulas:

F(y), Ay(F(y)<->G(y)), Ay(F(y)->G(y)), F(y)&G(y), F(y)VG(y), F(y)&~G(y),

where the formula F defines the class A, and the formula G defines B.

Now we can start formulating the axioms of set theory as Zermelo proposed them. We will use the same formal language of set theory introduced in the previous section, and the same axioms and inference rules of the classical logic. As the first axioms we adopt the same extensionality axiom C1 and the axiom C1'. But, following that, we will adopt only those comprehension axioms that are really used in mathematics for building of "useful" sets.

Separation Axiom Schema

Perhaps, the simplest way to obtain new sets would be separation: having a set x separate those members of x that possess some common property F. For example, in this way the set of all prime numbers is obtained from the set of all natural numbers. In general, the situation could be represented as follows:

|------------|-----------------y in x------------------|---------------------|
|-----------------------------------|-------------F(y)-------------|--------|
|-----------------------------------|------y in z------|---------------------|

The condition F(y) is any formula of set theory (it may contain parameters z1, ..., zn). Using this condition we separate those members of the set x that satisfy the condition F. The members separated make up a new set denoted by z.

To legalize this way of reasoning we must introduce the following Separation Axiom Schema: if F(y, z1, ..., zn) is any formula that contains free variables y, z1, ..., zn, but does not contain x and z, then the following formula is declared to be an axiom:

EzAy(y in z <-> y in x & F(y, z1, ..., zn)). --------------- (C21[F])

Of course, this schema is a part of the general comprehension schema, namely, it is C2[y in x & F(y, z1, ..., zn)].

An alternative, extremely convenient form of the separation schema can be obtained by using the notion of classes: the formula F defines a class A, hence, the axiom C21[F] says that the intersection A∩x (of the class A and the set x) is a set: A∩x = z.

Now we can prove the existence of the empty set, i.e the formula EzAy ~(y in z). Indeed, from the logical axioms we know that some sets exist, let z0 be a set. Then, using the impossible condition ~(y=y) and the axiom C21[~(y=y)] we obtain a set x such that

Ay(y in x <-> y in z0 & ~(y=y)).

hence, Ay ~(y in x). Q.E.D.

On the other hand, the Axiom of Extensionality implies that there is only one empty set. Indeed, if the sets x1, x2 both are empty, i.e. Ay ~(y in x1) and Ay ~(y in x2), hence, Ay(y in x1 <-> y in x2) and (by C1) x1=x2.

Note. Do not underestimate this simple invention - the empty set. You may think safely, that the empty set is not a set. Still, this would make the treatment of, for example, set intersection very uneasy - it would be sometimes a set, and sometimes - empty!

It would be nice to denote the empty set, for example, by o, yet our language of set theory does not contain constants. If we would introduce one, this would not solve the problem completely, because after the first constant we may wish to introduce the second one etc. Therefore, it would be better to do without any constants at all. Let us see how this can be achieved.

If we wish to use the empty set constant o in our reasoning legally, we must provide a method allowing to translate our "o-extended" formulas into the pure language of set theory that does not contain o. This can be done easily. Indeed, if we assert that the empty set o possess some property expressed by a formula F(x), i.e. we use the formula F(o), then this assertion can be expressed also by the formula

Ax(Ay ~(y in x) -> F(x)),

i.e. if x is empty set, then x possess the property F. This formula does not contain the constant o. Hence, we can use the constant o in our reasoning safely. If needed, we can reformulate any such reasoning by using (more complicated) formulas that do not contain o.

The above approach can be generalized. Let us assume that we have proved the existence and uniqueness of the set satisfying some formula T(x). I.e. we have proved the following two formulas:

Ex T(x),

Ax1Ax2 ( T(x1) & T(x2) -> x1=x2 ).

Then we may introduce a constant t denoting the above-mentioned unique set, and use it in our reasoning safely. Any assertion like as F(t) (i.e. "t possess the property F") we can reformulate without using of t: Ax(T(x) -> F(x)).

Still, the formula T could contain parameters, for example, T(x, z1, z2). If we have proved that for each pair of z1 and z2 there is a unique x such that T(x, z1, z2), then T defines some operation. We may wish to introduce a specific symbol # denoting this operation, and use expressions like z1 # z2 in our reasoning. Then x=z1# z2 will be just another record of the formula T(x, z1 , z2). This can be done safely, since for example, the assertion F(z1 # z2) we can reformulate as Ax(T(x, z1, z2) -> F(x)).

The possibility of safe introducing additional constants and operation symbols is widely used in semi-formal reasoning of set theory.

Note. The above explanation somewhat simplifies the problem. For full treatment see E. Mendelson [1997].

Exercise 2.6. a) Use appropriate separation axioms to prove that for each pair of sets x1, x2 the intersection x1x2 and the difference x1−x2 also are sets. Justify the usage of both operation symbols.

b) Prove that if A≤B, and A is a proper class, then B also is a proper class. Hence, the class of all sets V = {y | y=y} is a proper class (indeed, R≤V, where R is Russell's class).

The separation schema allows only obtaining smaller sets from larger ones. This means that without additional axioms we would be able to prove only the existence of the "smallest" set - the empty set. So, let us adopt some "enlarging" axioms, too.

Pairing Axiom

As the first "positive" axiom let us consider the Pairing Axiom. If we have two sets x1 and x2, then we can build a new set containing x1 and x2 as its only members. We will denote this set by {x1, x2} (if x1=x2, then we will write simply {x1}). To make this way of reasoning legal, we must adopt as an axiom the following formula:

Ax1Ax2EzAy ( y in z <-> y=x1 V y=x2). --------- (C22)

Of course, C22 is a comprehension axiom, namely, C2[y=x1 V y=x2]. In terms of classes: the Pairing Axiom says that if x1 and x2 are sets, then the class {x1, x2} also is a set.

The set {x1, x2} represents the so-called unordered pair. How to introduce in our theory the notion of ordered pair, which is important, for example, as a base for the notions of relation and function? We will denote the ordered pair of x1 and x2 by (x1, x2).

In his 1914 paper

N.Wiener. A simplification of the logic of relations. "Proc. of the Cambridge Philos. Soc.", 1914, vol. 17, pp.387-390

Norbert Wiener proposed to define (x1, x2) as a combination of unordered pairs, where the first member is marked by adding the empty set:

(x1, x2) = { {{x1}, o}, {{ x2}}}.

After this, in the paper

K.Kuratowski. Sur la notion d'ordre dans la theorie des ensembles. "Fund. Math.", 1921, Vol. 2, pp.161-171

Kazimierz Kuratowski proposed an even simpler elegant method of deriving (x1, x2) from {x1, x2}:

(x1, x2) = {{x1}, {x1, x2}}.

Intuitively, x1 and x2 have different positions in the right hand side expression, i.e. they seem to be "ordered".

Note. Do not underestimate this simple invention - it greatly simplifies the language and the axioms of set theory! Without it, we would need an additional construct (x, y) in the language of set theory, and, correspondingly, additional axioms to define the properties of this construct.

Exercise 2.7. Justify the definition of (x1, x2) by Wiener and Kuratowski by proving that

(x1, x2) = (y1, y2) -> x1=y1 & x2=y2.

And, particularly, if x1 and x2 are different, then (x1, x2) and (x2, x1) also are different.

Using the notion of ordered pairs we can introduce the notions of Cartesian product, relation and function - simply as specific kinds of classes and sets.

The Cartesian product of classes A, B is defined as follows:

AxB = { (u, v) | u in A & v in B },

or, more precisely:

AxB = { z | EuEv(u in A & v in B & z=(u, v) }.

If A, B are sets, then the Cartesian product AxB also will be a set? Sorry, to legalize this conclusion we must wait for additional axioms.

Relations are defined as classes that consist only of ordered pairs. I.e. the class Q={ y | F(y) } will be called a relation, if and only if

Ay( y in Q -> EuEv y=(u, v) ).

Each formula F1(u, v) having two free variables defines a relation:

Q = { (u, v) | F1(u, v) }.

And conversely, for each relation Q we can build a formula F1(u, v) such that

(u, v) in Q <-> F1(u, v).

Some time later we will prove that some relations are proper classes, for example:

E = { (u, v) | u=v }, C = { (u, v) | u in v }.

If we had only the axioms C21 and C22, we could not build sets that possess more than two members. Therefore, let us consider the next operation for building sets - the union of sets.

The simplest case is the union xUy, yet in general we can merge arbitrary collections of sets. For any class B we can define the union of all members of B:

UB = { y | Ez(y in z & z in B) }.

In other words, the union UB consists of all "members of members" of B. By the way, U{x, y} represents exactly the well-known xUy.

Union Axiom

The next axiom we adopt is the Union Axiom. It says that if x is a set, then Ux also is a set:

AxEu(Ay(y in u <-> Ez(y in z & z in x)). ----------- (C23)

This axiom is a comprehension axiom, namely, C2[Ez(y in z & z in x)].

Exercise 2.8. a) Prove that if B is a proper class, and x is a set, then the difference B−x is a proper class.

b) Show that the axioms C21, C22 and C23 allow proving the existence of any set which can be defined by using a finite number of the empty set symbols o, commas and braces, for example, {o, {o, {o}}, {o, {o, {o}}}}.

If we had only the axioms C21, C22 and C23, we could build only sets that possess a finite number of members.

But now we can prove that, if A and B are non-empty classes, and their Cartesian product AxB is a set, then A and B also are sets. Indeed, let v0 be a member of B, then for each u in A:

(u, v0) in AxB, {{u}, {u, v0}} in AxB, {u} in U(AxB), u in UU(AxB).

Hence, A≤UU(AxB). Since AxB is a set, according to the Union Axiom, UU(AxB) also is a set, hence, by the Separation Schema, so is A. Similar argument proves that B also is a set. (For a proof that, if A and B are sets, then the product AxB also is a set, we must wait for additional axioms, see below.)

Exercise 2.9. Prove that the relations E and C defined above are proper classes.

Natural Numbers in Set Theory

Now we are ready to try declaring some of the simplest finite sets as natural numbers (by definition). Namely, let us declare the empty set o to be zero, the set {o} - to be the number "one", then, {o, {o}}, or {0, 1} will be the number 2, {0, 1, 2} - the number 3, etc. In general, if the set cn is declared to be the number n, then the set cn+1 = cn U {cn} we declare to be the number n+1. It seems that we have thus also the set of all natural numbers

{0, 1, 2, 3, ..., n, n+1, ...}?

As noted above, the axioms we have adopted so far do not allow proving the existence of such an (infinite!) set. Therefore, let us try to define at least the class of all natural numbers. It appears not an easy task! The easy "solutions" by using "infinite formulas" like as

y=c0 V y=c1 V y=c2 V ... V y=cn V ...

cannot be taken seriously, since the language of set theory allows only finite formulas. To obtain a finite formula N(y) expressing En(y=cn), let us follow an early (1923) idea by John von Neumann:

J. von Neumann. Zur Einfuehrung der transfiniten Zahlen. "Acta Szeged", 1923, 1, pp. 199-208

In 1930s, R.M. Robinson, K. Goedel and P. Bernays simplified von Neumann's constructions considerably - see

A.A.Fraenkel, Y.Bar-Hillel, A.Levy. Foundations of Set Theory. Studies in Logic, Vol. 67, Elsevier Science Ltd, 1973, 404 pp. (Russian translation available)

First, let us try defining formally the notion of finite sets. The idea: let us say that some set y is finite, if and only if its members can be ordered in sych a way that each non-empty subset of y contains a minimum and a maximum member.

A relation R (possibly, a class) is called a (partial) ordering of the set y, if and only if

Ab(b in y -> ~R(b, b)) (non-reflexivity);

AbAcAd(b,c,d in y -> (R(b, c) & R(c, d) -> R(b, d)) (transitivity).

Let us call an ordering R a double-well-ordering of the set y, if and only if each non-empty subset of y contains a minimum and a maximum member under R, i.e. if and only if

Az(z≤y & Es(s in z) -> Eu(u in z & Av(v in z -> R(v, u) V v=u)) & Eu(u in z & Av(v in z → R(u, v) V u=v))).

Intuitively, only finite sets (and all of them) can be double-well-ordered (by some relation). Thus, this could serve as a good formal definition of finite sets: y is a finite set, if and only if we can provide a relation R such that R is a double-well-ordering of y.

Exercise 2.9a (for smart students). Prove the following:

a) If a, b are finite sets then axb is a set (and a finite one). Note: for a general proof that “if a, b are sets, then axb is a set” we need additional axioms.

b) For finite sets, prove the Power-Set Axiom (see below): if x is a finite set, then P(x) is a set (and a finite one).

c) For finite sets, prove the Replacement Axiom Schema (see below).

d) For finite sets, prove the Axiom of Choice (see below).

From the above (a) it follows that if R is a double-well-ordering of y, then so is the set r = R∩(yxy). This allows to express the finiteness of the set y by a single formula Finite(y):

Er (r is a double-well-ordering of y),

where the formula under quantifier Er looks now as follows:

Az(z≤y & Es(s in z) -> Eu(u in z & Av(v in z -> (v, u) in r V v=u)) & Eu(u in z & Av(v in z → (u, v) in r V u=v))).

All sets cn are finite sets:

c0=o; c1={o}={c0}; c2={o, {o}}={c0, c1}; c3={c0, c1, c2}; ...

Namely, they are double-well-ordered by the membership relation “in”: c0 in c1 in c2 in c3 in ....! Another remarkable property of cn is as follows: it contains all members of its members, members of members of members etc.

The set y is called transitive if it contains all members of each of its member, i.e. Au(u in y -> u≤y), or

AuAv (u in v & v in y -> u in y).

Thus, all sets cn are transitive and double-well-ordered by the membership relation “in”.

Let's use this combination of two properties to formalize the semi-formal predicate En(y=cn) as a definition of a class N:

N = {y | y is transitive & y is double well ordered by "in"}.

Now, natural numbers will be – by definition – simply members of N! (Does N contain the numbers cn only?)

Exercise 2.10. a) Show that the "standard" natural numbers cn defined above all are members of the class N. I.e. prove the theorem schema "for all n: cn in N". We do not know how to replace this schema (i.e. an infinite sequence of theorems) by one (finite) theorem. This seems impossible.

b) Show that, if x in N and y is the maximum (under "in") member of x, then x-{y} in N, x-{y} in x, and x-{y}=y. Derive from this that for all n: if x in N, and x contains exactly n members, then x=cn.

c) Prove that if B is a non-empty subclass of N, then B contains a minimum member under "in".

d) Prove the "induction principle" for N: if B is a subclass of N such that 0 in B & Ax(x in B -> xU{x} in B), then B=N.

e) (For smart students). Define addition and multiplication for members of N. Try to prove that all the axioms of first order arithmetic PA (see Section 3.1) hold in N. Probably, you will need only the above axioms C1+C1'+C21+C22+C23. Thus, in the set theory C1+C1'+C21+C22+C23 one can fully re-build (in the sense of Section 3.2) first order arithmetic PA (defined in Section 3.1). On the other hand, one can build in C1+C1'+C21+C22+C23 a "model" consisting of all finite sets of natural numbers. (Hint: see Section 3.3 for a method of coding finite sets of natural numbers by using the Chinese Remainder Theorem.). Verify that in this model: a) all axioms of set theory ZFC are true, except the axiom of infinity; b) the axiom "all sets are finite" (i.e. the negation of the axiom of infinity) is true.

Corollary 2.10 (answering a question by Calvin Ostrum, thanks). The following theories are equi-consistent (i.e. they are all consistent, or all inconsistent, simultaneously): PA, C1+C1'+C21+C22+C23, ZF minus axiom of infinity, ZFC plus negation of axiom of infinity.

For you, are the results of the Exercise 2.10 enough to say that "N is the class of all natural numbers"?

Axiom of Infinity

The axioms of set theory we have so far allow proving the existence only of finite sets. Indeed, if you wish, you can verify easily that all these axioms hold when interpreted in the area of sets, which can be defined by using of a finite number of the empty set symbols o, commas and braces, for example, {o, {o, {o}}, {o, {o, {o}}}}. A kind of infinity we have only among classes, for example, N is an infinite class.

Hence, our next step must be adopting of some axiom of infinity.

The approach used below is somewhat non-traditional: I will introduce the axiom of infinity as a comprehension axiom. Namely, if we wish to think of N as of a set, then we must adopt the following Axiom of Infinity:

ExAy(y in x <-> y in N). (C26).

Of course, this is a comprehension axiom, namely, C2[y in N].

Exercise 2.14. Write down the full text of C26 and count the number of characters in it.

The set of all natural numbers is denoted traditionally by ω (omega), instead of the above class letter N.

Note. As you see, C26 is somewhat lengthy when compared to other single Zermelo axioms (not axiom schemas, of course!). Perhaps, for this reason, Zermelo and his followers used shorter forms of the axiom of infinity, for example:

Ex(o in x & Ay(y in x -> yU{y} in x)),

or even shorter (why is it shorter?):

Ex( Ey(y in x) & Ay(y in x -> Ez(z in x & ~(z=y) & y≤z))).

Of course, C26 implies these formulas (simply take ω for x). If you wish, try proving that the converse is true as well (it is!).

A set x is called a countable set, if and only if x is finite, or members of x can be enumerated by using natural numbers, i.e. if there is a 1-1-mapping (possibly, a class) between ω and x.

After adopting of C26, we can prove the existence only of countable sets. To prove the existence of uncountable sets, the power-set axiom C24 must be applied additionally (see below).

Exercise 2.14a (for smart students). Prove the following:

a) If a, b are countable sets then axb is a set (and a countable one). Note: for a general proof that “if a, b are sets, then axb is a set” we still need additional axioms.

b) For countable sets, prove the Replacement Axiom Schema (see below).

Power-Set Axiom

The following rather complicated way of building sets was invented, perhaps, as late as in 1870s - during the attempts to derive the definition of real numbers from the properties of rational numbers. It appeared that speaking about "arbitrary" real numbers involves inevitably speaking about “arbitrary” sets of natural numbers. Let us consider, for example, the definition of real numbers by means of infinite binary fractions. Any such fraction, for example,

0.10101100110000101110...

"generates" some set of natural numbers. The above example generates the set

{1, 3, 5, 6, 9, 10, 15, 17, 18, 19, ...}.

In principle, in this way we can "generate" all the possible sets of natural numbers.

In general, this new operation is defined as follows. If x is a set, let us consider all the possible subsets of x, i.e. all y's such that y≤x, or Az(z in y -> z in x). Let us denote the class of all subsets of x by

P(x) = { y | y≤x}

(P stands for "power-set"). We wish to postulate that if x is a set, then P(x) also is a set. Thus, we adopt the following Power-Set Axiom:

AxEzAy(y in z <-> y≤x). --------------- (C24)

Of course, C24 is a comprehension axiom, namely, C2[y≤x].

Now we can prove that the Cartesian product of two sets is a set. Indeed,

y x z = { (u, v) | u in y & v in z }.

If u in y and v in z, then {u} in P(y) and {u, v} in P(yUz). Hence, (u, v) = {{u}, {u, v}} in PP(yUz), i.e. the class y x z is a part of the set PP(yUz). Q.E.D.

Note. For an alternative proof that does not depend on the Power-Set Axiom (but depends on the Replacement Schema below) see Exercise 2.13(e).

Cantor's Theorem (classical version). For any set x, there is no one-to-one-correspondence between x and P(x) (i.e. between members of x and all subsets of x).

Corollary. P(w) (“the set of all sets of natural numbers”) is an uncountable set. Hence, so is the set of all real numbers.

Cantor's Theorem (refined version, implies the classical version, verify!). If f is a function mapping members of a set x into subsets of x, (i.e. f: x->P(x)), then there is a subset of x that does not belong to range(f).

Proof. By the Separation Axiom C21[~(y in f(y))],

y0 = { y | y in x & ~(y in f(y)) }

is a subset of x. If y0=f(y) for some y in x, then (y in y0) <-> ~(y in f(y)) <-> ~(y in y0). Contradiction. Hence, y0 is a subset of x that does not belong to range(f). Q.E.D.

In this proof, only the Separation Axiom schema C21 and the Pairing Axiom C22 are used, i.e. this proof does not depend on the Power-Set Axiom C24 (as noted by Neil Tennant in Cantor's argument, February 2003, on the FOM List). Cantor's Theorem does not depend on C24, but the “sethood” of P(w) does!

Replacement Axiom Schema

Functions are relations that possess the "mapping" property:

(u, v1) in F & (u, v2) in F -> v1=v2,

or

AuAv1Av2 (F(u, v1) & F(u, v2) -> v1=v2).

If F is a function, then F(u)=v can be used as a convenient record of "(u, v) in F". Some functions are proper classes, for example, the above-mentioned identity function E(x)=x, or, more precisely,

E = {(u, v) | u=v}.

The well-known notions of domain and range of the function F can be defined as follows:

domain(F) = { u | Ev F(u)=v }

range(F) = { v | Eu F(u)=v }.

If F is a proper class, then, in general, domain(F) and range(F) also will be proper classes. For example, domain(E) = range(E) = V.

Exercise 2.11. Prove, for any relation Q, that Q is a set, if and only if domain(Q) and range(Q) both are sets.

The "zero-constant" function, i.e. F0 = {(u, v) | v=o}, has a proper class domain (domain(F0)=V), yet its range is a set (range(F0)={o}). Still, how about the fourth possibility - can a function have a set domain and a proper class range, i.e. can a function map a set onto a proper class? Of course, we do not wish such functions. To prohibit them, we must adopt additional axioms - the so-called Replacement Axiom schema. You will not find such axioms in Zermelo's 1908 paper, Abraham Fraenkel noticed that they are necessary some time later - in 1921:

A. A. Fraenkel. Zu den Grundlagen der Cantor-Zermeloschen Mengenlehre. "Math. Annalen", 1922, Vol. 86, pp. 230-237.

If F is a function, then for any class B we will denote by F"B the F-image of B:

F"B = { v | Eu (u in B & F(u)=v) }.

Exercise 2.12. Prove that if a function f is a set, then for any class B the image f"B is a set.

Still, if the function F is a class, and b is a set, then the image F"b is ... a set? Imagine, you take members of the set b one by one, and replace each member y by F(y). The result is F"b. Of course, we wish F"b to be a set. So, let us adopt the Replacement Axiom Schema:

F is a function -> AbEc F"b=c.

Or, more precisely, let F(u, v, z1, ..., zn) be a formula that does not contain v1, v2, x, y, b, c, then we adopt as the axiom C25[F] the following formula:

AuAv1Av2 (F(u, v1, z1, ..., zn) & F(u, v2, z1, ..., zn) -> v1=v2) -> AbEcAy(y in c <-> Ex(x in b & F(x, y, z1, ..., zn))).

Of course, this is again a comprehension axiom, namely, C2[Ex(x in b & F(x, y))], yet we allow to apply it only after we have proved that for each x there is only one (or none) y such that F(x, y).

Exercise 2.13. a) Prove that if B is a proper class, and b is a set, then no one-to-one function can map B into b (or, equivalently, there is no function F with domain(F)=b and range(F)=B).

b) Derive the Separation Axiom schema C21 from the Replacement Axiom schema C25.

c) (Jan Mycielski) Derive the Pairing Axiom C22 from C21, C24 and C25. (Hint: apply first C21, then twice - C24, and finally - C25).

d) Above, we used the Power-Set axiom C24 to prove that the Cartesian product of two sets is a set. By using the Replacement Axiom Schema C25, this can be proved without C24. Elaborate the following proof put on the FOM list by Harvey Friedman (see "What do you lose if you ditch Powerset?", November 2003):

LEMMA. {x} cross B exists.

Proof: For each y in B, {x} cross {y} exists. Then use Replacement to get the set of all <x,y> such that y in B (x fixed).

THEOREM. A cross B exists.

Proof: For each x in A, {<x,y>: y in B} exists. Use Replacement to get E = {{<x,y>: y in B}: x in A}. A cross B = UE.

The replacement axiom schema completes the list of comprehension axioms, that are necessary for reconstruction of common mathematics, i.e. for building of "useful" sets. Is our list (the axioms from C21 to C26) "complete" in the sense that no "acceptable" comprehension axioms will be discovered in the future? The answer could be "yes" (Church's thesis for set theory?). We will discuss this problem in Section 2.4.

Exercise 2.15. Prove that, in the set ω, the semiformal ("second order") Peano axioms hold (see Section 3.1).

So, it would be nice to stop at this point and finish our list of axioms by adopting an axiom asserting that no other sets exist - except those, which can be built by using the comprehension axioms? I.e. the axiom: "All sets can be built by using the comprehension axioms". Still, how to put this restriction into one (finite!) formula of set theory?

Open problem. How to put the statement "All sets can be built by using the comprehension axioms" into one formula of set theory? Is this possible at all?

While this problem remains unsolved, let us return to the tradition, and discuss the remaining two axioms - the axiom of regularity and the axiom of choice.

Axiom of Regularity

Sometimes called also the "Axiom of Foundation".

It appears that the existence of some "abnormal" kinds of sets is consistent with all the axioms we have adopted so far. For example, if you wish to assert the existence of a set x such that "x in x", you can do this safely: no contradiction with our previously adopted axioms will arise.

Exercise 2.16. Prove that, this is the case, indeed. Build a simple "model" where the "in" predicate is interpreted appropriately (i.e. "abnormally"). Similarly, prove the relative consistency of the conjecture "there are two sets x, y such that (x in y)&(y in x)". Etc.

An idea, allowing to exclude such "abnormal" sets, was first proposed in 1917 by Dmitry Mirimanoff (1861-1945):

D. Mirimanoff. Les antinomies de Russell et de Burali-Forti et le probleme fondamental de la theorie des ensembles. "Enseign. math", 1917, Vol. 19, pp. 37-52.

Mirimanoff introduced the notion of "ordinary sets" (or, as we would call them today, "well-founded sets", or "regular sets"), in which infinite chains "down" the membership relation do not appear, for example:

... in xn in ... in x3 in x2 in x1 in x.

In these terms, the above-mentioned "abnormal" sets (x in x, (x in y)&(y in x), etc.) should be qualified as "non-ordinary".

In 1925, J. von Neumann proposed to exclude such "abnormal", "non-ordinary", "non-regular" sets at all by introducing the following Axiom der Fundierung, now called the Axiom of Regularity:

Ax (~(x=o) -> Ey(y in x & y∩x=o)),

or, with abbreviations excluded,

~Ex( Ey(y in x) & Ay(y in x -> Ez(z in x & z in y))). ----- (C3)

J. von Neumann. Eine Axiomatisierung der Mengenlehre. "Journal fuer reine und angewandte Mathematik", 1925, Vol.154, pp. 219-240.

Exercise 2.17. Verify that, indeed, C3 excludes all the above-mentioned "abnormal" sets. Derive from C3 that ~(x in x) for all x, i.e that V=R, where R is Russell's class. Similarly, derive from C3 that there are no two sets x, y such that (x in y)&(y in x).

The set theory adopting the axiom of extensionality (C1), the axiom C1', the separation axiom schema (C21), the pairing axiom (C22), the union axiom (C23), the axiom of infinity (C26), the power-set axiom (C24), the replacement axiom schema (C25), and the axiom of regularity (C3), is called Zermelo-Fraenkel set theory, and is denoted by ZF.

Zermelo had in his axiom list also the famous axiom of choice.

Axiom of Choice

by Eric Schechter
Axiom of Choice
by Wikipedia
Eric W. Weisstein. "Axiom of Choice." From
MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/AxiomofChoice.html

If all members of a set x are non-empty sets, then we can try to define a choice function f that assigns to each y in x some f(y) in y. Can we hope to define a choice function for each collection x of non-empty sets? At least, we can postulate, that such a function always exists. Thus we obtain the Axiom of Choice (AC):

Ax( Ay(y in x -> y not empty) -> Ef(f is function & domain (f) = x & Ay(y in x -> f(y) in y))).

In 1904, Zermelo used this "principle of arbitrary choice" explicitly to prove that each set can be well-ordered.

E.Zermelo. Beweiss, dass jede Menge wohlgeordnet werden kann. "Math. Annalen", 1904, Vol. 59, pp. 514-516 (see also comments by Stanley N. Burris).

The ordering "<" of some set x is called well-ordering, if and only if each non-empty subset of x contains a minimum member under "<". For example, the set of all natural numbers ω is well ordered by the membership relation “in”. Finite sets can be double-well-ordered, see above.

This provocative paper by Zermelo was by far not the first time in the history when something like the "principle of arbitrary choice" was used in mathematical proofs. (About the way, how this principle was used by the founder of set theory – Georg Cantor, see above.) Still, Zermelo dared to state this principle explicitly and in its most unrestricted form.

Exercise 2.18. a) Prove the converse statement: if the union Ux is well-ordered (by some relation "<"), then there is a choice function for x.

b) Derive from AC that each infinite set contains an infinite countable subset. Which definition of infinite sets do you prefer to use here?

Note that AC is not a comprehension axiom. The choice function f is not defined by some formula F(x, y, z) expressing that f(y)=z. The existence of f is merely postulated. The term "principle of arbitrary choice" underlines this extremely non-constructive nature of AC. I.e. we assume that we are able to make an infinite number of choices without having any guiding rule. (For the long history of hot discussions "around the axiom of choice" – non-measurable sets, the Banach-Tarski Paradox etc. - see Medvedev [1982]).

We can "judge" AC as we please, yet as an axiom of set theory it is absolutely safe: in 1938 Kurt Goedel proved that

"If ZF+AC would be an inconsistent theory, then so would be ZF".

K.Goedel. The consistency of the axiom of choice and of the generalized continuum hypothesis. "Acad. U. S. A.". 1938, Vol. 24, pp.556-557 (see also Section 2.4.1 below).

The set theory ZF+AC is denoted traditionally by ZFC. By using the axioms of ZFC all theorems of Cantor's intuitive set theory can be proved.

Mathematicians may be interested to verify this themselves - just follow an excellent concise book:

T.Jech. Lectures in set theory with particular emphasis on the method of forcing. "Lecture Notes in Mathematics", vol. 217, Springer-Verlag, Berlin - Heidelberg - New York, 1971 (Russian translation available)

You will be inspired to do yourselves 90% of the technical work.

Since the end of XIX century we know that all the other theories of common mathematics can be reformulated in set theory. This completes the first stage of Hilbert's program: convert all existing mathematics into a formal theory (namely, ZFC).

### 2.4. Around the Continuum Problem

Back to title page.

set theory, axioms, Zermelo, Fraenkel, Frankel, infinity, Cantor, Frege, Russell, paradox, formal, axiomatic, Russell paradox, axiom, axiomatic set theory, comprehension, axiom of infinity, ZF, ZFC