machine learning, inductive inference, probabilistic, strategy, machine, inductive, learning, inference, Karlis, Podnieks, theorem, estimate, theory, log, bound, upper, lower, logarithmic

My personal page - click here

Any comments are welcome - e-mail to Karlis.Podnieks@mii.lu.lv

**INDUCTIVE INFERENCE OF FUNCTIONS BY PROBABILISTIC
STRATEGIES **

**Karlis Podnieks **

Institute of Mathematics and Computer Science

University of Latvia

Translation of papers [Po 75] and [Po 77-2] published in Russian.

For the results (without proofs) in English see also [FBP 91].

The paper in PDF format.

**Table of contents**

1. Definitions and results

2. Proofs

3. Proofs of Lemmas 2, 3

4. Proof of Lemma 4

5. Exclusion algorithm

6. Tail reduction proof

References

**Abstract**

The following model of inductive inference is considered.
Arbitrary numbering tau = {tau_{0}, tau_{1}, tau_{2},
... } of total functions N->N is fixed. A "black
box" outputs the values f(0), f(1), ..., f(m), ... of some
function f from the numbering tau. Processing these values by
some algorithm (a strategy) F we try to identify a tau-index of f
(i.e. a number n such that f = tau_{n}). Strategy F
outputs hypotheses h_{0}, h_{1}, ..., h_{m},
... If lim h_{m} = n and tau_{n} = f, we say that
F identifies in the limit tau-index of f. The complexity of
identification is measured by the number of mindchanges, i.e. by
F^{tau}(f) = card{m | h_{m} <> h_{m+1}
}. One can verify easily that for any numbering tau there exists
a **deterministic** strategy F such that F^{tau}(tau_{n})
<= n for all n. This estimate is exact (see [Ba 74], [FBP
90]). In the current paper the corresponding exact estimate
ln n + o(log n) is proved for **probabilistic** strategies.
This result was published without proof in [Po 75]. Proofs were published in [Po 77-2] (in Russian).

**Probabilistic strategy** is defined by some:

a) probability space (W, B, P), where W is the set of elementary events, B - a Borel field over subsets of W, P - a probability measure over B;

b) mapping M: N*->Z, where N* is the set of all finite sequences of natural numbers, Z - the set of all random variables over the space (W, B, P) taking their values from NU{oo} (oo means "undefined").

Thus M associates with each elementary event e in W some
deterministic strategy F_{e}. The hypothesis F_{e}
(<f(0),...,f(m)>) takes its values n with fixed
eprobabilities P{F_{e}(<f(0),...,f(m)>)=n}.

**Computable (recursive) probabilistic strategies**
can be defined by means of probabilistic Turing machines
introduced first in [LMS 56]. Let a
random Bernoulli generator of the distribution (1/2, 1/2) be
fixed. The generator is switched into deterministic
"apparatus" of a Turing machine. As a result, the
operation of the machine becomes probabilistic, and we can speak
of the probability that the operation satisfies certain
conditions.

Consider the following Turing machine M operating with a fixed Bernoulli generator. With input sequence

f(0), f(1), ..., f(m), ...

this machine prints as output an empty, finite or infinite sequence of natural numbers (hypotheses):

h_{0}, h_{1}, ..., h_{m},
...,

where h_{m} depends only on the values f(0), ...,
f(m). To each infinite realization of Bernoulli generator's
output (i.e. an infinite sequence of 0's and 1's) corresponds a
completely determined operation of the machine M as a
deterministic strategy.

By P{M, tau, f} we denote the probability that a probabilistic strategy M identifies in the limit tau-index of the function f. By P{M, f, <=k} we denote the probability that strategy M makes no more than k mindchanges by the function f.

Let us denote by f^{[m]} the code of <f(0), ...,
f(m)>, then the random variable M(<f(0), ..., f(m)>) can
be denoted by M(f^{[m]}). By P_{m}(M, f) we
denote the probability that M changes its hypothesis at the step
m, i.e. P{M(f^{[m+1]})<>M(f^{[m]})}.

First, let us consider some sufficient condition for P{M, tau,
f}=1. We will say that strategy M is tau-**consistent**
on the function f if, for all m,

a) M(f^{[m]}) is always defined (i.e. defined for all
events e in W),

b) if M(f^{[m]})=n for some event e in W, then tau_{n}(j)=f(j)
for all j<=m.

Thus, consistent strategies do not output "explicitly incorrect" hypotheses.

**THEOREM 1**. For any enumerated class (U, tau)
there is a probabilistic strategy M and a constant C>0 such
that: a) M always identifies in the limit tau-index of each
function f in U, and b) M changes its mind by the function tau_{n
}no more than ln n+C*sqrt(ln n)*lnln n times with
probability ->1 as n->oo. For a computable numbering tau, a
computable probabilistic strategy M can be constructed..

**THEOREM 2**. For any countable set FI of
probabilistic strategies there exists an enumerated class (U,
tau) and a constant C>0 such that: for any strategy M in FI,
which identifies in the limit tau-index of each function f in U,
there is an increasing sequence {n_{k}} such that M
changes its mind by the function tau_{nk }at least ln n -
C*sqrt(ln n)*lnln n times with probability ->1 as k->oo.
For the class of all computable probabilistic strategies, a
computable numbering tau can be constructed.

The upper bound ln n can be proved by means of a strategy from [BF 72]. Essential difficulties arise, however, not in the construction of the strategy but in its analysis.

The main idea is as follows. Let us suppose that the function
f in the "black box" which outputs the values f(0),
f(1), f(2), ..., is chosen at random from the numbering tau,
according to some probability distribution pi={pi_{n}}
(pi_{n} is the probability that f=tau_{n}, pi_{1}+pi_{2}+pi_{3}+...
=1). Then, having received the values f(0), ..., f(m), let us
consider the set E_{m} of all tau-indices
"suitable" for such f, i.e.

E_{m} = {n | (Aj<=m) tau_{n}(j)
= f(j)}.

It would be natural to output as a hypothesis any tau-index n
in E_{m} with probability pi_{n} / s, where s =
Sum{pi_{n} |n in E_{m}}.

To put it precisely, we define a probabilistic strategy BF_{tau,pi}
as follows. Let tau be any numbering of total functions. Take
some probability distribution {pi_{n}}, where pi_{n}>0
for all n, and pi_{1}+pi_{2}+pi_{3}+...
=1.

If the set E_{0} = {n | tau_{n}(0)=f(0)} is
empty, then we put BF_{tau,pi}(f^{[0]}) undefined
with probability 1. If E_{0} is non-empty, we put BF_{tau,pi}(f^{[0]})=n
with probability pi_{n}/s for every n in E_{0},
where s = Sum_{n}{pi_{n} | n in E }.

Let us assume now that the hypotheses BF_{tau,pi}(f^{[j]})
have already been determined for j<m, and BF_{tau,pi}(f^{[m-1]})=p.
If p is "undefined", then we set BF_{tau,pi}(f^{[m]})
undefined with probability 1. Else, if tau_{p}(m)=f(m)
(i.e. the hypothesis p is correct also for the next argument m),
we set BF_{tau,pi}(f^{[m]})=p with probability 1.

Suppose now that tau_{p}(m)<>f(m). Let us take
the set of all appropriate (for the present) hypotheses, i.e.

E_{m} = {n | (Aj<=m) tau_{n}(j)
= f(j)}.

If E_{m} is empty, then we put BF_{tau,pi}(f^{[m]})
undefined with probability 1. If E_{m} is nonempty, we
put BF_{tau,pi}(f^{[m]})=n with probability p_{n}/s
for every n in E_{m}, where s = Sum_{n}{ pi_{n}
| n in E_{m} }.

Of course, BF_{tau,pi }always identifies in the limit
tau-index of each function f in U.

**LEMMA 1**. Let {X_{m}} be a series of
independent random variables taking values from {0,1} in such a
way that: a) Sum_{m}X_{m} is always finite, b) E
= Sum_{m}P{X_{m}=1}is finite. Then for any number
t>0:

P{ |Sum_{m}X_{m} -
E|>=t*sqrt(E) } <= 1/t^{2}.

PROOF. Immediately, by Chebishev inequality.

Lemma 1 allows deriving upper and lower bounds of the number
of mindchanges from the corresponding bounds of Sum_{m} P_{m}(M,
f).

**LEMMA 2.** For all n,

Sum_{m}{ P_{m}(BF_{tau,pi},
tau_{n}) } <= ln(1/pi_{n}).

**LEMMA 3.** Let a function f of the numbering
tau be fixed. Then the following events are independent:

A_{m} = { BF_{tau,pi}(f^{[m]})
<> BF_{tau,pi}(f^{[m+1]}) }, m = 0, 1, 2,
...

It is curious that the events A_{m} (i.e. "at the
m-th step BF_{tau,pi} changes its mind") do not
display any striking indications of independence; nevertheless,
they do satisfy the formal independence criterion.

If we take

pi'_{n} = c / (n* (ln n)^{2}),

with the convention that 1/0=1 and ln 0 = 1, then by Lemma 2
the sum of the probabilities of BF_{tau,pi }changing
hypothesis by the function tau_{n} will be less than ln n
+ 2lnln n - ln c. Lemma 3 and Lemma 1 (let us take t = lnln n)
allow us to derive from this that, as n->oo, with probability
->1, the number of mindchanges of BF_{tau,pi'} by tau_{n
}does not exceed ln n + C*sqrt(ln n)*lnln n, thus proving
Theorem 1. For proofs of Lemmas 2, 3 see Section
3.

It is easy to verify that if the numbering tau is computable,
then the strategy BF_{tau,pi'} can be modified so that it
becomes computable (see Section 3).

The lower bound ln n of Theorem 2 is proved by
diagonalization. Let {M_{i}} be some numbering of
strategies of the set FI. In the numbering tau to be constructed,
for each strategy M_{i} an infinite sequence of blocks B_{ij}
is built in, each block consisting of a finite number of
functions tau_{n}. If M_{i} identifies (with
probability 1) tau-indices of all functions of B_{ij},
then by some function of this block M_{i} will change its
mind sufficiently often.

To construct an individual block B_{ij} the following
Lemma 4 will be used.

Let {Z_{j}} be a sequence of independent random
variables taking values 0,1 in such a way that

P{Z_{j} = 1} = 1/j, P{Z_{j} =
0}=1- 1/j.

Thus,

P{Z_{2}=1}+ ... +P{Z_{n}=1}=
1/2+...+1/n = ln n+O(1).

Let us take t = lnln n in Lemma 1, then for some C>0, as n->oo,

P{ Z_{2}+...+Z_{n} >= ln n -
Csqrt(ln n)lnln n) }->1.

Now one can understand easily the significance of

**LEMMA 4.** Let M be a probabilistic strategy, k
and n - natural numbers with k<n, e>0 - a rational number,
gamma - a binary string. Then there is a set of n functions s_{1},
..., s_{n} such that: a) each function s_{j} has
gamma as initial segment, and b) if M identifies with probability
1 s-indices of all functions s_{j}, then by one of these
functions M changes its mind >=k times with probability

>=(1-e) P{ Z_{2}+...+Z_{n}
>=k }.

If M is recursive strategy, then the set s_{1}, ..., s_{n}
can be constructed effectively.

Now, to prove Theorem 2, we derive the block B_{ij}
from the set of functions of Lemma 4 for

M = M_{i}, n = 2^{j}, k = ]j*ln
2 - sqrt(j)ln j[, e = 2^{-j} , gamma = 0...01

with <i, j> zeros in gamma, where <i, j> is Cantor's number of the pair (i, j).

Let us introduce a special coding of triples (i, j, s) such
that 0<=s<=2^{j} (see [BF
74]). The code of (i, j, s) is defined as the binary number

100....0a_{t}...a_{0}100....0,
----------(*)

|------ j ----|--|-- i --|--------------

where a_{t}...a_{0} is the number s. Clearly,

2^{i+j+1} + 2^{i} <= code(i,
j, s) <= 2^{i+j+2} - 2^{i}.

Now the numbering tau can be defined as follows. If n is (*)
for some (i, j, s), 0<=s<=2^{j}, then tau_{n}
is the s-th function of the block B_{ij}. Else tau_{n}
is set equal to zero.

Let a probabilistic strategy M in FI identify with probability
1 tau-indices of all functions of the numbering tau. Then M=M_{i}
for some i, and for each j>1 the block B_{ij} contains
a function by which M changes its mind at least ]j*ln 2 - sqrt(j)
ln j[ times with probability

>= (1-2^{j}) P{ Z_{2}+...+Z_{n
}>= ]j*ln 2 - sqrt(j) ln j[ } -------------(**)

Let us denote by n_{j} the (unique)
tau-index of this "bad" function. Obviously,

2^{i+j+1} < n_{j} < 2^{i+j+2}.

Hence, {n_{j}} is an increasing sequence,
and

log_{2} n_{j} -i-2 < j <
log_{2} n_{j} -i-1.

By the function tau_{nj} the strategy M changes its
mind at least

j*ln 2-sqrt(j) ln j > ln n_{j} -
Csqrt(ln n_{j})lnln n_{j}

times with probability (**) which ->1 as j->oo, thus proving Theorem 2.

For a proof of Lemma 4 see Section 4 and Section 5.

machine learning, inductive inference, probabilistic, strategy, machine, inductive, learning, inference, Karlis, Podnieks, theorem, estimate, theory, log, bound, upper, lower, logarithmic