Lambda counting
Introduction
tex file : [1]
pdf file :[2]
This paper addresses the following question. Having a (theoretical) programming language and a property, what is the probability that a random program satisfies the given property ? In particular, is it the case that almost every random program satisfies the desired property i.e. the probability is 1 ? We consider, in this introduction, that this notion of probability is, at least intuitively, sufficiently clear but, of course, this will have to be made precise.
This kind of question has some known results for Turing machines and cellular automata. Some of them will be given in section ??. We will concentrate here on functional programming languages and, more specifically, on the lambda calculus, the simplest such language. Various properties can be studied. Some concern the structure of a term, some concern its behaviour.
The first question for which it would be desirable to have an answer is the following: give a "simple" equivalent for the number of terms of size n. This question is, at present and as far as we know, unsolved and the usual technics of generating functions does not work. See section ??? for more details. We give here upper and lower bounds for this number. This estimation will be enough for our purpose but the gap between the lower and the upper bound is to big to hope to find an equivalent.
For other questions, some experiments have already been done (see for example ...) which clearly indicates the desired result. For example, they "show" that almost every closed lambda term begins with a but, as far as we know, there was no "proved" result of this form.
This paper proves some non trivial results on the structural form of a lambda term. In particular we show that almost every closed lambda term begins with "many" (the precise meaning of this is given in theorem ??), that they bound "many" occurrences of the corresponding variables (theorem ??) and that, given any fixed closed term, almost no -term has this term as a sub-term (theorem ??).
Our original motivation was to consider the property of being terminating. Is a random term strongly normalizable (SN for short) i.e. every sequence of reduction terminates ? This question is, at present, unsolved and the experiments we have done do not even give an idea of what the result should be. It is known that being SN is an undecidable question and it is thus not easy to count the number of SN terms of big size. It is clear that having as a sub-term is a sufficient (but not necessary) condition for being non SN but as the experiments have shown (and as we have proved) almost no terms contains and this is thus useless to have a result for non SN. On the opposite direction, it is known that, if a term t is not SN, then a pattern "looking like" (the precise meaning will be given in section ??) must appear in t but the experiments seem to show that almost every term possesses this pattern (actually, we have not been able to count for big enough terms to have a clear idea of the probability of possessing the desired pattern). But, again, this experimental result would be of no use to guess the result.
Another question, which is also undecidable, and for which finding the probability seems to be even more difficult, is to get the probability of being weakly normalizable i.e. there is a sequence of reduction that terminates.
Combinatory logic is another programming language related to the
lambda calculus. It is a way of coding lambda terms without using
bound variables. The coding is fair for the questions we are
concerned with, in the sense that there are translations in both
directions which, for example, preserve the property of being SN.
We have also studied this language and, surprisingly, the results
are very different from those for the lambda calculus. For example
we show that, for every fixed term t_0, almost every term possess
t_0 as a sub-term and this of course implies that almost every
term is non SN. Note that the increase of size in the translation
from the lambda calculus to combinatory logic is not known (we
only have a lower bound) and thus results for combinatory logic
cannot be used to get results for the lambda calculus.
The organization of the paper is as follows. Section ?? briefly recalls known results for Turing machines and cellular automata. Section ?? shows that, in combinatory logic, every fixed term appears in almost every term. In section ?? we recall the basic definitions of the lambda calculus and we discuss the various possibilities for counting the size of a term and the probability measure we put on sets of terms. Section ?? gives the combinatorial results we will need in our proofs and, in particular, the lower and upper bounds for the number of terms of size n. It also introduces Catalan and Motzkin numbers and the so-called Lambert function. Our main results, i.e. theorems ??, ?? and ?? appear in section ?? Section ?? gives experimental results for questions for which we have no proof. Finally, section ??? gives open questions and future work. The detailed proofs are given in an appendix.
Known results for Turing machines and cellular automata
In this section I propose to comment on paper [3] and [4].
Guillaume says: I am very skeptical about paper 2 by D'Abramo. I suggest to replace it by the paper by Calude and Stay [5].
Lambert function, Catalan and Motzkin numbers
Catalan numbers
- : Catalan numbers
Usual equivalent: which is obtained using Strirling formula. However, using stirling formula which can be extended as the following double inequality:
Thus, using this and , we have:
Then, we can check that
Thus finally for all .
Motzkin numbers
Let us define the number of unary-binary trees with inner nodes and leafs. We get
Then, by summing we define the number of unary-binary trees with inner nodes and give an equivalent:
Lambert W function
The Lambert function is defined by the equation which has a unique solution in .
For , we have which implies that near . To prove this, it is enough to remark that
This is not precise enough for our purpose. Using one step of the Newton method from , we can find a better upper bound for because is increasing and convex when . This gives for :
Indeed, if we define , we have and therefore, newton's method from gives a point at position:
Finally, we show that for , we have:
Indeed, for , we have , which implies and therefore .
combinatory logic
Basically the paper already written by Marek
tex file : [6]
pdf file :[7]
+ the following
As we will see in section ??, theorem ?? does not holds for the Lambda calculus. This may be surprising since there are translations between these systems which respect many properties (for exemple the one of being terminating). However these translations do not preserve the size.
The translation from combinatory logic to lambda calculus is linear, i.e. there is a constant such that, for all terms, but the translation in the other direction is not linear. As far as we know, there is no known bound on the size of but it is not difficult to find exemples where is of order .
The point is that has to code the binding in some way and this takes place. It will be interesting to compare the size of with the one of using other notion of size than the usual one. See section ?? for some complement.
Generality on lambda calculus
definition
The set of lambda terms (or, simply, terms) is defined by the following grammar
To be able to define the notion of a random term we
have to define a distribution law on . There are many
possibilities for that. We choose here the simplest one. Note that this is the one for which, at least at present, we are
able to prove some results. It is based on densities. For that we
first have to define the size of a term.
The usual definition is the following.
definition
The size (denoted as ) of a term is defined by the following rules.
- if is a variable.
-
-
In the rest of the paper we will use another definition (denoted
as ) which is similar but gives simpler computations.
We believe (but we have not yet checked the details) that, with
we would have similar results. The
computation, with , of the upper and lower
bounds of the number of terms of size will be done
in section ??
definition
The size (denoted as or, more simply ) of a term is defined by the following rules.
- if is a variable.
-
-
These definitions of the size are, for the implementation point of
view, not realistic because, in case a term has a lot of distinct
variables, it is not realistic to use a single bit to code them.
The usual way to implement this coding is to replace the names of
variables by their so called de Bruijn indices: a variable is
replaced by the number of that occur, on the
path from the variable to the root, between the variable and the
that binds it. Note that, in this case,
different occurrences of the same variable may be represented by
different indices.
Choosing the way we code these de Bruijn indices gives different other ways of defining the size of a term. This can be done in the following ways
- Use unary notation, i.e. the size of the index simply is itself
- Use optimal binary notation, i.e. the size of the index is i.e. the logarithm of in base 2.
- Use uniform binary notation, i.e. the size of an index is the logarithm, in base 2, of the number of leaves of the term.
Remark
See section ?? for a discusion about these different size.
definition
Let be an integer. We denote by the set of terms of size n.
definition
Let A be a set of terms.
1) We denote by the cardinality of A.
2) We denote by the limit, for n going to , of .
Remark
Note that d is not exactly a measure since is undefined if the previous limit does not exist
definition
Let P be a property of terms. We will say that almost every term satisfies P (this will be also stated as P holds a.e.) if
generating functions
this does not work (by now) because radius of convergence 0
no known results for the number of terms of size n (denoted )
our results
(the proof of result of section k needs the result of section (k-1))
Upper and lower bounds for
For the lower bound, we will first count the number of lambda-terms of size starting with lambdas and having no other lambda below. This means that the lower part of the term is a binary tree with inner nodes with possibility for each leaf. Therefore we have:
And therefore, for , using our lower bound for and , we get:
Now, for fixed, we define (so ) and look for the maximum of this function. We have . Thus, is equivalent to . The Lambert function begin increasing this means that is equivalent to . Therefore, reaches a maximum for .
This means that reaches its maximum for fixed when is near to which is likely not to be an integer. However, there are at least integer between and . Indeed, using our inequalities on Lambert W function, we have:
Thus, we get the following lowerbound for :
To simplify, using the fact that and taking large enough, we have the following lowerbound:
We now compute an upper bound for the number of lambda-terms of size with exactly lambdas (that is with leaves using the Motzkin numbers and allowing any lambda to bind any variable (regardless of the real scope):
If we sum this for all possible and get an upper bound of using Lambert function as for the lower bound, we get the following upper bound for :
The ration between our upper bound and lower bound is equivalent to (NEEDS FURTHER CHECKING):
upper and lower bounds for number of lambdas in a term of size n
Let Ls be defined as follows: lambda term of size n belongs to Ls iff it contains more than lambdas.
Claim 1. Density of Ls equals 0.
Proof: straighforward using lower and upper bounds.
denote the number of elements of Ls of size n. It is clearly smaller then Dividing by the lower bound , we get which tends to 0 when
Corollary. Random lambda term of size n contains no more then lambdas.
Once we know this we can improve the upper bound.
Let Ll be . In a term from Ll the proportion between unary and binary nodes is smaller then which is far from typical proportion in unary-binary trees which tends to some nonzero constant. Rough estimation gives
where C(n) corresponds to the binary structure, to the possible distribtions of lambdas in binary structure and to the possibilities of bindings (one can optimal number of unary nodes involving Lambert's function to get rid of epsilon here). The crucial observation now is that is subexponential.
Comparing this upper bound with we would obtain that the asymptotic ratio between upper and lower bound is subexponential.
I think it has some serious consequences. In particular it seems that in a random lambda term almost all lambdas lie on one path.
Proposition. \label{prop:typical_depth} There exists a function such that the set of terms t having a -depth greater than has density 1.
Lambdas in head position
Theorem. \label{th:starting_lambdas} Let . The set of terms t starting with less than lambdas has density 0.
Proof. By proposition \ref{prop:typical_depth}, for some function f such that , we can restrict to the set of terms t having -depth at least (because has density 1). To prove the theorem it is sufficient to exhibit a one-to-one function from into whose image set has density 0. We now define by describing where t is an arbitrary term of size n belonging to . By hypothesis, the term t starts with a chain of p -nodes, where . So where A is a term starting by an application and containing at least one (for n large enough). Now let B be the maximal purely applicative prefix of A. Precisely, B is the maximal binary tree whose leaves are either special leaves or variables among and such that where are terms starting with a lambda, and denotes the term obtained by replacing successive special leaves of B by terms respectively. By hypothesis on term t, we have . Let denote the terms obtained from by removing the leading lambda. Consider the term which is of size , where b is the number of application nodes in B. Finally, let be the set of purely applicative terms of size b whose variables are chosen among a set of size . Consider the leftmost deepmost lambda of term T (leftmost among deepmost), and let denote the term rooted at this position. For any , we define the term by subsituting with in T and binding all variables of u according to their name to lambda above u. Firstly, is a well-defined closed term because the insertion of u occurs at depth at least and u has at most different variables. Secondly, has size exactly n.
The set of binary trees with b internal nodes and leaves labelled either 'special' or by a variable among has cardinality (where denotes the Catalan number). Besides, the set has carinality . Therefore, since and , there exists a function with and a function from to subsets of such that:
- for each , contains elements;
- for n large enough, .
What we have established so far is the following: to term we assosiate injectively the 4-uple ; then, if u is the first element of in some arbitrary (but fixed) order, we assotiate to the term . By hypothesis on function defined above, the function is injective for n large enough. Moreover, we have so the image set of has density 0 and the theorem follows.
Head lambdas bind "many" occurrences of the corresponding variables
Theorem. \label{th:biding_lambdas} Let g and h be two functions belonging to and let be the set of terms t where the total number of occurrences of variables bound to one of the head lambdas is less than . Then has density 0.
Remark. The proof of this theorem uses arguments similar to those used in proof of theorem 'starting lambdas'. But is this theorem useful ?
Yes, since it is used in the next theorem. However, it is enough to show that for fixed integers k and k' each of k head lambdas binds at least k' variables.
every fixed closed term (including the identity !) does not appear in a random term (in fact we have much more than that)
Let denote the set of lambda terms which have as a subterm and let be the set of those lambda trees in which the number of unbounded leaves (occurrences of free variables) is equal to .
- Theorem
For any integers and and for any term of length the following holds:
- Proof
Fix and . Let be a lambda tree of size with exactly occurrences of free variables, denoted (not necessarily distinct). Since there are at most occurrences of lambdas and at most leaves in , there are at most
such trees and we can enumerate them in a fixed way. Let be the number of . The tree contains at least one occurrence of lambda, otherwise it would either contain more free variables or would be of a bigger size.
Let be an integer satisfying .
We will construct an injective function such that its image is of density 0 in .
Let be a term of size with as its subterm. Let us consider the tree which is built from the tree by adding an additional unary node (labelled with ) at depth . Next, let be the tree obtained by replacing the subtree in by the tree , where is a binary tree of size such that and , so the size of is equal to . Thus, the size of is equal to . The variable is bound by the -th lambda in the tree . Let us take . Since is the number of the tree , the function is injective.
By Theorem \ref{?}, each of head lambdas in a random tree of size binds more than variables. Trees from the image do not have this property, since the -th lmabda binds only variables. Thus, those trees are negligible among all trees of size .
- Corollary
Let be a closed lambda term. Then the density of is equal to .
- Corollary
Let be a lambda term in which there are at least two occurrences of lambdas. Then the density of is equal to .
comment : so different situation in combinatory logic and lambda calculus ; the coding uses a big size so need to count variables in a different way
Random lambda terms are strongly normalizable
Jakub says: I think that it might be possible to simplify the proof, by using other argument for the proof that the density of the image of encoding equals 0.
Chambéry says: we checked the proof and we are convinced by the result. Since Jakub made an important step for the second time, we suggest to explicitely acknowledge his contribution in the paper (for instance, by naming some key lemma/theorems "Jakub's lemma/theorem"). Here are the points to fix in the proof:
- as René said, the definition of "dangerous redex" must be changed to allow a single occurrence of x in A (and a proof sketch of why "no dangerous redex" implies SN must be added)
- version 1 of encoding is sufficient
- in the encoding, there is some substituion of variable to do in C (to have a closed term at the end, and also to ensure that the decoding is unambiguous)
- to complete the proof with the new definition of "dangerous redex" we need the result saying that terms containing identity have density 0 (as said by Jakub by e-mail). We don't see how to conclude without that. However, we have to further check if there is no shorter proof...
Experiments
results of the experiments we have done
some experiments that have to be done : e.g. density of terms having or big Omega pattern ...
to be done
Upper and lower bounds for with other size for variables especially one, binary with fixed size
Open questions and Future work
.....