E-PIC: Probability, Information Theory and Computing

As part of the DAAD project Probability, Combinatorics and Spin Glasses in the program Hochschulkooperationen with the African Institutes for Mathematical Sciences (AIMS) II



Organizers: 
Amin Coja-OghlanMax Hahn-Klimroth, Jan HazlaJan Lukas IgelbrinkNicola KistlerOlga Scheftelowitsch

Talks: 

  • Emmanuel Abbe
  • Ines Armendariz
  • Petra Berenbrink
  • Charilaos Efthymiou
  • Weiming Feng
  • Aernout van Enter
  • Andreas Göbel
  • Joon Lee
  • Alexander Munteanu
  • Ido Nachum
  • Jan Nagel
  • Malin Rau
  • Jean Ravelomanana
  • Alex Samarodnitsky
  • Thomas Sauerwald
  • Adrien Schertzer
  • Yanina Shkel
  • Konstantinos Zampetakis

The workshop is planned to occur in person at the "Hotel Dortmund am Technologiezentrum" from the 17th of April 2023 (12:30 PM) to the 21st of April 2023 (2 PM). The hotel is located close to the TU Dortmund.

All talks take place in the REFA conference center directly next to the hotel. (Room 228, first floor)

During the Workshop we would like to take an overview on the before mentioned topics and on the related open problems.


Interested participants can register under the following link. If you would like to exhibit a poster please also submit an abstract:

Registration formular


The workshop will start with Lunch at 12:30 pm on Monday and will conclude after Lunch on Friday at 14:00 pm.


Lunch is included (in the hotel restaurant).


If you have any questions do not hesitate to send a mail to our secretary at ulrike.spear@tu-dortmund.de

Emmanuel Abbe

Title: Leap complexity and neural network learning  

Abstract: Boolean functions with sparse Fourier transform on uniform inputs are known to be efficiently learnable in the PAC model. Here we consider the more specific model of learning with SGD on 'regular' neural networks.  
We claim that the sample complexity of learning sparse target functions in this model is controlled by a new “leap" complexity measure, which measures how “hierarchical" target functions are in the orthonormal L2 basis (Fourier transform for Boolean functions). For depth 2, we prove such a claim: we show that a time complexity of d^Leap time is sufficient for a (layerwise projected) SGD and necessary fo noisy GD. In particular, it is shown that SGD learns with a saddle-to-saddle dynamic that learns the functions by climbing the degree of hierarchical features.  We then discuss how this leads to a new 'degree curriculum' learning algorithm. Joint works with E. Boix (MIT), T. Misiakiewicz (Stanford) and S. Bengio (Apple), A. Lotfi (EPFL).


Ines Armendariz

Title: Soliton decomposition of a Brownian path

Abstract: The Box Ball System, or BBS for short, was introduced by Takahashi and Satsuma in 1990 as a cellular automaton that exhibits solitons (travelling waves). In a recent work, Ferrari, Nguyen, Rolla and Wang propose a hierarchical decomposition of a fixed configuration of the BBS in solitons, called the slot decomposition, and Ferrari and Gabrielli identified the distribution of this decomposition for a random walk with negative drift.
In this project we extend these results to a Brownian motion with negative drift. We consider the excursions over past minima of the trajectory, and show that they can be decomposed as a superposition of solitons. These are distributed as a Poisson process in the first quadrant of the plane, with an intensity that is homogeneous in the abscissa (associated to the location of the solitons) but not in the ordinate (denoting the size of the solitons).
Ongoing work with Pablo Blanc, Pablo Ferrari and Davide Gabrielli


Petra Berenbrink

Title: Fast Convergence of k-Opinion Undecided State Dynamics in the Population Protocol Model

Abstract: We analyze the convergence of the k-opinion Undecided State Dynamics (USD) in the population protocol model.
For k=2 opinions it is well known that the USD reaches \emph{consensus} with high probability within $O(n \log n)$ interactions.
Proving that the process also quickly solves the consensus problem for k>2 opinions has remained open.
In this paper we prove such convergence: under mild assumptions on k and on the initial number of undecided agents we prove that the USD achieves plurality consensus within O(k n \log n) interactions with high probability, regardless of the initial bias.
Moreover, if there is an initial additive bias  we prove that the initial plurality opinion wins with high probability, and if there is a multiplicative bias the convergence time is further improved.


Charilaos Efthymiou

Title: Spectral Independence Beyond Uniqueness using the Topological Method.

Abstract:  We present novel results for fast mixing of Glauber dynamics using the newly introduced and powerful Spectral Independence method from [Anari, Liu, Oveis-Gharan: FOCS 2020]. In our results, the parameters of the Gibbs distribution are expressed in terms of the spectral radius of the adjacency matrix of G, or that of the Hashimoto non-backtracking matrix.  The analysis relies on new techniques that we introduce to bound the maximum eigenvalue of the pairwise influence matrix I_G for the two spin Gibbs distribution \mu. There is a common framework that underlies these techniques which we call the topological method. The idea is to systematically exploit the well-known connections between I_G and the topological construction called tree of self-avoiding walks. Our approach is novel and gives new insights to the problem of establishing spectral independence for Gibbs distributions. More importantly, it allows us to derive new -- improved -- rapid mixing bounds for Glauber dynamics on distributions such as the Hard-core model and the Ising model for graphs that the spectral radius is smaller than the maximum degree.


Weiming Feng

Title: Perfect sampling from spatial mixing

Abstract: In this talk, I will introduce a perfect sampling technique that can be applied to general Gibbs distributions and runs in linear time if the correlation decays faster than the neighbourhood growth. In particular, in graphs with sub-exponential neighbourhood growth like Z^d, our algorithm achieves linear running time as long as Gibbs sampling is rapidly mixing. As concrete applications, we obtain the currently best perfect samplers for colourings and for monomer-dimer models in such graphs.


Aernout van Enter

Title: Dyson models with random boundary conditions

Abstract: I discuss the low-temperature behaviour of Dyson models (polynomially decaying long-range Ising models in one dimension) in the presence of random boundary conditions. For typical random (i.i.d.) boundary conditions Chaotic Size Dependence occurs, that is, the pointwise thermodynamic limit of the finite-volume Gibbs states for increasing volumes does not exist, but the sequence of states moves between various possible limit points. As a consequence it makes sense to study distributional limits, the so-called "metastates" which are measures on the possible limiting Gibbs measures.
The Dyson model is known to have a phase transition for decay parameters α between 1 and 2. We  show that the metastate obtained from random boundary conditions changes character at α =3/2. It is dispersed in both cases, but it  changes between being supported on two pure Gibbs measures when α is less than 3/2 to  being supported on mixtures thereof when α is larger than 3/2.


Andreas Göbel

Title: Sampling and approximation algorithms for Gibbs point processes.

Abstract: Gibbs point processes are probability distributions over continuous spaces, that describe random spatial events. In statistical physics they are used to model gases, fluids, or crystals, while in other fields they are used to model e.g. the growth of trees in a forest, the distribution of stars in the universe, or the location of cities on a map. The main computational tasks related to such a model are sampling from the point process and computing its partition function, which is the normalising constant of its distribution. In this talk we will discuss algorithmic approaches, developed over series of articles, that yield polynomial-time algorithms for (approximately) sampling from such point processes and for obtaining a randomised approximation of their partition functions, as well as quasi-polynomial deterministic algorithms for approximating their partition functions. Our view point is that of obtaining rigorous asymptotic running time guarantees for the broadest parameter-range possible, in terms of the fugacity \lambda \in \mathbb{R}_{\ge 0} and a repulsive symmetric pair potential \phi on a bounded measurable region \mathbb{V} of a complete separable metric space, equipped with a locally finite reference volume measure \nu.  We will focus on an algorithmic approach based on mapping repulsive Gibbs point processes to hard-core models on a natural family of geometric random graphs. We also give an overview of other approaches that have appeared in the literature.


Joon Lee

Title: The sparse parity matrix

Abstract: equals 1 with probability d/n independently for a fixed d>0. Draw a vector y randomly from the column space of A. It is a simple observation that the entries of a random solution x to Ax=y are asymptotically pairwise independent, i.e.,∑i<jE|P[xi=s,xj=t∣A]−P[xi=s∣A]P[xj=t∣A]|=o(n2)for s,t∈F2. But what can we say about the {\em overlap} of two random solutions x,x′, defined asn−1∑ni=11{xi=x′i}? We prove that for d<e the overlap concentrates on a single deterministic valueα∗(d). By contrast, for d>e the overlap concentrates on a single value once we condition on the matrix A, while over the probability space of A its conditional expectation vacillates between two different values α∗(d)<α∗(d), either of which occurs with probability 1/2+o(1). This anti-concentration result provides an instructive contribution to both the theory of random constraint satisfaction problems and of inference problems on random structures.


Alexander Munteanu 



Title: Oblivious Sketching for Logistic and Sparse Linear Regression



Abstract: We review recent advances on oblivious linear sketching. The toolbox of sketching enables efficient approximation algorithms for analyzing data streams and distributed data by compressing data while preserving various symmetric functions such as l_p-norms, or M-estimators. Going one step further to highly imbalanced and asymmetric functions, we quickly run into impossibility results.

Our recent research thus focused on new sketching techniques to cope with these issues under natural assumptions to the data distribution, captured by a parameter m. Our sketches provably reduce n data points to poly(m,d,log(n)) while preserving a constant factor approximation for logistic regression. State of the art sketches compress to almost linear size in mu and d, which roughly matches the best results obtained via subsampling, and which is close to optimal.

Model sparsity is a common assumption for very high dimensional data where we do not regress on all, but only on at most k << d dimensions. While the computational complexity for this problem becomes hard, it allows us to sketch to a smaller o(d) size. We show that reducing to k*log(d/k)/eps^2 is essentially optimal for any sketching algorithm under several regression losses, combining high dimensional probability, information theoretic arguments and metric embedding theory.


Ido Nachum

Title: A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs

Abstract: Fix a dataset \{ x_i \}_{i=1}^n \subset R^d. The celebrated Johnson--Lindenstrauss (JL) lemma shows that a random projection preserves geometry: with high probability, <x_i,x_j> \approx < W \cdot x_i , W \cdot x_j >.
How does the JL lemma relate to neural networks?
A neural network is a sequential application of a projection that is followed by a non-linearity \sigma: R \rightarrow R (applied coordinate-wise). For example, a fully connected network (FCN) with two hidden layers has the form: N(x)=W_3 \cdot \sigma( W_2 \cdot \sigma( W_1 \cdot x ) ).
By the JL lemma, any layer of a random linear FNN (\sigma(x)=x) essentially preserves the original geometry of the dataset.
How does the quantity < \sigma( W \cdot x_i ) , \sigma( W \cdot x_j ) > change with other non-linearities or a convolution (*) instead of matrix multiplication (\cdot)?
For FCNs with the prevalent ReLU activation (ReLU(x):=\max\{ x , 0 \}), the angle between two inputs contracts according to a known mapping. The question for non-linear convolutional neural networks (CNNs) becomes much more intricate. To answer this question, we introduce a geometric framework. For linear CNNs, we show that the Johnson--Lindenstrauss lemma continues to hold, namely, that the angle between two inputs is preserved. For CNNs with ReLU activation, on the other hand, the behavior is richer: The angle between the outputs contracts, where the level of contraction depends on the nature of the inputs. In particular, after one layer, the geometry of natural images is essentially preserved, whereas, for Gaussian correlated inputs, CNNs exhibit the same contracting behavior as FNNs with ReLU activation.


Jan Nagel

Title: On the speed of random walk on randomly weighted trees

Abstract: We consider random walk on trees where each edge is assigned a random conductance, and the random walk crosses an edge with probability proportional to the conductance of the edge. For this process, the distance to the root satisfies a law of large numbers with limit the effective speed of the random walk. We study how the speed changes when the distribution of conductances changes, in particular when the distribution of conductances is close to a critical case, where the tree on which the random walk moves might be finite.


Jean Ravelomanana

Title: Warning Propagation: stability and subcriticality

Abstract: Warning Propagation is a combinatorial message passing algorithm that unifies and generalises a wide variety of recursive combinatorial procedures. Special cases include the Unit Clause Propagation and Pure Literal algorithms for satisfiability as well as the peeling process for identifying the k-core of a random graph. Here we analyse Warning Propagation in full generality on a very general class of multi-type random graphs. We prove that under mild assumptions on the random graph model and the stability of the the message limit,  Warning Propagation converges rapidly. In effect, the analysis of the fixed point of the message passing process on a random graph reduces to analysing the process on a multi-type Galton-Watson tree. This result corroborates and generalises a heuristic first put forward by Pittel, Spencer and Wormald in their seminal k-core paper (JCTB 1996).


Malin Rau

Title: Asynchronous Opinion Dynamics in Social Networks

Abstract: Opinion spreading in a society decides the fate of elections, the success of products, and the impact of political or social movements.
A prominent model to study opinion formation processes is due to Hegselmann and Krause. It has the distinguishing feature that stable states do not necessarily show consensus, i.e., the population of agents might not agree on the same opinion.
We focus on the social variant of the Hegselmann-Krause model. There are n agents, which are connected by a social network. Their opinions evolve in an iterative, asynchronous process, in which agents are activated one after another at random. When activated, an agent adopts the average of the opinions of its neighbors having a similar opinion (where similarity of opinions is defined using a parameter ɛ). Thus, the set of influencing neighbors of an agent may change over time. To the best of our knowledge, social Hegselmann-Krause systems with asynchronous opinion updates have only been studied with the complete graph as social network
We show that such opinion dynamics are guaranteed to converge for any social network. We provide an upper bound of O(n|E|^2 (ɛ/δ)^2) on the expected number of opinion updates until convergence to a stable state, where |E| is the number of edges of the social network, and δ is a parameter of the stability concept. For the complete social network we show a bound of O(n^3(n^2 +  (ɛ/δ)^2)) that represents a major improvement over the previously best upper bound of O(n^9 (ɛ/δ)^2).


Alex Samorodnitsky

Title: A contractive inequality for functions on the boolean cube.

Abstract: We will describe a contractive inequality for functions on the boolean cube, upper-bounding the l_p norm of the image of a function F under the noise operator  by the expected norm of a projection of F on a random coordinate subset of a certain size. We use this inequality to study binary linear codes and, in particular,  obtain new bounds on the weight distribution of Reed Muller codes of positive rate and show that  these codes perform reasonably well on binary symmetric channels.


Thomas Sauerwald

Title: Balanced Allocations: The Power of Choice versus Noise

Abstract: In the balanced allocation problem we wish to allocate m balls (jobs) into n bins (servers) by allowing each ball to choose from some bins sampled uniformly at random. The goal is to maintain a small gap between the maximum load and average load. For the one-choice protocol, where each ball is allocated to a random bin, the gap diverges for large m. However, for the two-choice protocol, where each ball samples two bins and is placed in the least loaded of the two, it was shown that gap is only O(log log n) for all m. This dramatic improvement is widely known as ``power of two choices'', and similar effects have been observed in hashing and routing. In this talk, we will first give some intuition why two-choice maintains such a good balance in practice and theory. Then we will present our results in settings where the load information is subject to some noise. For example, the queried load information of bins might be (i) outdated, (ii) subject to some adversarial or random perturbation or (iii) only retrievable by binary queries. We prove that, roughly speaking, if the noise is not too strong, then performance is not affected and the O(log log n) gap bound of two-choice still holds. We also prove that under strong noise or outdated information, two-choice is no longer the optimal protocol.


Adrien Schertzer

Title: Fluctuations of the free energy in p-spin SK models on two scales

Abstract: 20 years ago, Bovier, Kurkova, and Löwe proved a central limit theorem (CLT) for the fluctuations of the free energy in the p-spin version of the Sherrington-Kirkpatrick model of spin glasses at high temperatures. We improve their results in two ways. First, we extend the range of temperatures to cover the entire regime where the quenched and annealed free energies are known to coincide. Second, we identify the main source of the fluctuations as a purely coupling dependent term, and we show a further CLT for the deviation of the free energy around this random object.


Yanina Shkel

Title: Minimum-Entropy Functional Representations (And How to Find Them)

ABSTRACT: Given two jointly distributed random variables (X,Y), a functional representation of X is a random variable Z independent of Y, and a deterministic function g such that X=g(Y,Z). The problem of finding a minimum entropy functional representation is known to be equivalent to the problem of finding a minimum entropy coupling where, given a collection of probability distributions {P_1, … , P_m}, the goal is to find a coupling X_1, ..., X_m (X_i ~ P_i) with the smallest Rényi entropy H_\alpha(X_1, ..., X_m). In this talk we present a new information spectrum converse, and apply it to obtain direct lower bounds on minimum entropy in both problems. The new results improve on all known lower bounds, including previous lower bounds based on the concept of majorization. In particular, the presented proofs leverage both - the information spectrum and the majorization - perspectives on minimum entropy couplings and functional representations. Finally, we review known greedy algorithms for solving the two problems, as well as overview several applications of minimum-entropy functional representation.


Kostas Zampetakis

Title: The Lovász Local Lemma as Approximate Dynamic Programming

Abstract: If we assign random values to the variables of a Constraint Satisfaction Problem, the probability we will get a solution is positive, if and only if the instance is satisfiable. Crucially, to establish that the instance is satisfiable any multiplicative underestimation of this probability, no matter how poor, suffices. The Lovász Local Lemma (LLL), a cornerstone of the Probabilistic Method, is a tool for establishing such positive probability lower bounds. In this talk, we will present a new, computational view of the LLL which leads to significant generalizations, including a passage from the setting of probabilities to that of general supermodular functions, and improved bounds for the negative-fugacity singularity of the hard-core model for several lattices.