# Documentation

## For Students at FAU

I am currently moving to the University of Toronto, and hence cannot supervise new theses or research projects. You are however always welcome to reach me out, if you got any questions regarding your research project or thesis!

# Brief description of a few recent topics

## Over-the-Air Federated Learning

Federated learning tries to make a trade-off between the classical centralized and decentralized learning over a network. To understand the principle concept, consider a network of devices each having a local dataset. For instance, we can assume that these devices are some smart phones that have collected input-output samples of the same function and want to learn this function. Traditional techniques for learning this function are as follows:

In centralized learning, all devices share their datasets, i.e., the collected input-output samples, with a center, often called the parameter center (PS). The PS then uses the whole collected datasets to learn the function, for instance it runs the least-squares (LS) method over the whole collected data.

In decentralized learning, each device learns over its own dataset individually. That means in our example, it runs the LS method over the its own samples.

The former approach leads to better learning performance, but it suffers from several issues. Among these issues, the main ones are high communication load, as the whole datasets should be transferred to the PS, and privacy, since the dataset of each device is now accessed by a third-party, i.e., PS, who is not necessarily trustworthy, and it could even be stolen by some eavesdroppers in the communication phase. Unlike the former approach, the latter does not suffer from these issues; however, it can suffer from a very bad performance; just think of learning a high-dimensional function from a few number of samples!

Federated learning tries to make a peace between these two extreme points of the spectrum. It suggests that these devices learn first locally and then share their results (not their datasets) gradually with the PS. The PS in each step takes the results and run a sort of wise mixing on them (it mainly averages them out). It the sends back the mixed version to each device and the device keep on running its learning algorithm with the mixed result (not its initial local results). Since the learning is now performed via the average results of all devices it performs much better than the traditional decentralized approach. On the other hand, the devices only share their results, often referred to as parameters. These parameters are of much smaller sizes compared to the datasets and are not very obviously connected to the data-points in the datasets. Hence, compared to the centralized approach, the communication load can significantly drop and the privacy can be better preserved.

Over-the-air federated learning refers to implementing federated learning in a network of mobile devices with wireless communication channels. Compared to the simple noise-free settings, wireless systems are subject to more challenges, such as noise and fading in the channels and passive eavesdropping. Addressing these challenges is the main focus of this topic.

## Linear Computation Coding

Linear computation coding (LCC) is a newly proposed paradigm proposed for efficiently implementing the vector-matrix multiplication. The idea can be illustrated easily: Assume you want to design a module which multiplies any input vector to a matrix and gives you the product as the output. The classical approach is to save the quantized versions of matrix elements on the device and program it to run a finite resolution multiplication. LCC indicates that it is in fact not the most efficient idea!

LCC proposes to the following: Break your matrix into a product of multiple matrices. Let us call each of these matrices in this decomposition a factor. Each factor is a sparse matrix, i.e., it has only few nonzero entries, and its nonzero entries are integer powers of 2. The product of each of these factors into a vector is now a simple task: you need to perform some bit shifts on the vector entries and then sum them up. The whole matrix-vector product can also be effectively implemented: Sequentially run this simplified procedure for all the factors.

LCC has shown to give a significantly huge gain in terms of accuracy compared to the classical finite-resolution implementation. More precisely, for a fixed complexity budget, if we implement a vector-matrix product via LCC, we get a significantly smaller quantization error.