Notes for the lectures on achieving the optimal trade-off between rate and list decoding radius via folded Reed-Solomon codes are now posted on the course webpage. Notes 7,8 on Reed-Solomon unique decoding, GMD decoding, and expander codes have also been edited.

## April 30, 2010

## April 13, 2010

### Notes for lectures 18-21

Drafts of the notes for the lectures up till last Friday are now posted on the course webpage. I plan to proofread and make necessary edits to portions of the notes (for lecture 15 and later) in the next couple of weeks or so. But the current versions should already be useful if you need a refresher on something we covered in lecture, or as reference for working on the problem set.

## March 31, 2010

### Notes on lectures 16,17

A draft of the notes on expander codes and their decoding that was covered in last week’s lectures are now posted on the course webpage. Also posted are related notes (that includes some illustrative figures) from a previous offering of the course.

## March 25, 2010

### Notes 7: Justesen codes and Reed-Solomon & GMD decoding

A draft version of the notes for the material we covered on Justesen codes, Reed-Solomon unique decoding, and GMD decoding of concatenated codes is posted on the course webpage. The material from Section 4 (Peterson’s algorithm) onwards has not been proofread and will be edited soon, but I wanted to make this version available now to avoid too much lag.

## March 5, 2010

### Notes 6: Code constructions: Reed-Solomon, BCH, Reed-Muller, and concatenated

*[Lectures scribed by Eric Blais]*

[The notes are also available in pdf format on the course webpage. Unless there is some explicit demand for it, I suspect that these might be the *last set* of notes for this course that I make available in wordpress html format.]

In this lecture, we begin the algorithmic component of the course by introducing some explicit families of good algebraic codes. We begin by looking at Reed-Solomon codes.

**1. Reed-Solomon codes **

Reed-Solomon codes are a family of codes defined over large fields as follows.

Definition 1 (Reed-Solomon codes)For integers , a field of size , and a set , we define the Reed-Solomon code

A natural interpretation of the code is via its encoding map. To encode a message , we interpret the message as the polynomial

We then evaluate the polynomial at the points to get the codeword corresponding to .

To evaluate the polynomial on the points , we multiply the message vector on the left by the Vandermonde matrix

The matrix is a generator matrix for , so we immediately obtain that Reed-Solomon codes are linear codes over .

** 1.1. Properties of the code **

Let’s now examine the parameters of the above Reed-Solomon code. The block length of the code is clearly . As we will see, the code has minimum distance . This also means that the encoding map is injective and therefore the code has dimension equal to .

The key to establishing the minimum distance of Reed-Solomon codes is the `degree mantra’ that we saw in the previous lecture: *A non-zero polynomial of degree with coefficients from a field has at most roots in .*

*Proof:* Since is a linear code, to prove the theorem it suffices to show that any non-zero codeword has Hamming weight at least .

Let . The polynomial is a non-zero polynomial of degree at most . So by our degree mantra, has at most roots, which implies that has at most zeros.

By the Singleton bound, the distance cannot exceed , and therefore must equal . The upper bound on distance can also be seen by noting that the codeword corresponding to the polynomial has Hamming weight exactly .

Note that the minimum distance of Reed-Solomon codes meets the Singleton bound. This is quite interesting: Reed-Solomon codes are a simple, natural family of codes based only on univariate polynomials, and yet their rate is optimal.

In our definition above, we have presented Reed-Solomon codes in the most general setting, where can be any arbitrary subset of of size . This presentation highlights the flexibility of Reed-Solomon codes. In practice, however, there are two common choices of used to instantiate Reed-Solomon codes:

- Take , or
- Take to be the set of non-zero elements in .

These two choices attain the best possible trade-off between the field size and the block length.

** 1.2. Alternative characterization **

We presented Reed-Solomon codes from an encoding point of view. It is also possible to look at these codes from the “parity-check” point of view. This approach is used in many textbooks, and leads to the following characterization of Reed-Solomon codes.

Theorem 3 (Parity-check characterization)For integers , a field of size , a primitive element , and the set , the Reed-Solomon code over with evaluation set is given by

In other words, Theorem 3 states that the codewords of the Reed-Solomon code with evaluation points correspond to the polynomials of degree that vanish at the points .

The characterization of Reed-Solomon codes in Theorem 3 has the same dimension as the code obtained with our original definition; to complete the proof of Theorem 3, we only need to check that every codeword obtained in Definition 1 satisfies the parity-check condition (1).

Exercise 1Complete the proof of Theorem 3.

(Hint: The proof uses the fact that for every in , .)

## February 12, 2010

### Basics of Finite Fields

In the next segment of the course, we will study explicitly constructed codes, and develop error-correction algorithms for them. Prominent among these will be certain algebraic constructions of codes based on polynomials over finite fields.

It is possible to get quite far treating finite fields as “black-boxes” that allow the field operations to be performed efficiently as atomic steps, along with just one important mantra:

*A non-zero polynomial of degree with coefficients from a field has at most roots in .*

But it is nevertheless desirable to have a good working knowledge of the basics of the theory of finite fields, and we will appeal to some of these results later on for list decoding some powerful algebraic codes. You are likely already familiar with this material from your undergraduate algebra. You can refer to your favorite algebra text for the basic theorems and their proofs, but I wanted to point to some notes that you can turn to if you need a refresher and a convenient reference.

So here are some excellently done notes on finite fields, written by G. David Forney and available on MIT’s OpenCourseWare for the course 6.451 Principles of Digital Communication II. These notes rigorously prove everything that we would need (and more!) from first principles, in a nice sequence.

Collected below are some basic results about finite fields, for quick reference. (I do not recall the definition of fields and the field axioms here.) All these facts are proved in the above linked notes.

- For every prime , there is a unique finite field of size that is isomorphic to which is the set under mod- addition and multiplication.
- For each prime , positive integer , and polynomial with coefficients in of degree that is
*irreducible*(in ), the set of polynomials in of degree at most with addition and multiplication of the polynomials defined modulo is a finite field (denoted ) with elements. - Every finite field is isomorphic to such a field, and therefore must have elements for some prime and positive integer .
- For every prime and integer , there exists an irreducible polynomial of degree . Therefore, there is a finite field with elements for every prime and positive integer .
- Additively, a finite field with elements has the structure of a vector space of dimension over .
- The multiplicative group of a finite field (consisting of its non-zero elements) is cyclic. In other words, the non-zero elements of a field can be written as for some .
- A with such a property is called a
*primitive element*of the field . - A field has primitive elements, where is the Euler’s totient function.

- A with such a property is called a
- All fields of size are isomorphic to for an arbitrary choice of degree irreducible polynomial .

The finite field with elements is therefore unique up to isomorphism field and will be denoted by .**Remark:**While one can pick any irreducible to represent the field as , sometimes a special choice can be judicious. For example, the complexity of multiplication is better if is sparse (i.e., has very few non-zero coefficients). - The elements of are the distinct roots of the polynomial .
- For each dividing , the field has a unique subfield of size , which consists of the roots of the polynomial .
- The polynomial is the product of all monic irreducible polynomials in whose degree divides , with no repetitions.

## February 9, 2010

### Notes 5.2: Proof of the first MRRW bound

*[Lecture scribed by Srivatsan Narayanan]*

**1. Fourier analytic proof of the first MRRW bound **

We develop a proof of the first MRRW (JPL) bound for binary *linear* codes based on a covering argument. Our main theorem (Theorem 1) states that the dual code has a small essential covering radius. From this, we conclude that the size of the dual code is large, and equivalently, the size of the code is small.

Theorem 1 (Dual codes have a small ”essential covering radius”)Let be a binary linear code of distance at least . Then,

Remark 1We say that a set has a covering radius at most if every point in is inside for some . Theorem 1 asserts that the dual code satisfies a relaxed version of the property: for large enough , at least a fraction of the points are inside for some . This is sufficient for establishing an asymptotic bound on the rate of the codes, as shown in Corollary 2.

Before we prove Theorem 1, we observe that it directly implies an asymptotic upper bound on the rate of codes.

Corollary 2Let be a binary linear code with distance at least . Then,

## February 7, 2010

### Notes 5.1: Fourier Transform, MacWillams identities, and LP bound

*[Lecture scribed by Srivatsan Narayanan]*

We will discuss the last and most sophisticated of our (upper) bounds on rate of codes with certain relative distance, namely the first linear programming bound or the first JPL bound due to McEliece, Rodemich, Rumsey, and Welch, 1977 (henceforth, MRRW). This bound is the best known asymptotic upper bound on the rate of a binary code for a significant range of relative distances (which is roughly ). We will present a complete and self-contained proof of the this bound. A variant called the second JPL bound gives the best known upper bound for the remainder of the range, and we will mention this bound (without proof) at the end.

The linear programming bound is so-called because it is based on Delsarte’s linear programming approach which shows that the distance distribution of a binary code satisfies a family of linear constraints whose coefficients are the evaluations of a certain family of orthogonal polynomials (in this case, the Krawtchouk polynomials). The optimum (maximum) of this linear program gives an upper bound on . MRRW constructed good feasible solutions to the dual of linear program using tools from the theory of orthogonal polynomials, and their value gave an upper bound on by weak duality.

In these notes, we will use Fourier analysis of functions defined on the hypercube to derive a relationship between the weight distribution of a linear code and its dual, called the MacWilliams identifies. These give the linear constraints of the above-mentioned linear program.

Instead of the using the linear program or its dual and the theory of orthogonal polynomials (and specifically properties of Krawtchouk polynomials), in the second part of these notes, we will give a *self-contained proof* of the first linear programming bound for binary linear codes using a Fourier analytic approach. This is based on the methods of Friedman and Tillich, which was later extended also to general codes by Navon and Samorodnitsky, that shows that the dual of a linear code of large distance must have small “essential covering radius” (which means that Hamming balls of small radii around the dual codewords will cover a large fraction of the Hamming space ). This shows that the dual must have large size, and therefore the code itself cannot be too large. The method can be extended to non-linear codes, but we will be content with deriving the linear programming bound for (binary) linear codes.

## January 30, 2010

### Notes 4: Elementary bounds on codes

We now turn to *limitations* of codes, in the form *upper bounds* on the rate of codes as a function of their relative distance. We will typically give concrete bounds on the size of codes, and then infer as corollaries the asymptotic statement for code families relating rate and relative distance. All the bounds apply for general codes and they do not take advantage of linearity. However, for the most sophisticated of our bounds, the linear programming bound, which we discuss in the next set of notes, we will present the proof only for linear codes as it is simpler in this case.

We recall the two bounds we have already seen. The Gilbert-Varshamov bound asserted the existence of (linear) -ary codes of block length , distance at least , and size at least . Or in asymptotic form, the existence of codes of rate approaching and relative distance . The Hamming or sphere-packing bound gave an upper bound on the size (or rate) of codes, which is our focus in these notes. The Hamming bound says that a -ary code of block length and distance can have at most codewords. Or in asymptotic form, a -ary code of relative distance can have rate at most .

As remarked in our discussion on Shannon’s theorem for the binary symmetric channel, if the Hamming bound could be attained, that would imply that we can correct *all* (as opposed to most) error patterns of weight with rate approaching . Recall that there are perfect codes (such as the Hamming codes) that meet the Hamming bound. However, these codes have very small distance ( in the case of Hamming codes). A generalization of Hamming codes called binary BCH codes (the acronym stands for the code’s independent inventors Hocquenghem (1959) and Bose and Ray-Chaudhuri (1960)) show that when is a fixed constant and the block length is allowed to grow, the Hamming bound is again tight up to lesser order terms. However, we will improve upon the Hamming bound and show that its asymptotic form (for any relative distance bounded away from zero) cannot be attained for any fixed alphabet. The proof method has some connections to *list decoding*, which will be an important focus topic later in the course.

**1. Singleton bound **

We begin with the simplest of the bounds:

Theorem 1Let be a code of block length and minimum distance over an alphabet of size . Then .

*Proof:* Suppose not, and . By the pigeonhole principle there must be two codewords , that agree on the first locations. But then , contradicting the hypothesis that has minimum distance .

This gives an alphabet-independent asymptotic upper bound on the rate as a function of relative distance.

Corollary 2The rate and relative distance of a code satisfy .

## January 25, 2010

### Notes 3: Stochastic channels and noisy coding theorem

We now turn to the basic elements of Shannon’s theory of communication over an intervening noisy channel.

**1. Model of information communication and noisy channel **

To quote Shannon from his paper *A Mathematical theory of communication*: “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” The basic setup of the communication problem consists of a source that generates digital information which is to reliably communicated to a destination through a channel, preferably in the most efficient manner possible. This “destination” could be spatially separated (eg., a distant satellite is sending images of Jupiter back to the space station on Earth), or could be temporally separated (eg., we want to retrieve date stored on our hard disk at a later point of time).

The following is a schematic of the communication model:

The first step in the communication model is to exploit the redundancy in the output of the source and compress the information to economize the amount of “raw, non-redundant” data that must be transmitted across the channel. This data compression step in called *source coding.* If at each time step the source outputs an i.i.d copy of a random variable supported on a finite set , then Shannon’s source coding theorem states that one can compress its output to bits per time step (on average, over i.i.d samples from the source , as ). In other words samples from the source can be coded as one of possible outputs. Here is the fundamental Shannon entropy defined as

where is to the base . Thus the entropy of a fair coin toss is , and that of a -biased coin toss is . The *source decoder* at the other end of the communication channel then decompresses the received information into (hopefully) the original output of the source.

The output of the source coder, say , must then be communicated over a noisy channel. The channel’s noisy behavior causes errors in the received symbols at the destination. To recover from the errors incurred due to the channel, one should *encode* the information output by the source coder by adding systematic redundancy to it. This is done through channel coding which maps to a codeword of some suitable error-correcting code (the study of channel coding will be our focus in this course).