Introduction to coding theory (CMU, Spr 2010)

April 30, 2010

Notes on list decoding folded RS codes

Filed under: Lecture notes — Venkat Guruswami @ 4:43 pm

Notes for the lectures on achieving the optimal trade-off between rate and list decoding radius via folded Reed-Solomon codes are now posted on the course webpage. Notes 7,8 on Reed-Solomon unique decoding, GMD decoding, and expander codes have also been edited.


April 13, 2010

Notes for lectures 18-21

Filed under: Lecture notes — Venkat Guruswami @ 9:51 pm

Drafts of the notes for the lectures up till last Friday are now posted on the course webpage. I plan to proofread and make necessary edits to portions of the notes (for lecture 15 and later) in the next couple of weeks or so. But the current versions should already be useful if you need a refresher on something we covered in lecture, or as reference for working on the problem set.

March 31, 2010

Notes on lectures 16,17

Filed under: Lecture notes — Venkat Guruswami @ 11:15 am

A draft of the notes on expander codes and their decoding that was covered in last week’s lectures are now posted on the course webpage. Also posted are related notes (that includes some illustrative figures) from a previous offering of the course.

March 25, 2010

Notes 7: Justesen codes and Reed-Solomon & GMD decoding

Filed under: Lecture notes — Venkat Guruswami @ 10:48 pm

A draft version of the notes for the material we covered on Justesen codes, Reed-Solomon unique decoding, and GMD decoding of concatenated codes is posted on the course webpage. The material from Section 4 (Peterson’s algorithm) onwards has not been proofread and will be edited soon, but I wanted to make this version available now to avoid too much lag.

March 5, 2010

Notes 6: Code constructions: Reed-Solomon, BCH, Reed-Muller, and concatenated

Filed under: Lecture notes — Venkat Guruswami @ 6:54 pm

[Lectures scribed by Eric Blais]

[The notes are also available in pdf format on the course webpage. Unless there is some explicit demand for it, I suspect that these might be the last set of notes for this course that I make available in wordpress html format.]

In this lecture, we begin the algorithmic component of the course by introducing some explicit families of good algebraic codes. We begin by looking at Reed-Solomon codes.

1. Reed-Solomon codes

Reed-Solomon codes are a family of codes defined over large fields as follows.

Definition 1 (Reed-Solomon codes) For integers {1 \le k < n}, a field {\mathbb{F}} of size {|\mathbb{F}| \ge n}, and a set {S = \{\alpha_1,\ldots,\alpha_n\} \subseteq \mathbb{F}}, we define the Reed-Solomon code

\displaystyle \mathrm{RS}_{\mathbb{F}, S}[n,k] = \{(p(\alpha_1), p(\alpha_2), \ldots, p(\alpha_n)) \in \mathbb{F}^n \mid p \in \mathbb{F}[X] \mathrm{deg}(p) \le k-1\} \ .

A natural interpretation of the {\mathrm{RS}_{\mathbb{F}, S}[n,k]} code is via its encoding map. To encode a message {m = (m_0, m_1,\ldots, m_{k-1}) \in \mathbb{F}^k}, we interpret the message as the polynomial

\displaystyle  p(X) = m_0 + m_1 X + \cdots + m_{k-1} X^{k-1} \in \mathbb{F}[X].

We then evaluate the polynomial {p} at the points {\alpha_1, \alpha_2, \ldots, \alpha_n} to get the codeword corresponding to {m}.

To evaluate the polynomial {p} on the points {\alpha_1,\alpha_2,\ldots,\alpha_n}, we multiply the message vector {m} on the left by the {n \times k} Vandermonde matrix

\displaystyle G = \begin{pmatrix} 1 & \alpha_1 & \alpha_1^{\,2} & \cdots & \alpha_1^{\,k-1} \\ 1 & \alpha_2 & \alpha_2^{\,2} & \cdots & \alpha_2^{\,k-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \alpha_n & \alpha_n^{\,2} & \cdots & \alpha_n^{\,k-1} \end{pmatrix}.

The matrix {G} is a generator matrix for {\mathrm{RS}_{\mathbb{F}, S}[n,k]}, so we immediately obtain that Reed-Solomon codes are linear codes over {\mathbb{F}}.

1.1. Properties of the code

Let’s now examine the parameters of the above Reed-Solomon code. The block length of the code is clearly {n}. As we will see, the code {\mathrm{RS}_{\mathbb{F}, S}[n, k]} has minimum distance {n-k+1}. This also means that the encoding map is injective and therefore the code has dimension equal to {k}.

The key to establishing the minimum distance of Reed-Solomon codes is the `degree mantra’ that we saw in the previous lecture: A non-zero polynomial of degree {d} with coefficients from a field {\mathbb{F}} has at most {d} roots in {\mathbb{F}}.

Theorem 2 The Reed-Solomon code {\mathrm{RS}_{\mathbb{F},S}[n,k]} has distance {n-k+1}.

Proof: Since {\mathrm{RS}_{\mathbb{F},S}[n,k]} is a linear code, to prove the theorem it suffices to show that any non-zero codeword has Hamming weight at least {n-k+1}.

Let {(m_0, m_1,\ldots,m_{k-1}) \neq 0}. The polynomial {p(X) = m_0 + m_1 X + \cdots + m_{k-1}X^{k-1}} is a non-zero polynomial of degree at most {k-1}. So by our degree mantra, {p} has at most {k-1} roots, which implies that {(p(\alpha_1),\ldots, p(\alpha_{n}))} has at most {k-1} zeros.

By the Singleton bound, the distance cannot exceed {n-k+1}, and therefore must equal {n-k+1}. The upper bound on distance can also be seen by noting that the codeword corresponding to the polynomial {\prod_{i=1}^{k-1} (X-\alpha_i)} has Hamming weight exactly {n-k+1}. \Box

Note that the minimum distance of Reed-Solomon codes meets the Singleton bound. This is quite interesting: Reed-Solomon codes are a simple, natural family of codes based only on univariate polynomials, and yet their rate is optimal.

In our definition above, we have presented Reed-Solomon codes in the most general setting, where {S} can be any arbitrary subset of {\mathbb{F}} of size {n}. This presentation highlights the flexibility of Reed-Solomon codes. In practice, however, there are two common choices of {S} used to instantiate Reed-Solomon codes:

  1. Take {S = \mathbb{F}}, or
  2. Take {S = \mathbb{F}^*} to be the set of non-zero elements in {\mathbb{F}}.

These two choices attain the best possible trade-off between the field size and the block length.

1.2. Alternative characterization

We presented Reed-Solomon codes from an encoding point of view. It is also possible to look at these codes from the “parity-check” point of view. This approach is used in many textbooks, and leads to the following characterization of Reed-Solomon codes.

Theorem 3 (Parity-check characterization) For integers {1 \le k < n}, a field {\mathbb{F}} of size {|\mathbb{F}| = q = n+1}, a primitive element {\alpha \in \mathbb{F}^*}, and the set {S = \{1, \alpha, \alpha^2, \ldots, \alpha^{n-1}\}}, the Reed-Solomon code over {\mathbb{F}} with evaluation set {S} is given by \displaystyle \mathrm{RS}_{\mathbb{F}, S}[n,k] = \{ \, (c_0,c_1,\ldots,c_{n-1}) \in \mathbb{F}^n \,\mid\, \ c(X) = c_0 + c_1 X + \cdots + c_{n-1} X^{n-1} \mathrm{satisfies}~ c(\alpha) = c(\alpha^2) = \cdots = c(\alpha^{n-k}) = 0 \,\} \ . \ \ \ \ \ (1)

In other words, Theorem 3 states that the codewords of the Reed-Solomon code with evaluation points {1,\alpha,\dots,\alpha^{n-1}} correspond to the polynomials of degree {n-1} that vanish at the points {\alpha, \alpha^2,\ldots,\alpha^{n-k}}.

The characterization of Reed-Solomon codes in Theorem 3 has the same dimension as the code obtained with our original definition; to complete the proof of Theorem 3, we only need to check that every codeword obtained in Definition 1 satisfies the parity-check condition (1).

Exercise 1 Complete the proof of Theorem 3.

(Hint: The proof uses the fact that for every {x \neq 1} in {\mathbb{F}^*}, {\sum_{\ell=0}^{n-1} x^\ell = \frac{1-x^n}{1-x} = 0}.)


February 12, 2010

Basics of Finite Fields

Filed under: Lecture notes — Venkat Guruswami @ 11:48 am

In the next segment of the course, we will study explicitly constructed codes, and develop error-correction algorithms for them. Prominent among these will be certain algebraic constructions of codes based on polynomials over finite fields.

It is possible to get quite far treating finite fields as “black-boxes” that allow the field operations to be performed efficiently as atomic steps, along with just one important mantra:

A non-zero polynomial of degree {d} with coefficients from a field {{\mathbb F}} has at most {d} roots in {{\mathbb F}}.

But it is nevertheless desirable to have a good working knowledge of the basics of the theory of finite fields, and we will appeal to some of these results later on for list decoding some powerful algebraic codes. You are likely already familiar with this material from your undergraduate algebra. You can refer to your favorite algebra text for the basic theorems and their proofs, but I wanted to point to some notes that you can turn to if you need a refresher and a convenient reference.

So here are some excellently done notes on finite fields, written by G. David Forney and available on MIT’s OpenCourseWare for the course 6.451 Principles of Digital Communication II. These notes rigorously prove everything that we would need (and more!) from first principles, in a nice sequence.

Collected below are some basic results about finite fields, for quick reference. (I do not recall the definition of fields and the field axioms here.) All these facts are proved in the above linked notes.

  1. For every prime {p}, there is a unique finite field of size {p} that is isomorphic to {{\mathbb F}_p} which is the set {\{0,1,\dots,p-1\}} under mod-{p} addition and multiplication.
  2. For each prime {p}, positive integer {m \ge 1}, and polynomial {g(X)} with coefficients in {{\mathbb F}_p} of degree {m} that is irreducible (in {{\mathbb F}_p[X]}), the set of polynomials in {{\mathbb F}_p[X]} of degree at most {m-1} with addition and multiplication of the polynomials defined modulo {g(X)} is a finite field (denoted {{\mathbb F}_{g(X)}}) with {p^m} elements.
  3. Every finite field is isomorphic to such a field, and therefore must have {p^m} elements for some prime {p} and positive integer {m}.
  4. For every prime {p} and integer {m \ge 1}, there exists an irreducible polynomial {g(X) \in {\mathbb F}_p[X]} of degree {m}. Therefore, there is a finite field with {p^m} elements for every prime {p} and positive integer {m}.
  5. Additively, a finite field with {p^m} elements has the structure of a vector space of dimension {m} over {{\mathbb F}_p}.
  6. The multiplicative group of a finite field (consisting of its non-zero elements) is cyclic. In other words, the non-zero elements of a field {{\mathbb F}} can be written as {\{1,\gamma,\gamma^2,\dots,\gamma^{|{\mathbb F}|-2}\}} for some {\gamma \in {\mathbb F}}.
    • A {\gamma} with such a property is called a primitive element of the field {{\mathbb F}}.
    • A field {{\mathbb F}} has {\varphi(|{\mathbb F}|-1)} primitive elements, where {\phi(\cdot)} is the Euler’s totient function.
  7. All fields of size {p^m} are isomorphic to {{\mathbb F}_{g(X)}} for an arbitrary choice of degree {m} irreducible polynomial {g(X) \in {\mathbb F}_p[X]}.
    The finite field with {p^m} elements is therefore unique up to isomorphism field and will be denoted by {{\mathbb F}_{p^m}}.

    Remark: While one can pick any irreducible {g(X)} to represent the field {{\mathbb F}_{p^m}} as {{\mathbb F}_{g(X)}}, sometimes a special choice can be judicious. For example, the complexity of multiplication is better if {g(X)} is sparse (i.e., has very few non-zero coefficients).

  8. The elements of {{\mathbb F}_{p^m}} are the {p^m} distinct roots of the polynomial {X^{p^m}-X \in {\mathbb F}_p[X]}.
  9. For each {k} dividing {m}, the field {{\mathbb F}_{p^m}} has a unique subfield of size {p^k}, which consists of the roots of the polynomial {X^{p^k}-X}.
  10. The polynomial {X^{p^m}-X} is the product of all monic irreducible polynomials in {{\mathbb F}_p[X]} whose degree divides {m}, with no repetitions.

February 9, 2010

Notes 5.2: Proof of the first MRRW bound

Filed under: Lecture notes — Venkat Guruswami @ 1:18 pm

[Lecture scribed by Srivatsan Narayanan]

1. Fourier analytic proof of the first MRRW bound

We develop a proof of the first MRRW (JPL) bound for binary linear codes based on a covering argument. Our main theorem (Theorem 1) states that the dual code {C^\perp} has a small essential covering radius. From this, we conclude that the size of the dual code {|C^\perp|} is large, and equivalently, the size of the code {C} is small.

Theorem 1 (Dual codes have a small ”essential covering radius”) Let {C} be a binary linear code of distance at least {d}. Then, \displaystyle   \left\vert \bigcup_{z \in C^\perp} B(z,r) \right\vert \geq \frac {2^n}n \ \ \ \ \ (1)

for {r = \frac 12 n - \sqrt{d(n-d)} + o(n)}.

Remark 1 We say that a set {S} has a covering radius at most {r} if every point in {\{0,1\}^n} is inside {B(z,r)} for some {z\in S}. Theorem 1 asserts that the dual code satisfies a relaxed version of the property: for large enough {r}, at least a {n^{-O(1)}} fraction of the points are inside {B(z,r)} for some {z\in C^\perp}. This is sufficient for establishing an asymptotic bound on the rate of the codes, as shown in Corollary 2.

Before we prove Theorem 1, we observe that it directly implies an asymptotic upper bound on the rate of codes.

Corollary 2 Let {C} be a binary linear code with distance at least {d = \delta n}. Then,

  1. {|C| \leq n \mathrm{Vol}(n,r)}
  2. {R(C) \leq h\left( \frac12 - \sqrt{\delta (1-\delta)} \right) + o(1)}


February 7, 2010

Notes 5.1: Fourier Transform, MacWillams identities, and LP bound

Filed under: Lecture notes — Venkat Guruswami @ 3:47 pm

[Lecture scribed by Srivatsan Narayanan]

We will discuss the last and most sophisticated of our (upper) bounds on rate of codes with certain relative distance, namely the first linear programming bound or the first JPL bound due to McEliece, Rodemich, Rumsey, and Welch, 1977 (henceforth, MRRW). This bound is the best known asymptotic upper bound on the rate of a binary code for a significant range of relative distances (which is roughly {\delta \in (0.273,1/2)}). We will present a complete and self-contained proof of the this bound. A variant called the second JPL bound gives the best known upper bound for the remainder of the range, and we will mention this bound (without proof) at the end.

The linear programming bound is so-called because it is based on Delsarte’s linear programming approach which shows that the distance distribution of a binary code satisfies a family of linear constraints whose coefficients are the evaluations of a certain family of orthogonal polynomials (in this case, the Krawtchouk polynomials). The optimum (maximum) of this linear program gives an upper bound on {A(n,d)}. MRRW constructed good feasible solutions to the dual of linear program using tools from the theory of orthogonal polynomials, and their value gave an upper bound on {A(n,d)} by weak duality.

In these notes, we will use Fourier analysis of functions defined on the hypercube to derive a relationship between the weight distribution of a linear code and its dual, called the MacWilliams identifies. These give the linear constraints of the above-mentioned linear program.

Instead of the using the linear program or its dual and the theory of orthogonal polynomials (and specifically properties of Krawtchouk polynomials), in the second part of these notes, we will give a self-contained proof of the first linear programming bound for binary linear codes using a Fourier analytic approach. This is based on the methods of Friedman and Tillich, which was later extended also to general codes by Navon and Samorodnitsky, that shows that the dual of a linear code of large distance must have small “essential covering radius” (which means that Hamming balls of small radii around the dual codewords will cover a large fraction of the Hamming space {\{0,1\}^n}). This shows that the dual must have large size, and therefore the code itself cannot be too large. The method can be extended to non-linear codes, but we will be content with deriving the linear programming bound for (binary) linear codes.


January 30, 2010

Notes 4: Elementary bounds on codes

Filed under: Lecture notes — Venkat Guruswami @ 11:54 pm

We now turn to limitations of codes, in the form upper bounds on the rate of codes as a function of their relative distance. We will typically give concrete bounds on the size of codes, and then infer as corollaries the asymptotic statement for code families relating rate and relative distance. All the bounds apply for general codes and they do not take advantage of linearity. However, for the most sophisticated of our bounds, the linear programming bound, which we discuss in the next set of notes, we will present the proof only for linear codes as it is simpler in this case.

We recall the two bounds we have already seen. The Gilbert-Varshamov bound asserted the existence of (linear) {q}-ary codes of block length {n}, distance at least {d}, and size at least {\frac{q^n}{{\rm Vol}_q(n,d-1)}}. Or in asymptotic form, the existence of codes of rate approaching {1-h_q(\delta)} and relative distance {\delta}. The Hamming or sphere-packing bound gave an upper bound on the size (or rate) of codes, which is our focus in these notes. The Hamming bound says that a {q}-ary code of block length {n} and distance {d} can have at most {\frac{q^n}{{\rm Vol}_q(n,\lfloor (d-1)/2 \rfloor)}} codewords. Or in asymptotic form, a {q}-ary code of relative distance {\delta} can have rate at most {1-h_q(\delta/2)+o(1)}.

As remarked in our discussion on Shannon’s theorem for the binary symmetric channel, if the Hamming bound could be attained, that would imply that we can correct all (as opposed to most) error patterns of weight {pn} with rate approaching {1-h(p)}. Recall that there are perfect codes (such as the Hamming codes) that meet the Hamming bound. However, these codes have very small distance ({3} in the case of Hamming codes). A generalization of Hamming codes called binary BCH codes (the acronym stands for the code’s independent inventors Hocquenghem (1959) and Bose and Ray-Chaudhuri (1960)) show that when {d} is a fixed constant and the block length is allowed to grow, the Hamming bound is again tight up to lesser order terms. However, we will improve upon the Hamming bound and show that its asymptotic form (for any relative distance bounded away from zero) cannot be attained for any fixed alphabet. The proof method has some connections to list decoding, which will be an important focus topic later in the course.

1. Singleton bound

We begin with the simplest of the bounds:

Theorem 1 Let {C} be a code of block length {n} and minimum distance {d} over an alphabet of size {q}. Then {|C| \le q^{n-d+1}}.

Proof: Suppose not, and {|C| > q^{n-d+1}}. By the pigeonhole principle there must be two codewords {c_1,c_2 \in C}, {c_1 \neq c_2} that agree on the first {n-d+1} locations. But then {\Delta(c_1,c_2) \le d-1 < d}, contradicting the hypothesis that {C} has minimum distance {d}. \Box

This gives an alphabet-independent asymptotic upper bound on the rate as a function of relative distance.

Corollary 2 The rate {R} and relative distance {\delta} of a code satisfy {R \le 1- \delta + o(1)}.


January 25, 2010

Notes 3: Stochastic channels and noisy coding theorem

Filed under: Lecture notes — Venkat Guruswami @ 11:17 pm

We now turn to the basic elements of Shannon’s theory of communication over an intervening noisy channel.

1. Model of information communication and noisy channel

To quote Shannon from his paper A Mathematical theory of communication: “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” The basic setup of the communication problem consists of a source that generates digital information which is to reliably communicated to a destination through a channel, preferably in the most efficient manner possible. This “destination” could be spatially separated (eg., a distant satellite is sending images of Jupiter back to the space station on Earth), or could be temporally separated (eg., we want to retrieve date stored on our hard disk at a later point of time).

The following is a schematic of the communication model:

The first step in the communication model is to exploit the redundancy in the output of the source and compress the information to economize the amount of “raw, non-redundant” data that must be transmitted across the channel. This data compression step in called source coding. If at each time step the source outputs an i.i.d copy of a random variable {Z} supported on a finite set {{\cal Z}}, then Shannon’s source coding theorem states that one can compress its output to {H(Z)} bits per time step (on average, over {n} i.i.d samples from the source {Z}, as {n \rightarrow \infty}). In other words {n} samples from the source can be coded as one of {M \approx 2^{H(Z) n}} possible outputs. Here {H(Z)} is the fundamental Shannon entropy defined as

\displaystyle H(Z) = \sum_{z \in {\cal Z}} \mathop{\mathbb P}[Z=z] \log \frac{1}{\mathop{\mathbb P}[Z=z]} \ . \ \ \ \ \ (1)

where {\log} is to the base {2}. Thus the entropy of a fair coin toss is {1}, and that of a {p}-biased coin toss is {h(p) = -p\log p - (1-p) \log(1-p)}. The source decoder at the other end of the communication channel then decompresses the received information into (hopefully) the original output of the source.

The output of the source coder, say {m}, must then be communicated over a noisy channel. The channel’s noisy behavior causes errors in the received symbols at the destination. To recover from the errors incurred due to the channel, one should encode the information output by the source coder by adding systematic redundancy to it. This is done through channel coding which maps {m} to a codeword {c} of some suitable error-correcting code (the study of channel coding will be our focus in this course).


Next Page »

Blog at