# Introduction to coding theory (CMU, Spr 2010)

## January 30, 2010

### Notes 4: Elementary bounds on codes

Filed under: Lecture notes — Venkat Guruswami @ 11:54 pm

We now turn to limitations of codes, in the form upper bounds on the rate of codes as a function of their relative distance. We will typically give concrete bounds on the size of codes, and then infer as corollaries the asymptotic statement for code families relating rate and relative distance. All the bounds apply for general codes and they do not take advantage of linearity. However, for the most sophisticated of our bounds, the linear programming bound, which we discuss in the next set of notes, we will present the proof only for linear codes as it is simpler in this case.

We recall the two bounds we have already seen. The Gilbert-Varshamov bound asserted the existence of (linear) ${q}$-ary codes of block length ${n}$, distance at least ${d}$, and size at least ${\frac{q^n}{{\rm Vol}_q(n,d-1)}}$. Or in asymptotic form, the existence of codes of rate approaching ${1-h_q(\delta)}$ and relative distance ${\delta}$. The Hamming or sphere-packing bound gave an upper bound on the size (or rate) of codes, which is our focus in these notes. The Hamming bound says that a ${q}$-ary code of block length ${n}$ and distance ${d}$ can have at most ${\frac{q^n}{{\rm Vol}_q(n,\lfloor (d-1)/2 \rfloor)}}$ codewords. Or in asymptotic form, a ${q}$-ary code of relative distance ${\delta}$ can have rate at most ${1-h_q(\delta/2)+o(1)}$.

As remarked in our discussion on Shannon’s theorem for the binary symmetric channel, if the Hamming bound could be attained, that would imply that we can correct all (as opposed to most) error patterns of weight ${pn}$ with rate approaching ${1-h(p)}$. Recall that there are perfect codes (such as the Hamming codes) that meet the Hamming bound. However, these codes have very small distance (${3}$ in the case of Hamming codes). A generalization of Hamming codes called binary BCH codes (the acronym stands for the code’s independent inventors Hocquenghem (1959) and Bose and Ray-Chaudhuri (1960)) show that when ${d}$ is a fixed constant and the block length is allowed to grow, the Hamming bound is again tight up to lesser order terms. However, we will improve upon the Hamming bound and show that its asymptotic form (for any relative distance bounded away from zero) cannot be attained for any fixed alphabet. The proof method has some connections to list decoding, which will be an important focus topic later in the course.

1. Singleton bound

We begin with the simplest of the bounds:

Theorem 1 Let ${C}$ be a code of block length ${n}$ and minimum distance ${d}$ over an alphabet of size ${q}$. Then ${|C| \le q^{n-d+1}}$.

Proof: Suppose not, and ${|C| > q^{n-d+1}}$. By the pigeonhole principle there must be two codewords ${c_1,c_2 \in C}$, ${c_1 \neq c_2}$ that agree on the first ${n-d+1}$ locations. But then ${\Delta(c_1,c_2) \le d-1 < d}$, contradicting the hypothesis that ${C}$ has minimum distance ${d}$. $\Box$

This gives an alphabet-independent asymptotic upper bound on the rate as a function of relative distance.

Corollary 2 The rate ${R}$ and relative distance ${\delta}$ of a code satisfy ${R \le 1- \delta + o(1)}$.

### Lecture 6 summary

Filed under: Lecture summary — Venkat Guruswami @ 3:29 pm

We finished the proof of the Plotkin upper bound ${1 - q\delta/(q-1)}$ on the rate of ${q}$-ary codes of relative distance ${\delta}$. We discussed the Johnson bound, and its implications for list decoding, a topic we will study in some detail later in the course. We mentioned the relation between the Johnson bound and bounds on ${A_q(n,d)}$ (you are asked to prove this for the binary case on your problem set). Using this and the proof of the Johnson bound for the binary case, we concluded the Elias-Bassalygo upper bound, our last of the “elementary” upper bounds.

In the next lecture, we will start our discussion of the linear programming bound, which remains the best known bound to date. We proved the following simple fact that plays a crucial role in the LP bound: For a linear code $C$, the quantity ${\sum_{c \in C} (-1)^{x \cdot c}}$ equals $|C|$ if $x \in C^\perp$ and equals $0$ otherwise.

## January 27, 2010

### Lecture 5 summary

Filed under: Lecture summary — Venkat Guruswami @ 6:09 pm

We discussed the connections of Shannon’s capacity theorem for BSC and limitations of codes with large minimum distance, and then proved the capacity theorem for the binary symmetric channel. We began a discussion on limitations of codes, i.e., upper bounds on size/rate as a function of minimum distance, and presented the Singleton bound and a version of the Plotkin bound.

## January 25, 2010

### Notes 3: Stochastic channels and noisy coding theorem

Filed under: Lecture notes — Venkat Guruswami @ 11:17 pm

We now turn to the basic elements of Shannon’s theory of communication over an intervening noisy channel.

1. Model of information communication and noisy channel

To quote Shannon from his paper A Mathematical theory of communication: “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” The basic setup of the communication problem consists of a source that generates digital information which is to reliably communicated to a destination through a channel, preferably in the most efficient manner possible. This “destination” could be spatially separated (eg., a distant satellite is sending images of Jupiter back to the space station on Earth), or could be temporally separated (eg., we want to retrieve date stored on our hard disk at a later point of time).

The following is a schematic of the communication model:

The first step in the communication model is to exploit the redundancy in the output of the source and compress the information to economize the amount of “raw, non-redundant” data that must be transmitted across the channel. This data compression step in called source coding. If at each time step the source outputs an i.i.d copy of a random variable ${Z}$ supported on a finite set ${{\cal Z}}$, then Shannon’s source coding theorem states that one can compress its output to ${H(Z)}$ bits per time step (on average, over ${n}$ i.i.d samples from the source ${Z}$, as ${n \rightarrow \infty}$). In other words ${n}$ samples from the source can be coded as one of ${M \approx 2^{H(Z) n}}$ possible outputs. Here ${H(Z)}$ is the fundamental Shannon entropy defined as

$\displaystyle H(Z) = \sum_{z \in {\cal Z}} \mathop{\mathbb P}[Z=z] \log \frac{1}{\mathop{\mathbb P}[Z=z]} \ . \ \ \ \ \ (1)$

where ${\log}$ is to the base ${2}$. Thus the entropy of a fair coin toss is ${1}$, and that of a ${p}$-biased coin toss is ${h(p) = -p\log p - (1-p) \log(1-p)}$. The source decoder at the other end of the communication channel then decompresses the received information into (hopefully) the original output of the source.

The output of the source coder, say ${m}$, must then be communicated over a noisy channel. The channel’s noisy behavior causes errors in the received symbols at the destination. To recover from the errors incurred due to the channel, one should encode the information output by the source coder by adding systematic redundancy to it. This is done through channel coding which maps ${m}$ to a codeword ${c}$ of some suitable error-correcting code (the study of channel coding will be our focus in this course).

## January 24, 2010

### Problem set 1

Filed under: Problem sets — Venkat Guruswami @ 7:27 pm

Problem set 1 is now posted here. It is due on Feb 17, so you have just over 3 weeks to work on it (but I encourage you to start early!). Please use the comments section of this post to ask any questions, point out inaccuracies if you find any, etc.

### Lecture 4 summary

Filed under: Lecture summary — Venkat Guruswami @ 11:52 am

Today we discussed Shannon’s model of communication in two stages: source coding followed by channel coding. We introduced the model of a discrete memoryless channel, and saw some examples such as the binary symmetric channel, the binary erasure channel, the noisy typewriter channel, and a continuous example: the binary input additive white Gaussian noise channel. We stated Shannon’s noisy coding theorem for a discrete memoryless channel (we will later prove it for the BSC). Together with the source coding theorem, this led to the joint source-channel coding theorem. We mentioned the “separation theorem” on optimality of doing source and channel coding separately.  The discussion of the capacity of the noisy typewriter channel led us into a brief discussion of zero-error capacity and Lovasz’s theorem on Shannon capacity of the 5-cycle.

## January 21, 2010

### Lecture 3 summary

Filed under: Lecture summary — Venkat Guruswami @ 3:23 pm

We discussed first order Reed-Muller codes, and contrasted the two extremes of Hadamard and Hamming codes. We defined asymptotically good codes families, and posed the question about their existence. We proved the Gilbert-Varshamov bound in two ways, by picking a code greedily and by picking a random linear code. This in particular showed the existence of asymptotically good codes of any rate. En route proving the asymptotic version of the Gilbert-Varshamov bound, we proved entropy estimates for the volume of Hamming balls. We concluded with some remarks on attaining the GV bound explicitly, mentioning the “better than random” algebraic-geometric code constructions, and the conjectured optimality of the GV bound on rate for binary codes.

## January 18, 2010

### Notes 2: Gilbert-Varshamov bound

Filed under: Lecture notes — Venkat Guruswami @ 5:19 pm

1. Asymptotically good codes and Gilbert-Varshamov bound

We begin by answering the question raised at the end of the previous notes on the existence of asymptotically good codes.

Suppose we are interested in ${q}$-ary codes (not necessarily linear) of block length ${n}$ and minimum distance ${d}$ that have many codewords. What is the largest size such a code can have? This is a fundamental quantity for which we define a notation below.

Definition 1 Let ${A_q(n,d)}$ be the largest size of a ${q}$-ary code of block length ${n}$ and minimum distance ${d}$. The binary case is of special importance, and in this case ${A_2(n,d)}$ is denoted simply as ${A(n,d)}$.

There is a natural greedy approach to construct a code of distance at least ${d}$: start with any codeword, and keep on adding codewords which have distance at least ${d}$ from all previously chosen codewords, until we can proceed no longer. Suppose this procedure halts after picking a code ${C}$. Then Hamming balls in ${\{0,1,\dots,q-1\}^n}$ of radius ${d-1}$ centered at the codewords of ${C}$ must cover the whole space. (Otherwise, we can pick one more codeword which has distance at least ${d}$ from every element of ${C}$, and the process would not have terminated.)

Definition 2 For integers ${q,n,\ell}$, denote by ${{\rm Vol}_q(n,\ell)}$ the volume of (i.e., the number of strings in) a Hamming ball of radius ${\ell}$ in ${\{0,1,\dots,q-1\}}$. Note that this number does not depend on where the ball is centered and equals

$\displaystyle {\rm Vol}_q(n,\ell) = \sum_{j=0}^{\ell} {n \choose j} (q-1)^j \ .$

Therefore, the greedy procedure terminates with a code ${C}$ satisfying

$\displaystyle |C| \cdot {\rm Vol}_q(n,d-1) \ge q^n \ .$

We therefore have the following lower bound.

Lemma 3 (Gilbert-Varshamov bound) The maximal size of a ${q}$-ary code of block length ${n}$ and distance ${d}$ satisfies

$\displaystyle A_q(n,d) \ge \frac{q^n}{{\rm Vol}_q(n,d-1)} = \frac{q^n}{\sum_{j=0}^{d-1} {n \choose j} (q-1)^j} \ . \ \ \ \ \ (1)$

## January 15, 2010

### Lecture 2 summary

Filed under: Lecture summary — Venkat Guruswami @ 3:32 pm

Today we discussed linear codes, and their representation via generator and parity check codes and their admitting systematic encodings. We extended the [7,4,3] Hamming codes to larger dimensions (and alphabets), and saw a simple syndrome decoder for correcting one error with Hamming codes. We proved the Hamming/Sphere-packing upper bound on code size as a function of distance, and saw that Hamming codes were “perfect” in that they meet this upper bound. We listed, without proof, all the binary perfect codes there are.  We defined the dual of a linear code, and then studied the distance properties of the dual of the Hamming code, the simplex code, and its extended form, the Hadamard code.

## January 13, 2010

### Lecture 1 summary

Filed under: Lecture summary — Venkat Guruswami @ 4:42 pm

In today’s lecture, we discussed some administrative aspects concerning problem sets and scribing. (For those who are taking the class for credit and couldn’t make it to lecture today,  I expect there will be about 3 problem sets that can be solved and written up in pairs, and each student will be responsible for scribing in detail lectures for one week starting in about two weeks.)

We discussed some historical context of Shannon’s and Hamming’s work, informally stated the nosieless and noisy coding theorems of Shannon (a formal statement and proof of the latter for a special channel will be covered probably next week), introduced the length 7 Hamming code, made a few basic definitions (Hamming distance, rate, distance, etc.), and argued why distance combinatorially captures  the error/erasure correcting capabilities of codes. There was discussion led by student questions/comments about using the fact that an error-correcting decoder also recovers the identity of error locations to limit the rate of ${t}$-error-correcting codes, leading to the Hamming/Volume upper bound on rate.