Math 305 Advanced Linear Algebra

Math 305 Advanced Linear Algebra

Logic and Set Theory

Introduction

Proof:

  • In Mathematics, a proof is a demonstration that assuming certain axioms, some statements is necessarily true.
  • A proof is a logical argument, not an empirical one
  • One must demonstrate that a proposition is true in all cases before it is considered a theorem of mathematics. An unproven proposition for which there is some sort of empirical evidence is known as a conjecture

1.1 Statement

What is Statement

An Statement (assertion) that is true or false, but not both.

Example

  • Fred is twenty years old
  • 3 + x = 12 ---> This is NOT a statement since it cannot be decided as True or Fasle.

Remark:

Law of excluded middle: For any proposition, either that proposition is true of its negation is true.

Combining Statements

Five ways of forming new statements based on statements P and Q.

  1. Conjunctions (P and Q)

    • Denotes \(P\wedge Q\)

    • Is true only if both statements are true, and is false otherwise

      P Q \(P\wedge Q\)
      T T T
      T F F
      F T F
      F F F
  2. Disjunction (P or Q)

    • Denote \(P\vee Q\)

    • Is true if either statement is true, and false otherwise

      P Q \(P\vee Q\)
      T T T
      T F T
      F T T
      F F F
  3. Negation (Not P)

    • Denotes \(\neg P\)

    • Is true if the statement is false, and is true otherwise

      P \(\neg P\)
      T F
      F T
  4. Conditional Connective (If P then Q; P only if Q; Q if P)

    • Denotes \(P\to Q\)

    • Is false if P (condition) is true and Q (result) is false, and is true otherwise.(Only this case)

    • P is called the antecedent and Q is called the consequent

    • Think \(P\to Q\) as a promise that Q is true whenever P is true. When P is false, the promise is kept by default.

    • Example: suppose your friend promises “if it is sunny tomorrow, I will ride my bike". We will say this is true if they keep their promise. If it rains and they don't ride their bike, most people would agree that they have still kept their promise.

      P Q \(P\to Q\)
      T T T
      T F F
      F T T
      F F T
    • Variations of \(P\to Q\)

      • Converse: \(Q\to P\)

      • Inverse: \(\neg P\to \neg Q\) (If not P, if not Q)

      • Contrapositive: \(\neg Q \to \neg P\)

        P Q $ PQ$ \(Q\to P\) \(\neg P \to \neg Q\) \(\neg Q \to \neg P\)
        T T T T T T T T F T F F T F
        T F T F F F T T F T T T F F
        F T F T T T F F T F F F T T
        F F F T F F T F T T T T T T
  5. Biconditional (P if and only if Q)

    • Denotes \(P \leftrightarrow Q\)

    • True if P and Q have the same truth value, and false otherwise

      P Q \(P\leftrightarrow Q\)
      T T T
      T F F
      F T F
      F F T

Compound Logical Statements

  • Form more complicated compound statements based on the above five.
  • Parentheses to avoid ambiguity
  • Convention: \(\neg\) takes precedence over the other four operations.

Example

\((P\to R)\wedge (Q \vee \neg R)\)

Step:

  1. Copy values
  2. Negation
  3. \(P\to R\)
  4. \(Q \vee \neg R\)
  5. \(\wedge\)

By truth table:

P Q R \((P\to R)\wedge (Q \vee \neg R)\)
T T T T T T T T T F
T T F T F F F T T T
T F T T T T F F F F
T F F T F F F F T T
F T T F T T T T T F
F T F F T F T T T T
F F T F T T F F F F
F F F F T F T F T T
1 3 1 5 1 4 2

Applications

Translating language to mathematical symbols.

Example

  1. You can access the Internet from campus only if you are a computer sciecne magjor or you are not a freshman

    P = "You can access the Internet from campus"

    Q = "You are a computer science major"

    R = "You are a freshman"

    \(P\to Q\): P only if Q; Q if P

    \(P\to (Q\vee \neg R)\)

  2. You cannot ride the roller coaster if you are under 4 feet tall unless you are older than 16 years old

    P = "You can ride the roller coaster"

    Q = "You are under 4 feet tall"

    R = "You are older than 16 years old"

    \(\neg P\) = " You cannot ride the roller coaster" --> This is the result

    The condition for the result is: You are under 4 feet tall and You are younger than 16 years old, which means that, if you are under 4 feet tall but older than 16, then you can ride the roller coaster.

    \((Q\wedge \neg R)\to \neg P\)

Tautology and Contradiction

  • Tautology is a statement that us true by logical necessity, regardless of whether the component statements are true or false

    Example

    \(P\vee (\neg P)\)

    P \(P\vee \neg P\)
    T T T F
    F F T F
  • Contradiction is a statement that is always false by logical necessity, regardless of whether the component statements are true or false.

    Example

    \(P\wedge (\neg P)\)

    P \(P\wedge (\neg P)\)
    T T F F
    F F F T

    Try to show whether the following statement is tautology, a contradiction or neither?

    \(((P\wedge Q)\to R)\to (P\to(Q\to R))\)

    P Q R \(((P\wedge Q)\to R)\to (P\to(Q\to R))\)
    T T T T T T T T T T T T T T
    T T F T T T F F T T T F F F
    T F T T F F T T T T F T T T
    T F F T F F T F T T F T T F
    F T T F F T T T F T T T T T
    F T F F F T T F F T T T F F
    F F T F F F T T F T F T T T
    F F F F F F T F F T F T T F
    1 2 1 3 1 1 4 1 3 2 1

1.2 Relations Between Statements

Meta Statement

A meta statement is a logical statement about logical statements.

Example

If the statement "Ethel is tall and Agnes is short" is true, then the statement "Ethel is tall" is true.

Implication

For two compound statements P and Q, and implication \(P\implies Q\),states that Q is true whenever P is true. To judge a implication, \(P\to Q\) is tautology.

Example

  • My thumb will hurt if I hit it with a hammer

  • \(x=2\) implies \(x+1=3\)

  • \(\neg (P\to Q)\implies P\vee Q\)

    To solve this, only need to justify whether \((\neg (P\to Q))\to(P\vee Q)\) is tautology by turth table.

Remark:

  • The difference between \(P\to Q\) and \(P\implies Q\)

    \(P\to Q\) is a compound statement that may or may not be true.

    \(P\implies Q\) is a relation stating that the compound statement \(P\to Q\) is true under all instances of the external propositions.

  • Be careful not to allow contradictions in logical arguments because, starting from a contradiction, anything can be proven true.

    Example

    \(P\wedge \neg P\implies Q\) is a valid logical equvalence. But Q doesn't appear on the LHS.

    Thus, a contradiction in your assumptions can lead to a “correct𝑡 proof for an arbitrary statement.

  • Logical implication is NOT always revisable

    Example

    \(\neg (P\to Q)\implies P\vee Q\)

    But \(P\vee Q\) does not imply \(\neg (P\to Q)\) Since as P and Q are both True, \(P\vee Q\) is True, but \(\neg (P\to Q)\) is Flase.

Fact 1.2.1 Implication

Let P, Q, R and S be statements

  1. \((P\to Q)\wedge P \implies Q\) (Modus Ponens)
  2. \((P\to Q)\wedge \neg Q\implies \neg P\) (Modus Tollens)
  3. \(P\wedge Q\implies P\) (Simplification)
  4. \(P\wedge Q\implies Q\) (Simpification)
  5. \(P\implies P\vee Q\) (Addition)
  6. \(Q\implies P\vee Q\) (Addition)
  7. \((P\vee Q)\wedge \neg P\implies Q\) (Modus Tollendo Ponens)
  8. \((P\vee Q)\wedge \neg Q\implies P\) (Modus Tollendo Ponens)
  9. \(P\longleftrightarrow Q\implies P\to Q\) (Biconditional-Conditional)
  10. \(P\longleftrightarrow Q\implies Q\to P\) (Biconditional-Conditional)
  11. \((P\to Q)\wedge(Q\to P)\implies P\longleftrightarrow Q\) (Conditional-Biconditional)
  12. \((P\to Q)\wedge(Q\to R)\implies P\to R\)
  13. \((P\to Q)\wedge(R\to S)\wedge(P\vee R)\implies Q\vee S\) (Constructive Dilemma)

Equivance

A equivalent \(P\Longleftrightarrow Q\), states that P is true if and only if Q is true. To justify, \(P\longleftrightarrow Q\) (biconditional) is a tautology.

Example

\(P\to Q \Longleftrightarrow \neg P\vee Q\)

Fact 1.2.2 Equivance

  1. \(\neg (\neg P)\Longleftrightarrow P\) (Double Negation)
  2. \(P\vee Q\Longleftrightarrow Q\vee P\) (Commutative Law)
  3. \(P \wedge Q \Longleftrightarrow Q\wedge P\) (Commutative Law)
  4. \((P\vee Q)\vee R\Longleftrightarrow P\vee(Q\vee R)\) (Associative Law)
  5. \((P\wedge Q)\wedge R \Longleftrightarrow P\wedge(Q\wedge R)\) (Associative Law)
  6. \(P\wedge(Q\vee R)\Longleftrightarrow(P\wedge Q)\vee(P\wedge R)\) (Distributive Law)
  7. \(P\vee(Q\wedge R)\Longleftrightarrow(P\vee Q)\wedge(P\vee R)\) (Distributive Law)
  8. \(P\to Q\Longleftrightarrow\neg P\vee Q\)
  9. \(P\to Q\Longleftrightarrow \neg Q\to \neg P\) (Contrapositive)
  10. \(P\longleftrightarrow Q \Longleftrightarrow Q\longleftrightarrow P\)
  11. \(P\longleftrightarrow Q\Longleftrightarrow (P\to Q)\wedge(Q\to P)\)
  12. \(\neg (P\wedge Q)\Longleftrightarrow \neg P \vee \neg Q\) (De Morgan's Law)
  13. \(\neg (P\vee Q)\Longleftrightarrow \neg P \wedge \neg Q\) (De Morgan's Law)
  14. \(\neg (P\to Q)\Longleftrightarrow P\wedge \neg Q\)
  15. \(\neg (P\longleftrightarrow Q)\Longleftrightarrow (P\wedge \neg Q)\vee (\neg P \wedge Q)\)

Application

Example

Three boxes with messages on each:

The first box: "This box is empty"

The second box: "This box is empty"

The third box: "Box 2 is not empty"

Only one of these messages are true. Which box is not empty?

Sol:

Denote \(P_i\) as "Box i is not empty", i = 1,2,3.

Then the messages of boxes become \(\neg P_1, \neg P_2, P_2\)

Then we have, since one of them are true, we consider three cases \[ (\neg P_1\wedge \neg (\neg P_2)\wedge \neg P_2)\vee(\neg(\neg P_1)\wedge\neg P_2\wedge\neg P_2)\vee(\neg(\neg P_1)\wedge \neg (\neg P_2)\wedge P_2) \\ \Longleftrightarrow (\neg P_1 \wedge P_2 \wedge \neg P_2)\vee (P_1\wedge \neg P_2)\vee (P_1\wedge P_2\wedge P_2) \\ \Longleftrightarrow (P_1\wedge \neg P_2)\vee (P_1 \wedge P_2) \\ \Longleftrightarrow P_1\wedge (\neg P_2\vee P_2) \\ \Longleftrightarrow P_1 \] Note that the term \((\neg P_1 \wedge P_2 \wedge \neg P_2)\) is canceled since \((P_2 \wedge \neg P_2)\) is always false.

1.3 Quantifiers

Expression \(x^3 \geq8\) Is not a statement.

Expression \(x3 \geq8, for\ all\ real\ numbers\ x\geq2\) is a statement.

Definition

Quantifiers tell us about the variables under condition.

Example

  • \(for\ all\ real\ numbers\ x\geq2\)
  • \(there\ exits\ a\ real\ number\ x\ such\ that\ x^2=9\)

First-order Predicate Logic

A predicate is an expression that contains variables (predicate variables), and they may be true of false depending on the values of these variables.

Example

  • \(x^2 \ is\ greater\ than\ x\)

The domain of a predicate variable is the collection of all possible values that variable may take.

Example

Let \(U\) be a collection of elements and \(P(x)\) be a predicate, that can be applied to any \(x\in U\)

Then \(P(x)="x\ has\ 4\ tires"\) for all the collection \(U\) of all vehicles.

Universal Quantifier

  • The statement "\(\forall x \in U,P(x)\) is true for all values of \(x\) in \(U\)"
  • \(\forall x\in U,P(x)\Longleftrightarrow \wedge_{x\in U}P(x)\)

Example

\(\forall x\in U,P(x)\) = "all vehicles have 4 tires"

Existential Quantifer

  • The statement "\(\exist x \in U, P(x)\) is true if \(P(x)\) is true for at least one value of \(x\) in \(U\)"
  • \(\exist x \in U,P(x)\Longleftrightarrow \vee_{x\in U}P(x)\)
  • Natural implication: $x U, P(x)x U,P(x) $

Example

  • \(\exist x \in U,P(x)\) = "there is one vehicle with 4 tires"
  • \(D=\{ birds\},P(x)= x \ is\ angry\), "some birds are angry"\(\longleftrightarrow\)"\(\exist x \in D,P(x)\)"

Remark

  • If \(U=\empty\) is the empty set, then
    • \(\forall x \in U, P(x)\) is vacuously true by convention since there are no elements in \(U\) to test with \(P(x)\)
    • \(\exist x \in U,P(x)\) Is false by convention because there are no elements in U
  • If we have \(P(x)=\)"x is a person", \(Q(x)=\) "x is mortal". Then we can write the statement "Every person is mortal" as \(\forall x,(P(x)\to Q(x))\)

Negation (De Morgan's law)

  • \(\neg (\forall x \in U,P(x)\Longleftrightarrow \exist x\in U,\neg P(x))\)
  • \(\neg(\exist x\in U,P(x)\Longleftrightarrow\forall x \in U,\neg P(x))\)

Remark:

\(\neg(\forall x\in U,P(x))\Longleftrightarrow\neg(\wedge_{x\in U}P(x))\Longleftrightarrow\vee_{x\in U}\neg P(x)\Longleftrightarrow\exist x\in U, \neg P(x)\)

Nested Quantification

"Every rabbit is faster than some tortoise"

Domains: \(R=\{rabbits\},T=\{tortoises\}\)

Prdicate C(x,y): Rabbit x is faster than tortoise y

Nested quantifiers are quantifiers that occur within the scope of other quantifiers.

Multiple Quantifiers

Consider predicate P(x,y) with free variables x and y:

  • In statement "\(\exist x \in U,P(x)\)", the variable \(x\) is called a bound variable becasue its value cannot be chosen freely.
  • In statement "\(\exist y\in U,P(x,y\)", \(x\) is a free variable and y is a bound variable

Example

Let \(I\) be a collection of images and \(C\) be a collection of colors. Define predicate P(x,y) = "x contains y" for \(x\in I,y\in C\)

Then

"\(\forall x \in I,\forall y\in C,P(x,y)\)" = "For any images in I, it contain any color in C"

"\(\forall x\in I,\exist y\in C,P(x,y)\)" = "For any images in I, there is a color in C contained in it"

  • The order of Quantifiers

    • "\(\exist y\in C,\forall x \in I,P(x,y)\)"\(\implies\)"\(\forall x \in I,\exist y\in C,P(x,y)\)"

      Example

      "\(\forall x \in I,\exist y\in C,P(x,y)\)" = "For any images in I, there is a color in C contained in it"

      "\(\exist y\in C,\forall x \in I,P(x,y)\)" = "There is at least one color in C exist in all imgaes in I"

    • "\(\exist x \in I,\forall y \in C,P(x,y)\)" \(\implies\)"\(\forall y \in C,\exist x \in I,P(x,y)\)"

      Example

      "\(\exist x \in I,\forall y \in C,P(x,y)\)" = "There is a image in I containing all color in C"

      "\(\exist y\in C,\forall x \in I,P(x,y)\)" = "For each color in C there is an image containing that color"

  • 8 total possibilities to choose order and quantifiers for each variable.

    image-20211031124918996
  • Negation

    • \(\neg(\forall x \in I,\forall y \in C,P(x,y))\Longleftrightarrow\exist x \in I,\exist y \in C,\neg P(x,y)\)
    • \(\neg(\forall x \in I,\exist y\in C,P(x,y))\Longleftrightarrow \exist x\in I,\forall y\in C,\neg P(x,y)\)
  • Multiple Quantifiers Application

    Example

    Use quntifiers to express the definition of the limit of a real-valued function \(f(x)\) Of a real variables \(x\) at a point \(a\) in its domain. And express when \(lim_{x\to a}f(x)\) does not exist.

    Sol:

    Recall the definition of the statement \(\lim_{x\to a}f(x)=L\) Is "for every real number

    \(\epsilon>0\) such that \(|f(x)-L|<\epsilon\) Whenever \(0<|x-a|<\delta\)"

    Using quantifiers: \(\exist L,\forall \epsilon>0,\exist \delta>0,\forall x\in D:(|x-a|<\delta\implies|f(x)-L|<\epsilon)\)

    Negation: \(\forall L,\exist \epsilon>0,\forall\delta>0,\exist x:|x-a|<\delta\wedge|f(x)-L|\geq\epsilon\)

    Note that by Negation of implication: \(\neg[|x-a|<\delta\implies|f(x)-L|<\epsilon]\Longleftrightarrow |x-a|<\delta\wedge|f(x)-L|\geq\epsilon\)

1.4 Strategies for Proofs

  • A mathematical proof is a convincing argument that starts from the premises, and logically deduces the desired conclusion.
  • A theorem is a statement that can be proved on the basis of explicitly stated or previously agreed assumptions.
  • A proposition is a statement not associated with any particular theorem; this term sometimes connotes a statement with a simple proof.
  • A lemma is a proven proposition which is used as a stepping stone to a larger result rather than an independent statement in itself.
  • A corollary is a mathematical statement which follows easily from a previously proven statement, typically a mathematical theorem.

Direct

Assume P true and give steps that lead to Q:

If P is true

.... some arguments ...

them Q is true

Contrapositive

Proof of equivalent statement \(\neg Q \to \neg P\)

\(P\to Q \Longleftrightarrow\neg Q\to\neg P\)

If \(\neg Q\) is true

... some argument ...

\(\neg P\) is true

\(P\to Q\) is true

Example

Let n be an integer, proof if \(n^2\) is odd, then \(n\) is odd.

Sol: Denote P = "\(n^2\) is odd", Q = "\(n\) is odd", then we have \(\neg Q\) = "n is even", \(\neg P\) = "\(n^2\) is even". By proving if \(\neg Q\) then \(\neg P\), we can prove \(P\to Q\).

Suppose we have \(\exist k \in \N:n=2k\)

\(\implies n^2=(2k)^2=4k^2=2(2k^2)\)

\(\implies n^2\) is an even number \(\implies \neg P\)

Which means we have $Q PPQ $

Contradiction

Using \(\neg(P\to Q)\Longleftrightarrow P\wedge\neg Q\), one supposes that both P and \(\neg Q\) are true and then gives steps leading to a contradiction.

Since \((P\to Q)\Longleftrightarrow\neg (\neg (P\to Q))\)

To prove \(P\to Q\) is true is to prove \(\neg(P\to Q)\) is false.

Since \(\neg(P\to Q)\Longleftrightarrow \neg(\neg P \vee Q)\Longleftrightarrow P\wedge\neg Q\)

To prove \(P\to Q\) is true is to prove \(P\wedge\neg Q\) is false.

Then assume \(P\wedge\neg Q\) is satisfied, find contradiction

Example

Prove if \(x\in \R, x^2=2\), then \(x\) is an irrational number.

Sol:

P = "\(x\in \R, x^2=2\)", Q = "x is an irrational number"

To prove the statement true, we prove \(P\wedge\neg Q\) is false.

If we have \(x\in \R, x^2=2\) and x is a rational number:

\(\implies \exist n,m\in \N:x=\frac{n}{m}\), m≠0. m and n does not have common factors

\(\implies x^2=(\frac{n}{m})^2=2\implies n^2=2m^2\)

\(\implies n^2\) is even \(\implies n\) is even

\(\implies \exist k \in \N, n=2k\implies 4k^2=2m^2\implies m^2=2k^2\)

\(\implies m^2\) is even \(\implies m\) is even

Since both n and m are even, there must be a common factor 2, which contradictes with "m and n does not have common factors".

Thus, the statement is true.

Induction

Let P(n) be a logical statement for each \(n\in N\). The principle of mathematical induction states that P(n) is true for all \(n\in N\) if:

P(1) is true, and

\(P(n)\to P(n+1)\) for all \(n\in \N\)

Example

Prove \(\sum_{i=1}^ni^2=\frac{n(n+1)(2n+1)}{6}\)

Sol:

\(n=1,S(1)=\frac{1\times 2\times 3}{6}=1\)

Assume \(S(n)=\sum_{i=1}^n i^2=\frac{n(n+1)(2n+1)}{6}\)

Then \(S(n+1)=\sum_{i=1}^{n+1}i^2=\frac{n(n+1)(2n+1)}{6}+(n+1)^2=\frac{(n+1)(n+1+1)(2(n+1)+1)}{6}\)

1.5 Set Theory

Basics

A set is an unordered collection of distinct objects, called elements or members of the set.

\(a\in A\): A is a set and a is an element of A

\(a\notin A\): A is a set and a is not an element of A

  • Element order is irrelevent
  • Repeated elements have no effect
  • Singleton: a set containing exactly one element.
  • Empty set \(\emptyset\): (null set) the set that does not have any elements in it
  • Standard sets: Integers \(\Z\), Rational numbers \(\Q\), Real numbers \(\R\), Complex numbers \(\C\)

Bounding

Open bounded interval: \((a,b)=\{ x\in \R|a<x<b\}\)

Close bounded interval: \([a,b]=\{x\in \R|a\leq x\leq b\}\)

Half-open bounded: \((a,b]=\{x\in \R|a<x \leq b\}\) or \([a,b)=\{x\in \R|a\leq x<b\}\)

Open unbounded: \((a,\infty)=\{x\in \R|a<x\}\) or \((-\infty,b)=\{x\in R|x<b\}\) Or \((-\infty,\infty)=\R\)

Close unbounded: \([a,\infty)=\{x\in \R|x\geq a\}\) or \((-\infty,b]=\{x\in \R|x\leq b\}\)

Subset

Set A is a subset of set B: If \(x\in A\implies x\in B\), we have \(A\subseteq B\)

If set A is not a subset of set B: \(A\not\subseteq B\) \[ A\subseteq B \Longleftrightarrow ((x\in A)\to(x\in B)) \] Example

  • If \(A=\{a,b,c\}\) then \(a\in A, \{a\}\subseteq A\)
  • If \(B=\{\{a\},b,c\}\) then \(\{\{a\}\}\subseteq B, \{a\}\in B\)

Lemma: Let A, B and C be sets

  • \(A\subseteq A\)
  • \(\emptyset\subseteq A\)
  • If \(A\subseteq B\), and \(B\subseteq C\), then \(A\subseteq C\)

Equivalence and Proper Subset

Let A and B be sets:

\(A=B\): If \(A\subseteq B, B\subseteq A\), set A equals the set B.

\(A\subsetneq B\): If \(A\subseteq B, A\not= B\), set A is a proper subset of set B \[ A=B\Longleftrightarrow\forall x((x\in A)\longleftrightarrow (x\in B)) \]

Lemma: Let A, B and C be sets

  • A = A
  • If A = B, then B = A
  • If A = B, and B = C, then A = C

Cardinality

For set A, the cardinality |A| is the number of elements in A.

Countably Infinite: If there is a one-to-one correspondence between A and the natural numbers \(\N\), then A is called countably infinite and |A| = \(\infty\)

Example

Set of rational numbers \(\Q=\{\frac{p}{q}|p\in \Z,q\in\N\}\), is countably infinite.

Uncountably Infinite: If |A| = \(\infty\) but not countably infinte, then A is uncountably infinite

Example

The set of real numbers is uncountably infinite by Cantor's diagonal argument.

Power Set

Let A be a set. The power set of A, denoted \(P(A)\), defined by \(P(A)=\{X|X\subseteq A\}\)

Example

  • \(x\in P(A)\) while \(x\subseteq A\)
  • \(P(\emptyset)=\{\empty\}\)
  • \(P(P(\empty))=\{\empty,\{\empty\}\}\)

Set Operations

Union: \(A\cup B = \{x|x\in A \vee x\in B\}\)

Intersection: \(A\cap B=\{x|x\in A\wedge\ x\in B\}\)

Disjoint: \(A\cap B=\empty\)

Remark:

  • If \(P(A\cup B)=P(A)\cup P(B)\), then \((A\subseteq B)\cup(B\subseteq A)\)

    Proof:

    Since \(P(A\cup B)=P(A)\cup P(B)\)

    We have \(A\cup B\in P(A)\cup P(B)\)

    Then \((A\cup B)\in P(A)\vee ((A\cup B)\in P(B))\)

    Then \(((A\cup B)\subseteq A)\vee((A\cup B)\subseteq B)\)

    Thus, \((B\subseteq A)\vee(A\subseteq B)\)

  • \(P(A\cap B)=P(A)\cap P(B)\)

    Proof:

    \(\Longrightarrow\) \(\forall v\in P(A\cap B)\), then \(v\subseteq A\cap B\)

    \((v\subseteq A)\) and \((v\subseteq B)\)

    ​ then we have \((v\in P(A))\) and \((v\in P(B))\)

    ​ Thus, \(v\in (P(A)\cap P(B)\)

    \(\Longleftarrow\) if \(x\in P(A)\cap P(B)\), then \(x\in P(A)\) And \(x\in P(B)\)

    ​ then we have \(x\subseteq A\) and \(x\subseteq B\), which means \(x\subseteq A\cap B\)

    \(x\in P(A\cap B)\)

Difference: \(A-B=\{x|x\in A\wedge x\not\in B\}\)

Remark: \(A-B=A-(A\cap B)\)

Complement: \(A^C=\{x|x\in U\wedge x\not\in A\}=U-A\)

Theorems: (Can be intuitively prove by Venn Diagram)

  • \(A-B\subseteq A\)
  • \((A-B)\cap B=\empty\)
  • \(A-B=\empty \longleftrightarrow A\subseteq B\)
  • If \(A\subseteq B\), then \(A-C=A\cap(B-C)\)
  • If \(A\subseteq B\), then \(C-B\subseteq C-A\)
  • \(C-(A\cup B)=(C-A)\cap(C-B)\)
  • \(C-(A\cap B)=(C-A)\cup(C-B)\)
  • \((A\cup B)^C=A^C\cap B^C\)
  • \((A\cap B)^C=A^C\cup B^C\)

For arbitratrily many set:

\(\bigcup_{\alpha\in I}S_\alpha=\{x|x\in S_\alpha \ for\ some\ \alpha\in I\}\)

\(\bigcap_{\alpha\in I}S_\alpha=\{x|x\in S_\alpha \ for\ all\ \alpha\in I\}\)

Cartesian Product:

\(A\times B=\{(a,b)|a\in A, b\in B\}\)

Remark:

  • Normally, \(A\times B\not= B\times A\), since the order matters
  • For a set A with m elements, the number of elelments in \(A^n\) is \(m^n\)

Theorems: (Can be intuitively proved by x-y coordinate)

  • If \(A\subseteq B,C\subseteq D\), then \((A\times C)\subseteq(B\times D)\)

  • \(A\times(B\cup C)=(A\times B)\cup(A\times C)\)

    \((B\cup A)\times A=(B\times A)\cup(C\times A)\)

  • \(A\times(B\cap C)=(A\times B)\cap(A\times C)\)

    \((B\cap C)\times A=(B\times A)\cap(C\times A)\)

  • \(A\times\empty=\empty\)

    \(\empty \times A=\empty\)

  • \((A\cap B)\times(C\cap D)=(A\times C)\cap (B\times D)\)

Relation

Defintion

Let A and B be sets. A relation R from A to B is \(R\subseteq A\times B\). If \(a\in A, b\in B\), we write \(aRb\) if \((a,b)\in R\). A relation on \(A\) is a relation from \(A\) to \(A\).

Example

  • A linear relation \(R^2=\{(a,b)|a\in R, b\in R\}\)

    \(R_1\) is a relation such that \(R_1=\{(a,a+1)|a\in R\}\subseteq R^2\)

  • The symbols \(<\) and \(\leq\) both represent relations on \(\R\)

  • We have set \(A=\{1,2,3\}\) and \(B=\{x,y,z\}\)

    \(R_1\) is a relation such that \(R_1=\{(1,x),(1,y),(2,z)\}\subseteq A\times B\)

    Thus we have \(1R_1x\), \(1R_1y\),\(1\not R_{1} z\)

Relation Class

Relation class of x with respect to R, \(R[x]=\{y\in B|xRy\}=\{y\in R|(x,y)\in R\}\), which is the set of y that corresponding to x.

Example

  • We have set \(A=\{1,2,3\}\) and \(B=\{x,y,z\}\)

    \(R_1\) is a relation such that \(R_1=\{(1,x),(1,y),(2,z)\}\subseteq A\times B\)

    \(R_1[1]=\{x,y\}\), \(R_1[2]=\{z\}\)

  • For relation \(R_<=\{(x,y)|x<y\}\subseteq R^2\)

    so \(R_<[x]=\{y>x|y\in R\}=(x,\infty)\)

    For relation \(R_{\leq}=\{(x,y)|x<y\}\subseteq R^2\)

    so \(R_{\leq}[x]=\{y>x|y\in R\}=[x,\infty)\)

Relation Properties

A relation R on none empty set A is said to be: (\(A\times A\))

Reflexive: if \(xRx\) holds for all \(x\in A\)---> \((x,x)\in R, \forall x\in A\)

Symmetric: if \(xRy\) implies \(yRx\), for all \(x,y\in A\) ---> \((x,y)\in R\wedge (y,x)\in R\)

Transitive: if \(xRy\) and \(yRz\), then \(xRz\), for all \(x,y,z\in A\) ---> \((x,y)\in R\wedge(y,z)\in R\to(x,z)\in R\)

Example

  • \(\leq\) \(<\)
    Reflexive Yes. Since \(x\leq x\) No
    Symmetric No No
    Transitive Yes Yes
  • Set \(B=\{x,y,z\}\). Let E be the relation defined on B by \(E=\{(x,x),(y,y),(z,z),(x,y),(y,z)\}\), then the relation E is:

    Reflexive?

    Since {x,x},{y,y},{z,z}\(\in E\), Yes

    Symmetric?

    We have {x,y},{y,z}\(\in E\), but we don't have {y,x}{z,y}\(\in E\). Thus, NO.

    Transitive?

    We have {x,y},{y,z}\(\in E\), but we don't have {x,z}\(\in E\). Thus, No.

Equivalence

A relation is call equivalence relation if it is reflexive, symmetric, and transitive.

Example

Suppose X and Y are sets and that \(f:X\to Y\) is a function. Let us define a relation ~ on X by \(x_1\sim x_2\) if and only if \(f(x_1)=f(x_2)\). Show that ~ is an equivalence relation.

Solution:

Reflexive: Since \(f(x)=f(x)\) for every \(x\in X\), then we have \(x\sim x\)

Symmetric: If \(x_1 \sim x_2\), then \(f(x_1)=f(x_2)\). But then \(f(x_2)=f(x_1)\), hence \(x_2\sim x_1\)

Transitive: If \(x_1\sim x_2\) and \(x_2\sim x_3\), then \(f(x_1)=f(x_2)\) and \(f(x_2)=f(x_3)\). But then \(f(x_1)=f(x_3)\), hence \(x_1\sim x_3\).

Equivalence Class

The relation classes of set A with respect to the equivalence relation ~ are called **equivalence classes*

Theorem:

Let A be a non-empty set, and let ~ be an equivalence relation on A

  • Let \(x,y\in A\). If \(x\sim y\), then \([x]=[y].\) If \(x\not\sim y\), then \([x]\cap[y]=\empty\)
  • \(\bigcup_{x]\in A}=A\)
  • \([x]=[y]\) if an only if \(x\not\sim y\)
Quotient Set

Let A be a non-empty set, and let ~ be an equivalence relation on A. The quotient set is \(A/\sim=\{[x]|x\in A\}\)

Example

Let P be the set of all people, and let be the relation on P defined by 𝑥~ 𝑦 if and only if 𝑥 and 𝑦 are the same age (in years). If person 𝑥 is 19 years old, then the equivalence class of 𝑥 is the set of all 19-year-olds. Each element of the quotient set P/~ is itself a set, where there is one such set consisting of all 1- yearolds, another consisting of all 2-year-olds, and so on. Although there are billions of people in 𝑃𝑃, there are fewer than 125 elements in P/~ ,because no currently living person has reached the age of 125.

Remark:

The quotient set is in fact a partition of A

Partition

Let A be a non-empty set. A partition of A is a family \(D\) of non-empty subsets of A such that:

  • If \(P,Q\in D\) and \(P≠ Q\), then \(P\cap Q=\empty\)
  • \(\bigcup_{P\in D}=A\)

Example

  • Let \(C=\{[n,n+1)\}_{n\in Z}\). Then C is a partition of R.

1.6 Function

Definition

A function from X to Y denote \(f:X\to Y\), is a subset \(F\subseteq X\times Y\) such that for each \(x\in X\), there is one and only one pair in F of the form \((x,y)\).

Remark:

  • Domain: X
  • Codomain: Y
  • Range: \(\{f(x)\in Y|x\in X\}\)
  • Two function are equal if they have the same domain, range, and value for all elements of the domain.
Injective/Surjective/Bijective

Let X and Y be sets, and let \(f:X\to Y\) be a function:

Injection (One-to-one): If \(x_1≠x_2\) implies \(f(x_1)≠f(x_2)\), for all \(x_1,x_2\in X\); equivalently, if \(f(x_1)=f(x_2)\) Implies \(x_1=x_2\) for all \(x_1,x_2\in X\)

Surjection (Onto): If for every \(y\in Y\), there exist some \(x\in X\) such that \(f(x)=y\); equivalently, the range equals Y, which is \(\{f(x)|x\in X\}=Y\)

Bijective: If it is both injective and surjective

Example

  • Let \(k:[0,\infty)\to[0,\infty)\) be defined by \(k(x)=x^2\) for all \(x\in [0,\infty)\) --> Bijective
  • Let \(g:[0,\infty)\to\R\) be defined by \(g(x)=x^2\) for all \(x\in [0,\infty)\) --> Injective
  • Let \(h:\R\to[0,\infty)\) be defined by \(h(x)=x^2\) for all \(x\in[0,\infty)\) --> Surjective
Composition

Let A, B, C be sets, and let \(f:A\to B\) and \(g:B\to C\) be functions. The composition of \(f\) and \(g\) is the function \((g\circ f):A\to C\) defined by \[ (g\circ f)(x)=g(f(x)) \]

Remark: The codomain of the first function much equal the domain of the second function, if the composition of two functions is defined.

Example

\(k:\R\to\R\) is defined by \(k(x)=\sin x\) for all \(x\in \R\).

\(h:(0,\infty)\to \R\) is defined by \(h(x)=ln x\)

Then \((k\circ h)(x)=k(h(x))=sin(lnx)\) is defined for all \(x\in(0,\infty)\)

However, \(h\circ k\) cannot be defined since \(ln(sinx)\) is not defined for all \(x\in\R\)

Lemma: Let A, B, C and D be sets, and let \(f:A\to B\) and \(g:B\to C\) and \(h:C\to D\) be functions:

  • \((h\circ g)\circ f=h\circ(g\circ f)\) (Association Law)

  • \(f\circ 1_A= f\) and \(1_B\circ f=f\) (Identity Law)

    Where \(1_A:A\to A\) is the identity function defined by \(1_A(x)=x\), for all \(x\in A\)

Proof:

\(m(a)=(h\circ g)\circ f(a)=(h\circ g)(f(a))=h(g(f(a)))\)

\(n(a)=h\circ(g\circ f)(a)=h((g\circ f)(a))=h(g(f(a)))\)

Lemma: Let A, B and C be sets, and let \(f:A\to B\) and \(g:B\to C\) be functions:

  • If \(f\) and \(g\) are injective, then \(g\circ f\) is injective
  • If \(f\) and \(g\) are surjective, then \(g\circ f\) is surjective
  • If \(f\) and \(g\) are bijective, then \(g\circ f\) is bijective

Proof:

We need to prove \(\forall x, y\in A\), if \((g\circ f)(x)=(g\circ f)(y)\), then \(x=y\)

Given \((g\circ f)(x)=(g\circ f)(y)\), since \(g\) is injective, then \(f(x)=f(y)\)

Given \(f(x)=f(y)\), sicne \(f\) is injective, then \(x=y\).

We need to prove \(\forall c\in C\), there exist \(a\in A\) such that \((g\circ f)(a)=c\)

Since \(g\) is surjective, then \(\forall c\in C\), there is \(b\in B\) such that \(g(b)=c\)

Since \(f\) is surjective, then \(\forall b\in B\), there is \(a\in A\) such that \(f(a)=b\)

Inverse Function

A bijective function has a unique inverse function \(f^{-1}:Y\to X\), satisfying: \[ \forall x\in X, f^{-1}(f(x))=x \ and\ \forall y\in Y, f(f^{-1}(y))=y \] Any one-to-one function \(f:X\to Y\) defines a bijection \(g:X\to R\), where R is the range of \(f\) and \(g(x)=f(x),\forall x \in X\)

Example

Suppose we have \(f(x)=sinx,x\in(-\infty,\infty)\). For \(x\in[-\frac{\pi}{2},\frac{\pi}{2}]\), the function is bijective and hence it has inverse function.

Image

For a function \(f:A\to B\) and subset \(P\subseteq A\) the image of P under \(f\), denoted \(f(P)\) is \[ f(P)=\{f(p)|p\in P\}\subseteq B (codomain) \] The range of \(f\) is the set \(f(A)\)

Let \(Q\subseteq B\). The inverse image or preimage of Q under \(f\), denoted \(f^{-1}(Q)\) Is the set defined by \[ f^{-1}(Q)=\{a\in A|f(a)\in Q\}\subseteq A (domain) \] image-20211106001706118

Example

graph TB
subgraph A
a1(1)
a2(2)
a3(3)
end
a1-->b1
a2-->b1
a3-->b2
subgraph B
b1(x)
b2(y)
b3(z)
end

Then the following would be:

\(f(A)=\{f(1),f(2),f(3)\}=\{x,y\}\)

\(f(\{1,3\})=\{x,y\}\)

\(f^{-1}(\{x\})=\{1,2\}\)

\(f^{-1}(B)=\{1,2,3\}\)

\(f^{-1}(\{x,y\})=\{1,2,3\}\)

\(f^{-1}(\{z\})=\empty\)

Remark:

  • In general, we have \[ f(f^{-1}(B))\subseteq B \ and\ f^{-1}(f(A))\supseteq A \] Example

    graph TB
    subgraph X
    1
    2
    3
    end
    1-->a
    2-->b
    3-->c
    subgraph Y
    a
    b
    c
    d
    e
    end
    

    We have \(B=\{a,b,c,d\}\)

    thus \(f^{-1}(B)=\{1,2,3\}\)

    then \(f(f^{-1}(B))=\{f(1),f(2),f(3)\}=\{a,b,c\}\subseteq B\)

    graph TB
    subgraph X
    1
    2
    3
    end
    1-->a
    2-->b
    3-->b
    subgraph Y
    a
    b
    end
    

    \(A=\{1,3\}\)

    \(f(A)=\{a,b\}\)

    \(f^{-1}(f(A))=f^{-1}(\{a,b\})=\{1,2,3\}\supseteq A\)

  • \(f(\empty)=\empty,f^{-1}(\empty)=\empty\)

  • For a bijective function \(f:A\to B\), the preimage of a singlton set \(\{f(x)\}\in B\) is a singlton set \(\{x\}\in A\)

Theorem:

  • Let \(f:A\to B,C\subseteq A,S\subseteq B\), then \(f(C)\subseteq S\) if an only if \(C\subseteq f^{-1}(S)\)

    Proof:

    If \(f(C)\subseteq S\),

    \(\forall x\in C\), then \(f(x)\in S\)

    Thus \(x\in f^{-1}(S)\) and \(C\subseteq f^{-1}(S)\)

    If \(C\subseteq f^{-1}(S)\)

    \(\forall y\in f(C)\), then \(\exist x\in C\) such that \(f(x)=y\)

    Since \(C\subseteq f^{-1}(S)\)

    \(x\in f^{-1}(S),f(x)\in S,y\in S,f(C)\in S\)

  • If \(A_1,A_2\) are two subset of A, and \(A_1\subseteq A_2\), then \(f(A_1)\subseteq f(A_2)\)

    Note that if \(f(A_1)\subseteq f(A_2)\not\implies A_1\subseteq A_2\)

    image-20211106125044228

  • \(f:A\to B\), if \(S,T\subseteq B\) and \(S\subseteq T\), then \(f^{-1}(S)\subseteq f^{-1}(T)\)

    Note that if \(f^{-1}(S)\subseteq f^{-1}(T)\not\implies S\subseteq T\)

    Proof:

    \(\forall x\in f^{-1}(S)\implies f(x)\in S\subseteq T\)

    thus \(f(x)\in T, x\in f^{-1}(T)\)

    Thus \(f^{-1}(S)\subseteq f^{-1}(T)\)

Theorem:

Let \(f:A\to B, C\subseteq A, D\subseteq B\)

  • \(x\in C\implies f(x)\in f(C)\)

  • \(y\in f(C)\implies \exist x\in C \ st.\ f(x)=y\)

  • \(x\in f^{-1}(D)\Longleftrightarrow f(x)\in D, x\in A\)

    image-20211108141545470

Theorem:

Let \(f:A\to B\), and let \(C\subseteq A, D\subseteq B\). Further, let \(C_1, C_2\subseteq A\) and \(D_1,D_2\subseteq B\), then

  • \(C\subseteq f^{-1}(f(C))\) --> Not necessarily equal

    The equality holds for all subsets C of A if and only if \(f\) is injective

    Proof:

    If \(f\) Is injective, we need to show \(f^{-1}(f(C))\subseteq C\) and \(C\subseteq f^{-1}(f(C))\)

    \(\forall x \in C, f(x)\in f(C)\)

    thus \(x\in f^{-1}(f(C))\) then \(C\subseteq f^{-1}(f(C))\)

    Also, \(\forall x\in f^{-1}(f(C))\), we have \(f(x)\in f(C)\)

    since \(f\) is injective, we have \(x\in C\)

    Thus, \(f^{-1}f(C)\subseteq C\)

    If \(f^{-1}(f(C))\subseteq C\), we need to show \(f\) is injective. Prove by contrapositive. We assume \(f\) is not injective

    then, \(\exist x_1, x_2\in A\) and \(x_1≠x_2\) but \(f(x_1)=f(x_2)=a\)

    then, let \(C=\{x_1\}\), we have \(f^{-1}(f(C))=f^{-1}(a)=\{x_1,x_2\}≠C\)

  • \(f(f^{-1}(D))\subseteq D\) --> Not necessarily equal

    The equality holds for all subsets D of B if and only if \(f\) is surjective

  • \(f(C_1\cap C_2)\subseteq f(C_1)\cap f(C_2)\) --> Not necessarily equal

    The equality holds for all subsets \(C_1,C_2\) of A if and only if \(f\) is injective

  • \(f(C_1\cup C_2)=f(C_1)\cup f(C_2)\)

    Proof:

    \(\Longrightarrow\) \(\forall y\in f(C_1\cup C_2)\), we have \(\exist x\in C_1\cup C_2\) such that \(f(x)=y\)

    ​ thus, we have \(x\in C_1, f(x)= y\) or \(x\in C_2, f(x)=y\), implies that \(y\in f(C_1)\) or \(y\in f(C_2)\)

    ​ so we have \(y\in f(C_1)\cup f(C_2)\)

    \(\Longleftarrow\) \(\forall y\in f(C_1)\cup f(C_2)\)

    ​ then we have \(y\in f(C_1)\) or \(y\in f(C_2)\), implies that \(\exist x_1\in C_1\) such that \(f(x_1)\in C\) or \(\exist x_2\in C_2\) such that \(f(x_2)=C\)

    ​ thus, \(y\in f(C_1\cup C_2)\)

  • \(f^{-1}(D_1\cap D_2)=f^{-1}(D_1)\cap f^{-1}(D_2)\)

  • \(f^{-1}(D_1\cup D_2)=f^{-1}(D_1)\cup f^{-1}(D_2)\)

    Proof:

    We have:

    \(D_1\subseteq D_1\cup D_2\) and \(D_2\subseteq D_1\cup D_2\),implies that:

    \(f^{-1}(D_1)\subseteq f^{-1}(D_1\cup D_2)\) and \(f^{-1}(D_2)\subseteq f^{-1}(D_1\cup D_2)\)

    thus, \(f^{-1}(D_1)\cup f^{-1}(D_2)\subseteq f^{-1}(D_1\cup D_2)\)

    At the same time,

    \(\forall x\in f^{-1}(D_1\cup D_2)\implies f(x)\in D_1\cup D_2\)

    \(f(x)\in D_1\) or \(f(x)\in D_2\)

    then \(x\in f^{-1}(D_1)\) or \(x\in f^{-1}(D_2)\)

    \(x\in f^{-1}(D_1)\cup f^{-1}(D_2)\)

    \(f^{-1}(D_1\cup D_2)\subseteq f^{-1}(D_1)\cup f^{-1}(D_2)\)

Example

  • Let the function \(f:\R\to \R\) Be defined by \(f(x)=x^2\). Let \(A=[1,2]\) then \(B=f(A)=[1,4]\). And \(f^{-1}(B)=f^{-1}([1,4])=[-2,-1]\cup[1,2]\subseteq A\)
  • Let the function \(f:\R\to \R\) be defined by \(f(x)=x^2+1\). Let \(B=[0,2]\) and notice that \(A=f^{-1}(B)=[-1,1]\) Then \(f(A)=f([-1,1])=[1,2]\subseteq B\)

Theorem:

Let \(f:X\to Y, A_i\subseteq X,\forall i\in I, B_i\subseteq Y\) for all \(i\in I\), then,

  • \(f(\bigcup_{i\in I}A_i)=\bigcup_{i\in I}f(A_i)\)
  • \(f(\bigcap_{i\in I}A_i)\subseteq \bigcap_{i\in I}f(A_i)\)
  • \(f^{-1}(\bigcup_{i\in I}B_i)=\bigcup_{i\in I}f^{-1}(B_i)\)
  • \(f^{-1}(\bigcap_{i\in I}B_i)=\bigcap_{i\in I}f^{-1}(B_i)\)

Metric Spaces and Topology

2.1 Metric Sapces

A metric space (X,d) is a set X along a well-defined metric d

Distance

Definition

A metric on a set \(X\) is a function \(d:X\times X\to \R\) satisfies the following properties:

  • \(d(x,y)\geq 0, \forall x, y\in X\); equality holds if and only if \(x=y\)
  • $d(x,y)=d(y,x),x, yX
  • \(d(x,y)+d(y,z)\geq d(x,z),\forall x,y,z\in X\)

Then \(d(x,y)\) is called distance between points x and y.

Example

The set of real numbers equipped with the metric of absolute distance \(d(x,y)=|x-y|\) defines the standard metric space of real numbers \(\R\).

L2 Distance / Euclidean Distance

Given \(x=(x_1,\dots,x_n),y=(y_1,\dots,y_n)\in \R^n\), note that here \(x\) are column vectors \[ d_2(x,y)=\sqrt{ \sum_{i=1}^{n}(x_i-y_i)^2} \] Proof:

Remark: Cauchy-Schwarz inequality

For column vectors \(\vec p=\left[\begin{matrix}x_1 \\ x_2 \\ .\\.\\x_n\end{matrix}\right],\vec q=\left[\begin{matrix}y_1 \\ y_2 \\ .\\.\\y_n\end{matrix}\right]\)

we have \[ \vec{p} \cdot \vec{q} = |p|\cdot|q|\cdot cos\theta \\ |\vec p\cdot \vec q|\leq |p||q|\\ |\vec p\cdot \vec q|^2\leq |p|^2\cdot|q|^2 \]

  1. \(d_2(\vec x,\vec y)\geq 0,d_2(\vec x,\vec y)=0\implies x_i=y_i\forall i=1,\dots,n\implies \vec x=\vec y\)

  2. \(d_2(\vec x,\vec y)=d_2(\vec y,\vec x)\)

  3. Prove \(d_2(\vec x,\vec y)\leq d_2(\vec x,\vec z)+d_2(\vec z,\vec y)\)

    let \(\vec p=\vec x - \vec z,\vec q=\vec z-\vec y,\vec p+\vec q=\vec x-\vec y\)

    image-20211110220717113

    \[ d_2(\vec x,\vec y)^2=|\vec x-\vec y|^2=|\vec p+\vec q|^2\\ =|\vec p|^2+2\vec p\cdot\vec q+|\vec q|^2\\ \leq |\vec p|^2 +2|\vec p||\vec q|+|\vec q|^2\\ by\ Cauchy\_Schwartz\ Inequality\\ =(|\vec p|+|\vec q|)^2=(d_2(\vec x,\vec z)+d_2(\vec y +\vec z))^2\\ \implies d_2(\vec x,\vec y)\leq d_2(\vec x,\vec z)+d_2(\vec z,\vec y) \]

L1 Distance / Manhattan Distance

Given \(x=(x_1,\dots,x_n),y=(y_1,\dots,y_n)\in \R^n\), note that here \(x\) are column vectors \[ d_1(x,y)=\sum_{i=1,\cdot n}|x_i-y_i| \]

L Infinity / Chebyshev Distance

Given \(x=(x_1,\dots,x_n),y=(y_1,\dots,y_n)\in \R^n\), note that here \(x\) are column vectors \[ d_{\infty}(x,y)=max_{i=1,\cdot n}\{|x_i-y_i|\} \]

L0 Distance / Discrete Distance

Let \(Z\) be any non-empty set whatsoever, and \[ d(x,y)=\begin{cases}1& x≠y\\0& x=y\end{cases} \] \(d(x,y)\) is called discrete distance on \(Z\)

Proof:

  1. \(d(x,y)=0\Longleftrightarrow x=y\)

  2. \(d(x,y)=d(y,x)\)

  3. Prove \(d(x,y)\leq d(x,z)+d(z,y)\)

    1. When \(x=y=z\)

    \(0\leq 0+ 0\)

    1. When \(x≠y,z=x \ or\ z=y\)

    \(1\leq 0+1\)

    1. When \(x≠y≠z\)

    \(1\leq 1+ 1\)

    Thus, True.

Jaccard Distance

\(d_J(A,B)=1-\frac{|A\cap B|}{|A\cup B|}=\frac{|A\cup B|-|A\cap B|}{|A\cup B|}=\frac{|A\bigtriangleup B|}{|A\cup B|}\)

  1. \(d_J(A,B)\geq 0,d(A,B)=0\ iff\ A=B\)

  2. \(d_J(A,B)=d_J(B,A)\)

  3. Prove \(d_J(A,B)\leq d_J(A,C)+d_J(B,C)\)

    When A=B=C --> True

    When \(C\subseteq A, C\subseteq B\)

    \(d_J(C,A)+d_J(C,B)=\frac{|C\cup A|-|C\cap A|}{|C\cup A|}+\frac{|C\cup B|-|C\cap B|}{|C\cup B|}=\frac{|A-C|}{|A|}+\frac{|B-C|}{|B|}\geq \frac{|A-C|+|B-C|}{|A\cup B|}\geq\frac{|A\bigtriangleup B|}{||A\cup B}=d_J(A,B)\)

    When in other cases, True.

Cosine Distance (Not distance)

\(d_{cos}(a,b)=1-\frac{\vec a\cdot \vec b}{||\vec a||||\vec b||}=1-\frac{\sum a_i b_i}{||\vec a||||\vec b||}\)

  1. \(d_{cos}(\vec a,\vec b)\geq 0,d_{cos}(\vec a,\vec b)=0\Longleftrightarrow cos0=1\) hold when the direction is the same, thus is not a "if and only if" --> False
  2. \(d_{cos}(\vec a,\vec b)=d_{cos}(\vec b,\vec a)\)
  3. Triangular Inequality --> False by conterexample
Example of Distances
  • Given \(x=(0,0)^T,y=(2,1)^T\)

    Then

    \(d_2(x,y)=\sqrt{2^2+1^2}=\sqrt 5\)

    \(d_1(x,y)=|2|+|1|=3\)

    \(d_\infty(x,y)=max(2,1)=2\)

  • An interval [a,b], let \(C[a,b]\) be the set of all continuous functions on [a,b]. Show that, \[ \rho(x(t),y(t))=max_{a\leq t\leq b}|x(t)-y(t)| \] is a distance between two functions \(x(t), y(t)\).

    Sol

    1. \(\rho(x(t),y(t))\geq 0\) if \(\rho(x(t),y(t))=max_{a\leq t\leq b}|x(t)-y(t)|=0\implies \forall t x(t)=y(t)\)

    2. \(\rho (x(t),y(t))=\rho(y(t),x(t))\)

    3. Note that \(|a+b|\leq|a|+|b|\) \[ \rho(x(t),y(t))=max_{a\leq t\leq b}|x(t)-y(t)|\\=max_{a\leq t\leq b}|x(t)-z(t)+z(t)-y(t)|\\ \leq \max_{a\leq t\leq b}(|x(t)-z(t)|+|z(t)-y(t)|)\\=\rho(x,z)+\rho(z,y) \]

Linear Algebra

3.1 Fields

Abelian Group

Consider a set G of objects and one operation \(\cdot\) on the elements of G, the set and operation \((G,\cdot)\) is called Abelian group / communtative group if:

Closure: \(\forall a,b\in G\), then \(a\cdot b\in G\)

Associativity: \(\forall a,b,c\in G\), then \((a\cdot b)\cdot c=a\cdot(b\cdot c)\)

Identity element: \(\exist e\in G\) such that \(a\cdot e=a\) for \(\forall a\in G\)

Inverse element: \(\forall a\in G,\exist b\in G\) such that \(a\cdot b=e\)

Commutativity: \(\forall a,b\in G\) then \(a\cdot b=b\cdot a\)

Example

  • The integers and the operation addition "+", denote (Z, +), while \((Z,\times)\) is not defined since there is no inverse element in \(\N\)

Field

Consider a set F of objects and two operations on the elements of F, addition \(+\) and multiplication \(\cdot\). The set with two operation \((F,+,\cdot)\) is a field if:

Closure: \(\forall a,b\in F\) then \(a+b\in F,a\cdot b\in F\)

Associativity: \(\forall a,b,c\in F\) then \((a+b)+c=a+(b+c),(a\cdot b)\cdot c=a\cdot (b\cdot c)\)

Identity element: \(\exist 0,1\in F\) such that \(a+0=a,a\cdot 1=a\) for \(\forall a\in F\)

Additive Inverse: \(\forall a\in F\), exist a unique element \(-a\in F\) such that \(a+(-a)=0\)

Multiplicative Inverse: \(\forall a≠0\in F,\) exist a unique element \(a^{-1}\in F\) such that \(a\cdot a^{-1}=1\)

Distribution of multiplication over addition: \(\forall a,b,c\in F\) such that \(a\cdot(b+c)=a\cdot b+a\cdot c\)

Remark

If there is no inverse element for multiplication, \((F,+,\cdot)\) is called commutative ring

3.2 Vector Space

Vector Space

A vector space (essentially a set) consists of the following:

  1. A field \(F\) of scalars
  2. A set \(V\) of obejects
  3. A binary operation called vector addition, which maps any pair of vectors \(\vec v,\vec w\in V\) a vector \(\vec v+\vec w\in V\) such that:
    1. Commutative: \(v+w=w+v\)
    2. Associative: \(u+(v+w)=(u+v)+w\)
    3. There is a unique vector such that \(v+\vec 0=v,\forall v\in V\)
    4. To each \(v\in V\), there is a unique vector \(-v\in V\), such that \(v+(-v)=\vec 0\)
  4. An operation called scalar multiplication, which associates with each \(s\in F\) and \(\vec v\in V\) a vector \(s\vec v\in V\) such that:
    1. \(1v=v,\forall v\in V\)
    2. \((s_1s_2)\vec v=s_1(s_2\vec v)\)
    3. \(s(\vec v+\vec w)=s\vec v+s\vec w\)
    4. \((s_1+s_2)\vec v=s_1\vec v+s_2\vec v\)

Example

Standard vector space of functions:

Let \(X\) be a non-empty set and \(Y\) be a vector space over \(F\). Consider the set \(V\) of all function mapping from \(X\) to \(Y\).

The vector addition of two functions \(f,g\in V\) is the function from \(X\) into \(Y\) denoted by: \[ (f+g)(x)=f(x)+g(x) \] where the RHS uses vector addition from \(Y\).

The scalar product of \(s\in F\) and the function \(f\in V\) is the function \(sf\) denoted by : \[ (sf)(x)=sf(x),\forall x\in X \] where the RHS uses scalar mulitiplication from \(Y\).

Check whether is a vector space.

  • \((f+0)(x)=f(x)+0(x)=f(x)\), where \(o(x)\) is the zero function
  • \(f(x)+(-f(x))=0(x)\)

Subspace

Definition

Let \(V\) be a vector space over \(F\). A subspace (subset of vector space) of \(V\) is a subset \(W\subsetneq V\) which is itself a vector space over F.

Theorem:

A non-empty subset \(W\subsetneq V\) is a subset of \(V\) if \(W\) satisfies the following conditions:

  • \(\vec 0\in W\)
  • \(\forall \vec u,\vec v\in W,\vec u+\vec v\in W\)
  • \(\forall s\in F,\vec u\in W,s\vec u\in W\)

Example

  • Let \(A\) be an \(m\times n\) matrix over \(F\). The set of all \(n\times 1\) column vectors \(V\) such that \(\vec v\in V\) and \(A\vec v=0\) is a subspace of \(F^{n\times 1}\)

    Prove:

    \(W=\{\vec v|A\vec v=\vec 0\}\)

    1. \(A\vec 0=0\) True
    2. when \(A(\vec v_1+\vec v_2)=A\vec v_1+A\vec v_2=\vec 0\in W\) True
    3. \(A(x\vec v)=xA\vec v=\vec 0\in W\) True
  • The subspace of \(\R^2\) are precisely \(\{\vec 0\},\R^2\) and all lines in \(\R^2\) through the origin.

  • Prove that: \(b\in F,V=\{(x_1,x_2,x_3,x4)^T\in F^4|x_4=5x_3+b\}\) is a subspace of \(F^4\) if and onle of \(b=0\)

    Sol:

    If \(b=0\), then \(b\in F,V=\{(x_1,x_2,x_3,x4)^T\in F^4|x_4=5x_3\}\)

    1. \((0,0,0,0)^T\in V\)
    2. \(\vec v_1=[x_1,x_2,x_3,x_4]^T,\vec v_2=[y_1,y_2,y_3,y_4]^2,\vec v_1+\vec v_2=[x_1+y_1,x_2+y_2,x_3+y_3,x_4+y_4]\) and we have \(x_4=5x_3,y_4=5y_3\implies x_4+y_4=5(x_3+y_3)\)
    3. \(\lambda v_1,x_4=5x_3,\lambda x_4=\lambda 5x_3\)

    If \(V\) is subspace,

    there should be \(\vec 0\in V\) such that \(0=5\cdot 0+b,b=0\)

Remark:

  • \(\{\vec 0\}\) is the smallest subspace of \(V\)
  • \(V\) itself is the largest subspace of \(V\)
  • The empty set is NOT a subspace of \(V\) since a vector space must contain at least one element.
Sum Of Subspaces

Suppose \(U_1,U_2,U_3,...,U_m\) are the subspaces of \(V\). The sum of subspaces is the set of all possible sums of elements of \(U_1,..,U_m\): \[ U_1+\cdots+U_m=\{u_1+\cdots+u_m|u_1\in U_1,\cdots u_m\in U_m\} \] Example

  • \(U=\{(x,0,0)^T\in F^3\}, W=\{(0,y,0)^T\in F^3|y\in F^3\}\), then \(U+F=\{(x,y,0)^T|x,y\in F^3\}\)

  • For \(U=\{(x,y,y)\in F^3,x,y\in F\}, W=\{(x,x,y)\in F^3,x,y\in F\}\), prove that \(U+W=\{(x,y,z)\in F^3,x,y,z\in F\}\) is a sum of subspace.

    Sol:

    We have the following decomposition,

    \((x,y,z)=(0,0,z-y)\in W+(x,y,y)\in U\)

    and \((1,1,1)=(1,1,1)\in W + (0,0,0)\in U\)

Remark: Sum of subspaces in the theory of vector spaces are analogous to unions of subsets in set theory.

Theorem:

Suppose \(U_1,U_2,...,U_m\) are the subspaces of \(V\). Then \(U_1+\cdots+U_m\) is the smallest subspaces of \(V\) containing \(U_1,...,U_m\)

Proof:

\(0\in U_1+U_2+\cdots+U_m\) and \(U_1+U_2+\cdots+U_m\) is closed under addition and multipliaction, thus it is a subspace.

At the same time,

we have \(U_1,U_2,...U_m\subseteq U_1+U_2+\cdots+U_m\).

And conversely, every subspace of \(V\) containing \(U_1,U_2...,U_m\) contains \(U_1+U_2+\cdots +U_m\) because subspaces are closed for sum. Thus, \(U_1+U_2+\cdots +U_m\) is the smallest subspace.

Direct Sum

Suppose \(U_1,U_2,...,U_m\) are the subspaces of \(V\). The sum \(U_1+\cdots+U_m\) is called a direct sum denoted by \(U_1\oplus\cdots \oplus U_m\) if each element \(u\in U_1+\cdots +U_m\) can be written in only way (decomposed uniquely) as \(u=u_1+\cdots+u_m\) where \(u_j\in U_j\)

Theorem:

Suppose \(U_1,U_2,...,U_m\) are subspaces of \(V\). The sum \(U_1+\cdots+U_m\) is a direct sum if and only if the only way to write \(\vec 0\) as sum \(u_1+\cdots+u_m\) with \(u_j\in U_j\), is by taking each \(u_j\) equal to \(\vec0\)

\(\implies\) Obversely by the definition of direct sum.

\(\Longleftarrow\) if \(\vec 0\in u_1+\cdots+u_m\) could be represented in two ways.

​ Then \(\vec 0=u_1+\cdots+u_m=u_1'+\cdots+u_m',u_i,u_i'\in U_i\)

​ Then \((u_1-u_1')+\cdots+(u_m-u_m')=\vec 0\)

​ Since \(\vec 0\) could only be written as \(0+0+0...+0\)

​ We have \(u_i=u_i'\)

Example

  • Let \(U=\{(x,y,0)\in F^3:x,y\in F\},W=\{(0,0,z)\in F^3:x,y\in F\}\), then show that \(F^3=U\oplus W\)

    Sol:

    1. Show is a subspace

      \(\forall (x,y,z)\in F^3,(x,y,z)=(x,y,0)\in W+(0,0,z)\in W\)

    2. Show ii has unique decomposition

      If \((x,y,z)=(x_1,y_1,0)+(0,0,z_1)=(x_2,y_2,0)+(0,0,z_2)\)

      then \((x_1-x_2,y_1-y_2,0)=(0,0,z_2-z_1)\)

      then \(x_1-x_2=0,y_1-y_2=0,z_2-z_1=0\)

  • Let \(U_1=\{(x,y,0)\in F^3:x,y\in F\}\\U_2=\{(0,0,z)\in F^3:z\in F\}\\U_3=\{(0,y,y)\in F^3:y\in F\}\), then is \(U_1+U_2+U_3\) a direct sum?

Sol:

\((x,y,z)=(x,y,0)+(0,0,z)+(0,0,0)\\(0,0,0)=(0,1,0)+(0,0,1)+(0,-1,-1)=(0,0,0)+(0,0,0)+(0,0,0)\)

Thus, NO

Theorem:

Suppose \(U\) and W are subspaces of \(V\). Then \(U+W\) is a direct sum if and only if \(U\cap W=0\).

Note that the result above deals only wiht the case of two subspaces

Proof:

If \(U+W\) is a direct sum, then \(\forall v\in U\cap W,-v\in U\cap W\),

\(\vec 0=v+(-v),v\in U,(-v)\in W\)

then by the unique decomposition, we have \(v=0\implies U\cap W=\{\vec 0\}\)

If \(U\cap W=\{\vec 0\}\)

suppose \(\vec 0=u+w\), with \(u\in U,w\in W\), we have \(u=-w\)

Then \(w=-u\in U,u=-w\in W\), thus \(u,w\in U\cap W\)

thus \(u=w=0\)

\(U+W\) is a direct sum.

Example

we have \(U_1=\{(x,y,0)\in F^3|x,y\in F\}\\U_2=\{(0,0,z)\in F^3|z\in F\}\\U_3=\{(0,y,y)\in F^3|y\in F\}\)

then \(U_1\cap U_2 \cap U_3=\{0\}\) but \(U_1+U_2+U_3\) is not a direct sum beacuse

\((0,0,0)=(0,1,0)+(0,0,1)+(0,-1,-1)=(0,0,0)+(0,0,0)+(0,0,0)\)

Span

Linear Combination

A vector \(w\in V\) is said to be a linear combination of vectors \(v_1,...,v_n\in V\) provided that there exist scalars \(s_1,...,s_n\in F\) such that \[ w=\sum_{i=1}^{n}s_iv_i \] Let \(S\) and \(T\) be two subsets of vector spaces \(V\). S and \(T\) are said to be equivalent if any element in S could be a linear combination of T, and any element in T could be a linear combination of S.

Example

  • \((17,-4,2)^T=6(2,1,-3)^T+5(1,-2,4)^T\)
  • But for \((17,-4,5)\) there do not exist numbers \(a_1,a_2\in F\) such that \((17,-4,5)^T=a_1(2,1,-3)^T+a_2(1,-2,4)^T\)
Span

Let \(U\) be a list or set of vectors in \(V\). Then the span of \(U\),\(span(U)\) is defined to be the set of all finite linear combinations of the vectors in U: \[ span(U)=\{v|v=\sum\lambda_i u_i,\lambda_i\in F,u_i\in U\} \]

Remark:

\(\vec 0\in span(U)\)

Example

  • \((17,-4,2)^T\in span((2,1,-3)^T,(1,-2,4)^T)\) while \((17,-4,5)^T\not \in span((2,1,-3)^T,(1,-2,4)^T\)
  • Span of empty set: \(\{0\}\)

Theorm:

For a vector space \(V\), the span of any list or set of vectors in \(V\) Form a subspace.

Example

For \(U=\vec e_1,\vec e_2\), where \(\vec e_1=(1,0)^T,\vec e_2=(0,1)^T\)

\(span(U)=\R^2\)

Theorem:

The span of a list of vectors \(U\) in V is the intersection of all subspaces of \(V\) that contain \(U\). Which is the smallest subspaces of V containing U.

Proof:

Let \(v_1,v_2,...,v_m\) be a list of vectors in V

  1. Show that \(span(v_1,...,v_m)\) is a subspace.

    1. \(\vec 0=0\cdot v_1+\cdots+0\cdot v_m\in span(v_1,...,v_m)\)

    2. \((a_1v_1+\cdots+a_mv_m)+(b_1v_1+\cdots+b_mv_m)=(a_1+b_1)v_1+\cdots+(a_m+b_m)v_m\in span(v_1,...,v_m)\)

    3. \(\lambda (a_1v_1+\cdots+a_mv_m)=\lambda a_1v_1+\cdots+\lambda a_mv_m\in span(v_1,...,v_m)\)

  2. Show it is the smallest one

    \(\forall \vec v_j, v_j\in span(v_1,...,v_m)\)

    For any subspaces \(U\subseteq V\), and \(v_j\in U,j=1,...,m\), then the linear combination of \(\sum_{i=1}^{m}\lambda_j v_j\in U\)

If \(span(v_1,...,v_m)\) equals \(V\), we say that \(v_1,...,v_m\) spans \(V\).

Example

\((1,0,\cdots,0)^T,(0,1,\cdots,0)^T,\cdots,(0,0,\cdots,1)^T\) spans \(F^n\)

A vector space is called finite-dimensional if some finite number of vectors span the space.

A vector space is called infinte-dimensional if it is not finite-dimensional.

Example

A function \(p:F\to F\) is called a polynomial with coefficients in \(F\) if there exist \(a_0,\cdots,a_m\in F\) such that, \[ p(z)=a_0+a_1z+\cdots+a_mz^m \] For all \(z\in F\)

\(p(F)\) is the set of all polynomials with coefficients in F, is infinite-dimensional.

\(p_m(F)\) is the set of all polynomials with coefficients in F and degree at most m, is a finite-dimension vector space for each non-negative integer m.

Linear Dependence and Independence

Let V be a vector space over F. A list of vectors \(u_1,\cdots,u_n\in V\) is called linearly dependent if there are scalars \(s_1,...,s_n\in F\), not all equal to 0, such that \(\sum_{i=1}^{n}s_iu_i=0\)

  • A list that is not linearly dependent is called linearly independent
  • A subset \(U\subseteq V\) is called linearly dependent if there is a finite list \(u_1,...,u_n\in U\) of distinct vectors that is linearly dependent.

Example

  • For \(V=\R^4\), vectors \(v_1=(1,1,0,0),v_2=(0,1,1,0),v_3=(0,0,1,1)\) Are linearly independent.
  • IMG_77AFD9C7E5FE-1

Remark:

  • Any subset of a linearly independent set is also linearly independent
  • Any set which contains the \(\vec 0\) is linearly dependent.
  • A set \(U\subsetneq V\) Is linearly independent if and only if each finite subset of U is linearly independent.
Steinitz Exchange Lemma

Let \(V\) be a vector space over \(F\). Suppose that \(\alpha_1\cdots\alpha_s\) is a linearly independent set of vectors in \(V\), and could be linear combiniations of \(\beta_1\cdots\beta_2\in V\) (\(\beta\) havs more information), then

  • \(s\leq t\)
  • \(\alpha_1\cdots \alpha_s\) could exchange \(s\) of vectors in \(\beta_1\cdots \beta_t\) such that \(\alpha_1 \cdots \alpha_s,\beta_{s+1}\cdots \beta_t=\beta_1\cdots\beta_t\)

Back to Dimension

Example

\(\vec\alpha_1=[1,0]^T,\vec\alpha_2=[0,1]^T\\ \vec\beta_1=[0,1]^T,\vec\beta_2=[1,1]^T,\vec\beta_3=[2,3]^T\)

Then \((\vec\alpha_1,\vec\alpha_2,\vec\beta_3)\) is equivalent with \((\vec\beta_1,\vec\beta_2,\vec\beta_3)\) --> How to Check Equivalence?

Basis

A basis of \(V\) is a list of vectors in V that is linearly independent and spans V. The vector space is finite-dimensional if it has a finite basis.

Check a Basis
  1. \(V=span(v_1\cdots v_n),\forall v_n\in V\)
  2. \(v_1\cdots v_n\) Are linearly independent.

Example

Check \((1,-1,0),(1,0,-1)\) is the basis of \(V=\{(x,y,z)^T\in F^3|x+y+z=0\}\). Note that all vectors are coloumn vectors.

Sol:

  1. Check span

    Since \((x,y,z)=(-y-z,y,z)\), we have

    \((-y-z,y,z)=(-y,y,0)+(-z,0,z)=-y(1,-1,0)-z(1,0,-1)\)

  2. Check linear dependency

    \(\lambda_1(1,-1,0)+\lambda_2(1,0,-1)=(0,0,0)=(\lambda_1+\lambda_2,-\lambda_1,-\lambda_2)=(0,0,0)\implies \lambda_1=\lambda_2=0\)

Remark:

Basis of a vector space V contains:

  • The most vectors that are linearly indepedent.
  • THe least vectors that span the vector space V.

Theorem: Every spanning list in a vector space can be reduced to form a basis of the vector space.

Proof:

If \(V=span(\vec v_1,\vec v_2,...,\vec v_3)\), we want to remove some vectors to form the basis of \(V\).

Let \(B=\{\vec v_1,...,\vec v_n\}\)

Step 1: If \(\vec v_1=\vec 0\), then remove \(\vec v_1\), else keep \(\vec v_{1}\).

Step 2: If \(\vec v_j\) is in \(span(\vec v_1,...,\vec v_{j-1})\), delete \(\vec v_{j}\) from B

​ If \(\vec v_j\) is not in \(span(\vec v_1,...,\vec v_{j-1})\), keep \(\vec v_{j}\) In B

Finally we will have a set of linearly independent vectors such that \(span(\vec v_1,...,\vec v_m)=span(\vec v_1,...,\vec v_m)\)

Example

\(V=span\{(1,1),(0,1),(1,0)\}\), we want to find the basis of V.

  1. Check \((1,1)\) is not \(\vec 0\).
  2. Check whether \((1,1)\) and \((0,1)\) are linearly independent --> Yes
  3. Check whether \(\{(1,1),(0,1)\}\) and \((1,0)\) are linearly independent --> No, since \((1,0)=(1,1)-(0,1)\)

Thus, the basis is \(V=span(v_1,v_2)\).

But, by changing the order of checking, we can find other basis such as: \(V=span\{(1,0),(0,1)\}\)

Theorem:

Every vector space has a basis.

Theorem:

Every linearly independent finite number of vectors in a finite-dimensional vector space can be extended to a basis of the vector space.

Theorem:

Suppose \(V\) is finite-dimensional and \(U\) is subspace of \(V\). Then there is a subspace \(W\) Of \(V\), such that \(V=U\oplus W\).

Proof:

SInce \(V\) is finite-dimensional, \(W\) is also finite-dimensional.

Let \(\vec u_1,...,\vec u_m\) be the basis of \(U\), we have \(span(\vec u_1,...,\vec u_m)=U\) and \(\exist \vec w_1,...,\vec w_n\in V\) such that, by the thorem bi extension, \(\vec u_1,...,\vec u_m,\vec w_1,...,\vec w_n\) is the basis of \(V\)

Thus, we define \(W=span(\vec w_1,...,\vec w_n)\) then \(W\sub V\) is a subspace.

Next we prove \(U\oplus W=V\)

  1. Since \(\vec u_1,...,\vec u_m,\vec w_1,...,\vec w_n\) is the basis of \(V\),

    \(\forall \vec v\in V, \vec v=\lambda_1\vec u_1+\cdots+\lambda_m\vec u_m+\beta_1\vec w_1+\cdots+\beta_n\vec w_n\\=\vec u+\vec w\),where \(\vec u\in U,\vec w\in W\)

  2. \(\forall \vec v\in U\cap W\\ \implies \vec v=\lambda_1\vec u_1+\cdots+\lambda_m\vec u_m=\beta_1\vec w_1+\cdots+\beta_n\vec w_n=0\)

    since \(\vec u_1...\vec u_m,\vec w_1,...,\vec w_n\) is basis of V, which means they are linearly independent.

    we have \(\lambda_1=...=\lambda_m=\beta_1=...=\beta_n\implies \vec v=0\)

Example

For \(V=P_5,U=span\{1,x^2\}\), find \(W\) such that \(V=U\oplus W\)

Sol:

\(P_5=a_0+a_1x+a_2x^2+a_3x^3+a_4x^4+a_5x^5\)

which means \(V=span\{1,x,x^2,x^3,x^4,x^5\}\)

Since we have \(U=span\{1,x^2\}\), we have \(W=\{x,x^3,x^4,x^5\}\)

Dimension

By Stinitz exchange lemma, if \(V\) is a finite-dimensional vector space, then any two bases of \(V\) have the same number of element.

Definition

The dimension of a finite-dimensional vector space is the number of elements in any basis for \(V\), denoted by \(dim V\)

Theorem:

Suppose \(V\) is finite-demensional, then every linearly independent list of vectors in \(V\) with length \(dimV\) is a basis for \(V\).

Theorem:

If \(V\) is finite-demensional and \(U\) is subspace of \(V\), then \(dimU\leq dimV\).

Example

Show that \(1,(x-5)^2,(x-5)^3\) is a basis of the subspace \(U\) of \(P_3(R)\) defined by \(U=\{p\in P_3(\R):p'(5)=0\}\).

Sol:

Since \(p'(5)=0\), we have P is the factor of \((x-5)^2\), which is

\(P=g(x)(x-5)^2\\P'=g'(x)(x-5)^2+2g(x)(x-5)\), thus \(g(x)\) Has three potential values \(g(x)=(x-5)\\g(x)=C\\g(x)=0\)

then we have \((x-5)^3,(x-5)^2,1\) can represent the \(P_3\) and they are linearly independent. Thus they are a basis of \(U\).

Theorem (Dimesion of a sum)

If \(U_1\) and \(U_2\) are subspaces of finite-dimensional vector spcace, then: \[ dim(U_1+U_2)=dimU_1+dim_2-dim(U_1\cap U_2) \]

Suppose \(U_1,U_2,...,U_m\) are the subspaces of \(V\). The sum \(U_1+\cdots+U_m\) is a direct sum if and only if \[ dim(U_1+\cdots+U_m)=dimU_1+\cdots+dimU_m \]

Example

Let \(U=\{p\in P_4(F: p(6)=0\}\)

  1. Find the basis of U

Since we have \(p(6)=0\), \((x-6)\) is a factor of P, thus we have \(P=g(x)(x-6)\in P_4\), where \(g(x)=1,x,x^2,x^3\in P_3\) (note that the zero function is included, by taking them equal to zero). finally we have \((x-6),x(x-6),x^2(x-6),x^3(x-6)\) as basis.

General Method:

The standard form of \(P_4\) is \(P_4=a_0+a_1x+a_2x^2+a_3x^3+a_4x^4\)

For \(P(6)=0\) we have: \(P(6)=a_0+a_16+a_26^2+a_36^3+a_46^4=0\), the solution of the equation are:

\((-6,1,0,0,0)^T,(-6^2,0,1,0,0)^T,(-6^3,0,0,1,0)^T,(-6^4,0,0,0,1)\), plug these solutions in, we have: \(P_4=x-6,x^2-6^2,x^3-6^3,x^4-6^4\)

3.3 Matrices

Definition

image-20211122235900578
Matrix Multiplication

Let A be an \(m\times n\) matrix over F and let B be an \(n\times p\) Matrix over F. The matrix product AB is the \(m\times p\) matrix C: \[ c_{ij}=\sum_{k=1}^na_{ik}b_{kj} \]

Remark:

  • The j-th column vector of C \(\vec c_j\) is the linear combination of columns of A

    \((A\cdot B)_j=\vec a_1 b_{1j}+\vec a_2 b_{2j}+\cdots+\vec a_nb_{nj}\), where \(\vec a_n\) are column vectors of A

  • The i-th row vector of C \(\vec c^*\) is the linear combination of the rows of B

    \((A\cdot B)_i=a_{i1}\vec b_1^*+a_{i2}\vec b_2^*+\cdots +a_{in}\vec b_n^*\), where \(\vec b_n^*\) are row vectors of B

  • \(Rank(A)=Rank(A^T)\)

Lemma:

If \(\vec a=\begin{bmatrix}a_1\\a_2\\ \vdots\\ a_m \end{bmatrix}\in F^m, \vec b^*=(b_1,b_2,...,b_n)\in F^n\) are two vectors and \(A=\vec a\times \vec b_n^*\), then \(Rank(A)=1\).

Note: This lemma means that all other columns of A can be represented by the first column of A.

Then for \(A\times B\), we have \(A\times B=\sum_{k=1}^n\) where \(\vec a_k=\begin{bmatrix}a_{1k}\\a_{2k}\\ \vdots\\ a_{mk} \end{bmatrix}_{m\times 1}, \vec b_k^*=(b_{k1},b_{k2},...,b_{k3})_{1\times p}\), we know the for each k, \(Rank(\vec a_k \vec b_k^*)=1\), meaning that \(AB\) is n rank 1 matrix sum

Theorem

Let \(A\) be an \(m\times n\) matrix over \(F\) and let \(B\) be an \(n\times p\), then:

  • \(Rank(AB)\leq min\{Rank(A), Rank(B)\}\)
  • \(Rank(A+B)\leq Rank(A)+Rank(B)\)
  • \(Rank(A^TA)=Rank(AA^T)=Rank(A)=Rank(A^T)\)
  • If \(P\in F^{m\times m}, Q\in F ^{n\times n}\) and invertible, then \(Rank(PAQ)=Rank(PA)=Rank(AQ)=Rank(A)\)

Echelon Forms

Row Echelon Form:

image-20211123022302822

Reduced Row Echelon Form:

image-20211123022408248

Remark:

image-20211123025404848

  • Elementary matrix is square matrix
  • Multiplying elementry matrix on the left of A means elementary row operations on A
  • Multiplying elementary matrix on the right of A means elementary column operations on A
  • Elementary matrix is invertible
  • Elementary operation does not change the rank of matrix

Lemma:

For any \(m\times n\) matrix A over F, there is an invertible \(m\times m\) matrix \(P\) over \(F\) such that \(R=PA\) is the reduced row echelon form.

Remark:

  • This is the first factorization of \(A=P^{-1}R\), where R is the reduced row echelon form.
  • If \(Rank(A)=r\) and let \(R'\) be the matrix by removing those zero rows of R, then \(A=CR'\), where \(C\in F^{m\times r}\) be the first \(r\) Columns of \(P^{-1}\). (Zeros in RREF are trivial)

Lemma:

Let \(A\) be an \(m\times n\) Matrix over \(F\) with \(m\lt n\) (#row < #colunm). Then, there exists a length-n column vector \(x≠0\) such that \(A\vec x=0\)

Remark: If the unkonwn number n is larger than equation number m, there is always notrivial solution for \(A\vec x=0\).

Theorem:

Let \(A\in F^{m\times n}\), with \(Rank(A)=r\), then there exist invertible matrix \(P\in F^{m\times m}\) and \(Q\in F^{n\times n}\) such that \(A=P\begin{bmatrix}I_{r\times r} & 0_{r\times (n-r) }\\ 0_{(m-r)\times r}&0_{(m-r)\times (n-r)} \end{bmatrix}Q\). (P as row operation, Q as column operation)

Corollary 1:

A matrix \(A\in F^{n\times n}\) is invertible if and only if it could be written as a production of elementary matrix.

Corollary 2:

A matrix \(A\in F^{n\times n}\) is invertible if and only if A could be changed to be a \(n\times n\) identity matrix after finite elementary operation: \(PA_{n\times n}Q=I_{n\times n}\)

Theorem (Rank Factorization): Let \(A\in F^{m\times n}\), then \(Rank(A)=r\geq 1\) if and only if there exist a full column rank matrix \(B\in F^{m\times r}\) and a full row rank matrix \(C\in F^{r\times n}\) such that \(A_{m\times n}=B_{m\times r}C_{r\times n}\) (B and C are NOT unique)

3.4 Linear Transformation

Definition

Let \(V\) adn \(W\) be vector sapce. A linear transformation from \(V\) to \(W\) is a function \(T:V\to W\) with the following properties:

  • Additivity: \(T(u+v)=Tu+Tv,\forall u,v\in V\)
  • Homogeneity: \(T(\lambda v)=\lambda T(v),\forall \lambda\in F,\forall v\in V\)

Example

  • Let A be a fixed $ mn$ matrix over F. the function T is defined by \(T(v)=Av\) is a linear transformation from \(F^m\to F^n\).

    \(\vec v_{n\times 1}\in V,A_{m\times n}\vec v_{n\times 1}=\vec w_{m\times 1}\in W\)

    \(T:F^n\to F^m,T\vec v=A\vec v\)

    graph LR
    subgraph F^n
    v
    end
    v--Tv-->Av
    subgraph F^m
    Av
    end
    
  • Let V be the subspace of contiuous functions from \(\R\to\R\), and define \(T\) by \(Tf(x)=\int_0^xf(t)dt\). then T is a linear transforamtion from V to V. The function \(Tf\) is continous and differentiable.

    \(T(f_1+f_2)=\int_0^x(f_1(t)+f_2(t))dt=Tf_1(x)+Tf_2(x)\)

  • Zero map: \(0\in L(v,w),0(\vec v)=\vec 0, \forall\vec v\in V\)

  • Idenity map: \(I\in L(V,V),I(\vec v)=\vec v\)

  • Differentiation: \(D\in L(P(R),P(R)), DP=P'\)

  • Integration: \(T\in L(P(R),R), TP=\int_0^1P(x)dx\)

  • Multiplication by \(x^2\): \(T\in L(P(R),P(R)),(TP)(x)=x^2P(x)\)

  • Backword Shift: \(T\in L(R^{\infty},R^{\infty}), T(x_1,x_2,...)^T=(x_2,x_3,...)^T\)

Let \(L(V,W)\) denote the collection of all linear transformation from \(V\) into \(W\), where V and W are vectors spaces over a field F.

Suppose \(S,T\in L:(V,W)\) and \(\lambda \in F\). The sum \(S+T\) and the product \(\lambda T\) are linear maps from V to W defined by: \((S+T)\vec v=S\vec v+T\vec v\) and \((\lambda T)\vec v=\lambda (T\vec v),\forall \vec v\in V\)

Remark:

  • \(L(V,W)\) is a vector space. (Idenity element: zero function)
  • If \(T\) is a linear transformation, then \(T(0)=0\)
  • If \(a_1,a_2,...,a_k\in V\) are linear dependent, then the under linear transformation T, \(T(a_1),T(a_2),...,T(a_k)\) are linearly dependent
  • If \(T(a_1),T(a_2),...T(a_k)\) are linearly independent, then \(a_1,a_2,...a_k\) are linearly independent

If \(T\in (U,V)\) and \(S\in L(V,W)\), then the product \(ST\in L(V,W)\) is defined by \[ (ST)u=S(Tu) \] For \(u\in U\). Note that multiplication is not commutative: ST≠TS

Remark:

  • It is just the composition \(S\circ T\) Of two functions.
  • The product satisfies the following properties:
    • Associativity: \((T_1T_2)T_3=T_1(T_2T_3)\)
    • Identity: \(TI=IT=T\)
    • Distributive: \((S_1+S_2)T=S_1T+S_2T\) and \(S(T_1+T_2)=ST_1+ST_2\) whenever \(T,T_1,T_2\in L(U,V)\) and \(S,S_1,S_2\in L(V,W)\)

Theorem: Suppose \(v_1,...v_n\) is the basis of V and \(w_1,...w_n\in W\). Then there exist a unique transformation \(T:V\to W\), such that \[ T\vec v_j=\vec w_j, for\ each\ j=1,...n \] Proof:

Since \(\vec v_1,...\vec v_n\) is the basis of V \(\implies \forall \vec v\in V,\vec v=\sum_{i=1}^n\lambda_i\vec v_i\)

Then we define \(T(\vec v)=T(\sum_{i=1}^n\lambda_i\vec v_i)=\sum_{i=1}^n \lambda_i\vec w_i\in W\)

Such that \(\forall \vec v_j,\exist\lambda_i=0,i≠j,\lambda_j=1\) St \(T(\vec v_j)=\vec w_j\)

Next we need to show T is a linear transformation

Then show its uniqueness

Ranges and Null Space

Range ((collection of output))

For a linear transforamtion \(T:V\to W\), the range of T is the set of all vectors \(\vec v\in W\) such that \(\vec w=T\vec v\) for some \(\vec v\in V\). \[ R(T)=\{\vec w\in W|\exist \vec v\in V,s.t. T\vec v=\vec w\}=\{T\vec v|\vec v\in V\} \] Nullspace/Kernal: \[ Null(T)=\{\vec v\in V|T\vec v=\vec 0\} \] image-20211124232010680

Example

  • If \(T\) is the zero transformation from \(V\) to \(W\) such that \(T\vec v=0,\forall \vec v\in V\), then \(NullT=V,R(T)=\{\vec 0\}\)
  • Suppose \(S\) is the transformation from \(F^{\infty}\) to \(F^{\infty }\) is the backward shift defined by \(T(x_1,x_2,...)=(x_2,x_3,...)\). Then \(T(x1,x_2,...)=0\) iff \(x_2,x_3,...\) are all 0.. Thus, \(NullT=\{(a,0,...)|a\in F\},R(T)=F^{\infty}\)
  • For \(D:P_4\to P_4\), \(R(D)=P_3\subseteq P_4, Null(D)=\{C|C\in F\}\)

Theorem:

For a linaer transformation \(T:V\to W\), Null T i s a subspace of V and R(T) is a subspace of W

Theorem: For a linear transformation \(T:V\to W\), T is injective if and only if \(NullT=\{0\}\); T is surjective if and only if its range \(R(T)=W\)

Example

The differential transformation \(D\in L(P_5(\R),P_5(\R))\) is not surjective; \(S\in L(P_5(\R),P_4(\R))\) is surjective

Rank and Nullity

\[ Rank(T)=dim(range(T))\\nullity(T)=dim(Null(T)) \]

Fundamental Theorem of Linear Transformation

Suppose \(V\) is finite dimensionl and \(T\in L(V,W)\). Then range \(T\) is finite dimensional and \[ dimV=rank(T)+nullity(T) \] Proof:

\(Null(T)\) is a subspace of V \(\implies \exist subspace\ U \subseteq V\) such that \(V=Null(T)\oplus U\)

Then we have \(dim(V)=dim(Null(T))+dim(U)\)

We need to prove \(dim(U)=dim(R(T))\)

If \(\vec u_1,...\vec u_m\) is the basis of U, since 0 are excluded, then \(T\vec u_1,...T\vec u_m\) is the basis of R(T).

Example

  • For \(D\in L(P_5(\R),P_5(\R)), Dp(x)=p'(x)\)

    \(R(D)=P_4(\R),dim(R(D))=5\\Null(D)=P_0(\R),dim(Null(T))=1\\dim(V)=dim(P_5)=6\)

  • For \(T\in L(F^m,F^n),T\vec x=A_{n\times m}\vec x_{m\times 1},rank(A)=8\)

    \(Null(T)=\{\vec x| A\vec x=0\}\\ R(T=\{A\vec x\})\\dim(R(T))=rank(T)=rank(A)=8\\dim(Null T)=Nullity T=n-8\)

Corollary

Suppose \(V\) and \(W\) are fine-dimensional vector spaces

  • If \(dimV>dimW\), then no linear map from \(V\) to \(W\) is injective
  • If \(dimV<dimW\) then no linear map from \(V\) to \(W\) is surjective

Application

A homogeneous system of linear equations with more variables than equations have nonzero solutions

A inhomogeneous system of linear equations with more equations than variables have no solution for some choice of constant terms.

Linear Transformation and Matrix

Assume \(V\in F^n\) and \(W\in F^m\) are two vector spaces. \(T:V\to W\) is a linear transformation. \(\{\alpha_1,...\alpha_n\}\) and \(\{\beta_1,...\beta_m\}\) are the basis of V and W. Since \(T(\alpha_i)\in W\) and \(\{\beta_1,..\beta_m\}\) are the basis of W, then

image-20211125225957369

image-20211125230058806

The matrix A depends on the basis of V, the basis of W and the linear transformation T. In general, the matrix of T respect to differenc basis is denote \(M(T,(v_1,...v_n),(w_1,...w_m))\), which is uniquely defined by the three elements.

Example

  • Suppose \(T\in L(F^2,F^3)\) is defined by \(T(x,y)=(x+3y,2x+5y,7x+9y)\). Find the matrix of T with respect to the standard basis of \(F^2\) and \(F^3\).

    Sol:

    \(A=M(T,(\begin{bmatrix}1\\0\end{bmatrix}\begin{bmatrix}0\\1\end{bmatrix}),(\begin{bmatrix}1\\0\\0\end{bmatrix}\begin{bmatrix}0\\1\\0\end{bmatrix}\begin{bmatrix}0\\0\\1\end{bmatrix}))\)

    We have \(\vec e_1=\begin{bmatrix}1\\0\end{bmatrix},\vec e_2=\begin{bmatrix}0\\1\end{bmatrix}\) are basis of \(F^2\) and \(\beta_1=\begin{bmatrix}1\\0\\0\end{bmatrix},\beta_2=\begin{bmatrix}0\\1\\0\end{bmatrix},\beta_3=\begin{bmatrix}0\\0\\1\end{bmatrix}\) are basis of \(F^3\)

    Then \(T\vec e_1=T\begin{bmatrix}1\\0\end{bmatrix}=\begin{bmatrix}1\\2\\7\end{bmatrix}(plug\ in\ x=1,y=0)=\vec \beta_1+\vec 2\beta_2+\vec 7\beta_3=(\vec \beta_1,\vec\beta_2,\vec\beta_3)\begin{bmatrix}1\\2\\7\end{bmatrix}\\T\vec e_2=\begin{bmatrix}3\\5\\9\end{bmatrix}=(\vec \beta_1,\vec \beta_2,\vec\beta_3)\begin{bmatrix}3\\5\\7\end{bmatrix}\)

    Thus, \(T(\vec e_1,\vec e_2)=(\vec\beta_1,\vec\beta_2,\vec\beta_3)\begin{bmatrix}1&3\\2&5\\7&7\end{bmatrix}\)

  • Suppose \(D\in L(P_3(\R),P_2(\R)),Dp=p'\) find the matrix of D with respect to the standard basis of \(P_2(\R)\) and \(P_3(\R)\).

    Sol:

    \(A=M(D,(1,x,x^2,x^3),(1,x,x^2))\)

    Basis of \(P_3(\R):1,x,x^2,x^3\)

    Basisi of \(P_2(\R):1,x,x^2\)

    \(D1=0=(1,x,x^2)\begin{bmatrix}0\\0\\0\end{bmatrix}\\Dx=1=(1,x,x^2)\begin{bmatrix}1\\0\\0\end{bmatrix}\\Dx^2=2x=(1,x,x^2)\begin{bmatrix}0\\2\\0\end{bmatrix}\\Dx^3=3x^2=(1,x,x^2)\begin{bmatrix}0\\0\\3\end{bmatrix}\\ \implies D(1,x,x^2,x^3)=(1,x,x^2)\begin{bmatrix}0&1&0&0\\0&0&2&0\\0&0&0&3\end{bmatrix}\)

Assume for \(\alpha\in U\), under the basis \((\alpha_1,\alpha_2,...\alpha_n)\), the coordinate is \((x_1,x_2,...x_n)\), i.e. \(\alpha=\sum_{i=1}^nx_i\alpha_i\) (\(\alpha\) is a linear combination of \(\alpha_i\)), then \[ T\alpha=T(x_1\alpha_1+\cdots x_n\alpha_n)=x_1T\alpha_1+\cdots x_nT\alpha_n=T(\alpha_1,\alpha_2\cdots\alpha_n)\vec x=(\beta_1,\beta_2,...\beta_m)A\vec x \\ \implies T\alpha=(\beta_1,\beta_2,...\beta_m)A\vec x \]

Theorem

Suppose \(S,T\in L(V,W),\lambda\in F\), (basis are fixed) then

  • \(M(S+T)=M(S)+M(T)\)

  • \(M(\lambda S)=\lambda M(S)\) \[ A=M(T,\vec\alpha,\vec\beta)\\B=M(S,\vec\alpha,\vec\beta)\\ A+B=M(T+S,\vec\alpha,\vec\beta) \]

Suppose \(T\in L(U,V),S\in L(V,W)\) then

  • \(M(ST)=M(S)M(T)\) \[ A=M(T,\vec\alpha,\vec\beta) \\B=M(S,\vec\beta,\vec\gamma)\\B\cdot A=M(ST,\vec\alpha,\vec\gamma) \]

Invertibility

A linear transformation \(T\in L(V,W)\) is called invertible if there exists a linear map \(S\in L(W,V)\) such that \(ST\) equals the identity function on \(V\) and \(TS\) equals the identity function on \(W\). Then \(S\) is called an inverse of \(T\). If \(T\) is invertible, the fucntion \(S\) Is unique and is denoted by \(T^{-1}\).

Theorem: A linear transforamtion is invertible if and only if it is injective and surjective. (\(Null(T)=0,Range(T)=W\))

Example

  • The multiplication by \(x^2\) linear transforamtion from \(P(\R)\to P(\R)\) is not invertible becasue it is not surjective. (1 is not in the range)
  • The backward shif linear map from \(F^{\infty}\to F^{\infty}\) is not invertible because it is not injecticv. (\((1,0,...)\) is in the null space)

Isomorphic Vector Space

An isomorphism is an invertible linear transformation two vector spaces are called isomorphic if there is an isomorphism from one vector space onto the other one.

Theorem:

Two finite-dimensional vector spaces ovber \(F\) are isomorphic if and only if they have the same dimension

Theorem;

Suppose \(v_1...v_n\) is a basis of \(V\) and \(w_1,...w_m\) is a basis of \(W\). Then \(M\) is an isomorphism between \(L(V,W)\) and \(F^{m,n}\)

Note:

This means that for \(A=M(T,\vec\alpha,\vec\beta)\), M is a linear map such that \(M:T\to A\) and is injective and surjective

Proof:

  • If \(T\in L(V,M),M(T)=0\) (here 0 is matrix), then \(T\vec v_k=\vec 0,\forall \vec v_k\in V\), where \(\vec v_1,...\vec v_k\) is the basis of V.

    Thus, \(T\vec v=0,\forall v\in V\), then \(T=0,Null(M)=0\)

Theroem:

Suppose V and W are finte-dimensional. Then \(dimL(V,W)=dimV\cdot dimW\)

Theorem:

Let \(T\in L(V,W)\) and \(A=M(T,\vec\alpha,\vec\beta)\),then \(R(T)=dim(range(T))=Rank(A),Nullity T=dim(Null(T))=n-Rank(A)\)

Matrix Equivalent

Let \(T\in L(V,W)\). \(A=M(T,\vec\alpha_1,...\vec\alpha_n),(\vec\beta_1,...\vec\beta_m), B=M(T,(\vec\alpha_1',...\vec\alpha_n'),(\vec\beta_1',...\vec\beta_m'))\)

Then there are two invertible matrix \(P\in F^{mm},Q\in F^{nn}\) such that \[ (\vec\alpha_1',...\vec\alpha_n')=(\alpha_1,...\alpha_n)Q\\ (\vec\beta_1,...\vec\beta_m)=(\vec\beta_1',...\vec\beta_m')P \] Then this means that \[ T(\vec\alpha_1',...\vec\alpha_n')=T(\vec\alpha_1,...\vec\alpha_n)Q=(\vec\beta_1,...\vec\beta_m)AQ=(\vec\beta_1',...\vec\beta_m')PAQ\\B=PAQ \] i.e. The matrix of linear transformation T is equivalent

Example

For \(D:P_3\to P_2\)

Basis of \(P_3:1,x,x^2,x^3\)

Basis of \(P_2:1,x,x^2\)

\(D(1,x,x^2,x^3)=(1,x,x^2)\begin{bmatrix}0&1&0&0\\0&0&2&0\\0&0&0&3\end{bmatrix}\)

For another basis of \(P_3:1,x,x^2,x^3+x^2\), we have \(Q:(1,x,x^2,x^3+x^2)=(1,x,x^2,x^3)\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&1\\0&0&0&1\end{bmatrix}\)

For another baisi of \(P_2:1+x,x,x^2\), we have \(P:(1,x,x^2)=(1+x,x,x^2)\begin{bmatrix}1&0&0\\-1&1&0\\0&0&1\end{bmatrix}\)

If we directly write D with the alternative basis, \(D(1,x,x^2,x^2+x^3)=(0,1,2x,2x+3x^2)=(1+x,x,x^2)\begin{bmatrix}0&1&0&0\\0&-1&2&2\\0&0&0&3\end{bmatrix}\)

And then we can verify that \(B=PAQ\)

Theorem:

A linear transformation \(T\in L(V,W)\). \(A=M(T,\vec\alpha,\vec\beta)\) and \(rank(A)=r\), then there exists a basis \((\vec\alpha')\) in V and a basis \((\vec\beta')\) in W such that \[ T(\vec\alpha')=(\vec\beta')\begin{bmatrix}I_r&0\\0&0\end{bmatrix} \]

Operator

A linear transforamtion from a vector space to itself is called an operator. \(L(V)=L(V,V)\)

Theorem:

Suppose \(V\) is finite-dimensional and \(T\in L(V)\). Then the following are equivalent:

  • T is inveritible.
  • T is injective
  • T is surjective

Corollary:

Suppose \(V\) is finite-dimensional and \(T\in L(V)\). Then \(T\) is invertible/bijective if and only if \(T\) transforms the basis of \(V\) to the basis of \(V\). i.e. if \(v_1,...v_n\) is the basisi of \(V\) then \(Tv_1,...Tv_2\) is also the basis of \(V\).

Remark: Suppose \(V\) is finite-dimensional and \(T\in L(V)\). If T is invertible and A is a matrix under the basis \(\{\alpha_1,...\alpha_n\}\), i.e. \[ T(\alpha_1,...\alpha_n)=(\alpha_1,...\alpha_n)A \] then, the matrix of the invert operator \(T^{-1}\) under the same basis is \(A^{-1}\) \[ M(T,\vec\alpha,\vec\alpha)=A\\ M(T^{-1},\vec\alpha,\vec\alpha)=A^{-1} \]

Similar Matrix

\(A,B\in F^{n\times n}\), we say \(B\) is similar to \(A\) if there exist an invertible matrix \(P\in F^{n\times n}\) such that \[ B=P^{-1}AP \]

Theorem:

Let \(T\in L(V)\), then the matrices with different basisi are similar to each other.

Proof: \(V\in F^n\) is a vector space. \((\vec a_1,...\vec a_n)\) is a bassi of V, and \(T\) is an operator on V.

A is the matrix of T under basis \((\vec a_1,...\vec a_n)\) i.e. \(T(\vec a_1,...\vec a_n)=(\vec a_1,...\vec a_n)A\)

\((\vec b_1,...\vec b_n)\) Is another basis and B is the matrix under this basis \(T(\vec b_1,...\vec b_n)=(\vec b_1,...\vec b_n)B\)

Then we have an invertible matrix \(P\in F^{n\times n}\) such that \((\vec b_1,..\vec b_n)=(\vec a_1,...\vec a_n)P\)

Since \(T(\vec b_1,...\vec b_n)=T(\vec a_1,...\vec a_n)P=(\vec a_1,...\vec a_n)AP=(\vec b_1,...\vec b_n)P^{-1}AP\)

we have \(B=P^{-1}AP\)

Remark:

Matrix similarity is an equvalance

Functional

A linear functional on V is a linear map from V to F. In other words, a linear functional is an element of \(L(V,F)\)

Example

  • Define \(\phi:R^3\to R\) by \(\phi(x,y,z)=4x-5y+2z\). Then \(\phi\) is a linear functional on \(R^3\)
  • Fix \((c_1,...,c_n)^T\in F^n\) Define \(\phi:F^n\to F\) by \(\phi=((x_1,...x_n)^T)=\sum_{i=1}^n c_ix_i\)
  • Define \(\phi:P(R)\to R\) by \(\phi(p)=3p''(5)+7p(4)\). Then \(\phi\) is a linear functional on \(P(R)\)
  • Define \(\phi:P(R)\to R\) by \(\phi(p)=\int_0^1\phi dx\). Then \(\phi\) is a linear functional on \(P(R)\)

Dual Space

The dual space of \(V\), denoted \(V'\), is the vector space of all linear functional on V. In other words, \(V'=L(V,F)\)

Theorem:

Suppose V is a finite dimensional. The \(V'\) is also finite-dimensional and \(dimV'=dimV\)

Proof:

\(dim(L(V,W))=dimV\cdot dimW\)

\(dim(L(V,F))=dimV\cdot dimF\)

\(dimV'=dimV\)

Dual Basis

If \(\vec v_1,...\vec v_n\) is a basis of V, then the dual basis of \(\vec v_1,...\vec v_n\) is the list \(\vec\phi_1,...\vec\phi_n\) of the elements of \(V'\), where each \(\phi_j\) is the linear functional on V such that \[ \phi_j(\vec v_k)=\begin{cases}1&j=k\\0&j≠k\end{cases} \] Example

What is the dual basis of the standard basis \(\vec e_1,...\vec e_n\) of \(F^n\).

Sol:

For \(1\leq j\leq n\) , define \(\phi_j\) to be the linear functional on \(F^n\) and defined as \(\phi_j((x_1,x_2,...x_n)^T)=x_j\)

then \(\phi(\vec e_k)=\begin{cases}1&j=k\\0&j≠k\end{cases}\implies\phi_j(\vec e_j)\)

Theorem:

Suppose V is a finite-dimensional. Then the dual basis fo a basis of V is a basis of \(V'\)

Proof:

Suppose \(\vec v_1...\vec v_n\) is the basis of V. \(\vec\phi_1...\vec\phi_n\) is the dual basis.

Let \(\lambda_1\phi_1+\cdots+\lambda_n\phi_n=0\) (Here 0 is a zero map), then we have \(\forall v\in V,0(v)=0\in F\)

Then \(\forall v_j,j=1...n,\lambda_1\phi_1(v_j)+\lambda_2\phi_2(v_j)+\cdots\lambda_j\phi_j(v_j)+\cdots\lambda_n\phi_n(v_j)\)=0

Since only \(\phi_j(v_j)=1\), and \(\lambda_j\phi_j(v_j)=0\)

\(\lambda_j=0,j=1...n\), thus, \(\phi_j\) is linearly independent

Also \(dim(\phi_1...\phi_n)=n=dim(V')\)

Therefore, \(\phi_1...\phi_n\) is a basis of \(V'\)

Dual Map

If \(T\in L(V,W)\), then the dual map of \(T\) is the linear map \(T'\in L(W',V')\) defined by \(T'(\phi)=\phi\circ T\) for \(\phi\in W'\)

Remark:

  • \(\phi \in W'=L(W,F)\), different \(\phi\in W'\) as input, \(\phi\circ T\in V'\) as output, which means \(T'(\phi)\in V'\) i.e. \(T'(\phi):V\to F\)
  • If \(\phi,\psi\in W'\) then \(T'(\phi+\psi)=(\phi+\psi)\circ T=\phi\circ T+\psi\circ T=T'(\phi)+T'(\psi)\)
  • If \(\lambda\in F\) and \(\phi \in W'\), then \(T'(\lambda\phi)=\lambda\phi\circ T=\lambda(\phi\circ T)=\lambda T'(\phi)\)

Example

Define \(D:P(R)\to P(R)\) by \(Dp=p'\). \(\phi\) is linear functional on \(P(R)\)

Then if \(\phi(p)=p(3)\). Then \(D'(\phi)\) is the linear functional on \(P(R)\) given by \((D'(\phi))(p)=(\phi\circ D)(p)=\phi(D(p))=\phi(p')=p'(3)\)

Algebraic Properties of Dual Maps

  • \((S+T)'=S'+T,\forall S,T\in L(V,W)\)

  • \((\lambda T)'=\lambda T',\forall \lambda\in F,T\in L(V,W)\)

  • \((ST)'=T'S',\forall T\in L(U,V),S\in L(V,W)\)

    • \((ST)'(\phi)=\phi\circ(ST)=\phi\circ(S\circ T)=(\phi\circ S)\circ T=T'(\phi\circ S)=T'(S'(\phi))=(T'S')(\phi)\)

Theorem:

Suppose \(T\in L(V,W)\). then \(M(T')=(M(T))^T\) i.e.

\(T:V\to W\), where \(\vec\alpha_1...\vec\alpha_n\) is basis of \(V\) and \(\vec\beta_1..\vec\beta_m\) is the basis of W

\(T':W'\to V'\), where \(\vec\psi_1...\vec\psi_m\) is the basis of W' and \(\vec\phi_1...\vec\phi_n\) is the basis of V'

if \(A=M(T,(\vec\alpha_1...\vec\alpha_n),(\vec\beta_1...\vec\beta_m))\)

then \(A^T=M(T',(\vec\psi_1...\vec\psi_m),(\vec\phi_1...\vec\phi_n))\)

Annihilator

For \(U\sub V\), the annihilator of \(U\), denoted \(U^0\) is defined by \[ U^0=\{\phi\in V'|\phi(u)=0,\forall u\in U\} \] Note that here 0 is number.

Example

  • Suppose \(U\) is the subspace of \(P(R)\) consisting of all polynomial multiples of \(x^2\). IF \(\phi\) is the linear functional on \(P(R)\) defined by \(\phi(p)=p'(0)\), show that \(\phi\in U^0\)

    Sol:

    We have \(U=\{p(x)\in P(R)|p(x)=x^2g(x),g(x)\in p(x)\}\)

    we need to show that \(\forall p\in U, \phi(p)=0\)

    \(\phi(p)=p'(0)=(x^2g(x)')|_{x=0}=2xg(x)+x^2g'(x)|_{x=0}=0\)

  • Let \(e_1,e_2,e_3,e_4,e_5\) denote the standard basisi of \(R^5\), and let \(\phi_1,\phi_2,\phi_3,\phi_4,\phi_5\) denote the dual basis of $(R^5)'. Suppose \[ U=span(e_1,e_2) \] Show that \(U^0=span(\phi_3,\phi_4,\phi_5)\)

    Sol:

    1. we need to show \(span(\phi_3,\phi_4,\phi_5)\subseteq U^0\), then show that \(\forall\phi\in span(\phi_3,\phi_4,\phi_5),\phi\in U^0\)

      we have \(\phi=\lambda_3\phi_3+\lambda_4\phi_4+\lambda_5\phi_5\)

      \(\forall v\in U=span(e_1,e_2), v=\lambda_1e_1+\lambda_2e_2\)

      \(\phi(v)=\phi(\lambda_1e_1+\lambda_2e_2)=\lambda_1\phi(e_1)+\lambda_2\phi(e_2)=\lambda_1(\lambda_3\phi_3(e_1)+\lambda_4\phi_4(e_1)+\lambda_5\phi_5(e_1))+\lambda_2(\lambda_3\phi_3(e_1)+\lambda_4\phi_4(e_1)+\lambda_5\phi_5(e_1))\)

      Since \(\phi_3(e_1)=\phi_4(e_1)=\phi_5(e_1)=0\)

      \(\phi(v)=0\in U^0\)

    2. We need to show \(span(\phi_3,\phi_4,\phi_5)\subseteq U^0\), then show that \(\forall \phi\in U^0, \phi\in span(\phi_3,\phi_4,\phi_5)\)

      Since \(\phi\in U^0\subseteq (R^5)',\phi=\lambda_1\phi_1(e_1)+\lambda_2\phi_2(e_1)+\lambda_3\phi_3(e_1)+\lambda_4\phi_4(e_1)+\lambda_5\phi_5(e_1)\)

      Since only \(\phi_1(e_1)=1,\phi_2(e_2)=1\), but \(\phi(e_1)=0,\phi(e_2)=0\) since \(\phi\in U^0\) , we have \(\lambda_1=0,\lambda_2=0\)

      Thus, \(\phi=span(\phi_3,\phi_4,\phi_5)\)

      Then we will find that:

      \(V:e_1,e_2,e_3,e_4,e_5\)

      \(V':\phi_1,\phi_2,\phi_3,\phi_4,\phi_5\)

      \(e_1,e_2,\in U,\phi_1\phi_2\in U',\phi_3\phi_4\phi_5\in U^0\)

      \(dimV=dimU+dimU^0=2+3=5\)

    Theorem:

    Suppose V is finite-dimensional and U is subspace of V. Then \[ dimU+dimU^0=dimV=dimV' \] Note:

    \(U\subseteq V,U^0\subseteq V'\). They are in difference spaces.

    Theorem:

    Suppose \(U\subseteq V\). then \(U^0\) is a subsapce of \(V'\).

The Null Space of T'

For \(T'\in(L(W',V'))\)\[ Null(T')=\{\phi\in W'.T'(\phi)=0(zero\ map)\}=\{\phi\in W',\forall v\in V,\phi(T(v))=0\}\subseteq W' \]

Theorem:

Suppose V and W are finite-dimensional and \(T\in L(V,W)\). Then,

  • \(NullT'=(RangeT)^0\)
  • \(dim(NullT')=dim((RangeT)^0)=dimW-dim(RangeT)=dim(NullT)+dimW-dimV\)

image-20211201081836911

Theorem:

Suppose V and W are finite-diemensional and \(T\in L(V,W)\).Then \(T\) is surjective if and only if \(T'\) is injective.

The Range Space of T'

For \(T'\in L(V,W)\) \[ Range(T')=\{T'(\phi)|\forall \phi\in W'\}\subseteq V' \] 截屏2021-12-01 上午8.22.05

Theorem:

Suppose \(V\) and \(W\) are finite-dimensional and \(T\in L(V,W)\). Then,

  • \(dimRangeT'=dimRangeT\)
  • \(RangT'=(NullT^0)\)

Theorem:

Suppose V and W are finite-dimensional and \(T\in L(V,W)\).Then \(T\) is injective if and only if \(T'\) is surjectiveIMG_8EC4C15E9609-1

Rank and Nullity

截屏2021-12-01 上午10.16.20

3.5 Norm and Vector Space

Norm Definition

A norm on vector space \(V\) is a real-valued function \(\Vert\cdot\rVert:V\to \R\) that satisfies the following properties:

  • Nonnegativity: \(\lVert v\rVert\geq 0,\forall v\in V\), equality holds if and only if \(\vec v=\vec 0\)
  • Homogeneity: \(\lVert s\vec v\rVert=|s|\lVert\vec v\rVert,\forall \vec v\in V,s\in F\)
  • Triangular Inequality: \(\lVert\vec v+\vec w\rVert\leq\lVert\vec v\rVert+\lVert\vec w\rVert,\forall \vec v,\vec w\in V\)

Remark:

  • Avector space V with norm \(\lVert\cdot\rVert:V\to \R\) is called normed vector space.
  • The concept of norm is closely related to that of a matrix. For instance, a metric can be defined from any norm. Let \(\lVert\vec v\rVert\) be a norm on vector space V, then \(d(v,w)=\lVert \vec v-\vec w\rVert\) is the mertric(distance ) induced by the norm.

Example

  • Standard norms for \(\R^n\) and \(\C^n\)

    • \(l^1\) norm: \(\lVert\vec v \rVert_1=\sum_{i=1}^n|\vec v_i|\)
    • \(l^p\) norm: \(\lVert\vec v \rVert_p=(\sum_{i=1}^n|\vec v_i|^p)^{\frac{1}{p}}\)
    • \(l^{\infty}\) norm: \(\lVert\vec v \rVert_{\infty}=max_{i=1...n}\{\vec v_i\}\)
  • Standard norms for the vector space of function from \([a,b]\) to \(\R\)

    • \(L^1\) norm: \(\lVert f(t)\rVert_1=\int_a^b|f(t)|dt\)
    • \(L^p\) norm: \(\lVert f(t)\rVert_p=(\int_a^b|f(t)|^pdt)^{\frac{1}{p}}\)
    • \(L^\infty\) norm: \(\lVert f(t)\rVert_\infty=essup_{[a,b]}\{|f(t)|\}\)

    Note that here the integral notation refers to the Lebesgue integral instead of the Riemann integral

  • Consider vectors in \(\R^n\) with the Euclidean metric \(d(v,w)=\sqrt{\sum_{i=1}^n(v_i-w_i)^2}\). Let \(\bar d(v,w)=min\{d(v,w),1\}\) and define \(f(v)=\bar d(v,0)\). Is \(f\) a norm?

    Sol:

    We know that \(f(v)=min(d(v,0),1)\)

    Checking homogeneity to see if \(f(\lambda v)=|\lambda|f(v)\)

    \(f(\lambda v)=min(d(\lambda v,0),1)=min(|\lambda|d(v,0),1)\) (This is becasue the homogeneity is satisfied for \(d(v,w)\))

    \(\lambda f(v)=\lambda min(d(v,0),1)\)

    Let \(\vec v=\vec e,\lambda=2\) then \(\begin{cases}f(2\vec e)=min(2d(\vec e,0),1)=1\\2f(\vec e)=2\end{cases}\), meaning that \(f\) is not a norm.

Complete Normed Spaces

A vector \(\vec v\in V\) is said to be normalized if \(\Vert\vec v\rVert=1\). Any nonzero vector could be normalized by \[ \vec u=\frac{\vec v}{\lVert \vec v\rVert} \] has norm \(\lVert \vec u\rVert=1\). A normalized vector is also referred to as a unit vector.

3.6 Operator Norms

Supermum: least upper bound, \(supX\).

  • The smallest extended real number \(M\in \R\cup\{-\infty,\infty\}\) such that \(x\leq M,\forall x\in X\).
  • It is always well-defined and equals \(-\infty\) if \(X=\empty\)

Maximum: the largest value achieved by the set.

  • It equals \(supX\) If \(supX\in X\)

Infimum: greatest lower bound, \(infX\)

  • The largest extended real number \(m\in\R\cup\{-\infty,\infty\}\) such that \(x\geq M,\forall x\in X\)
  • It is always well-defiend and equals \(\infty\) if \(X=\empty\)

Minimum: the smallest value achieved by the set.

  • It equals \(infX\) if \(infX\in X\)

Example

  • \(E_1=(1,\frac{1}{2},\frac{1}{3},...\frac{1}{n},...),\begin{cases}maxE_1=1\\minE_1=NA\\infE_1=0\\supE_1=1\end{cases}\)
  • \(E_4=\{x\in\R|0<x\leq 1\},\begin{cases}maxE_4=1\\minE_4=NA\\infE_4=0\\supE_4=1\end{cases}\)

Induced Operator Norm

Let \(V\) and \(W\) be two normed vector spaces and let \(T:V\to W\) be a linear transformation. The induced operator norm of \(T\) is defiend as \[ \lVert T\rVert=sup_{v\in V-\{0\}}\frac{\lVert Tv\rVert}{\lVert v\rVert}=sup_{v\in V,\lVert v\rVert=1}\lVert T v\rVert=sup_{\lVert u\rVert=1}\lVert Tu\rVert_w \]

Proof:

\(\lVert T\rVert=sup_{v\in V-\{0\}}\frac{\lVert Tv\rVert}{\lVert v\rVert}=sup_{v\in V-\{0\}}\lVert T\frac{v}{\lVert v\rVert_V}\rVert_W\)

Let \(u=\frac{v}{\lVert v\rVert},\lVert u \rVert=1\), we have \(Tu=\frac{Tv}{\lVert v\rVert_V}\)

Remark:

Induced operator norm inequation: \[ \lVert Tu\rVert\leq\lVert T\rVert\lVert u\rVert \]

For the space $L(V,V) $ of linear operators on \(V\), a norm is called submultiplicative if \(\lVert TU\rVert\leq\lVert T\rVert\lVert U\rVert,\forall T,U\in L(V,V)\)

Remark:

The induced operator norm inequality shows that all induced operator norms are submultiplicative beacuse \[ \lVert UTu\rVert\leq \lVert U\rVert\lVert Tu\rVert\leq\lVert U\rVert\lVert T\rVert\lVert u\rVert \]

Because for any linear functional \(\varphi\), we have \(\operatorname{dim}\operatorname{range}\varphi \leq \operatorname{dim}\mathbb{F} = 1\)

Matrix Norm

A norm of a vector space of matrices is called a matrix norm.

For \(A\in F^{m\times n}\), the maxtrix norm induced by the \(l^p\) vector norm \(\lVert\cdot\rVert_p\) is \[ \lVert A\rVert_p=max_{\lVert\vec v\rVert_p}\lVert A\vec v \rVert_p \] In special cases:

  • \(\lVert A\rVert_\infty=max_{\lVert\vec v\rVert_\infty}\lVert A\vec v \rVert_\infty=max_i\sum_j|a_{ij}|\) (Max row sum)
  • \(\lVert A\rVert_1=max_{\lVert\vec v\rVert_1}\lVert A\vec v \rVert_1=max_j\sum_i|a_{ij}|\) (Max column sum)
  • \(\lVert A\rVert_2=\sqrt{\rho(A^H A)}\), where \(\rho(B)\) is the maximum absolute value of all eigenvalues
  • \(\lVert A\rVert_F=(\sum_i^n\sum_j^m|a_{ij}|^2)^{\frac{1}{2}}\) (Frobenius norm / Euclidean norm)

Example

\(A=\begin{bmatrix}1&1\\2&1\end{bmatrix}\), then \(A^TA=\begin{bmatrix}5&3\\3&2\end{bmatrix}\) and the eigenvalues of this matrix are \(\sigma_1^2=6.8541,\sigma_2^2=0.1459\)

  • \(\lVert A\rVert_\infty=max(2,3)=3\)
  • \(\lVert A\rVert_1=max(3,2)=3\)
  • \(\lVert A\rVert_2=\sqrt{max(\sigma_1^2,\sigma_2^2)}\)
  • \(\lVert A\rVert_F=\sqrt{1^2+1^2+2^2+1^2}=\sqrt 7=\sqrt{\sigma_1^2+\sigma_2^2}\)

Coordinate System

Let \(\{\vec v_1,\vec v_2...\vec v_n\}\) be a basis for the n dimensional vector space V, then for any vector \(\vec w\in V\) can be expressed uniquely as \[ \vec w=\sum_{i=1}^ns_i\vec v_i \] Ordering this set \((v_1,v_2,...v_n)\) allows the first element in the coordinate vector to be associated with the first vector in our basis and so on.

If \(V\) is a finite-dimensional vector space, an ordered basis for V is a finite set of vectors that is linearly independent and spans V.

The ordered basis \(B\), denoted by \((\vec v_1,...\vec v_n)\) defines the set and a spcific ordering of the vectors. Based on this ordered basis, a vector \(v\in V\) can be unambiguously represneted as an \(n-tuple(s_1,...,s_n)\in F^n\) such that \[ \vec v=\sum_n^i s_i\vec v_i \] For a finite-dimensional vector space V with ordered space \(B=(\vec v_1,...\vec v_n)\), the coordinate vector of \(\vec v\in V\) is denoted by \([\vec v]_B\) and equals the unique vector \(\vec s=F^n\) such that \[ \vec v=\sum_i^n s_i\vec v_i \] Example

Suppose that \(A=\vec v_1,...\vec v_n\) is an ordered basis for V. Let \(P\) be an \(n\times n\) invertible matrix. Show that there exists an ordered basisi \(B=\vec w_1,...\vec w_n\) for \(V\) such that \([u]_A=[u]_B\)

Sol:

Let \((\vec w_1,...\vec w_n)=(\vec v_1,...\vec v_n)P\)

then we have \(\vec u=(\vec v_1,...\vec v_n)[\vec u]_A\\=(\vec w_1,...\vec w_n)P^{-1}[\vec u]_A\)

Since \(\vec u=(\vec w_1,...\vec w_n)[\vec u]_B\)

We have \([\vec u]_B=P^{-1}[\vec u]_A\)

Which means \([\vec u]_A=P[\vec u]_B\)

More spcifically:

\(A=(1,x,x^2),B=(1,x+x^2),x^2\)

We have \((1,x+x^2,x^2)=(1,x,x^2)\begin{bmatrix}1&0&0\\0&1&0\\0&1&1\end{bmatrix}\)

Suppose \(\vec v=1+3x+5x^2=(1,x,x^2)(1,3,5)^T,[\vec v]_A=(1,3,5)^T\)

we have \(\vec v=(1,x+x^2,x^2)(1,3,2)^T, [\vec v]_B=(1,3,2)^T\)

then we can verify that \([\vec v]_A=P[\vec v]_B\)

3.7 Inner Products

Definition

Let \(F\) be the field of real number or the field of complex numbers, and assume \(V\) is a vector space over \(F\). An inner product on \(V\) is a function which assigns to each ordered pair of vectors \(\vec v,\vec w\in V\) a saclar \(<\vec v,\vec w>\in F\) in such a way that for all \(\vec u,\vec v,\vec w\in V,\forall s\in F\):

  • \(<\vec v,\vec v>\geq0,\forall \vec v\in V\) with equality iff \(\vec v=0\)
  • \(<\vec u,\vec v,\vec w>=<\vec u,\vec w>+<\vec v,\vec w>\)
  • \(<\lambda \vec v,\vec w>=\lambda<\vec v,\vec w>\)
  • \(<\vec v,\vec w>=\overline{<\vec w,\vec v>}\)

Remark:

  • \(<\lambda \vec v_1 + \vec v_2, \vec w>=\lambda<\vec v_1,\vec w>+<\vec v_2,\vec w>\)

  • \(<\vec v,s\vec w_1+\vec w_2>=\overline{s}<\vec v,\vec w_1>+<\vec v,\vec w_2>\)

    \(<u,v+w>=\overline{<v+w,u>}=\overline{<v,u>+<w,u>}=\overline{<v,u>}+\overline{<w,u>}=<u,v>+<u,w>\)

    \(<v_1,sv_2>=\overline{s}<v_1,v_2>\)

Example

  • The Eucliden inner product on \(F^n\) is defined by \[ <(w_1,...w_n)^T,(z_1,...z_n)^T>=w_1\overline{z_1}+\cdots w_n\overline{z_n} \] For column vector, \(<v,w>=w^H\cdot v\)

  • The inner product on a function space is: let V be the vector space os all continuous complex-values functions on the unit interval \([0,1]\). Then the followubng denotes an inner product \[ <f,g>=\int_0^1 f\overline{g}dx \]

    Proof:

    other proves are simple, note this one:

    \(<\overline{g,f}>=\overline{\int_0^1 g\bar{f}dx}=\int_0^1 f\overline{g}dx=<f,g>\)

Orthogonal

Two vectors \(\vec u.\vec v\in V\) are called orthogonal if \(<\vec u,\vec v>=0\)

Remark:

  • \(\vec 0\) is orthogonal to every vector in V
  • \(\vec 0\) is the only vector in V that is orthogonal to itself

Inner Products and Matrix

Theorem:

Let V be a finite-dimensional vector space, and suppose that \(B=w_1,...w_n\) is an ordered basis for V. Any inner product on V is determiend by the values \(G_{ij}=<w_j,w_i>\) that is takes on pairs of vectors in B.

Proof:

Let \(\vec u,\vec v\in V, \vec u=\sum_{j=1}^Ns_j\vec w_j,\vec v=\sum_{i=1}^n t_i\vec w_i,[\vec u]_A=\begin{bmatrix}s_1\\ \vdots\\s_n\end{bmatrix},[\vec v]_A=\begin{bmatrix}t_1\\\vdots\\t_n\end{bmatrix}\)

\(<\vec u,\vec v>=<\sum_{i=1}^ns_jw_j,\sum_{i=1}^nt_iw_i>\\=\sum_{j=1}^n s_j<w_j,\sum_{j=1}^nt_iw_i>\\=\sum_{i=1}^n\sum_{j=1}^ns_j \bar{t_i}<w_j,w_i>\\=(\bar t_1,...\bar t_n)G\begin{bmatrix}s_1\\\vdots\\s_n\end{bmatrix}\\=[\vec v]_A^H G[\vec u]_A\)

Remark:

  • The matrix G is called the weight (Gram) matrix of the inner product in the ordered basis B.
  • \(G=G^H\Longleftrightarrow G_{ij}=<w_i,w_j>=\overline{<w_j,w_i>}=\overline{G_{ji}}\)
  • For any \(\vec x\in F^n,\vec x^HG\vec x\geq 0\), with equality iff \(\vec x=0\) (positive defiend)

Theorem: If \(S\in F^{n,n}\) is a positive defined, \(A=(\alpha_1,...\alpha_n)\) is a basis of vector space V, for any \(\alpha,\beta\in V\) with coordinates x,y,under basis \((\alpha_1,...\alpha_n)\), then the dunction \(<\alpha,\beta>\) defined as \(<\alpha,\beta>=y^HGx\) is a inner product,a nd under this basis the Grame matrix is S. \[ f(\alpha,\beta)=[\beta]^H_AG[\alpha]_A \] Remark:

For finte-dimensional vector space \(V^n\) and basis \((\alpha_1,...\alpha_n)\), we define a function \(\sigma:<\alpha,\beta>\to G\). then the previous throem show that this function is a injection and surjection.

Induced Norm

Let V be an inner-product space with inner product \(<\cdot>\). This inner product natrually denotes the induced norm \[ \lVert u\rVert^2=<u,u> \] Projection:

Let \(w,v\) be vectors in an inner-product space V with inner product \(<\cdot>\). The projection of \(w\) onto \(v\) is defined to be (projection of \(\vec w\) on \(\vec v\) with direction \(\vec v\)) \[ u=\frac{<w,v>}{\lVert v\rVert}(norm/length)\frac{v}{\lVert v\rVert}(direction) \]

Lemma:

Let \(u\) be the projection of \(w\) onto \(v\). Then \(<w-u,u>=0\) and \(\lVert w\rVert^2=\lVert u\rVert^2+\lVert w-u\rVert^2\)

Proof:

\(\lVert w\rVert=<w,w>=<u+(w-u),u+(w-u)>=<u,u>+<w-u,u>+<u,w-u>+<w-u,w-u>=\lVert u\rVert^2+\lVert w-u\rVert^2\)

Cauchy-Schwarz Inqeuality:

Suppose \(\vec u,\vec v\in V\), then \(|<\vec u,\vec v>|\leq \lVert\vec u\rVert\lVert\vec v\rVert,|<\vec u,\vec v>|^2\leq <\vec u,\vec u><\vec v,\vec v>\). This inequality is an equality if and only if \(\vec u=\lambda\vec v\) for some \(\lambda \in F\)

Proof:

if \(\vec v=0,or\ \vec u=0\), then it is true

if \(\vec v\not=0,\vec u\not=0\), then we have, by Orthogonal Decomposition,

\(\vec u=\frac{<\vec u,\vec v>}{\lVert \vec v\rVert^2}\vec v+\vec w\), where \(\vec w\) is the residual, \(<\vec v,\vec w>=0\)

Orthogonal Decomposition:

Suppose \(\vec w,\vec v\in V\), with \(\vec v\not=0,\) then \[ \vec w=c\vec v(projection)+\vec u(residual),with\ c=\frac{<\vec w,\vec v>}{\lVert \vec v\rVert^2},and\ <\vec u,\vec v>=0 \]

then \(\lvert \vec u\rVert^2=\lVert \frac{<\vec u,\vec v>}{\lVert\vec v\rVert^2}\vec v\rVert^2+\lVert\vec w\rVert^2\\=\frac{|<\vec u,\vec v>|^2}{\lVert \vec v\rVert^2}+\lVert\vec w\rVert^2\geq \frac{|<\vec u,\vec v>|^2}{\lVert \vec v\rVert^2}\), by Pythagorean Theorem

Pythagorean theorem:

Suppose \(\vec u,\vec v\) are orthogonal vectors in V. Then \[ \lVert\vec u+\vec v\rVert^2=\lVert\vec u\rVert^2+\lVert\vec v\lVert^2 \]

Then, \(|<\vec u,\vec v>|^2\leq\lVert\vec u\rVert^2\cdot\lVert\vec v\rVert^2\implies|<\vec u,\vec v>|\leq\lVert\vec u\rVert\cdot\lVert\vec v\rVert\)

The equality is true when \(\lVert\vec w\rVert^2=0\)

Example:

  • If \((x_1,x_2...,x_n),(y_1,...y_n)\in R^n\) then \[ |<\vec x,\vec y>|^2=|x_1y_1+...+x_ny_n|^2\leq(x_1^2+...x_n^2)\cdot(y_1^2+...+y_n^2) \]

  • If \(f\) and \(g\) are continous real-valuesd functions on \([-1,1]\) then \[ |<f,g>|^2=|\int_{-1}^1fgdx|^2\leq|\int_{-1}^1f^2dx||\int_{-1}^1g^2dx|=<f,f>\cdot<g,g>=\lVert f\rVert^2\lVert g\rVert^2 \]

Inner Product and Norm

If \(V\) is an inner-product space over \(F\) and \(\lVert\vec v \rVert=\sqrt{<\vec v,\vec v>}\), then for any \(\vec v,\vec w\in V\) and any \(s\in F\)

  • \(\lVert s\vec v\rVert=|s|\lVert\vec v\rVert\)
  • \(\lVert\vec v \rVert\geq 0,for\ \vec v≠0\)
  • \(|<\vec v,\vec w>|\leq \lVert\vec v\rVert\lVert\vec w\rVert\) with equality iff \(\vec v=0,\vec w=0,or\ \vec v=s\vec w\)
  • \(\lVert\vec v+\vec w\rVert\leq \lVert\vec v\rVert+\lVert\vec w\rVert\) with equality iff \(\vec v=0,\vec w=0,or\ \vec v=s\vec w\)

3.8 Sets of Orthogonal Vectors

Orthonormal

A list of vectors is called orthonormal if each vectors in the list has norm 1 and is orthogonal to all the other vectors in the list. In other words, a list of \(e_1,...e_m\) vectors is orthonomal if \[ <e_i,e_j>=\begin{cases}1&if\ i=j\\0&if\ i≠j\end{cases} \]

Properties:

If \(e_1,...e_m\) is an orthonomal list of vectors in V, then

  • \(\lVert a_1\vec e_1+\cdots+a_m\vec e_m\rVert^2=|a_1|^2+\cdots+|a_m|^2\)

    \(\lVert a_1\vec e_1+\cdots+a_m\vec e_m\rVert^2\\=<a_1e_1+...+a_me_m,a_1e_1+...+a_me_m>\\=<a_1e_1,a_1e_1>+...+<a_me_m,a_me_m>&(<e_i,e_j>=\begin{cases}1&if\ i=j\\0&if\ i≠j\end{cases})\\=\lVert a_1e_1\rVert^2+...+\lVert a_me_m\rVert^2\\=|a_1|^2+...+|a_m|^2&(\lVert e_i\rVert^2=0)\)

  • Orthonomal list is linearly independent

    Suppose there is \(\lVert a_1e_1+...+a_me_m\rVert^2=|0|^2=0\)

    which means \(|a_1|^2+...+|a_m|^2=0\implies \exist a_j=0\implies linearly\ independent\)

Orthonormal Basis

An orthonormal basis of \(V\) is an orthonormal list of vectors in V that is also a basis of V.

Remark:

Orthonomal basis shares two properties:

  • orthonormal \(\implies <e_i,e_j>=\begin{cases}1&if\ i=j\\0&if\ i≠j\end{cases}\)
  • Basis \(\implies\) linearly independent and spans the vector space

Theorem:

Every orthonormal list of vectors in \(V\) with length \(dim V\) is an orthonormal basis of V.

Example

  • Standard orthonormal basis of \(R^4:(1,0,0,0)^T,(0,1,0,0)^T,(0,0,0,1)^T,(0,0,0,1)^T\)

  • Let V be the vector space over \(\C\) of continous complex-valued functions on the interval \(0\leq x\leq 1\) with the inner product \[ <f,g>=\int_0^1f\bar gdx \] Let \(f_n=\sqrt{2}cos2\pi nx,g_n=\sqrt{2}sin 2\pi nx\). Then \(\{1,f_1,g_1,f_1,g_2\cdots\}\) is a countable infinte orthonormal set that is a Scauder basis for this vector space.

    Proof: \(<g_n,f_n>=\int_0^1 \sqrt{2} sin(2n\pi x)\sqrt 2 cos(2n\pi x)dx=0 \ \ (Integration\ by\ part)\)

    \(<f_n,f_n>=\int_0^1\sqrt 2 cos(2n\pi x)\sqrt 2cos(2n\pi x)dx=1\)

Theorem:

Suppose \(\vec e_1,...\vec e_n\) is an orthonormal basis of V and \(\vec v\in V\). Then \[ v=<v,e_1>e_1+<v,e_2>e_2+\cdots+<v,e_n>e_n\\and\ \lVert v\rVert^2=|<v,e_1>|^2+|<v,e_2>|^2+\cdots+|<v,e_n>|^2 \] Proof:

Since \(e_1,...e_n\) are basis, we have

\(v=\sum_{i=1}^n\lambda_i e_i\)

we need to figure out that \(\lambda\) is

\(<v,e_j>=<\lambda_1 e_1+...+\lambda_n e_n,e_j> \ (plug\ in\ \vec v)\\\lambda_1<e_1,e_j>+...+\lambda_n<e_n,e_j>\ (Distribution\ law)\\=\lambda_j<e_j,e_j>\ (<e_i,e_j>=\begin{cases}1&if\ i=j\\0&if\ i≠j\end{cases})\)

Thus, \(\lambda_j=<\vec v,\vec e_j>\)

Gram-Schmidt orthogonalization process:

Let \(V\) be an inner-product space and assume \(\vec v_1,...\vec v_n\) are ;inearly independent vectors in \(V\). Then it is possible to construct an orthogonal sequence of vectors \(\vec w_1,...\vec w_n\in V\) such that for each \(k=1,...,n\) then set \(\{\vec w_1,...\vec w_n\}\) is a basisi of the subspace spanned by \(\vec v_1,...\vec v_k\)

Gram-Schmidt Orthogonalization - ppt download

Corollary

every finite-dimensional inner-production space has a basis of orthonormal vector.

Remark: Let \(V\) be an inner-product space. Then every orthonrmal list of vectors in V can be extended to an orthonormal basis of V

Example

Find an orthonormal basisi of \(P_2(\R)\), where the inner product is given by \(<p,q>=\int_{-1}^1p(x)q(x)dx\).

Sol: The standard basis of \(P_2(\R)\) is \(\vec v_1=1,\vec v_2=x,\vec v_3=x^2\)

Then we use the Gram-Schmidt process to get the orthonormal:

\(v_1=1,\beta_1=v_1=1,\vec e_1=\frac{\beta}{\lVert\beta\rVert}=\frac{1}{\sqrt{\int_{-1}^1 1^2dx}}=\frac{\sqrt{2}}{2}\)

\(v_2=x,\beta_2=v_2-<v_2,\vec e_1>\vec e_1=x-\int^1_{-1}x\cdot\frac{\sqrt 2}{2}dx\cdot \frac{\sqrt 2}{2}=x\\\vec e_2=\frac{\beta_2}{\lVert\beta_2\rVert}=\frac{x}{\int_{-1}^1x^2 dx}=\sqrt{\frac{3}{2}}x \\\dots\)

Orthogonal Complment

Let \(V\) be an inner-product space and \(W\) be any set of vectors in \(V\). Then, the orthogonal complement of \(W\), denoted \(W^{\bot}\) is the set of all vectors in V that are orthogonal to every vector in W \[ W^{\bot}=\{v\in V|<v.w>=0,\forall w\in W\} \]

Remark:

Let \(W\) be any subset of vector space \(V\). Then \(W^{\bot}\) is a closed subspace of \(V\) and that any vector in the subspace spanned by \(W\) is orthogonal to any vector in \(W^{\bot}\)

  • \(\{\vec 0\}^{\bot}=V \ and\ V^{\bot}=\{\vec 0\}\)

  • If \(U\) is a subset of V then \(U\cap U^{\bot}\subseteq \{\vec 0\}\)

  • If \(U\) and \(W\) are subset of \(V\) And \(U\subseteq W\), then \(W^{\bot}\subseteq U^{\bot}\)

    Example

    For \(\R^3\)

    \(W=span\{\vec e_1,\vec e_2\}\to W^{\bot}=span\{\vec e_3\}\)

    \(U=span\{\vec e_2\}\to U^{\bot}=span\{\vec e_1,\vec e_3\}\)

Theorem:

Suppose \(U\) is a finite-dimensional subspace of \(V\). Then \(V=U\oplus U^{\bot}\)

Proof: Let \(\vec e_1,...\vec e_m\) be an orthonormal basisi of \(U\), then it can be extended to be an orthonomal basis of \(V\), \(\vec e_1...\vec e_m,\vec e_{m+1}...\vec e_n\)

Then we need to show \(\vec e_{m+1}...\vec e_n\) is the basisi of \(U^{\bot}\)

  • since orthonormal \(<e_j,e_k>=0,j> m,k\leq m\implies \vec e_j\in U^{\bot}\)
  • \(\forall \vec v\in U^{\bot}, \vec v\in V\implies \vec v=\lambda_1 e_1+...+\lambda_m e_m+\lambda_{m+1}e_{m+1}+...+\lambda_n e_n\), since orthonormal, \(\lambda_i=<\vec v,e_i>=0,i=1...m\implies \vec v=\lambda_{m+1}+...\lambda_me_n\)

Thus, \(U^{\bot} \cap U=\{0\}\implies V=U\oplus U^{\bot}\)

This means we have \[ \vec v=<\vec v,\vec e_1>\vec e_1+...+<\vec v,\vec e_m>\vec e_m+<\vec v,\vec e_{m+1}>\vec e_{m+1}+...+<\vec v,\vec e_n>\vec e_n=\vec u+\vec w,\vec u\in U,\vec w\in U^{\bot} \]

Corollary:

Suppose \(V\) is finite-dimenmsional and \(U\) ia subspace of \(V\), then \[ dimV=dimU+dimU^{\bot} \]

Theorem:

Suppose \(U\) is a finite-dimensional subspace of \(V\). Then \[ (U^{\bot})^{\bot}=U \]

Remark:

Suppose \(U\subseteq V\) ia a finite-dimensional subspcae

  • \(V=U\oplus W_1=U\oplus W_2\), since the decomposition is not unique

  • However, \(V=U\oplus U^{\bot}\) is unique, i.e. if \(\vec u\) is fixed, there is a unique \(\vec w\) such that \(\vec v=\vec u+\vec w\)

    Proof:

    We need to show that if \(V=U\oplus T_1=U\oplus T_2\), where \(T_1, T_2\) are both orthogonal to \(U\), then $ T_1=T_2$

    Suppose we have \(V=U\oplus T_2\), let \(\vec t_1\in T_1\subseteq V\), then there is \(\vec u\in U,\vec t_2\in T_2\) such that \(\vec t_1=\vec u+\vec t_2\)

    then we have \(<\vec t_1,\vec u>=<\vec u,\vec u>+<\vec t_2,\vec u>=0\\0=<\vec u,\vec u>\Longleftrightarrow \vec u=0\) since orhtogonal

    Thus, \(\vec t_1=\vec t_2\in T_2,T_1\subseteq T_2\)

    Similarly \(T_2\subseteq T_1\)

Orthogonal Projection

Suppose \(V\) is finite-dimensional vector space, and \(U\) is a subspace of \(V\).

Then there is orthogonal decomposition \[ V=U\oplus U^{\bot} \]

  • \(\forall \vec v\in V, \vec v=\vec u+\vec w, where\ \vec u\in U,\vec w\in W, and\ <\vec u,\vec w>=0\)
  • Both \(U\) and \(U^{\bot}\) are close, and \((U^{\bot})^{\bot}=U\)

The orthogonal projection of \(V\) onto \(U\) is the operator \(P_U\in L(V)\) defined as follows (map \(\vec v\in V\to \vec u\in U\)): \[ For\ \vec v\in V, \vec v=\vec u+\vec w, where\ \vec u\in U,\vec w\in U^{\bot}, then \ P{_U\vec v=\vec u} \] image-20211212040430853

Example

Suppose \(\vec x\in V\) with \(\vec x≠\vec 0\) and \(U=span(\vec x)\). Show that \[ P_U\vec v=\frac{<\vec v,\vec x>}{\lVert\vec x\rVert^2}\vec x,\forall \vec v\in V \]

Sol:

By orthogonal decomposition, \(\vec v\) can be written as

\(\vec v=\frac{<\vec v,\vec x>}{\lVert \vec x\rVert^2}\vec x(projection\in U=span(\vec x))+(\vec v-\frac{<\vec v,\vec x>}{\lVert \vec x\rVert^2}\vec x)(residual\in U^{\bot})\)

since \(<\frac{<\vec v,\vec x>}{\lVert \vec x\rVert^2}\vec x,\vec v-\frac{<\vec v,\vec x>}{\lVert \vec x\rVert^2}\vec x>=0\)

\(P_U\vec v=\frac{<\vec v,\vec x>}{\lVert \vec x\rVert^2}\vec x\)

Orthogonal Projection Properties

Suppose \(U\) is a finite-dimensional subspace of \(V\) and \(\vec v\in V\). Then

  • \(P_U\in L(V)\)

  • For every orthonormal basis \(\vec e_1,...\vec e_m\) of \(U\), \(P_U\vec v=<\vec v,\vec e_1>\vec e_1+\cdots+<\vec v,\vec e_m>\vec e_m\)

    \(\forall \vec v\in V,\vec v=\vec u+\vec w,\vec u\in U,\vec w\in U^{\bot}\)

    Since \(\vec e_1,...\vec e_m\) is the orthonomal basis of \(U\), we have

    \(\vec u=\lambda_1\vec e_1+...+\lambda_m\vec e_m\), then we need to find \(\lambda\)

    \(<\vec v,\vec e_j>=<\vec u+\vec w,e_j>=<\vec u,\vec e_j>\ (since\ for\ \vec w\in U^{\bot},\vec e_j\in U,<\vec w,\vec e_j>=0,j=1...m)=\lambda_j<\vec e_j,\vec e_j>=\lambda_j\\\implies\lambda_j=<\vec v,\vec e_j>, \vec u=P_U\vec v=<\vec v,\vec e_1>\vec e_1+\cdots+<\vec v,\vec e_m>\vec e_m\)

  • \(P_U\vec u=\vec u,\forall \vec u\in U\)

    No residual

  • \(P_U\vec w=0,\forall \vec w\in U^{\bot}\)

    No projection

  • \(Range(P_U)=U\)

    \(P_U\vec v\in U\implies Range(P_U)\subseteq U\\\forall \vec u\in U, P_U\vec u=\vec u\implies U\subseteq Range(P_U)\)\(\implies Range(P_U)=U\)

  • \(NullP_u=U^{\bot}\)

    \(NullP_U=\{\vec v|P_U(\vec v)=0\}=U^{\bot}\)

  • \(\vec v-P_U\vec v\in U^{\bot}\)

    \(\forall \vec v\in V, \vec v=\vec u+\vec w=P_U\vec v+(\vec v-P_U\vec v)\ (Orthogonal \ decomposition)\implies <P_U\vec v,\vec w>=<P_U\vec v,\vec v-P_U\vec v>=0\)

  • \((P_U)^2=P_U\)

    \((P_U)^2=P_U\circ(P_U)\implies (P_U)^2(\vec v)=P_U(P_U(\vec v))=P_U(\vec u)=\vec u=P_U\vec v\)

  • \(\lVert P_U\vec v\rVert\leq\lVert\vec v\rVert\)

Example

\(V=F^4, W=span(\vec u_1,\vec u_2),where\ \vec u_1=(1,1,-1,4)^T,\vec u_2=(1,-1,1,2)^T\). Find the basis of \(W^{\bot}\)

Sol;

\(dimW^{\bot}=4-2=2\)

\(\forall \vec v\in W^{\bot}\) if and only if \(<\vec v,\vec u_1>=0 \ and \ <\vec v,\vec u_2>=0\)

Let \(\vec v=\begin{bmatrix}x_1\\x_2\\x_3\\x_4\end{bmatrix}\\\implies\begin{cases}x_1+x_2-x_3+4x_4=0\\x_1-x_2+x_3+2x_4=0\end{cases}\\\implies \begin{bmatrix}1&1&-1&4\\1&-1&1&2\end{bmatrix}\begin{bmatrix}x_1\\2_2\\x_3\\x_4\end{bmatrix}=\vec 0\\\implies Basis=\{(0,1,1,0)^T,(-3,-1,0,1)^T\}\)

Minimizing The Distance to a Subspace

Suppose \(U\) is a finite-dimensional subspace of \(V\), \(\vec v\in V,\vec u\in U\), then \[ \lVert\vec v-P_U\vec v\rVert(Minimal\ Residual)\leq \lVert\vec v-\vec u\rVert (Residual) \] And the inequality above is an equality iff \(\vec u=P_U\vec v\)

image-20211212180948287

Best Approximation (Finite-dimensioanl)

\(V\) is a vector space with inner product (defined), \(\lVert\cdot\rVert\) is the induced norm, \(U\) is a subspace of \(V\)

For \(\vec v\in V\), we have \[ \lVert\vec v-P_U\vec v\rVert(Minimal\ Loss)\leq\lVert\vec v-\vec u\rVert,\forall\vec u\in V \] Then \(\hat{\vec v}\) is called the best approximation of \(\vec v\) on \(U\): \[ \lVert\vec v-\hat{\vec v}\rVert\leq\lVert\vec v-\vec u\rVert,\forall\vec u\in V \] Then, if \(\hat{\vec v}\) is the best apporixmation, \[ <\vec v-\hat{\vec v},\vec u>=0,\forall \vec u\in U \]

Example

Find a polynomial u with real coefficients and degree at most 5 that approximates \(sinx\) as well as possible on the interval \([-\pi,\pi]\) in the sense that \(\int_{-\pi}^\pi|sinx-u(x)|^2dx\) is as small as possible.

Sol:

Let \(C_R[-\pi,\pi]\) denote the real inner product space of continous real-valued function on \([-\pi,\pi]\) with inner product (derived by the loss function) \[ <f,g>=\int_{-\pi}^\pi fgdx \] \(U=P_5([-\pi,\pi])\) is the most 5 order polynomial, \(U\subseteq C_R[-\pi,\pi]\), then our problem is to find \(\vec u\in U,such\ that\ \lVert\vec v-\vec u\rVert is\ minimal\)

Then the best apporixmation of \(sinx\) in \(P_5\) is \(P_U(sinx)=P_U(v)=<v,\vec e_1>\vec e_1+\cdots+<v,\vec e_6>\vec e_6\)

Then we first apply Gram-Schmidt Prcedure to standard \(P_5: 1,x,x^2,x^3,x^4,x^5\), producing an orthonormal basisi \(\vec e_1,...\vec e_6\) of \(U\)

Then \(P_U(v)=\vec e_1+\cdots+\vec e_6\\\implies applying\ <f,g>=\int_{-\pi}^\pi fgdx\\\implies P_U\vec v=0.987862x-0.155271x^3+0.00564312x^5\)

Compare:

Real (red): \(\vec v=sinx\)

Best approximation (blue): \(P_U\vec v\)

Tylor expansion (green): \(T\vec v=x-\frac{x^3}{3!}+\frac{x^5}{5!}\)

image-20211212183831419

3.9 Linear Functionals and Riesz Theorem

Linear Functional

A linear functional on \(V\) is a linear transformation from \(V\) to \(F\) (map vector to number). In other words, a linear functional is an element of \(L(V,F)\)

Example

The function \(\phi:F^3\to F\) defined by \[ \phi(z_1,z_2,z_2)=2z_1-5z_2+z_3\\\implies \phi(z)=<z,u>=<(z_1,z_2,z_3)^T,(2,-5,1)^T>, \forall z\in F^3 \] is a linear functional on \(F^3\).

Riesz Theorem

Suppose \(V\) is finite-dimensioanl and \(\phi\) is linear functional on \(V\). Then there is a **uniqu \(\vec u\in V\) such that \[ \phi(\vec v)=<\vec v,\vec u> \]

Note:

\(\vec u=\overline{\phi(\vec e_1)}\vec e_1+\cdots+\overline{\phi(\vec e_n)}\vec e_n\)

Proof:

Let \(\vec e_1...\vec e_n\) be the orthonormal basis of \(V\)

Then \(\forall \vec v\in V\) we have

\(\vec v=<v,e_1>e_1+\cdots+<v,e_n>e_n\\\implies \phi(\vec v)=\phi(<v,e_1>e_1+...+<v,e_n>e_n)\\=<v,e_1>\phi(e_1)+...+<v,e_n>\phi(e_n) \ (<v,e_j> and\ \phi(e_j) \ is\ number, \ \phi \ is\ linear\ transformation)\\=<v,\overline{\phi(e_1)}e_1>+...+<v,\overline{\phi(e_n)}e_n>\)

Then let \(\vec u=\overline{\phi(e_1)}e_1>+...+\overline{\phi(e_n)}e_n\), we have \(\phi(\vec v)=<\vec v,\vec u>\)

Example

Find \(\vec u\in P_2()\R\) such that \(\int_{-1}^1p(t)(cos\pi t)dt=\int_{-1}^1 p(t)u(t)dt,\forall p\in P_2(\R)\)

Sol:

image-20211212195103126

3.10 Adjoint

Definition

Suppose \(T\in L(V,W.)\) The adjoint of \(T\) is the function \(T^*:W\to V\) such that \(<T\vec v,\vec w>_W=<\vec v,T^*\vec w>_V,\forall \vec v\in V,\vec w\in W\)

By Riesz Respresentation Theorem

\(<T\vec v,\vec w>_W=\phi_W(\vec v)=<\vec v,\vec u>_V,T^*\vec w=\vec u\)

Example

Define \(T:R^3\to R^2\) by \(T(x_1,x_2,x_3)^T=(x_2+3x_3,2x_1)^T\). Find formula for \(T^*\)

Sol:

\(T:R^3\to R^2, T^*:R^2\to R^3\), by definition

\(<v,T^*w>=<Tv,w>\\\implies <(x_1,x_2,x_3)^T,T^*(y_1,y_2)^T>=<T(x_1,x_2,x_3)^T,(y_1,y_2)^T>\\=<(x_2+3x_3,2x_1)^T,(y_1,y_2)^T>\\=(x_2+3x_3)y_1+2x_1y_2\\=2x_1y_2+x_2y_1+3x_3y_1\\=<(x_1,x_2,x_3)^T,(2y_2,y_1,3y_1)^T>\\\implies T^*(y_1,y_2)^T=(2y_2,y_1,3y_1)^T\)

Fix \(u\in V\) and \(x\in W\). Define \(T\in L(V,W)\) by \(Tv=<v,u>x\). For every \(v\in V\), find a formula for \(T^*\)

sol:

\(\forall \vec v\in V, fix\ \vec w \in W\)

Since \(T:V\to W, T^*:W\to V\)

\(<v,T^*w>_V=<Tv,w>_W\\=<<v,u>x,w>_W\\=<v,u>_V<x,w>_W\ (<v,u>_V,<x,w>_W\ are\ numbers\ )=\\<v,\overline{<x,w>_W}u>_V\\=<v,<w,x>u>_V\\\implies T^*w=<w,x>_Wu\in V\)

\(T^*\) is a lionear transformation i.e. \(T^*\in L(W,V)\)

Properties of the Adjoint

  • \((S+T)^*=S^*+T^*,\forall S,T\in L(V,W)\)

  • \((\lambda T)^*=\bar \lambda T^*,\forall \lambda \in F, T\in L(V,W)\)

  • \((T^*)^*=T,\forall T\in L(V,W)\)

  • \(I^*=I\), where \(I\) is the identity operator on \(V\)

  • \((ST)^*=T^*S^*,\forall T\in L(V,W), S\in L(W,U)\), \(U\) is an inner product space over \(F\)

    \(T:V\to W,S:W\to U\implies ST:V\to U,(ST)^*:U\to V\)

    <\(v,(ST)^*u>_V=<(ST)v,u>_U\\=<S(Tv),u>_U\\=<Tv,S^*u>_W\ (By\ definition)\\=<v,T^*S^*u>_V\)

Theorem:

Suppose \(T\in L(V,W)\), then

\(NullT^*=(RangeT)^{\bot}\)

\(w\in NullT^*\Longleftrightarrow T^*w=0\ (Definition\ of\ null\ space)\\\Longleftrightarrow <v,T^*w>=0,\forall v\in V \ (orthogonal)\\\Longleftrightarrow <Tv,w>_W=0 \ (Adjoint)\\\Longleftrightarrow w\in (R(T))^{\bot}\)

\(Range T^*=(NullT)^{\bot}\)

\(NullT=(Range T^*)^{\bot}\)

\(Range T=(NullT^*)^{\bot}\)

image-20211213032253789

\(V=Null(T)\oplus Range(T^*)\)

\(W=Null(T^*)\oplus Range (T)\)

The Matrix of T*

Let \(T\in L(V,W)\). Suppose \(E_1,...e_n\) is an orthonormal basis of \(V\) and \(f_1...f_m\) is orthonormal basis of \(W\). Then \[ M(T^*,(f_1...f_m),(e_1...e_n))=A^H \] is teh conjugate transpose of \[ M(T,(e_1...e_n),(f_1...f_m))=A \] Example

For \(T:R^3\to R^2, T(x_1,x_2,x_3)^T=(x_2+3x_3,2x_1),\ and \ T^*:R^2\to R^3, T^*(y_1,y_2)^T=(2y_2,y_1,3y_1)^T\)

Let \(e_1=(1,0,0)^T,e_2=(0,1,0)^T,e_3=(0,0,1)^T,f_1=(1,0),f_2=(0,1)\)

we have \[ T(e_1,e_2,e_3)=(f_1,f_2)\begin{bmatrix}0&1&3\\2&0&0\end{bmatrix}(A)\\T^*(f_1,f_2)=(e_1,e_2,e_3)\begin{bmatrix}0&2\\1&0\\3&0\end{bmatrix}(A^H) \] Big Picture

image-20211213133044462

\(R^m=Range(T)\oplus (RangeT)^{\bot}=Range(T)\oplus Null(T^*)\)

\(R^n=Null(T)\oplus(NullT)^{\bot}=NullT\oplus Range(T^*)\)

3.11 Fundamental Subspaces

Let \(T\) be a bounded linear transformation from \(V\to W\). The equation \(T\vec v=\vec w\) has a solusion if and only if \(<w,u>=0,\forall \vec v\in NullT^*\) \[ \vec w \in R(T)\Longleftrightarrow \vec w \bot Null T^* \] In matrix notation, \(\vec v=\vec w\) has a solusion if and only if \(\vec u^H\vec w=0,\forall \vec u \ such\ that\ A^H\vec u=0\)

Least Square

Let \(T\) be a bounded linear transformation from \(V\to W\) (\(dim V<dim W\)). The vector \(v\in V\) minimizes \(\lVert T\vec v-\vec w\rVert\) if and only if \[ T^*T\vec v=T^*\vec w,\vec w\in W \] If A is a matrix that \(A^HA\) is invertible, the best approximation minimizes \(\lVert A\vec v-\vec w\rVert\) is \[ \vec v=(A^HA)^{-1}A^H\vec w \] Suppose \(T\) is linear transformation from \(V\to W\). The vector \(\vec v\in V\) minimizes \(\lVert\vec v\rVert\) and satisfies \(T\vec v=\vec w\). Then \(\hat{\vec v}\in R(T^*)\). i.e. \[ \exist \vec w\in W,\ such\ that\ TT^*\vec u=\vec w \] If A is a matrix such that \(AA^H\) is invertible, the best approximation minimizes \(\lVert v\rVert\) is \[ \vec v=A^H(AA^H)^{-1}\vec w \]

Projection Operator

Let \(W\in \C^{mn},\vec v\in \C^m\), then the orthogonal projection of v onto column space of W is given by \[ P_W\vec v= W(W^HW)^{-1}W^H\vec v \] Especially, if \(W\) is the matrix with orthonormal columns, then \[ P_W\vec v=WW^H\vec v \]

Psedo Inverse

For a matrix \(A\in C^{mn}\), then matrix \(A^+\in C^{nm}\) is the pseudo inverse iff:

  • \(AA^+A=A\) (\(AA^+\) is idempotent)
  • \(A^+AA^+=A^+\) (\(A^+A\) is idempotent)
  • \((AA^+)^H=AA^+\) (\(AA^+\) is Hermitian)
  • \((A^+A)^H=A^+A\) (\(A^+A\) is Hermitian)

Application

  • Fitting a straight line to 3 points \((-2,1),(0,2),(2,4)\)

    Sol;

    image-20211213164930887
  • Fit m points with a parabola that minimizes \(\lVert b-f(t)\rVert^2\)

    Sol:

    image-20211213170532129
    image-20211213170557011

Weighted Least Squares Solution of a Linear System

image-20211213171238426
image-20211213171308082

Unitary Matrices

image-20211213171541073

Singular Value Decomposition

Self-Adjoint Operator

image-20211213202707807

Example

  • image-20211213202905177
  • image-20211213202934321

Theorem:

image-20211213203233168

Not always true for real inner product, counterexample: image-20211213203322965

image-20211213203440109

Check Self-Adjoint:

image-20211213203728050

Normal Operators

image-20211213204048146

Example

  • image-20211213204156288

    image-20211213204327285

Check Normal Operator:

image-20211213204453648

image-20211213204704823

Remark:

If \(T\in L(V)\) is normal, then \(NullT=NullT^*,R(T)=R(T^*)\)

image-20211213205015916

Invariant Subspace

image-20211214030115991

Example

  • image-20211214030245335
  • image-20211214030334429

image-20211214031442565

Eigenvalues and Eigenvectors

image-20211214030718305

Theorem:

image-20211214030759777

Example

image-20211214030902431
image-20211214030933843

Theorem:

image-20211214031110640
image-20211214031214826
image-20211214031251188

Example

image-20211214031324164

Orthogonal Eigenvectors for Normal operators

image-20211214031727268
image-20211214032247736

Exampleimage-20211108141545470

image-20211214032359372
image-20211214032428542

Example

image-20211214032459359

Positive Operators

image-20211214032733862
image-20211214032811653
image-20211214032626552
image-20211214032846846
image-20211214033006244
image-20211214033028168

Isometry

image-20211214033142692
image-20211214033216178
image-20211214033305526

Example

image-20211214033334279
image-20211214033405060

Math 305 Advanced Linear Algebra
http://example.com/2022/10/30/Math 305 Advanced Linear Algebra/
Author
Jiacheng Xie
Posted on
October 30, 2022
Licensed under