Exercise: Combining Linear Regression and Measurements

Let's return to our example from the previous chapter, where we defined a new type called Measurement to store a measurement and it's measurement error. We will now define appropriate methods for this type.

Let's first take a look at addition: if we add two measurements, $a \pm b$ and $c \pm d$, we will get a new measurement

\[(a \pm b) + (c \pm d) = (a + c) \pm (b + d)\]

That is, we add the measured values and the error terms separately.

Exercise

Define a method for the addition of two measurements. Hint: recall that functions from other modules (like + from Base) have to be imported before you can define a new method for them.

show solution
Solution
import Base: +
+(x::Measurement, y::Measurement) = Measurement(x.value + y.value, x.error + y.error)
+ (generic function with 229 methods)


We can now add measurements together:

m1 = 2.98 ± 0.43
m2 = 0.34 ± 1.34

m1 + m2
Main.Measurement(3.32, 1.77)

Work's like a charm! Subtraction works similarly, so I will define this method for you:

import Base: -
-(x::Measurement, y::Measurement) = Measurement(x.value - y.value, x.error + y.error)
- (generic function with 237 methods)

To make working with our Measurement type a bit more aesthetic, we can even define a new method for Base.show that defines how our type is printed to the REPL:

import Base: show

function show(io::IO, m::Measurement)
    value_as_string = string(round(m.value, digits = 2))
    error_as_string = string(round(m.error, digits = 2))
    print(io, value_as_string*" ± "*error_as_string)
end

m1
2.98 ± 0.43

(This function definition contains some concepts that you are not familiar with yet, e.g. IO, so don't worry if it looks incomprehensible to you at this point.) But let's get back to more serious business: The next thing we need is to be able to add measurements and real numbers together. We can think of real numbers as measurements without error, so adding a measurement $(a \pm b)$ and a real number $c$ should yield

\[(a \pm b) + c = (a + c) \pm b\]

Exercise

Add methods for the addition of measurements and real numbers. Hint: real numbers are denoted by the type Real.

show solution
Solution
+(a::Real, b::Measurement) = Measurement(a + b.value, b.error)
+(a::Measurement, b::Real) = b + a
m1 + 3.53
6.51 ± 0.43


Okay, to spare you some time, I will define some last methods we need: subtraction and multiplication of real numbers and measurements. Subtraction works just like addition, and for multiplication we see that

\[c \cdot (a \pm b) = ca \pm cb\]

-(x::Measurement, y::Real) = Measurement(x.value - y, x.error)
-(x::Real, y::Measurement) = Measurement(x - y.value, y.error)

import Base: *
*(x::Measurement, y::Real) = Measurement(x.value*y, abs(x.error*y))
*(x::Real, y::Measurement) = y*x
* (generic function with 328 methods)

Let's see if it all works as planned:

m1 + m2
9.34 + m1
m2 + 4.52
3.5 * m2
m2 * 3.5
1.19 ± 4.69

Now for the fun part: In the exercise to the first chapter, we tried to predict income from years of education with the help of linear regression. Let's simulate some data again:

using Random
Random.seed!(1243)

x = 10 .+ 3*randn(20)
β = 300
α = 1000
y = α .+ β*x + 500*randn(20)

This time, let's asume we observed education with some measurement error:

x = Measurement.(x, 2*randn(20))
20-element Vector{Main.Measurement}:
  9.93 ± 2.08
 14.01 ± 0.9
 12.94 ± 3.99
  9.38 ± 0.27
  9.52 ± 0.33
  3.87 ± 1.53
  7.33 ± 2.77
 11.65 ± 3.15
  8.56 ± 2.92
  9.61 ± 0.88
 14.36 ± 1.65
 12.81 ± 3.58
 12.52 ± 1.33
  4.87 ± 0.11
  9.36 ± 3.1
 11.49 ± 2.22
  5.74 ± 2.4
 14.41 ± 2.28
 10.65 ± 0.55
  8.31 ± 1.8

But what happens when we us the education data with measurement error to predict income? Let's find out! We defined our prediction function as

function predict(x, α, β)
    y = α .+ β*x
    return y
end

and if we plug in the values for education with measurement error, we achieve

predict(x, 1000, 300)
20-element Vector{Main.Measurement}:
 3979.81 ± 623.21
 5203.53 ± 271.33
 4883.21 ± 1196.21
 3814.21 ± 79.95
 3854.61 ± 97.9
 2161.85 ± 457.95
 3198.92 ± 832.0
 4494.55 ± 945.31
 3567.1 ± 875.31
 3883.65 ± 263.47
 5306.83 ± 493.83
 4843.26 ± 1074.22
 4757.03 ± 399.06
 2459.52 ± 34.17
 3808.14 ± 928.98
 4447.9 ± 667.33
 2721.06 ± 719.89
 5321.68 ± 684.61
 4196.09 ± 165.8
 3492.08 ± 539.58

a prediction of income with the respective measurement error.

Hopefully, this serves a a nice illustration of what multiple dispatch together with Julias type system is able to achieve. When you wrote the predict function at the beginning of this workshop, you probably had no idea what multiple dispatch even is. But because we are able to define methods for important operations (like *, +), we can use any function or algorithmn that is composed of these operations. This is a very powerful idea that allows for great extensibility and interoperability of different packages. If you don't believe me, imagine somebody has written an R package for linear regression. Now you are in the situation that you have to deal with measurement error. I believe it would be impossible to get the published R package to work with measurement error without rewriting the whole package.

Similarly, if you think about our pokemon example from the beginning: If we would like to add more types of pokemon, we just have to define the types (Water, Fire, etc.) and the relevant methods for our functions (effectiveness). Of course, the code we wrote in the pokemon example is rather limited, but this translates seamlessly to complicated packages with more functionality.