Unknown: So in this video, we are briefly going to cover
random number generators in NumPy. I will not talk about all
the different ways we can generate random numbers in
NumPy. However, the focus is more on understanding random
number generators and getting an idea why they are useful in
practice. So maybe let's start with a simple example. So let's
just draw a random number one random sample. Let's first
import NumPy. And then there's this random module in NumPy,
which contains a lot of different functions for drawing
random numbers. Here's a nice overview. It's an older
documentation, but I couldn't find the equivalent in a newer
documentation that has such a nice list of different things.
For example, using random dot Rand which draw random samples
from a uniform random uniform distribution, given a certain
shape, so if we provide two and three here, it will generate a
two by three dimensional array with random numbers from a
random uniform distribution. There are also other ones here,
for example, the random function random random, which contains
random floats in the open half open interval between zero and
one, and so forth. So there are many different types of
functions you can use. Also, you can draw random samples from
different distributions here. For example, there's one way
using the random dot RAND function to draw samples from a
standard normal distribution is also another one, I think, you
know, right here. Same thing. Also, on your drawing samples
from a random normal distribution. If you want to
draw samples from a normal distribution, you don't want to
rescale it manually, you can just directly use the normal
function here, which gives you the mean, standard deviation and
the size thing it should be the second deviation could be the
variance, but I think this is the standard deviation. Yeah.
Okay. So let's draw, using a simple approach a random sample
using rent and, and we will just draw a random sample of 10
values. Yeah, this is nice. However, as you can see, if I
execute it, again, these numbers will change, which is useful in
practice in many applications. But in machine learning, if we
want to, for example, generate random numbers for shuffling. Or
if we want to just shuffle an array, and we implement some
functions to test our implementation. Sometimes we
want to always get the same results. And that guarantees us
to have a reproducible code, which means that if we send our
code and our example execution of that code to another person,
the other person who would run your code would get exactly the
same results. And this is often very useful in certain
situations where you have reviewers to submit the paper
and they run your code. And ideally, they should get the
same number as you get right. So if everyone gets different
numbers every time you execute it, then it's kind of hard to
tell what's going on. So in order to do that, to guarantee
that you always get the same results, you can set a random
seed at the top of your notebook, for example. So this
is not a numbers totally arbitrary. It's just some
number. And here now, when I execute this, I get these three
values. For example, After setting the random seed, if I
execute it again, it will always be the same three values, right?
However, if I draw another sample here, this will of
course, be different. Because it's still a random sample. It's
just like, if you execute the code over again, it will be the
same numbers drawn from the random uniform distribution.
Yes, so why and when is that useful? If you think back of our
KNN notebook, from lecture three, I showed you how to
shuffle an array. And that for example, it would be useful to
set a random number seed. So every time we split the data
set, it's the same split. So if someone wants to compare or
method, the can method with a different cane and method, it
would always generate the same split of the data set if we use
the same random seed before we shuffle it. Or maybe just to
give you an intuitive example here to let's consider a simple
let's say can n implementation. So if we have
k n KNN
classifier, some neighbors I think um, Second run, let's just
say we want to test it. For some reason, we want to implement a
test. We don't want to test it on some random data and we
generate labels. Let's say we have 50. examples from Class
Zero,
and
50, an example from class one. So it's just a quick way of
creating a label array. Random labels, I mean, the labels are
not random, but we will associate random values to that.
So we can then for example, write a sample. So NumPy, sorry,
in second run, our convention is usually we use y for the class
labels and x for the features. So we have, let's say, 100
training examples, corresponding to or labeled here, and then two
features. So this will be random data. And then let's say we can
fit on can use neighbor classifier
can fit right, explain. And then let's say we have our scoring
function was like computing the accuracy. So we get an accuracy
of 66%. Now if we do that, again, our accuracy goes up to
68. So this kind of fluctuating, so if we want to stabilize that,
we would, of course, have to use our random seed, we would insert
that, usually at the top of our notebook or code. So this would
be the top of our notebook, it would be going right here,
before we use the random function. So it will always be
the same value. No. Now we can, for example, let's say we
develop a unit test, I'm just showing a very simplified
example, we can include a test like this. And then this should
always pass until maybe I change my random number generator, and
this doesn't work anymore. Or if I draw, so this would work. And
then I draw another random sample, because the second
random sample is different. This is not true anymore. So that
will give us an error. And the same. So this is a very true
example. But the same issue exists. If we shuffle our data
set, for example, using the train test, split method in
psychic learn that I introduced in the K nearest neighbor
method, notebook. But yeah, let's not get into too much
detail about k nearest neighbors. This was just an
example showing you how not random number seed and random
number generator works. Personally, I always prefer to
use a random state object in NumPy. This is an object that
are its own random number generator, because usually when
I write code, I have different positions where I use
randomness. And this allows me a little bit more fine grained
control over where I want my randomness. So if I have a very
long code example, let's say I want my randomness, because I
want to collect a sample of something, this random, large
random sample, and I only want the shuffling of the training
set to be fixed. But every time let's say I execute my code, I
want different results. So in that case, I would only set the
NumPy random seed for the specific part in my code, for
example, that train test split. So I would, for example, when I
draw random in the indices to shuffle the race, I would only
use that there. And this random state allows me to do that. So
if I use this one, it won't, I will show you maybe here, it
won't affect anything we do with NumPy random dot random. So if I
set this random seed here, it will every time still generated
different numbers. So this one is not affected by this object,
because if we want to use this random number generator, we
would have to use it directly. So here we are calling a NumPy
function. So the function is rent which is contained in the
NumPy random module. And here we are initializing an object from
the random state class. And then we are calling a method rent on
this object. So we are calling it here. So every time I'm
executing this, I will get the same results. And this is the
same as doing NumPy random dot
seats.
So it's the same thing. You can also see numbers match. However,
I can have multiple of these objects in particular, I can
have infinitely many of these. And I can use them at different
places in my code. So this one gives me almost the same
results. But then if I call it twice, of course, it will also,
the second result will be different from the first result,
right. So this will always give me the same results, but this
one will be different from this one. Here, this is usually how I
prefer using a random state in my code. However, there's also a
new version of that. So this is because I'm old school, back
then there was only random state. However, nowadays, the
NumPy community recommends the new random generator. So if you
like, you can read more about that here. For this class for
our class, it's not necessary. I'm actually I kind of even
prefer the old one, because it generates results, similar to
using a function. However, yeah, I should probably recommend
using the new random generator, in your projects, if you like. I
think the improvement is just the way they generate random
numbers. So the algorithm that is used for generating random
numbers, in practice for simple applications, you won't notice a
difference, because in our case, it doesn't really matter how the
random numbers are generated. For us, it's more important that
we get consistency, and reproducible code. And this is
more due to setting a random seed. In any case, this one will
give different numbers than, let's say calling this one. So
be aware of that it's not like a one to one mapping like this to
this one will generate different values. And it's just using a
different method for generating these pseudo random or random
numbers. So in code, number generators are not truly random.
They are usually pseudo random. But yeah, that would be a very
long video to explain that here at this point, all you need to
know is there are different methods for generating random
numbers. It doesn't really matter which one you use,
especially not for this class. The only thing that matters is
having a random seed. So for example, if you submit homework
and you get certain results, and I execute the homework to check
if your results are correct, for example, it would be great if I
get the same results as you get right. So in that way, what's
more important here is setting a random seed. Okay, so but that
is all I wanted to say about random number generators that
are not super important, like what type of random number
generator you use. But using a random seed is important.