Practice English Speaking&Listening with: 4.6 NumPy Random Number Generators (L04: Scientific Computing in Python)

Normal
(0)
Difficulty: 0

Unknown: So in this video, we are briefly going to cover

random number generators in NumPy. I will not talk about all

the different ways we can generate random numbers in

NumPy. However, the focus is more on understanding random

number generators and getting an idea why they are useful in

practice. So maybe let's start with a simple example. So let's

just draw a random number one random sample. Let's first

import NumPy. And then there's this random module in NumPy,

which contains a lot of different functions for drawing

random numbers. Here's a nice overview. It's an older

documentation, but I couldn't find the equivalent in a newer

documentation that has such a nice list of different things.

For example, using random dot Rand which draw random samples

from a uniform random uniform distribution, given a certain

shape, so if we provide two and three here, it will generate a

two by three dimensional array with random numbers from a

random uniform distribution. There are also other ones here,

for example, the random function random random, which contains

random floats in the open half open interval between zero and

one, and so forth. So there are many different types of

functions you can use. Also, you can draw random samples from

different distributions here. For example, there's one way

using the random dot RAND function to draw samples from a

standard normal distribution is also another one, I think, you

know, right here. Same thing. Also, on your drawing samples

from a random normal distribution. If you want to

draw samples from a normal distribution, you don't want to

rescale it manually, you can just directly use the normal

function here, which gives you the mean, standard deviation and

the size thing it should be the second deviation could be the

variance, but I think this is the standard deviation. Yeah.

Okay. So let's draw, using a simple approach a random sample

using rent and, and we will just draw a random sample of 10

values. Yeah, this is nice. However, as you can see, if I

execute it, again, these numbers will change, which is useful in

practice in many applications. But in machine learning, if we

want to, for example, generate random numbers for shuffling. Or

if we want to just shuffle an array, and we implement some

functions to test our implementation. Sometimes we

want to always get the same results. And that guarantees us

to have a reproducible code, which means that if we send our

code and our example execution of that code to another person,

the other person who would run your code would get exactly the

same results. And this is often very useful in certain

situations where you have reviewers to submit the paper

and they run your code. And ideally, they should get the

same number as you get right. So if everyone gets different

numbers every time you execute it, then it's kind of hard to

tell what's going on. So in order to do that, to guarantee

that you always get the same results, you can set a random

seed at the top of your notebook, for example. So this

is not a numbers totally arbitrary. It's just some

number. And here now, when I execute this, I get these three

values. For example, After setting the random seed, if I

execute it again, it will always be the same three values, right?

However, if I draw another sample here, this will of

course, be different. Because it's still a random sample. It's

just like, if you execute the code over again, it will be the

same numbers drawn from the random uniform distribution.

Yes, so why and when is that useful? If you think back of our

KNN notebook, from lecture three, I showed you how to

shuffle an array. And that for example, it would be useful to

set a random number seed. So every time we split the data

set, it's the same split. So if someone wants to compare or

method, the can method with a different cane and method, it

would always generate the same split of the data set if we use

the same random seed before we shuffle it. Or maybe just to

give you an intuitive example here to let's consider a simple

let's say can n implementation. So if we have

k n KNN

classifier, some neighbors I think um, Second run, let's just

say we want to test it. For some reason, we want to implement a

test. We don't want to test it on some random data and we

generate labels. Let's say we have 50. examples from Class

Zero,

and

50, an example from class one. So it's just a quick way of

creating a label array. Random labels, I mean, the labels are

not random, but we will associate random values to that.

So we can then for example, write a sample. So NumPy, sorry,

in second run, our convention is usually we use y for the class

labels and x for the features. So we have, let's say, 100

training examples, corresponding to or labeled here, and then two

features. So this will be random data. And then let's say we can

fit on can use neighbor classifier

can fit right, explain. And then let's say we have our scoring

function was like computing the accuracy. So we get an accuracy

of 66%. Now if we do that, again, our accuracy goes up to

68. So this kind of fluctuating, so if we want to stabilize that,

we would, of course, have to use our random seed, we would insert

that, usually at the top of our notebook or code. So this would

be the top of our notebook, it would be going right here,

before we use the random function. So it will always be

the same value. No. Now we can, for example, let's say we

develop a unit test, I'm just showing a very simplified

example, we can include a test like this. And then this should

always pass until maybe I change my random number generator, and

this doesn't work anymore. Or if I draw, so this would work. And

then I draw another random sample, because the second

random sample is different. This is not true anymore. So that

will give us an error. And the same. So this is a very true

example. But the same issue exists. If we shuffle our data

set, for example, using the train test, split method in

psychic learn that I introduced in the K nearest neighbor

method, notebook. But yeah, let's not get into too much

detail about k nearest neighbors. This was just an

example showing you how not random number seed and random

number generator works. Personally, I always prefer to

use a random state object in NumPy. This is an object that

are its own random number generator, because usually when

I write code, I have different positions where I use

randomness. And this allows me a little bit more fine grained

control over where I want my randomness. So if I have a very

long code example, let's say I want my randomness, because I

want to collect a sample of something, this random, large

random sample, and I only want the shuffling of the training

set to be fixed. But every time let's say I execute my code, I

want different results. So in that case, I would only set the

NumPy random seed for the specific part in my code, for

example, that train test split. So I would, for example, when I

draw random in the indices to shuffle the race, I would only

use that there. And this random state allows me to do that. So

if I use this one, it won't, I will show you maybe here, it

won't affect anything we do with NumPy random dot random. So if I

set this random seed here, it will every time still generated

different numbers. So this one is not affected by this object,

because if we want to use this random number generator, we

would have to use it directly. So here we are calling a NumPy

function. So the function is rent which is contained in the

NumPy random module. And here we are initializing an object from

the random state class. And then we are calling a method rent on

this object. So we are calling it here. So every time I'm

executing this, I will get the same results. And this is the

same as doing NumPy random dot

seats.

So it's the same thing. You can also see numbers match. However,

I can have multiple of these objects in particular, I can

have infinitely many of these. And I can use them at different

places in my code. So this one gives me almost the same

results. But then if I call it twice, of course, it will also,

the second result will be different from the first result,

right. So this will always give me the same results, but this

one will be different from this one. Here, this is usually how I

prefer using a random state in my code. However, there's also a

new version of that. So this is because I'm old school, back

then there was only random state. However, nowadays, the

NumPy community recommends the new random generator. So if you

like, you can read more about that here. For this class for

our class, it's not necessary. I'm actually I kind of even

prefer the old one, because it generates results, similar to

using a function. However, yeah, I should probably recommend

using the new random generator, in your projects, if you like. I

think the improvement is just the way they generate random

numbers. So the algorithm that is used for generating random

numbers, in practice for simple applications, you won't notice a

difference, because in our case, it doesn't really matter how the

random numbers are generated. For us, it's more important that

we get consistency, and reproducible code. And this is

more due to setting a random seed. In any case, this one will

give different numbers than, let's say calling this one. So

be aware of that it's not like a one to one mapping like this to

this one will generate different values. And it's just using a

different method for generating these pseudo random or random

numbers. So in code, number generators are not truly random.

They are usually pseudo random. But yeah, that would be a very

long video to explain that here at this point, all you need to

know is there are different methods for generating random

numbers. It doesn't really matter which one you use,

especially not for this class. The only thing that matters is

having a random seed. So for example, if you submit homework

and you get certain results, and I execute the homework to check

if your results are correct, for example, it would be great if I

get the same results as you get right. So in that way, what's

more important here is setting a random seed. Okay, so but that

is all I wanted to say about random number generators that

are not super important, like what type of random number

generator you use. But using a random seed is important.

The Description of 4.6 NumPy Random Number Generators (L04: Scientific Computing in Python)