Now that we have converted or created Dataframes, we can explore what we can do to them. You should be familiar with the
.head()function that should produce top rows of a Dataframe.
You can also cut out sections of a dataframe. this is needed for many Machine Learning procedures.
Let's say we have a dataframe called
df, which has a column named "key". Here's how we can extract this column:
col = df[["key"]]
This returns a new dataframe, with its only column being "key."
We can also extract many columns. Let's say that
dfhas columns called "key", "key1", "key2". We can extract these columns:
cols = df[["key", "key1", "key2"]]
Now, there's a technicality to this. There's another way to extract single columns, but it doesn't produce a dataframe, rather it produces what's called a series:
col = df["key"]
This is a one-dimensional Pandas array, which can hold all types of elements. At any rate, we won't really use them, so don't worry about that.
Pandas provides a very useful function that allows you to get the number of unique values in a particular column. This can be very useful in many contexts, and we'll use it occasionally.
For the column "key" in the dataframe df, we produce a list of unique values by the following command:
Again, this chapter is like an encyclopedia, so you can refer back to it whenever you need.
We can also extract the section of a dataframe that passes a particular condition. This is very important, and although this will be more useful in classification, we may still use it. Let's say that we want to include only the values such that the values of the key column are greater than 0.8:
df[df[key] > 0.8]
And that's all the basics you should need to know about Pandas. If there's anything else, we'll cover it later.