- The first "column" in the DataFrame is the index, which defaults to incrementing integers
- Like how each column has a name, the index is the "name" of each row
- We can assign a column to be the index of a DataFrame:
listings_df = listings_df.set_index('id')
listings_df
Why do we need to assign the result of set_index()
?
- Calling
.set_index()
does not change the original DataFrame value - Calling
.set_index()
returns a new DataFrame value with the index changed, which we then assign to the original variable. - Most Pandas methods return a new value rather than changing the original value.
We can perform indexing and slicing on DataFrames using .iloc
:
To get the first row:
listings_df.iloc[0]
To get the second column in the first row:
listings_df.iloc[0, 1]
To get the second column of the first five rows:
listings_df.iloc[0:5, 1]
To get the second column of all rows:
listings_df.iloc[:, 1]
We can also index and slice rows and columns by their names:
To get a single row by it's name in the index:
listings_df.loc['l9995141']
To get several rows by their names:
listings_df.loc[['l9995141', 'l12026015', 'l44688136']]
While you can use
:
slicing to specify a start and end names for a range, it is more common to specify a list of names.
To get the name
column of all rows:
listings_df.loc[:, 'name']
Use sorting and indexing on listing_df
to find:
- The value in the third column of the fifth row.
- The
name
of the listing with anid
of'l6113'
- The
review_scores_rating
of the most reviewed listing. - The
latitude
andlongitude
of the least expensive listing.