Dask DataFrames support two types of index: label-based and positional indexing. The main problem with Dask Indexing is that it does not maintain the partition's information. This means it is difficult to perform row indexing; only column indexing is possible. DataFrame.iloc only supports integer-based indexing, while DataFrame.loc supports label-based indexing. DataFrame.iloc only selects columns.
Let's perform these indexing operations on a Dask DataFrame:
- First, we must create a DataFrame and perform column indexing:
# Import Dask and Pandas DataFrame
import dask.dataframe as dd
import pandas as pd
# Create Pandas DataFrame
df = pd.DataFrame({"P": [10, 20, 30], "Q": [40, 50, 60]},
index=['p', 'q', 'r'])
# Create Dask DataFrame
ddf = dd.from_pandas(df, npartitions=2)
# Check top records
ddf.head()
This results in the following output:
P Q
p 10 40
q 20 50
r 30 60
In the preceding example, we created a pandas...