Using AWS Glue and Amazon Athena
In this section, we will use AWS Glue to create a crawler, an ETL job, and a job that runs KMeans clustering algorithm on the input data.
We use a publicly available dataset about the students' knowledge status on a subject. The dataset and the field descriptions are available for download from the UCI site: https://archive.ics.uci.edu/ml/datasets/User+Knowledge+Modeling
- Log in to the AWS Management Console and go to the Glue console. Click on the
Add crawler
button. - Specify the
Crawler name
asUser Modeling Data Crawler
as shown here. Click on theNext
button:

- In the
Add a data store
screen, selectS3
as theData store
, and select theSpecified path in my account
option. Specify the path for the S3 bucket containing the input data. Click on theNext
button:

- Select
No
on theAdd another data store
and click on theNext
button. - On the IAM console, select the
Glue service
and click on theNext: Permissions
button:

- Next, we attach the appropriate policies to the role...