Using Python for research (edX project)
Final assigment of the course HarvardX: PH526x Using Python for Research.
Description
The goal was to develop a model that would predict the type of physical activity (standing, walking, climbing stairs or going down the stairs) from tri-axial smartphone accelerometer data.
We were given three files of data.
train_time_series.csv. Raw accelerometer data. Used to build the model.
timestamp | UTC time | accuracy | x | y | z | |
---|---|---|---|---|---|---|
20586 | 1565109930787 | 2019-08-06T16:45:30.787 | unknown | -0.0064849… | -0.9348602… | -0.0690460… |
20587 | 1565109930887 | 2019-08-06T16:45:30.887 | unknown | -0.0664672… | -1.0154418… | 0.08955383… |
20588 | 1565109930987 | 2019-08-06T16:45:30.987 | unknown | -0.0434875… | -1.0212554… | 0.17846679… |
… (3744x6) |
train_labels.csv. The activity labels for every tenth observation in train_time_series.csv. Also needed to build the model.
timestamp | UTC time | label | |
---|---|---|---|
20589 | 1565109931087 | 2019-08-06T16:45:31.087 | 1 |
20599 | 1565109932090 | 2019-08-06T16:45:32.090 | 1 |
20609 | 1565109933092 | 2019-08-06T16:45:33.092 | 1 |
… (375x3) |
test_time_series.csv. We were asked to provide the activity labels predicted by our code for this test data set.
timestamp | UTC time | accuracy | x | y | z | |
---|---|---|---|---|---|---|
24330 | 1565110306139 | 2019-08-06T16:51:46.139 | unknown | 0.0342864… | -1.5044555… | 0.1576232… |
24331 | 1565110306239 | 2019-08-06T16:51:46.239 | unknown | 0.4091644… | -1.0385437… | 0.0309753… |
24332 | 1565110306340 | 2019-08-06T16:51:46.340 | unknown | -0.234390… | -0.9845581… | 0.1247711… |
… (1250x6) |
Because it was a classification problem I tried several approaches like Knn and logistic regression. At the end of the course we were not given the data set with the actual labels but we were told our accuracy score. I achieved a very modest 46%.
The complete code is available here
Comments
- This was my first data science project, so my knowledge of Python (and data science) was extremely basic. I would definitely change some things now that I know more.