Diego Hernández Jiménez

Welcome to my personal website! Here I share some of my little projects.

Using Python for research (edX project)

Final assigment of the course HarvardX: PH526x Using Python for Research.

Description

The goal was to develop a model that would predict the type of physical activity (standing, walking, climbing stairs or going down the stairs) from tri-axial smartphone accelerometer data.

We were given three files of data.

train_time_series.csv. Raw accelerometer data. Used to build the model.

timestamp UTC time accuracy x y z
20586 1565109930787 2019-08-06T16:45:30.787 unknown -0.0064849… -0.9348602… -0.0690460…
20587 1565109930887 2019-08-06T16:45:30.887 unknown -0.0664672… -1.0154418… 0.08955383…
20588 1565109930987 2019-08-06T16:45:30.987 unknown -0.0434875… -1.0212554… 0.17846679…
… (3744x6)

train_labels.csv. The activity labels for every tenth observation in train_time_series.csv. Also needed to build the model.

timestamp UTC time label
20589 1565109931087 2019-08-06T16:45:31.087 1
20599 1565109932090 2019-08-06T16:45:32.090 1
20609 1565109933092 2019-08-06T16:45:33.092 1
… (375x3)

test_time_series.csv. We were asked to provide the activity labels predicted by our code for this test data set.

timestamp UTC time accuracy x y z
24330 1565110306139 2019-08-06T16:51:46.139 unknown 0.0342864… -1.5044555… 0.1576232…
24331 1565110306239 2019-08-06T16:51:46.239 unknown 0.4091644… -1.0385437… 0.0309753…
24332 1565110306340 2019-08-06T16:51:46.340 unknown -0.234390… -0.9845581… 0.1247711…
… (1250x6)

Because it was a classification problem I tried several approaches like Knn and logistic regression. At the end of the course we were not given the data set with the actual labels but we were told our accuracy score. I achieved a very modest 46%.

The complete code is available here

Comments

  • This was my first data science project, so my knowledge of Python (and data science) was extremely basic. I would definitely change some things now that I know more.