Data Science Specialization at Coursera

Last year, I took the nine online courses in the Data Science Specialization offered by Johns Hopkins University via Coursera. It could have been called “Data Science with R” since one whole course and a good part of the other courses were more about R programming than data analysis. I certainly learned more about R and the R ecosystem than about statistics and data science, but I already had some knowledge of the latter. The nine courses are:

  1. The Data Scientists Toolbox
  2. R Programming
  3. Getting and Cleaning Data
  4. Exploratory Data Analysis
  5. Reproducible Research
  6. Statistical Inference
  7. Regression Models
  8. Practical Machine Learning
  9. Developing Data Products

There’s also a “capstone” course/project, but it’s not offered as often, and the timing didn’t work for me.

Course Structure

Each course was four weeks, and the pattern for most weeks was about 30 minutes of video lectures and a multiple choice quiz. Some courses also had a project involving some data analysis, and those were the most educational parts of the series. There were also practice assignments and online discussion forums, but I found the Coursera site too sluggish to really use those casually.

As would be expected from courses like this, what you get out depends on the extra effort you put in. Someone completely new to the material would have to spend a lot of self-learning time because they’re not going to learn technical material like programming and statistics from a few video lectures. Unfortunately for me, the multiple choice quizzes made it a little too easy to get by with little effort. That’s partly a necessity of an online course, but being able to take each quiz three times seems to undermine any rigor. If you can eliminate just one of the four choices, you’re guaranteed to get each question correct by the third try.

Positives

Instructors. One thing that kept me in the series (besides my stubbornness to finish what I started) was the quality of the instructors, Brian Caffo, Jeff Leek and Roger D. Peng. They obviously have an enthusiasm for the subject and did a great job organizing the material. I especially like that they would often do live R coding during the lectures (I’m sure there were some edits, but still…). Sometimes that’s where the best R tips were learned.

Material. I liked that they spent a good amount of time just on programming and just on reproducible results, two topics that could be ignored or brushed over and still pass for data science. For the modeling, they made of point of avoiding linear algebra, which was a nice change from the standard approach.

Projects. While still constrained by the time available, these longer assignments forced you to apply the material to perform some basic analysis and publish the result. The projects used real data sets which often required some clean-up/preprocessing, which was a good lesson in itself.

Negatives

Peer grading. The longer assignments were better measures, but due to the compressed timing, they had to be kept pretty simple, and due to the large number of students, they had to be peer graded. For peer grading, each student had to grade four others. I can only hope they required some sort of agreement among duplicate graders because the grading guidance was minimal and coarse. (Is there a model provided? Yes/No. Is the model correct? Yes/No…)

The one course I didn’t get high marks in was the course I expected to be my best: Exploratory Data Analysis. I didn’t get good scores on the the long assignment from the peer grading. I knew I wasn’t strictly following the assignment guidelines, but I hoped the graders would be more flexible. The assignment called for a report with something like three pages of text and an appendix with two pages of graphs. I prefer to put the graphs inline with the text, but some graders stuck with the literal guidelines and gave my zeros for that part of the grading (there was often no way of giving partial credit).

Coursera. Besides the sluggish forums and grading limitations already mentioned, I would like to go back to review parts of the material (partly so I could be more specific in this review), but the archives are no longer available. Last month at least some of the course archives were available, but now none are. Given the scope of the material, providing later access seems essential.

Bottom Line

For someone new to the field, I think these courses will be too brief to learn the material well, but they provide a great tour from some great tour guides and will help frame further study.

One thought on “Data Science Specialization at Coursera”

Leave a Reply

Your email address will not be published.