OMSCS — Big Data for Health Informatics

Jonathan Lao
3 min readJan 13, 2021

--

Atlanta is a good TV show.

Overview

This was the first class I’ve taken that left me with an overall negative impression. Other reviewers described this class as a sort of ‘bootcamp’ of a firehose of different technologies, rather than a traditional ‘academic’ course about big data. Ultimately while I agree with that characterization, the class turns out to not be a very good bootcamp.

Practical Tips

  • It’s true that you will touch a lot of different technologies like Hadoop, Pig, Hive, Spark, etc. But ultimately, I think the key to succeeding in the class is being comfortable with the basics of Python, SQL, machine learning (like understanding how to train and test a regression model), and Scala. If you’re comfortable with at least three of the four concepts, you’re probably in good shape. If not, recognize you’ll be spending more time than average by virtue of getting up to speed on new syntax.
  • The class Piazza discussion threads for each assignment will ultimately answer any question that you may possibly have about each homework. One tactic can be to wait a bit after an assignment is first released so the early go-getters hash out all the nitty details and clarifications. If instead you want to finish the homework early, know that there will be many other students ready and willing to work with you.
  • The sunlabs are the single best resource that will help with the homework. Following the corresponding lab step-by-step will often have >50% overlap with each homework. I tried ‘front-loading’ the labs ahead of time, but ultimately I think just doing the corresponding lab alongside the homework is the best use of your time.
  • Knowing the basics of Docker can be useful. I followed along a LinkedIn Learning class which we get a free subscription to as students.
  • Individual grading for each homework can be very hit or miss. In general, the local tests are rather simplistic, so passing them is a low bar. Sometimes passing them is sufficient to get a 100 in the end. Other times, they are wholly inadequate. And in one instance, I was passing the local tests despite knowing that I had an incorrect implementation of the algorithm, and I still got a 100.
  • Group projects can be so hit or miss. A good group can really make your life easier at the end of the semester. I don’t have any good tips as to how to find a good group, but in general all the slackers tend to drop out of the class.
  • To my knowledge, as long as you hit every deliverable for the project, you will get an A.
  • The assignments take up the first half of the course and the project the last half. Overall, this is a much more front-heavy course.
  • Overall grading for the class is very generous. Know that if you stick with the class to the end and at least turn in every deliverable, you’re very likely to get an A.
  • In light of the lenient class grading, don’t stress too much about the exam. Some of the questions are easy if you have a good grasp of the basic machine learning algorithms. Others felt like obscure trivia questions.

--

--