For interested outsiders

If you’re reading this, you may have been directed to this link by a former student of mine in order to provide you with more information about the large project they completed in my course. This page doesn’t have information about specific student projects, but should give you a sense of the kinds of skills students who completed this project will have. You may wish to skip directly to the skills description.

One sentence description: Students received data as well as web scraping, accessing an API and retrieving licensed data, and then used these to merge, wrangle, visualize, summarize and model data on fitness tracker customers to meet a client brief, and reported on their methods and findings appropriately for a general executive audience and, separately, for a technical audience.

What is STA303/1002 about?

STA303: Methods of Data Analysis II is a course delivered by the Department of Statistical Sciences at the University of Toronto. (STA1002 is the code used for graduate students from other departments.)

In Winter 2022 (January to April), STA303 remained completely online due to COVID-19. Students rose to this challenge remarkably well, but it was a significant challenge to wellbeing and focus, none-the-less.

STA303 is a communication- and application-focused course where students learn:

The models covered include: linear mixed models, generalized linear models, generalized linear mixed models and generalized additive models.

Task summary

Full information on this assessment can be found on the rest of this site, but here is a quick overview. The project was worth 45% of students’ final grades and students could choose to complete it individually or as a team. Teamwork was recommended, but the task was the same either way. Teams were not required as students had good reasons for completing individually, such as being located in a challenging time zone, lack of access to internet appropriate for calling, or other caring or work obligations that made scheduling meetings untenable.

Students were consulting for MINGAR, analyzing customer data for their Canadian fitness tracker/watch market. Note: MINGAR isn’t a real company, This level of detailed personal data would be innapropriate to share with a class of 600 students. I also wanted students to have a project where they could share (most of) the code, data and outputs in their portfolios, on GitHub, etc. That said, this dataset was simulated based on real research and trends and draws on my own experiences running a small company.

Each team or individual created a consulting company for the purposes of this activity and to register their group/individual status completed a pseudo-NDA, of which the only real part was reminding them that they had already agreed to several codes of conduct as part of their enrolment at U of T and made clear my expectations of their professionalism. It also gave them a chance to familiarize themselves with a common requirement for consulting.

The deliverable

The final submission was a report that included:

Students were tasked to answer the research questions posed by the client, communicate their findings in ways appropriate to the audience for each section of the report, choose appropriate methods and create professional visualizations and tables to explain their results.

Reports were written in a reproducible R Markdown file (a code and text file type popular for use with the programming language R). Students were provided with a basic template that they could choose to use.

Skills demonstrated

Students who completed this project to a reasonable standard can do the following (organized under broad headings):

Statistical reasoning and knowledge

Ethical professional practice

Modern data practices

Writing

Programming

General