In nearly every field of science, our ability to generate data has exceeded our capacity for analysis. For me, this means that there is the potential for loss to science; many important discoveries may go unnoticed because we are unable to efficiently analyze data. AI is helping a lot with this, but there is still a need for people who can manage, transform, and visualize data. This course is designed to give you the tools you need to do this using the R programming language.
The goal for this course is to help get you started using the R programming language. You will learn to clearly and neatly organize messy data, transform it in ways that address your questions, and communicate results in a variety of formats. The course is designed for people with no prior programming experience. There is a substantial learning curve but, working together, we will make learning R easier, interesting, and fun.
This class is NOT focused on statistical analysis, interpretation, or modeling. The goal is to provide you with the foundational skills you need to perform these tasks.
What about AI? Will programming become an irrelevant skill? AI is a powerful tool, but it is just that. To use it effectively, you need to understand the data you are working with and the questions you are trying to answer. Programming skills can help you use AI more effectively.
The first two weeks of the class can seem slow, tedious, and frustrating. These first steps are like learning a new language; you often won’t know what is being said or why. Please be patient- I promise that it will get easier.
Given that R is open source, many resources are available online. We will use a combination of resources in the class, but key items are listed below.
Our campus has a wealth of expertise in data science. There is even a major in data science. Should your interests progress, here are some links. The Davis R Users Group offers regular workshops. Among the goals of this class is to get you set-up so that you can attend and learn more!
When you need help with homework or a class topic, please post on the class Canvas site. We are here to support you and everyone should participate, please.
The goal of lab 1 is to get everyone started using R, RStudio, and GitHub. All of our work will be done in RStudio and uploaded to the class GitHub repository. It is important that everyone is set-up correctly before we are done today. In the spirit of the R universe, our class is a community. If you see someone struggling, please give them some help.
This quarter we are fortunate to be able to use the computers in the SCC. Each of these computers is up-to-date, and the installed software is exactly the same. This makes following my instructions much easier, but you will likely want to have the ability to work at home. Please follow the directions below to set-up your personal computer.
Because you will need to work on assignments at home, it is important that you spend time making sure that your computer is set-up and ready to go. The first step in this is basic maintenance; i.e. clean up your desktop and update your OS. Data scientists are neat and tidy! Spend time getting yourself organized, it will pay off.
Please follow these steps to set-up you computer here.
Keeping your computer happy and healthy takes some work. Here are some steps that you can take to make sure that your computer is running optimally.
I have setup Canvas to mirror the class webpage. Both sites contain the same information except for grades. Most people will be fine using the Canvas site and it is more accessible. Please make sure that you pay attention to class announcements- these will all come from Canvas.
R is the programming language that we will use in this class. RStudio is an interface (Integrated Development Environment, IDE) that makes using R easier. It’s a graphical interface that allows you to write and run code, visualize data, and manage files. RStudio is not required to use R, but it makes the experience much more user-friendly.
When you open RStudio, you will see that the window is divided into four sections. Each of these sections has a specific function that we will use throughout the course.