Introduction to statistics with R (beginners)

Biostatistics: Introduction to biostatistics (1st level)

This course is part of the Plateforme Biostatistique de Toulouse training session: “Initiation à LA statistique avec R”. The first session has been held on September 21-23 2020, the second on March 29-31, 2021 and the third is scheduled to March 14-16, 2022.

The content of this course is basic statistics, illustrated with the programming language R. It is taught by Nathalie Vialaneix and Sandrine Laguerre. The course covers the following topics:

  • exploratory statistics in one and two dimensions, including plots

  • statistical inference and statistical tests

  • PCA and clustering

This page gathers information about the course and material to download.

Please, contact Nathalie Vialaneix for any question or technical settings.

Install R

For this course, the installation of R, RStudio (and ability to compile RMarkdown files) and of a few packages on your personal computer is required prior the beginning of the course. The installation steps are described below.

Do not hesitate to contact me (emails preferred) in case of problem during the installation. Please describe precisely the error message (screenshot is a plus) when reporting a problem.

Install R (preferentially version 4.0 or higher)

R can be downloaded for free on the official repository website. Choose the version depending of your OS (Windows, Linux or Mac). Mac users should also probably install tcltk which is available in the section called tools. Some linux users might also found R in their distribution repositories (this is the case for Ubuntu and Debian users; further details are provided at this page, third bullet point).
   

Install RStudio

RStudio (Desktop version) can be downloaded for free at this link. Choose the version ("Installers" prefer) depending of your OS (Windows, Linux or Mac). Ubuntu users can install the .deb file with
sudo dpkg -i rstudio-XX.deb
sudo apt-get install -f
      
To be sure that you can compile rmarkdown files, open RStudio and click on New / RMarkdown file and then try to click on the button "knit". If all packages to knit your file are not installed, you should be prompted to install them.

Install required CRAN packages

The following packages (available on CRAN) will be required:
  • RColorBrewer
  • FactoMineR
  • factoextra
  • lubridate
  • scales
They are installed (with dependencies) using:
install.packages(c("RColorBrewer", "FactoMineR", "factoextra", "lubridate", "scales"))
      
We also recommend that you check the installation with
library("...")
      
where ... is a package name.

Special warning for INRAE users: some of the installation settings in various units of INRAE Toulouse are such that your personal R library is located on a remote folder. When not on-site, this can result in errors or delays with installed package. If you intend to follow the course from your home, carrefully check that the package loads properly (with the library command as stated above) after a reboot of your computer.

Materiel for the class

Download the material and have it ready on your computer for the class.

  1. Course material: Slides are available at this link. An HTML version for offline use is also provided.
  2. Material for the practical session: Slides are available at this link with the corresponding source (Rmd) file (with **R** script including inside).
  3. Datasets The class will be illustrated with the following datasets:
    • for the first two days: an stages_TDF.csv is a dataset describing Tour de France stages. It originates from kaggle where its complete description is available. Be careful to download this file directly (with a right click on the mouse) and to not open it with Excel!
    • for the third day: the CSV file athle_records.csv (that must also not be opened with Excel).