Introduction to statistics with R (beginners)

Biostatistics: Introduction to biostatistics (1st level)

This course is part of the Plateforme Biostatistique de Toulouse training session: “Initiation à LA statistique avec R”. The first session will held on September 21-23 2020.

The content of this course is basic statistics, illustrated with the programming language R. The course covers the following topics:

  • exploratory statistics in one and two dimensions, including plots (Nathalie Vialaneix and Sandrine Laguerre)

  • statistical inference and statistical tests (Nathalie Vialaneix and Sandrine Laguerre)

  • PCA and clustering (Sébastien Déjean and Jérome Mariette)

This page gathers information about it and material to download.

Warning: For this first session, the INRAE health and security rules apply. In addition, a videoconference version will be organized for the first two days. Please, contact Nathalie Vialaneix for more information and technical settings.

Install R

For this course, the installation of R, RStudio (and ability to compile RMarkdown files) and of the few packages on your personal computer is required prior the beginning of the course. The installation steps are described below.

Do not hesitate to contact me (emails preferred) in case of problem during the installation. Please describe precisely the error message (screenshot is a plus) when reporting a problem.

Install R (preferentially version 4.0 or higher)

R can be downloaded for free on the official repository website. Choose the version depending of your OS (Windows, Linux or Mac). Mac users should also probably install tcltk which is available in the section called tools. Some linux users might also found R in their distribution repositories (this is the case for Ubuntu and Debian users; further details are provided at this page, third bullet point).

Install RStudio

RStudio (Desktop version) can be downloaded for free at this link. Choose the version ("Installers" prefer) depending of your OS (Windows, Linux or Mac). Ubuntu users can install the .deb file with
sudo dpkg -i rstudio-XX.deb
sudo apt-get install -f
To be sure that you can compile rmarkdown files, open RStudio and click on New / RMarkdown file and then try to click on the button "knit". If all packages to knit your file are not installed, you should be prompted to install them.

Install required CRAN packages

The following packages (available on CRAN) will be required:
  • RColorBrewer
  • FactoMineR
They are installed (with dependencies) using:
install.packages(c("RColorBrewer", "FactoMineR"))
We also recommend that you check the installation with
where ... is a package name.

Materiel for the class

Download the material and have it ready on your computer for the class.

  1. Course material
  2. Datasets The class will be illustrated with the following datasets:
    • for the first two days: an extract of the Health Nutrition and Population Statistics database that is hosted by the World Bank Group and provides various key indicators related to health issues, population dynamics and nutrition. The data have been gathered from 258 countries and the current dataset is restricted to the year 2019. It comes from this page. Be careful to download this file directly (with a right click on the mouse) and to not open it with Excel!
      In addition, the variables in this dataset are described in this Excel file (that you can open with Excel or Libreoffice).
    • for the third day: the three CSV athle_records.csv, body_full.csv and body_light.csv (that must also not be opened with Excel).
  3. Material for the practical session