Training Courses

Introduction

This is Vanlog’s educational offer that ranges from courses for R beginners to specialization courses. The specialist offer covers both particular analysis areas and advanced programming techniques to use technology to achieve the desired results.

Trainers have experience both in corporate and academic training, and they are active users of the techniques subjects of these courses for business solutions and research.

Data Science Courses (2 day courses)

1 . R for Data Science

This course aims to introduce the attendee to the modern R for Data Science. RStudio and the Tidyverse will be the core of tools to perform data manipulation, visualization, modelling and to perform exportation of the result in beautiful and simple reports.

Outline

  • An overview of the Data Science toolbox
  • Introduction to R and RStudio
  • Why Data Science with R
  • Advantages of an interpreted language
  • Data Objects for Continuous and Categorical Variables
  • Data Objects for Scalar, Vector and Matrices
  • Tables, Data Frames and Tibbles
  • Functions to wrap complex calculations
  • Data formats
  • Data Import
  • Missing data handling
  • Data Manipulation with Window functions
  • Relational Data Manipulation
  • Tidy datasets
  • Reproducible analysis with pipelines
  • Data Discovery
  • Data Visualization: box plots, density plots, histograms, bar charts, scatter and line plots
  • Introduction to the Grammar of Graphics
  • Introduction to Statistical Models
  • Iterative investigation method
  • Result presentation with RMarkdown Reports

What you will be able to do

  • Use data from several data sources
  • Tidy your dataset
  • Discover relations among your data
  • Create visualizations
  • Fit a model for your data
  • Deliver insights and results with a clear report or presentation

Duration

2 days

Pre requisites

None.

Audience

This course is a fundamental for every business area. Different example data can be used according to industry type, for a better understanding and faster use of the concepts.

2 . Analysis Communication: Data Visualization & Automatized Reports

Communication is the last and most delicate step following the analysis. It communicates sophisticated results to people unrelated to technology and without profound statistical skills. Therefore it is important to know how to convey the message in a simple way with the help of expressive and well thought-out graphics. Finally, we will look at the R technologies to make the reports reproducible on updated data at the cost of a click.

This course will teach you how to use ggplot2 as well as other interactive JS visualization tools. The second part will show you how to build a customized and automatized PDF or HTML reports or slides with R Markdown.

Topics

  • General Grammar of Graphics with Ggplot2
  • How to explore data with Visualizations
  • Univariate analysis plots
  • Multivariate analysis plots
  • Matrix and Grid plots
  • Graphics with multiple layers
  • Specific plots and graphs with HTMLWidgets
    • Bar charts
    • Area plots
    • Dots and Bubble charts
    • Time series plots
    • Carpet plots
    • Graph and Sankey plots
    • Plots from models
    • Interactive plots
    • 3D plots
    • Hexbin plots
    • Sunburst plots
    • Heatmaps
    • Pivot
    • Contour plots
  • RMarkdown reproducible and instantaneous reports
  • Static Printable PDF reports
  • Interactive Reports
  • Interactive Dashboards
  • Presentations with Data

What you will be able to do

  • Quickly create visualization to understand the data set
  • Graphically highlight relationships in data
  • Choose the best representation for the data types you have
  • Use specific plot for Graphs or other visualizations
  • Present results either as PDF, verbose reports, Dashboards or Slides

Duration

2 days

Pre requisites

None.

Audience

This course is a fundamental for every business area. It is especially useful for professionals that need to express insights with professional graphics and to create understandable Graphics and presentations.

3 . Web Dashboards with R

R and Shiny are an effective way to make a pilot program, which is a feasibility study on small-scale: short-term and inexpensive experiments that help an organization discover whether this project could be useful for their business. In this course we will learn how to build a pilot application with Shiny: an R framework suitable to express your data workflow and make it accessible as a nice dashboard to anyone with a web browser.

Outline

  • Standalone Application
  • Basics of Reactive Programming
  • Interactive Document (RMarkdown and Shiny)
  • Dashboards
  • Ggplot plot Integration
  • HTMLWidgets Integration
  • Data Table Interactivity
  • Ggplot Interactivity
  • Upload input data
  • Download or export outputs
  • Shiny Gadgets
  • Introduction to Golem
  • Share results

What you will be able to do

  • Create an interactive dashboard
  • Integrate powerful visualizations and interactions
  • Improve your data exploration and presentation solutions
  • Share your results online

Duration

2 days

Pre requisites

  • Basics of R programming and Tidyverse
  • Nice to have: Dplyr or SQL basics and Ggplot2 experience

Audience

This course is a fundamental for every business area. It is especially useful for professionals that need to create a easy-to-use graphical interface to navigate data and their analysis as well as share them on-line.

4 . Databases and Big Data with R

When the database grows, it is important to find the solution that best suits your needs. In this course we start with the use of databases to analyze a much larger data than the one that’s accessible with the simple R and we will move on increasing the data size until we understand how the distributed architectures work and how to use Apache Hadoop and Apache Spark with R and Sparklyr.

Outline

  • Databases

    • Manipulate data on the disk
    • Lazy operations
    • Data Import and Manipulation
  • Distributed Architectures

    • Principles
    • Cloudera HDFS
    • Hive and Apache Spark
  • Sparklyr

    • Standalone testing mode and distributed mode
    • Data Import and Distributed Manipulation
    • Relational data and Joins
    • In-memory Caching
    • In-memory Distributed computation
    • Machine Learning Pipelines
    • Introduction to the main Machine Learning Algorithms

What you will be able to do

  • Select the architecture that best suits your needs, selecting which type of database or distributed architecture.
  • Import, manipulate data in that architecture
  • Understand the basics and the limitations of the architecture in terms of data and of kind of analysis
  • Use the main Machine Learnings algorithms on a distributed architecture

Duration

2 days

Pre requisites

  • Basics of R programming and Tidyverse
  • Nice to have: Dplyr or SQL basics

Audience

This course provides the foundations for analyzing and using a big dataset and is also recommended for anyone who needs to understand how the size of the data affects the analysis process or results and, consequently, understand what benefits the usage of these technologies provides to the business.

Programming Specialization (1 day courses)

1 . Parallel computing

This class aims to give a basis of what the main constraints of parallel programming are, the description and the differences between the main paradigms of parallelization (shared memory, distributed computation, map-reduce paradigm, future, …). We will do exercises using some R libraries and we will measure the time gain. (R packages: “future”, “parallel”, “foreach” and others)

Outline

  • Shared Memory and Distributed Computation
  • Map and Reduce Paradigm
  • Parallel, foreach, doSNOW libraries
  • Future
  • doSNOW Cluster
  • Understand the parallelization overhead

What you will learn

  • Understand how to rewrite your code in a parallelizable way.
  • Use different parallelization paradigms and libraries.
  • Recognize bottlenecks and benchmark your code.

Duration

1 day

Pre requisites

  • Good R programming knowledge with loops and Functions

Audience

This is an advanced course for professionals and researchers who need to improve algorithm performance by taking full advantage of the available hardware by parallelizing the calculation process. The course explains the basics of parallelization to distinguish where and how it provides a real performance improvement.

2 . Optimize R code

In the most expensive projects in terms of computational performance, it is important to know how to optimize the performance of the algorithm created. Therefore it is important to have a tool to understand what the bottlenecks are and it is important to know how to orientate yourself with the possible solutions. In a nutshell: can I solve this quickly by optimizing the R code or do I need to rewrite the affected part in a compiled language? In this course we will see how to do profiling (analysis of computational run time) directly from RStudio, what are the main rules to be observed to write efficient R code, we will see the basics of the interaction between C++ and R and we will try to understand how much performance is gained with this tool and at what price.

Outline

  • Benchmark R code
  • I/O benchmark and optimization
  • Vectorization
  • Understand the copy-on-modify and the OOP overhead
  • Optimal design
  • Introduction to RCpp and compiled languages

What you will be able to do

  • write better code at the first try
  • benchmark your code and improve it. The run time can easily be reduced to 20% at this stage.
  • understand which part to rewrite with a compiled language and understand the gain/cost proportion

Duration

1 day

Pre Requisite

  • Good R programming knowledge with loops and Functions

Audience

This is an advanced course for professionals and researchers who need to write performing code and know where and how to work in order to improve it.

3 . Professional Programming

Writing code means creating a software functionality. However, writing good code has stronger requirements in terms of reliability, robustness, reusability, extensibility. This means that the exact resolution of the problem must be ensured, in a great variety of cases, in a legible and extensible way. This course aims to describe the “best practices” of the following working methods: debugging, tracing, error handling, logging, asserting, documentation, unit tests.

Outline

  • General programming best practices
  • Unit Testing
  • Assertions
  • Error handling
  • R debugging
  • tracing and logging
  • working documentation

What you will be able to do

  • Write maintainable analysis and applications
  • Release new versions with confidence
  • Quickly find and fix errors

Duration

1 day

Pre Requisite

  • Good R programming knowledge with loops and Functions

Audience

This is an advanced course for professionals and researchers who need to deliver professional and reusable work improving their work flow.

4 . Professional Shiny Programming

About Course

R and Shiny are an effective way to make a pilot program, which is a feasibility study on small-scale: short-term and inexpensive experiments that help an organization discover whether this project could be useful for their business. In this course we will learn how to build a pilot application that can be easily put into production to work on real data on a larger scale. We will code a Shiny application with a code organized in different Shiny Modules (“Shiny”) (clear and maintainable code) which is installed as a package (“Golem”), create unit tests (“Testthat”) to avoid regressions (TDD or TAD). A good logging system is useful to see what happens on your server: find bugs and discover user usage paths (“futile.logger”). Configuration files make the application more adaptable. Track the package environment to be reproducible (“renv”). Deploy on a Linux server using “Git” and “Github”.

Outline

  • Application as a Package with Golem
  • Shiny Dashboard Framework
  • Reproducibility with Venv
  • Understanding Shiny Reactivity
  • Tidy and readable code with Shiny Modules
  • Application configuration
  • Monitor the application with logging

What you will be able to do

  • Write maintainable Shiny applications
  • Split your Shiny applications in reusable modules
  • Release new versions with confidence
  • Quickly find and fix errors
  • Understand how users use your applications

Duration

1 day

Pre Requisite

  • Experience in writing Simple Shiny Applications

Audience

This is an advanced course for professionals and researchers who need to write complex Shiny applications where it is important to have a well thought out programming method.

IT training (2 day course)

1 . DevOps R

This course will teach you how to provide a Server with any R service: RStudio Server, Shiny Server (free or Pro version), Shiny Proxy (the Open Source Shiny Server based on Docker) and some custom service using R.

We will install this software on a Linux System and see how to best use the features of the operating system to provide services to final users with the correct security criteria. Finally, we will make everything automated and reproducible through Ansible and establish a CI/CD pipeline that will automatically install the latest version of the software only if automatic tests have been passed.

Outline

  • Linux Operating Systems

  • RStudio Server (Free and Pro version)

  • Shiny Server (Free and Pro version)

  • Shiny Proxy

  • Plumber Server

  • Connect Server

  • R Packages installation and scope

  • Data storage and databases

  • Orchestration tools and infrastructure as a code

  • Install R software

  • Ansible

  • Deploy with Ansible

  • Set up the entire server with Ansible

  • Ansible extensions for R

  • Linux File System and collaborative analysis environment

  • Authentication and Authorization

  • Single or multi worker

  • Https

  • Shiny applications for non R developers

  • R Markdown for non R developers

  • Continuous integration with git and Github

  • Continuous deploy

What you will be able to do

  • Understand the most modern technique to set up server architectures
  • Know the R server tools: how they operates, authentication and other features
  • Set up an R server on a Linux system
  • Create Continuous integration pipeline with Github
  • Deploy continuously new releases

Duration

2 days

Pre Requisite

  • Basic Knowledge of linux operating system

Audience

This course is not for Data Scientists, but for Operations. This is a fundamental course for professionals who need to set up the R servers for application or collaborative analysis.

Trainer

Andrea Melloncelli

Andrea works as a Data Scientist Consultant and Trainer.

Andrea teaches Statistics and how to use statistical software to solve business problems. He taught in University courses in Milano Bicocca and Milano Cattolica and has experience of on-the-job training in some major companies.

Andrea is an expert Data Science programmer. His expertise range from calculus sheets (Excel and Google Sheets) to programming software depending on the type of problem to solve. He has a solid experience in R, Python and Scala programming and development, along with extensive skills in Unix system management, IT automation tools, cloud technologies and big-data platforms, such as Hadoop & Spark. Andrea graduated in Physics from the Università Degli Studi Di Milano.

Linkedin

Contact form