R

Open Science, Open Source and R

Andy Wills — Tue, 19 Feb 2019 12:30:00 +0000

Free software will save psychology from the Replication Crisis.

"Study reveals that a lot of psychology research really is just 'psycho-babble'".—The Independent.

Psychology changed forever on the August 27, 2015. For the previous four years, the 270 psychologists of the Open Science Collaboration had been quietly re-running 100 published psychology experiments. Now, finally, they were ready to share their findings. The results were shocking. Less than half of the re-run experiments had worked.

When someone tries to re-run an experiment, and it doesn't work, we call this a failure to replicate. Scientists had known about failures to replicate for a while, but it was only quite recently that the extent of the problem became apparent. Now, an almost existential crisis loomed. That crisis even gained a name: the Replication Crisis. Soon, people started asking the same questions about other areas of science. Often, they got similar answers. Only half of results in economics replicated. In pre-clinical cancer studies, it was worse; only 11% replicated.

Open Science

Clearly, something had to be done. One option would have been to conclude that psychology, economics and parts of medicine could not be studied scientifically. Perhaps those parts of the universe were not lawful in any meaningful way? If so, you shouldn't be surprised if two researchers did the same thing and got different results.

Alternatively, perhaps different researchers got different results because they were doing different things. In most cases, it wasn't possible to tell whether you'd run the experiment exactly the same way as the original authors. This was because all you had to go on was the journal article—a short summary of the methods used and results obtained. If you wanted more detail, you could, in theory, request it from the authors. But, we'd already known for a decade that this approach was seriously broken—in about 70% of cases, data requests ended in failure.

Go to Full Article

A Good Front End for R

Joey Bernard — Thu, 26 Apr 2018 14:30:00 +0000

by Joey Bernard

R is the de facto statistical package in the Open Source world. It's also quickly becoming the default data-analysis tool in many scientific disciplines.

R's core design includes a central processing engine that runs your code, with a very simple interface to the outside world. This basic interface means it's been easy to build graphical interfaces that wrap the core portion of R, so lots of options exist that you can use as a GUI.

In this article, I look at one of the available GUIs: RStudio. RStudio is a commercial program, with a free community version, available for Linux, Mac OSX and Windows, so your data analysis work should port easily regardless of environment.

For Linux, you can install the main RStudio package from the download page. From there, you can download RPM files for Red Hat-based distributions or DEB files for Debian-based distributions, then use either rpm or dpkg to do the installation.

For example, in Debian-based distributions, use the following to install RStudio:


sudo dpkg -i rstudio-xenial-1.1.423-amd64.deb

It's important to note that RStudio is only the GUI interface. This means you need to install R itself as a separate step. Install the core parts of R with:


sudo apt-get install r-base

There's also a community repository of available packages, called CRAN, that can add huge amounts of functionality to R. You'll want to install at least some of them in order to have some common tools to use:


sudo apt-get install r-recommended

There are equivalent commands for RPM-based distributions too.

At this point, you should have a complete system to do some data analysis.

When you first start RStudio, you'll see a window that looks somewhat like Figure 1.

Figure 1. RStudio creates a new session, including a console interface to R, where you can start your work.

The main pane of the window, on the left-hand side, provides a console interface where you can interact directly with the R session that's running in the back end.

The right-hand side is divided into two sections, where each section has multiple tabs. The default tab in the top section is an environment pane. Here, you'll see all the objects that have been created and exist within the current R session.

The other two tabs provide the history of every command given and a list of any connections to external data sources.

Go to Full Article

Galit Shmueli et al.'s Data Mining for Business Analytics (Wiley)

James Gray — Fri, 03 Nov 2017 16:11:00 +0000

by James Gray

The updated 5th edition of the book Data Mining for Business Analytics from Galit Shmueli and collaborators and published by Wiley is a standard guide to data mining and analytics that adds two new co-authors and a trove of new material vis-á-vis its predecessor. R is a free, open-source and popularity-gaining software environment for statistical computing and graphics. Trailing with the subtitle Concepts, Techniques, and Applications in R, the new 5th edition of Data Mining for Business Analytics continues to provide an applied approach to data-mining concepts and methods, using the R software as a canvas on which to illustrate.

With the book, readers learn how to implement a variety of popular data-mining algorithms in R to tackle business problems and opportunities. Material covered in-depth includes both statistical and machine-learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis.

The new 5th edition includes material from business, government, a dozen case studies demonstrating applications for the data-mining techniques described, and exercises in each chapter that help readers gauge and expand their comprehension and competency of the material. Data Mining for Business Analytics can serve as either a text book or a reference for analysts, researchers and practitioners working with quantitative methods in myriad fields.

Go to Full Article