by Antonio Sánchez
Learning to code can be quite hard. Apart from the difficulties of learning a new language , following a book can be quite boring. From my point of view, one of the bests ways to become a good programmer is choosing small and funny experiments oriented to train specific techniques of programming. This is what I usually do in my blog Fronkonstin . In this tutorial, we will learn to combine C++ with R to create efficient loops. We will also learn interesting things about ggplot2
, a very important package for data visualization. The excuse to do all this is to create beautiful images using a mathematical concept called Strange Attractors , that we will discuss briefly also. Combining art and coding is often called generative art , so be ready because after following this tutorial you can discover a new use of R to fall for as I am. Here we go!
Iteration is an indispensable technique in every programming language. Mastering it will help you to produce robust, fast, concise and easy to understand code. Iterating is repeating the same task over and over and the first way to do that efficiently is writing a loop , which is a way to resume many lines of code in just a few of them. In R, you can do this using the for
function. For example, this loop calculates the mean of each column of a data frame called df
:
library(tidyverse) n <- 1000 # Rows of data frame # df with 4 columns and n rows sampled from N(0,1) df <- tibble( a = rnorm(n), b = rnorm(n), c = rnorm(n), d = rnorm(n) ) output <- vector("double", ncol(df)) # Empty output # For loop for (i in 1:ncol(df)) { output[[i]] <- mean(df[[i]]) } output
## [1] -0.0069680221 0.0247380568 0.0014428622 -0.0002675645
In the previous example, we repeated 4 times the task (which is the piece of code inside the curly brackets). Another way to do iterations in R, sometimes more efficient than a for
loop, is by using the apply family. This is how the previous loop can be rewritten using the apply
function:
output <- apply(df, 2, mean) output
These approaches have limitations. For example, if you want to use the result of an iteration as the input of the next one, you can not use apply . Another one is that sometimes R is not enough fast, which makes the for
function quite inefficient. When you face this kind of situations, a good option is combinnig C++ and R. To illustrate this, let’s briefly introduce a stunning mathematical creature : the strange attractors.
In mathematics, a dynamical system is a set of functions which describes the time dependence of a point in a geometrical space. An attractor is a set of numerical values towards a dynamical system tend to evolve in discrete time regardless of their starting conditions. Simple attractors can be fixed points , sets of points or limit cycles . More interesting attractors are strange , chaotic or itinerant attractors, which span an array of possible states in which a dynamical system can roam around without repeating itself.
One of the best-known examples of strange attractors was discovered by Kiyohiro Ikeda ^{1} and bears his name. It is defined by the next equations:
\[
x_{n+1}=1+u(x_{n}\cos t_{n}-y_{n}\sin t_{n}) \\
y_{n+1}=u(x_{n}\sin t_{n}+y_{n}\cos t_{n})
\]
with \(u\ge0.6\) and
\[
t_{n}=0.4-{\frac {6}{1+x_{n}^{2}+y_{n}^{2}}}
\]
In this example, our imaginary point moves in a 2D dimensional space, since we just need (x, y)
coordinates to describe its movement. In the previous system, time is represented by n
. As you can see, the Ikeda’s system only depends on one parameter, called u
. The location of the point at n
depends only on its location at n-1
. For example, this image ^{2} shows the trajectories of 2000 random points following this previous equation with \(u = 0.918\) :
All points, regardless of their starting place, tend to end following the same path, represented in bright white color. Another interesting property of strange attractors is that points that are very close at some instant, can differ significantly in the next step. This is why strange attractors are also known as chaotic attractors.
Today we will experiment with Clifford attractors, which are defined by these equations:
\[
x_{n+1} = sin(a \: y_{n}) + c \: cos(a \: x_{n}) \\
y_{n+1} = sin(b \: x_{n}) + d \: cos(b \: y_{n})
\]
These attractors were discovered by Clifford Pickover , an american scientific divulgator and depend on four parameters; a
, b
, c
and d
.
To represent the trajectory of a particular point, we need to store in a data frame its locations at every single (discrete) moment of time. For example, this is how we can do it using a for
loop:
# Parameters setting a <- -1.25 b <- -1.25 c <- -1.82 d <- -1.91 # Number of jumps n <- 5 # Initialization of df to store locations of point df <- data.frame(x = double(n+1), y = double(n+1)) # The point starts at (0,0) df[1, "x"] <- 0 df[1, "y"] <- 0 # For loop to store locations for (i in 1:n) { df[i+1, "x"] <- sin(a*df[i, "y"])+c*cos(a*df[i, "x"]) df[i+1, "y"] <- sin(b*df[i, "x"])+d*cos(b*df[i, "y"]) } df
## x y ## 1 0.0000000 0.0000000 ## 2 -1.8200000 -1.9100000 ## 3 1.8629450 2.1543131 ## 4 0.8172482 0.9946399 ## 5 -1.8969558 -1.4673201 ## 6 2.2716156 1.1936322
As you can see, our point started at (0,0)
and jumped 5 times (the number of jumps is represented by n
). You can try to change n
from 5
to 10000000
and see what happens (disclaimer: be patient).
A better approach to do it is using purrr
, a functional programming package which provides useful tools for iterating through lists and vectors, generalizing code and removing programming redundancies. The purrr
tools work in combination with functions, lists and vectors and results in code that is consistent and concise. This can be a purrr
approach to our problem:
library(purrrlyr) n <- 5 # Initialization of our data frame df <- tibble(x = numeric(n+1), y = numeric(n+1)) # Convert our data frame into a list by_row(df, function(v) list(v)[[1L]], .collate = "list")$.out -> df # This function computes current location depending of previous one f <- function(j, k, a, b, c, d) { tibble( x = sin(a*j$y)+c*cos(a*j$x), y = sin(b*j$x)+d*cos(b*j$y) ) } # We apply accumulate on our list to store all steps and convert into a data frame accumulate(df, f, a, b, c, d) %>% map_df(~ .x) -> df df
## # A tibble: 6 x 2 ## x y ## <dbl> <dbl> ## 1 0 0 ## 2 -1.82 -1.91 ## 3 1.86 2.15 ## 4 0.817 0.995 ## 5 -1.90 -1.47 ## 6 2.27 1.19 </dbl> </dbl>
Even this approach is not efficient since the accumulate
function must be applied on an existing object (the df
in our case)
The next paragraph, extracted from Hadley Wickham’s Advanced R , explains the motivations of combining C++ and R:
Rcpp
makes it very simple to connect C++ to R and provides a clean, approachable API that lets you write high-performance code, insulated from R’s complex C API. Typical bottlenecks that C++ can address include:
Loops that can’t be easily vectorised because subsequent iterations depend on previous ones.
Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.
Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues.
This code populates our data frame using C++:
# Load in libaries library(Rcpp) library(tidyverse) # Here comes the C++ code cppFunction('DataFrame createTrajectory(int n, double x0, double y0, double a, double b, double c, double d) { // create the columns NumericVector x(n); NumericVector y(n); x[0]=x0; y[0]=y0; for(int i = 1; i < n; ++i) { x[i] = sin(a*y[i-1])+c*cos(a*x[i-1]); y[i] = sin(b*x[i-1])+d*cos(b*y[i-1]); } // return a new data frame return DataFrame::create(_["x"]= x, _["y"]= y); } ') a <- -1.25 b <- -1.25 c <- -1.82 d <- -1.91 # Call the function starting at (0,0) to store 10M jumps df <- createTrajectory(10000000, 0, 0, a, b, c, d) head(df)
## x y ## 1 0.0000000 0.0000000 ## 2 -1.8200000 -1.9100000 ## 3 1.8629450 2.1543131 ## 4 0.8172482 0.9946399 ## 5 -1.8969558 -1.4673201 ## 6 2.2716156 1.1936322
Note that, apart from Rcpp
, we also loaded the tidyverse package. This is because we will use some of its features, as dplyr
or ggplot2
, later.
The function createTrajectory
is written in C++
and stores the itinerary of an initial point. You can try it to see that It’s extremely fast. In the previous example, I populated on the fly our data frame with 10 million points starting at (0,0)
in less than 5 seconds in a conventional laptop.
Once we have built our data frame, it’s time to represent it with ggplot
.
Making graphics is a necessary skill and for this ggplot
is an amazing tool, which by
the way, I encourage you to learn. Usually, a good visualization is the best way to show your results. R is an extraordinary tool for visualizing data and one of the most important libraries to do it is ggplot2
: it is powerful, flexible and is updated to include functionalities regularly. Although its syntax may seem a bit strange at first, here are four important concepts from my point of view to help you out with understanding it:
geom_point
to represent points, geom_line
to represent time series, geom_bar
to do bar chars … Have a look to ggplot2
package, there are many other types. x
column of our data frame df
and y
on the y-axis, so we did aes(x = x, y = y)
. +
operator. This operator is very important: it doesn’t only combine geometries, it’s also used to add features to the plot as we did to define its aspect ratio (with coord_equal
) or its appearance (with theme_void
). theme_void
. There are many of them and it’s is a quick and easy way to change the appearance of your plot: just a way of making life easier and simpler. The ggplot2
package is a new world by itself and takes part of the tidyverse
, a collection of efficient libraries designed to analyze data in a consistent way.
This is how we can represent the points of our data frame:
ggplot(df, aes(x = x, y = y)) + geom_point(shape = 46, alpha = .01) + coord_equal() + theme_void() -> plot ggsave("strange1.png", plot, height = 5, width = 5, units = 'in')
The syntax of our plot is quite simple. Just two comments about it:
Strange attractors are extremely sensitive to their parameters. This feature makes them very entertaining as well as an infinite source of surprises. This is what happens to Clifford attractor when setting a
, b
, c
and d
parameters to -1.2
, -1.9
, 1.8
and -1.6
respectively:
a <- -1.2 b <- -1.9 c <- 1.8 d <- -1.6 df <- createTrajectory(10000000, 0, 0, a, b, c, d) ggplot(df, aes(x = x, y = y)) + geom_point(shape = 46, alpha = .01) + coord_equal() + theme_void() -> plot ggsave("strange2.png", plot, height = 5, width = 5, units = 'in')
Don’t you feel like playing with parameters? From my point of view, this is a very classy way of spending your time.
Once we know what strange attractors look like , let’s try another family of them. This is the new formula we will experiment with:
\[
x_{n+1} = a_{1} + a_{2}x_{n} + a_{3}y_{n} + a_{4}|x_{n}|^{a_{5}} + a_{6}|y_{n}|^{a_{7}} \\
y_{n+1} = a_{8} + a_{9}x_{n} + a_{10}y_{n} + a_{11}|x_{n}|^{a_{12}} + a_{13}|y_{n}|^{a_{14}}
\]
These attractors are localized mostly to a small region of the XY plane with tentacles that stretch out to large distances. If any of the exponents are negative and the attractor intersects the line along which the respective variable is zero, a point on the line maps to infinity. However, large values are visited infrequently by the orbit, so many iterations are required to determine that the orbit is unbounded. For this reason most of the attractors in the figures have holes in their interiors where their orbits are precluded from coming too close to their origins.
cppFunction('DataFrame createTrajectory(int n, double x0, double y0, double a1, double a2, double a3, double a4, double a5, double a6, double a7, double a8, double a9, double a10, double a11, double a12, double a13, double a14) { // create the columns NumericVector x(n); NumericVector y(n); x[0]=x0; y[0]=y0; for(int i = 1; i < n; ++i) { x[i] = a1+a2*x[i-1]+ a3*y[i-1]+ a4*pow(fabs(x[i-1]), a5)+ a6*pow(fabs(y[i-1]), a7); y[i] = a8+a9*x[i-1]+ a10*y[i-1]+ a11*pow(fabs(x[i-1]), a12)+ a13*pow(fabs(y[i-1]), a14); } // return a new data frame return DataFrame::create(_["x"]= x, _["y"]= y); } ') a1 <- -0.8 a2 <- 0.4 a3 <- -1.1 a4 <- 0.5 a5 <- -0.6 a6 <- -0.1 a7 <- -0.5 a8 <- 0.8 a9 <- 1.0 a10 <- -0.3 a11 <- -0.6 a12 <- -0.3 a13 <- -1.2 a14 <- -0.3 df <- createTrajectory(10000000, 1, 1, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14) ggplot(df) + geom_point(aes(x = x, y = y), shape = 46, alpha = .01, color = "black") + coord_fixed() + theme_void() -> plot ggsave("strange3.png", plot, height = 5, width = 5, units = 'in')
Ops! What happened? Why our image is so small now? The reason is because some few points are located quite far from the vast majority (you can think about them as outliers ). Let’s remove 5% of extreme points from both tails independently:
mx <- quantile(df$x, probs = 0.05) Mx <- quantile(df$x, probs = 0.95) my <- quantile(df$y, probs = 0.05) My <- quantile(df$y, probs = 0.95) df %>% filter(x > mx, x < Mx, y > my, y < My) -> df ggplot(df) + geom_point(aes(x = x, y = y), shape = 46, alpha = .01, color = "black") + coord_fixed() + theme_void() -> plot ggsave("strange3.png", plot, height = 5, width = 5, units = 'in')
Now a stunning image appears.
Sometimes you will need to process the data frame after you generate it since:
To conclude, let’s change the appearance of our plot. We will use the theme
function to customize it setting the background color to black and adding a white line as frame.
opt <- theme(legend.position = "none", panel.background = element_rect(fill="black", color="white"), plot.background = element_rect(fill="black"), axis.ticks = element_blank(), panel.grid = element_blank(), axis.title = element_blank(), axis.text = element_blank()) plot <- ggplot(df) + geom_point(aes(x = x, y = y), shape = 46, alpha = .01, color = "white") + coord_fixed() + opt ggsave("strange4.png", plot, height = 5, width = 5, units = 'in')
More about recursivity and ggplot
:
ggplot
and the second one introduces Rcpp
to use C++ when R is not fast enough. More about strange attractors:
我来评几句
登录后评论已发表评论数()