Why does R allow us to exceed array bounds here?
A colleague recently sent me the following code snippet in R
> a=c(1,2,3,40) > b=a[1:10] > b [1] 1 2 3 40 NA NA NA NA NA NA
The fact that R didn’t issue a warning upset him since exceeding array bounds, as we did when we created b, is usually a programming error.
I’m less concerned and simply file the above away in an area of my memory entitled ‘Odd things to remember about R’ — I find that most programming languages have things that look odd when you encounter them for the first time. With that said, I am curious as to why the designers of R thought that the above behaviour was a good idea.
Does anyone have any insights here?
Perhaps something to do with ‘check.bounds’ default behavior? http://stat.ethz.ch/R-manual/R-patched/library/base/html/options.html
Not sure about the design decisions behind it. The ease of dropping NAs? The assumption that the statistician knows the dimensions of the vector but not yet the exact values?
It is actually not a bad idea. If a statistical value does not exist it is NA. If you can filter out NAs later, that is okay.
The other options are an error message, which can be issued in R with the check.bounds option, or omitting out of bounds indices. The latter was the default in early versions of EMT by the way. But it turned out to be confusing.
You want to use [[ if you want it to error out when out of bounds.
a=c(1,2,3,40)
a[[6]]
Error in a[[6]] : subscript out of bounds
Hadley has a table that shows the difference between [ and [[ indexing.
http://adv-r.had.co.nz/Subsetting.html search for Missing/out of bounds indices
Thanks for all the feedback
I recommend a book by Patrick Burns called “The R Inferno” (http://www.burns-stat.com/pages/Tutor/R_inferno.pdf) which describes all these little confusing things about R, in quite humorous and ironical way