--- title: "Benchmarks" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Benchmarks} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, eval = FALSE, include = FALSE} # to run the timings in this readme, set the following env variable: Sys.setenv(ODBC_EVAL_BENCHMARKS = "true") ``` ```{r, include = FALSE} eval_timings <- as.logical(Sys.getenv("ODBC_EVAL_BENCHMARKS", "false")) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = eval_timings ) ``` The odbc package is often much faster than the existing [RODBC](https://cran.r-project.org/package=RODBC) and DBI compatible [RODBCDBI](https://cran.r-project.org/package=RODBCDBI) packages. We'll benchmark writing and reading data from the [nycflights13](https://github.com/tidyverse/nycflights13) package using the three packages. ```{r, include = FALSE} # The CRAN version of RODBC uses iODBC, so to use unixODBC we need to # recompile it from source, specifying the odbc manager explicitly: # # install.packages("RODBC", type = "source", INSTALL_opts="--configure-args='--with-odbc-manager=odbc'") # # see `vignette("develop")` for more details. ``` ```{r setup} library(odbc) library(RODBC) library(RODBCDBI) library(DBI) library(nycflights13) ``` ## Setup RODBC and RODBCDBI trip up on `flights`' tibble subclass, so we'll convert it to a data frame before passing to each package. Also, the RODBCDBI package does not support writing timestamps, so we remove that column as well. ```{r} flights$time_hour <- NULL flights <- as.data.frame(flights) ``` Now, let's configure three driver connections to a Microsoft SQL Server database, one for each package we'll be benchmarking. ```{r, echo = FALSE} options("connectionObserver" = NULL) ``` ```{r} odbc <- dbConnect(odbc::odbc(), dsn = "MicrosoftSQLServer", uid = "SA", pwd = "BoopBop123!") rodbc <- RODBC::odbcConnect(dsn = "MicrosoftSQLServer", uid = "SA", pwd = "BoopBop123!") rodbcdbi <- dbConnect(RODBCDBI::ODBC(), dsn = "MicrosoftSQLServer", user = "SA", password = "BoopBop123!") ``` The above code requires a `dsn` `MicrosoftSQLServer` to have been previously configured; see `vignettes("setup")` to learn more. ## Writing We'll first benchmark writing the flights dataset, which contains ~300,000 rows and 18 columns, to a database. The code for odbc and RODBCDBI looks quite similar, as they both use the DBI front-end: ```{r, include = FALSE} if ("flights" %in% dbListTables(odbc)) { dbExecute(odbc, "drop table flights") } ``` ```{r} odbc_write <- system.time(dbWriteTable(odbc, "flights", flights)) odbc_write ``` ```{r, include = !eval_timings} #> user system elapsed #> 0.883 0.176 8.108 ``` ```{r, include = FALSE} if ("flights" %in% dbListTables(odbc)) { dbExecute(odbc, "drop table flights") } ``` ```{r} rodbcdbi_write <- system.time(dbWriteTable(rodbcdbi, "flights", flights)) rodbcdbi_write ``` ```{r, include = !eval_timings} #> user system elapsed #> 8.287 11.107 257.841 ``` ```{r, include = FALSE} if ("flights" %in% dbListTables(odbc)) { dbExecute(odbc, "drop table flights") } ``` The timings for RODBC and RODBCDBI are quite similar, as they both utilize the RODBC back-end: ```{r} rodbc_write <- system.time(sqlSave(rodbc, flights, "flights")) rodbc_write ``` ```{r, include = !eval_timings} #> user system elapsed #> 8.266 11.023 235.825 ``` ```{r, include = FALSE} if ("flights" %in% dbListTables(odbc)) { dbExecute(odbc, "drop table flights") } ``` ## Reading ```{r, include = FALSE} if (!"flights" %in% dbListTables(odbc)) { dbWriteTable(odbc, "flights", flights) } ``` Again, the syntax for odbc and RODBCDBI is identical, while the timings for RODBCDBI and RODBC are quite similar. ```{r} odbc_read <- system.time(result <- dbReadTable(odbc, "flights")) rodbcdbi_read <- system.time(result <- dbReadTable(rodbcdbi, "flights")) rodbc_read <- system.time(result <- sqlFetch(rodbc, "flights")) ``` ```{r} odbc_read ``` ```{r, include = !eval_timings} #> user system elapsed #> 0.515 0.024 0.557 ``` ```{r} rodbcdbi_read ``` ```{r, include = !eval_timings} #> user system elapsed #> 1.308 0.035 1.356 ``` ```{r} rodbc_read ``` ```{r, include = !eval_timings} #> user system elapsed #> 1.291 0.033 1.343 ```