Javascript required
Skip to content Skip to sidebar Skip to footer

How to Read Data Set in Rmysql

DBI Packet

Establish a connexion

The starting time step to import information from a SQL database is creating a connection to it. You need different packages depending on the database you want to connect to. All of these packages do this in a uniform style, equally specified in the DBI parcel.

dbConnect() creates a connection between your R session and a SQL database. The beginning argument has to be a DBIdriver object, that specifies how connections are made and how data is mapped betwixt R and the database. Specifically for MySQL databases, y'all can build such a driver with RMySQL::MySQL().

If the MySQL database is a remote database hosted on a server, you lot'll too take to specify the following arguments in dbConnect(): dbname, host, port, user and password. Nigh of these details take already been provided.

            library(DBI)          
            Alarm bulletin: package 'RMySQL' was built under R version iii.vi.three                      
            # Edit dbConnect() call con <- dbConnect(RMySQL::MySQL(),                   dbname = "tweater",                   host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com",                   port = 3306,                  user = "student",                  password = "datacamp")          

List the database tables

After y'all've successfully continued to a remote MySQL database, the next step is to see what tables the database contains. You can do this with the dbListTables() part.

            dbListTables(con)          
            [1] "comments" "tweats"   "users"                      
            # Build a vector of table names: tables tables <- dbListTables(con)  # Brandish structure of tables str(tables)          
                          chr [1:3] "comments" "tweats" "users"          

Import data

You lot do this with the dbReadTable() function. Simply pass it the connectedness object (con), followed past the name of the table you lot want to import. The resulting object is a standard R data frame.

            # Import the users tabular array from tweater: users users <- dbReadTable(con, "users")  # Print users users          

Import all tables

Side by side to the users, we're also interested in the tweats and comments tables. However, dissever dbReadTable() calls for each and every ane of the tables in your database would mean a lot of code duplication. Remember about the lapply() office? You lot can use it again here! A connection is already coded for y'all, also as a vector table_names, containing the names of all the tables in the database.

            # Get table names table_names <- dbListTables(con)  # Import all tables tables <- lapply(table_names, dbReadTable, conn = con)  # Print out tables tables          
            [[1]]          
                          [[2]]          
                          [[iii]]          
            NA          

SQL Queries from R

dbGetQuery() is what you need. Equally usual, yous first pass the connection object to information technology. The second argument is an SQL query in the course of a graphic symbol string.

            # Import tweat_id column of comments where user_id is 1: elisabeth elisabeth <- dbGetQuery(con, "SELECT tweat_id FROM comments WHERE user_id = one")  # Print elisabeth elisabeth          
            # Import mail service column of tweats where engagement is higher than '2015-09-21': latest latest <- dbGetQuery(con, "SELECT mail FROM tweats WHERE engagement > '2015-09-21'")  # Print latest latest          
            # Create information frame specific specific <- dbGetQuery(con, "SELECT message FROM comments WHERE tweat_id = 77 and user_id >4")  # Impress specific specific          

In that location are besides defended SQL functions that you can use in the WHERE clause of an SQL query. For example, CHAR_LENGTH() returns the number of characters in a string.

            # Create data frame curt curt <- dbGetQuery(con, "SELECT id, name FROM users WHERE CHAR_LENGTH(name) < 5")  # Print short short          

Another very oftentimes used keyword is JOIN, and more specifically INNER Join.

                          dbGetQuery(con, "SELECT name, post   FROM users INNER Join tweats on users.id = user_id     WHERE engagement > '2015-09-19'")          
                          dbGetQuery(con, "SELECT post, message   FROM tweats INNER Join comments on tweats.id = tweat_id     WHERE tweat_id = 77")          

DBI internals

Send - Fetch - Clear

You've used dbGetQuery() multiple times now. This is a virtual function from the DBI package, but is actually implemented by the RMySQL package. Backside the scenes, the following steps are performed: - Sending the specified query with dbSendQuery(); - Fetching the consequence of executing the query on the database with dbFetch(); - Immigration the result with dbClearResult().

Allow's non use dbGetQuery() this time and implement the steps above. This is tedious to write, but it gives you the power to fetch the query's result in chunks rather than all at one time. You tin can do this by specifying the due north argument within dbFetch().

              # Transport query to the database res <- dbSendQuery(con, "SELECT * FROM comments WHERE user_id > 4")  # Apply dbFetch() twice dbFetch(res, due north = 2)            
              dbFetch(res, n = ii)            
                              # Clear res dbClearResult(res)            
              [1] Truthful            

RMySQL automatically specifies a maximum of open connections and closes some of the connections for you. It's always polite to manually disconnect from the database afterwards. Yous do this with the dbDisconnect() function.

              # Create the data frame  long_tweats long_tweats <- dbGetQuery(con, "SELECT mail, engagement   FROM tweats     WHERE CHAR_LENGTH(post)>forty")  # Print long_tweats print(long_tweats)            
                              # Disconnect from the database dbDisconnect(con)            
              [1] TRUE            

Information on the web

read.csv() and read.delim(), are capable of automatically importing from URLs that betoken to flat files on the web.

You must be wondering whether Hadley Wickham's alternative package, readr, is equally potent.

            # Load the readr package library(readr)  # Import the csv file: pools url_csv <- "http://s3.amazonaws.com/assets.datacamp.com/product/course_1478/datasets/swimming_pools.csv" pools <- read_csv(url_csv)          
            Parsed with column specification: cols(   Proper name = col_character(),   Accost = col_character(),   Latitude = col_double(),   Longitude = col_double() )          
            # Import the txt file: potatoes url_delim <- "http://s3.amazonaws.com/assets.datacamp.com/product/course_1478/datasets/potatoes.txt" potatoes <- read_tsv(url_delim)          
            Parsed with column specification: cols(   area = col_double(),   temp = col_double(),   size = col_double(),   storage = col_double(),   method = col_double(),   texture = col_double(),   flavor = col_double(),   moistness = col_double() )          
            # Print pools and potatoes pools          
            potatoes          

A safer alternative to HTTP, namely HTTPS, which stands for HypterText Transfer Protocol Secure. Just remember this: HTTPS is relatively safe, HTTP is not.

            # https URL to the swimming_pools csv file. url_csv <- "https://s3.amazonaws.com/avails.datacamp.com/production/course_1478/datasets/swimming_pools.csv"  # Import the file using read.csv(): pools1 pools1 <- read.csv(url_csv)  # Load the readr package library(readr)  # Import the file using read_csv(): pools2 pools2 <- read_csv(url_csv)          
            Parsed with column specification: cols(   Name = col_character(),   Address = col_character(),   Breadth = col_double(),   Longitude = col_double() )          
            # Print the structure of pools1 and pools2 str(pools1)          
            'data.frame':   20 obs. of  4 variables:  $ Proper noun     : Gene w/ twenty levels "Acacia Ridge Leisure Centre",..: 1 two 3 four 5 6 19 vii 8 9 ...  $ Address  : Factor w/ 20 levels "1 Fairlead Crescent, Manly",..: 5 20 18 10 9 11 6 fifteen 12 17 ...  $ Latitude : num  -27.6 -27.6 -27.half dozen -27.v -27.4 ...  $ Longitude: num  153 153 153 153 153 ...          
            str(pools2)          
            Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    twenty obs. of  4 variables:  $ Name     : chr  "Acacia Ridge Leisure Middle" "Bellbowrie Puddle" "Carole Park" "Centenary Pool (inner City)" ...  $ Accost  : chr  "1391 Beaudesert Road, Acacia Ridge" "Sugarwood Street, Bellbowrie" "Cnr Purlieus Road and Waterford Road Wacol" "400 Gregory Terrace, Spring Hill" ...  $ Latitude : num  -27.half dozen -27.6 -27.6 -27.v -27.4 ...  $ Longitude: num  153 153 153 153 153 ...  - attr(*, "spec")=   .. cols(   ..   Name = col_character(),   ..   Address = col_character(),   ..   Latitude = col_double(),   ..   Longitude = col_double()   .. )          

Import Excel files from the spider web

When you learned about gdata, it was already mentioned that gdata can handle .xls files that are on the cyberspace. readxl tin't, at least not nonetheless. The URL with which yous'll be working is already available in the sample code. You will import it in one case using gdata and in one case with the readxl package via a workaround.

            # Load the readxl and gdata package library(readxl) library(gdata)          
            gdata: Unable to locate valid perl interpreter gdata:  gdata: read.xls() will be unable to read Excel XLS and XLSX gdata: files unless the 'perl=' argument is used to specify gdata: the location of a valid perl intrpreter. gdata:  gdata: (To avoid display of this message in the future, delight gdata: ensure perl is installed and available on the gdata: executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files.  gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files.  gdata: Run the part 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats.  Attaching package: 㤼㸱gdata㤼㸲  The following object is masked from 㤼㸱package:stats㤼㸲:      nobs  The following object is masked from 㤼㸱package:utils㤼㸲:      object.size  The following object is masked from 㤼㸱package:base㤼㸲:      startsWith          
            # Specification of url: url_xls url_xls <- "http://s3.amazonaws.com/assets.datacamp.com/product/course_1478/datasets/breadth.xls"  # Import the .xls file with gdata: excel_gdata excel_gdata = read.xls(url_xls, perl="c:/Program Files/Git/usr/bin/perl.exe")          
            probando la URL 'http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/latitude.xls' Content blazon '' length 37888 bytes (37 KB) downloaded 37 KB          
            # Download file behind URL, name information technology local_latitude.xls download.file(url_xls, "local_latitude.xls", mode="wb")          
            probando la URL 'http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/latitude.xls' Content blazon '' length 37888 bytes (37 KB) downloaded 37 KB          
            # Import the local .xls file with readxl: excel_readxl excel_readxl = read_excel("local_latitude.xls")          

There's more: with download.file() y'all can download any kind of file from the web, using HTTP and HTTPS: images, executable files, just also .RData files. An RData file is very efficient format to shop R data.

Y'all can load data from an RData file using the load() part, merely this role does not accept a URL cord as an argument.

            # https URL to the wine RData file. url_rdata <- "https://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/wine.RData"  # Download the vino file to your working directory download.file(url_rdata, "wine_local.RData")          
            probando la URL 'https://s3.amazonaws.com/avails.datacamp.com/product/course_1478/datasets/wine.RData' Content blazon '' length 4781 bytes downloaded 4781 bytes          
            # Load the wine data into your workspace using load() load("wine_local.RData")  # Print out the summary of the wine data summary(wine)          
                          Alcohol        Malic acrid        Ash        Alcalinity of ash  Min.   :11.03   Min.   :0.74   Min.   :i.360   Min.   :10.threescore      1st Qu.:12.36   1st Qu.:1.60   1st Qu.:2.210   1st Qu.:17.20      Median :xiii.05   Median :i.87   Median :2.360   Median :19.50      Mean   :12.99   Mean   :2.34   Mean   :2.366   Mean   :19.52      3rd Qu.:xiii.67   3rd Qu.:3.ten   3rd Qu.:2.560   3rd Qu.:21.fifty      Max.   :14.83   Max.   :5.80   Max.   :3.230   Max.   :30.00        Magnesium      Total phenols     Flavanoids     Min.   : 70.00   Min.   :0.980   Min.   :0.340    1st Qu.: 88.00   1st Qu.:1.740   1st Qu.:1.200    Median : 98.00   Median :two.350   Median :2.130    Hateful   : 99.59   Mean   :2.292   Mean   :two.023    3rd Qu.:107.00   3rd Qu.:2.800   3rd Qu.:two.860    Max.   :162.00   Max.   :iii.880   Max.   :v.080    Nonflavanoid phenols Proanthocyanins Color intensity   Min.   :0.1300       Min.   :0.410   Min.   : 1.280    1st Qu.:0.2700       1st Qu.:1.250   1st Qu.: 3.210    Median :0.3400       Median :1.550   Median : 4.680    Hateful   :0.3623       Mean   :1.587   Mean   : 5.055    3rd Qu.:0.4400       third Qu.:1.950   3rd Qu.: 6.200    Max.   :0.6600       Max.   :3.580   Max.   :13.000         Hue           Proline        Min.   :1.270   Min.   : 278.0    1st Qu.:one.930   1st Qu.: 500.0    Median :2.780   Median : 672.0    Mean   :2.604   Mean   : 745.i    3rd Qu.:iii.170   third Qu.: 985.0    Max.   :4.000   Max.   :1680.0                      

HTTR

Downloading a file from the Net ways sending a Get asking and receiving the file you lot asked for. Internally, all the previously discussed functions use a Go request to download files.

httr provides a convenient function, Become() to execute this Go request. The result is a response object, that provides piece of cake access to the status code, content-type and, of course, the actual content.

You tin can extract the content from the asking using the content() part. There are three means to retrieve this content: every bit a raw object, as a character vector, or an R object, such as a list. If yous don't tell content() how to retrieve the content through the equally argument, it'll try its all-time to figure out which type is most appropriate based on the content-type.

            # Load the httr parcel library(httr)  # Get the url, save response to resp url <- "http://www.instance.com/" resp <- GET(url)  # Impress resp resp          
            Response [http://www.case.com/]   Date: 2020-04-05 13:58   Status: 200   Content-Type: text/html; charset=UTF-8   Size: 1.26 kB <!doctype html> <html> <caput>     <championship>Example Domain</title>      <meta charset="utf-viii" />     <meta http-equiv="Content-type" content="text/html; charset=utf...     <meta name="viewport" content="width=device-width, initial-scal...     <style type="text/css">     body { ...          
            # Get the raw content of resp: raw_content raw_content <- content(resp, as = "raw")  # Impress the head of raw_content caput(raw_content)          
            [1] 3c 21 64 6f 63 74          

Web content does not limit itself to HTML pages and files stored on remote servers such every bit DataCamp's Amazon S3 instances. There are many other data formats out there. A very common one is JSON. This format is very ofttimes used by then-chosen Web APIs, interfaces to web servers with which y'all as a customer can communicate to become or store information in more complicated means.

            # Go the url url <- "http://world wide web.omdbapi.com/?apikey=72bc447a&t=Annie+Hall&y=&plot=brusque&r=json" resp <- GET(url)  # Print resp resp          
            Response [http://www.omdbapi.com/?apikey=72bc447a&t=Annie+Hall&y=&plot=curt&r=json]   Date: 2020-04-05 13:58   Status: 200   Content-Blazon: application/json; charset=utf-8   Size: 929 B          
            # Print content of resp every bit text content(resp, as = "text")          
            [1] "{\"Title\":\"Annie Hall\",\"Twelvemonth\":\"1977\",\"Rated\":\"PG\",\"Released\":\"twenty Apr 1977\",\"Runtime\":\"93 min\",\"Genre\":\"One-act, Romance\",\"Director\":\"Woody Allen\",\"Writer\":\"Woody Allen, Marshall Brickman\",\"Actors\":\"Woody Allen, Diane Keaton, Tony Roberts, Carol Kane\",\"Plot\":\"Neurotic New York comedian Alvy Singer falls in love with the ditzy Annie Hall.\",\"Language\":\"English, German\",\"Country\":\"USA\",\"Awards\":\"Won four Oscars. Another 26 wins & 8 nominations.\",\"Poster\":\"https://chiliad.media-amazon.com/images/M/MV5BZDg1OGQ4YzgtM2Y2NS00NjA3LWFjYTctMDRlMDI3NWE1OTUyXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_SX300.jpg\",\"Ratings\":[{\"Source\":\"Internet Pic Database\",\"Value\":\"viii.0/10\"},{\"Source\":\"Rotten Tomatoes\",\"Value\":\"97%\"},{\"Source\":\"Metacritic\",\"Value\":\"92/100\"}],\"Metascore\":\"92\",\"imdbRating\":\"8.0\",\"imdbVotes\":\"243,072\",\"imdbID\":\"tt0075686\",\"Type\":\"movie\",\"DVD\":\"28 Apr 1998\",\"BoxOffice\":\"Northward/A\",\"Production\":\"United Artists\",\"Website\":\"N/A\",\"Response\":\"True\"}"          
            # Print content of resp content(resp)          
            $Title [one] "Annie Hall"  $Year [one] "1977"  $Rated [1] "PG"  $Released [1] "twenty April 1977"  $Runtime [1] "93 min"  $Genre [one] "Comedy, Romance"  $Director [1] "Woody Allen"  $Writer [1] "Woody Allen, Marshall Brickman"  $Actors [1] "Woody Allen, Diane Keaton, Tony Roberts, Ballad Kane"  $Plot [1] "Neurotic New York comedian Alvy Singer falls in love with the ditzy Annie Hall."  $Language [1] "English language, German"  $Country [1] "Us"  $Awards [i] "Won 4 Oscars. Another 26 wins & viii nominations."  $Poster [1] "https://m.media-amazon.com/images/M/MV5BZDg1OGQ4YzgtM2Y2NS00NjA3LWFjYTctMDRlMDI3NWE1OTUyXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_SX300.jpg"  $Ratings $Ratings[[one]] $Ratings[[1]]$Source [i] "Internet Movie Database"  $Ratings[[ane]]$Value [1] "8.0/10"   $Ratings[[two]] $Ratings[[2]]$Source [1] "Rotten Tomatoes"  $Ratings[[2]]$Value [one] "97%"   $Ratings[[three]] $Ratings[[3]]$Source [1] "Metacritic"  $Ratings[[3]]$Value [1] "92/100"    $Metascore [1] "92"  $imdbRating [1] "8.0"  $imdbVotes [1] "243,072"  $imdbID [1] "tt0075686"  $Type [1] "movie"  $DVD [1] "28 Apr 1998"  $BoxOffice [ane] "N/A"  $Production [1] "United Artists"  $Website [1] "Due north/A"  $Response [i] "True"          

JSON

In the simplest setting, fromJSON() tin convert grapheme strings that correspond JSON data into a nicely structured R list.

            library(jsonlite)          
            # wine_json is a JSON wine_json <- '{"proper noun":"Chateau Migraine", "year":1997, "alcohol_pct":12.4, "color":"red", "awarded":false}'  # Convert wine_json into a list: vino wine <- fromJSON(wine_json)  # Print construction of wine str(wine)          
            List of 5  $ name       : chr "Chateau Migraine"  $ year       : int 1997  $ alcohol_pct: num 12.4  $ colour      : chr "ruddy"  $ awarded    : logi Faux          

Quandl API

fromJSON() also works if you pass a URL as a character string or the path to a local file that contains JSON data. Let's effort this out on the Quandl API, where you tin fetch all sorts of financial and economical data.

              # Definition of quandl_url quandl_url <- "https://world wide web.quandl.com/api/v3/datasets/WIKI/FB/data.json?auth_token=i83asDsiWUUyfoypkgMz"  # Import Quandl data:  quandl_data <- fromJSON(quandl_url)  # Impress construction of quandl_data str(quandl_data)            
              List of 1  $ dataset_data:Listing of 10   ..$ limit       : NULL   ..$ transform   : Zero   ..$ column_index: NULL   ..$ column_names: chr [1:thirteen] "Date" "Open" "High" "Low" ...   ..$ start_date  : chr "2012-05-18"   ..$ end_date    : chr "2018-03-27"   ..$ frequency   : chr "daily"   ..$ information        : chr [1:1472, 1:13] "2018-03-27" "2018-03-26" "2018-03-23" "2018-03-22" ...   ..$ plummet    : Nix   ..$ club       : Goose egg            

OMDb API

Compare the release year of two movies in the Open Movie Database.

              # Definition of the URLs url_sw4 <- "http://www.omdbapi.com/?apikey=72bc447a&i=tt0076759&r=json" url_sw3 <- "http://world wide web.omdbapi.com/?apikey=72bc447a&i=tt0121766&r=json"  # Import two URLs with fromJSON(): sw4 and sw3 sw4 <- fromJSON(url_sw4) sw3 <- fromJSON(url_sw3)  # Print out the Title element of both lists sw4$Title            
              [1] "Star Wars: Episode IV - A New Hope"            
              sw3$Championship            
              [1] "Star Wars: Episode 3 - Revenge of the Sith"            
              # Is the release twelvemonth of sw4 subsequently than sw3? Year <- sw4$Year > sw3$Year Year            
              [1] Imitation            

JSON: more than functions

JSON is built on ii structures: objects and arrays.

            # Claiming 1 json1 <- '[ane, ii, iii, 4, 5,six]' fromJSON(json1)          
            [ane] i ii 3 4 5 6          
            # Challenge 2 json2 <- '{"a": [1, 2, three], "b": [4, v, six]}' fromJSON(json2)          
            $a [1] i 2 3  $b [1] 4 5 6          
            # Challenge i json1 <- '[[one, 2], [3, 4]]' fromJSON(json1)          
                          [,1] [,2] [1,]    ane    2 [2,]    3    four          
            # Challenge 2 json2 <- '[{"a": i, "b": ii}, {"a": 3, "b": four}, {"a": v, "b": half dozen}]' fromJSON(json2)          

toJSON()

Apart from converting JSON to R with fromJSON(), y'all can too utilize toJSON() to convert R data to a JSON format. In its near bones employ, yous simply laissez passer this function an R object to convert to a JSON. The result is an R object of the class json, which is basically a character string representing that JSON.

              # URL pointing to the .csv file url_csv <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/h2o.csv"  # Import the .csv file located at url_csv h2o <- read.csv(url_csv, stringsAsFactors = Faux)  # Convert the information file co-ordinate to the requirements water_json <- toJSON(h2o)  # Print out water_json water_json            
              [{"h2o":"Algeria","X1992":0.064,"X2002":0.017},{"water":"American Samoa"},{"water":"Republic of angola","X1992":0.0001,"X2002":0.0001},{"water":"Antigua and Barbuda","X1992":0.0033},{"water":"Argentina","X1992":0.0007,"X1997":0.0007,"X2002":0.0007},{"h2o":"Commonwealth of australia","X1992":0.0298,"X2002":0.0298},{"h2o":"Austria","X1992":0.0022,"X2002":0.0022},{"water":"Bahamas","X1992":0.0013,"X2002":0.0074},{"water":"Bahrain","X1992":0.0441,"X2002":0.0441,"X2007":0.1024},{"h2o":"Barbados","X2007":0.0146},{"water":"British Virgin Islands","X2007":0.0042},{"water":"Canada","X1992":0.0027,"X2002":0.0027},{"water":"Republic of cape verde","X1992":0.002,"X1997":0.0017},{"water":"Cayman Islands","X1992":0.0033},{"water":"Central African Rep."},{"h2o":"Republic of chile","X1992":0.0048,"X2002":0.0048},{"water":"Colombia","X1992":0.0027,"X2002":0.0027},{"water":"Cuba","X1992":0.0069,"X1997":0.0069,"X2002":0.0069},{"h2o":"Cyprus","X1992":0.003,"X1997":0.003,"X2002":0.0335},{"water":"Czech Rep.","X1992":0.0002,"X2002":0.0002},{"water":"Kingdom of denmark","X1992":0.015,"X2002":0.015},{"h2o":"Republic of djibouti","X1992":0.0001,"X2002":0.0001},{"water":"Ecuador","X1992":0.0022,"X1997":0.0022,"X2002":0.0022},{"water":"Egypt","X1992":0.025,"X1997":0.025,"X2002":0.1},{"water":"El Salvador","X1992":0.0001,"X2002":0.0001},{"water":"Republic of finland","X1992":0.0001,"X2002":0.0001},{"h2o":"France","X1992":0.0117,"X2002":0.0117},{"water":"Gibraltar","X1992":0.0077},{"water":"Greece","X1992":0.01,"X2002":0.01},{"water":"Republic of honduras","X1992":0.0002,"X2002":0.0002},{"water":"Republic of hungary","X1992":0.0002,"X2002":0.0002},{"water":"India","X1997":0.0005,"X2002":0.0005},{"h2o":"Republic of indonesia","X1992":0.0187,"X2002":0.0187},{"water":"Iran","X1992":0.003,"X1997":0.003,"X2002":0.003,"X2007":0.2},{"water":"Iraq","X1997":0.0074,"X2002":0.0074},{"water":"Republic of ireland","X1992":0.0002,"X2002":0.0002},{"water":"Israel","X1992":0.0256,"X2002":0.0256,"X2007":0.14},{"water":"Italia","X1992":0.0973,"X2002":0.0973},{"water":"Jamaica","X1992":0.0005,"X1997":0.0005,"X2002":0.0005},{"water":"Japan","X1997":0.04,"X2002":0.04},{"h2o":"Jordan","X1997":0.002,"X2007":0.0098},{"water":"Kazakhstan","X1997":i.328,"X2002":1.328},{"water":"State of kuwait","X1992":0.507,"X1997":0.231,"X2002":0.4202},{"h2o":"Lebanese republic","X2007":0.0473},{"water":"Libya","X2002":0.018},{"water":"Malaysia","X1992":0.0043,"X2002":0.0043},{"water":"Republic of the maldives","X1992":0.0004},{"water":"Malta","X1992":0.024,"X1997":0.031,"X2002":0.031},{"h2o":"Marshall Islands","X1992":0.0007},{"water":"Mauritania","X1992":0.002,"X2002":0.002},{"water":"United mexican states","X1992":0.0307,"X2002":0.0307},{"h2o":"Morocco","X1992":0.0034,"X1997":0.0034,"X2002":0.007},{"water":"Namibia","X1992":0.0003,"X2002":0.0003},{"water":"Netherlands Antilles","X1992":0.063},{"water":"Nicaragua","X1992":0.0002,"X2002":0.0002},{"water":"Nigeria","X1992":0.003,"X2002":0.003},{"water":"Norway","X1992":0.0001,"X2002":0.0001},{"water":"Oman","X1997":0.034,"X2002":0.034,"X2007":0.109},{"water":"Peru","X1992":0.0054,"X2002":0.0054},{"water":"Poland","X1992":0.007,"X2002":0.007},{"water":"Portugal","X1992":0.0016,"X2002":0.0016},{"water":"Qatar","X1992":0.065,"X1997":0.099,"X2002":0.099,"X2007":0.18},{"water":"Kingdom of saudi arabia","X1992":0.683,"X1997":0.727,"X2002":0.863,"X2007":i.033},{"water":"Senegal","X1992":0,"X2002":0},{"water":"Somalia","X1992":0.0001,"X2002":0.0001},{"water":"South Africa","X1992":0.018,"X2002":0.018},{"water":"Kingdom of spain","X1992":0.1002,"X2002":0.1002},{"h2o":"Sudan","X1992":0.0004,"X1997":0.0004,"X2002":0.0004},{"water":"Sweden","X1992":0.0002,"X2002":0.0002},{"h2o":"Trinidad and Tobago","X2007":0.036},{"water":"Tunisia","X1992":0.008,"X2002":0.013},{"water":"Turkey","X1992":0.0005,"X2002":0.0005,"X2007":0.0005},{"water":"United Arab Emirates","X1992":0.163,"X1997":0.385,"X2007":0.95},{"h2o":"United kingdom","X1992":0.0333,"X2002":0.0333},{"water":"United States","X1992":0.58,"X2002":0.58},{"water":"Venezuela","X1992":0.0052,"X2002":0.0052},{"water":"Republic of yemen, Rep.","X1992":0.01,"X2002":0.01}]                          

Minify and prettify

JSONs can come in unlike formats. Take these two JSONs, that are in fact exactly the same: the first ane is in a minified format, the 2nd one is in a pretty format with indentation, whitespace and new lines:

              # Mini {"a":1,"b":2,"c":{"x":5,"y":6}}  # Pretty {   "a": one,   "b": 2,   "c": {     "ten": 5,     "y": 6   } }            

Unless you're a computer, you surely prefer the 2nd version. All the same, the standard form that toJSON() returns, is the minified version, every bit information technology is more concise. Y'all can adapt this behavior past setting the pretty argument inside toJSON() to TRUE. If you already accept a JSON string, y'all tin employ prettify() or minify() to make the JSON pretty or as concise equally possible.

              # Convert mtcars to a pretty JSON: pretty_json pretty_json <- toJSON(mtcars, pretty = TRUE)  # Impress pretty_json pretty_json            
              [   {     "mpg": 21,     "cyl": 6,     "disp": 160,     "hp": 110,     "drat": 3.9,     "wt": 2.62,     "qsec": 16.46,     "vs": 0,     "am": 1,     "gear": four,     "carb": four,     "_row": "Mazda RX4"   },   {     "mpg": 21,     "cyl": 6,     "disp": 160,     "hp": 110,     "drat": iii.9,     "wt": 2.875,     "qsec": 17.02,     "vs": 0,     "am": i,     "gear": 4,     "carb": 4,     "_row": "Mazda RX4 Wag"   },   {     "mpg": 22.eight,     "cyl": 4,     "disp": 108,     "hp": 93,     "drat": three.85,     "wt": 2.32,     "qsec": 18.61,     "vs": i,     "am": 1,     "gear": iv,     "carb": ane,     "_row": "Datsun 710"   },   {     "mpg": 21.4,     "cyl": 6,     "disp": 258,     "hp": 110,     "drat": 3.08,     "wt": 3.215,     "qsec": 19.44,     "vs": i,     "am": 0,     "gear": 3,     "carb": 1,     "_row": "Hornet 4 Drive"   },   {     "mpg": xviii.7,     "cyl": 8,     "disp": 360,     "hp": 175,     "drat": 3.15,     "wt": 3.44,     "qsec": 17.02,     "vs": 0,     "am": 0,     "gear": 3,     "carb": two,     "_row": "Hornet Sportabout"   },   {     "mpg": eighteen.1,     "cyl": half dozen,     "disp": 225,     "hp": 105,     "drat": two.76,     "wt": 3.46,     "qsec": twenty.22,     "vs": ane,     "am": 0,     "gear": 3,     "carb": 1,     "_row": "Valiant"   },   {     "mpg": 14.3,     "cyl": viii,     "disp": 360,     "hp": 245,     "drat": 3.21,     "wt": 3.57,     "qsec": xv.84,     "vs": 0,     "am": 0,     "gear": 3,     "carb": four,     "_row": "Duster 360"   },   {     "mpg": 24.4,     "cyl": four,     "disp": 146.vii,     "hp": 62,     "drat": 3.69,     "wt": 3.nineteen,     "qsec": 20,     "vs": 1,     "am": 0,     "gear": 4,     "carb": ii,     "_row": "Merc 240D"   },   {     "mpg": 22.8,     "cyl": 4,     "disp": 140.viii,     "hp": 95,     "drat": 3.92,     "wt": 3.15,     "qsec": 22.ix,     "vs": 1,     "am": 0,     "gear": 4,     "carb": two,     "_row": "Merc 230"   },   {     "mpg": 19.two,     "cyl": vi,     "disp": 167.6,     "hp": 123,     "drat": three.92,     "wt": 3.44,     "qsec": 18.three,     "vs": 1,     "am": 0,     "gear": 4,     "carb": four,     "_row": "Merc 280"   },   {     "mpg": 17.eight,     "cyl": half dozen,     "disp": 167.6,     "hp": 123,     "drat": 3.92,     "wt": 3.44,     "qsec": 18.nine,     "vs": 1,     "am": 0,     "gear": 4,     "carb": iv,     "_row": "Merc 280C"   },   {     "mpg": 16.4,     "cyl": viii,     "disp": 275.8,     "hp": 180,     "drat": three.07,     "wt": 4.07,     "qsec": 17.iv,     "vs": 0,     "am": 0,     "gear": 3,     "carb": 3,     "_row": "Merc 450SE"   },   {     "mpg": 17.3,     "cyl": 8,     "disp": 275.viii,     "hp": 180,     "drat": 3.07,     "wt": 3.73,     "qsec": 17.half-dozen,     "vs": 0,     "am": 0,     "gear": 3,     "carb": 3,     "_row": "Merc 450SL"   },   {     "mpg": 15.2,     "cyl": viii,     "disp": 275.viii,     "hp": 180,     "drat": 3.07,     "wt": 3.78,     "qsec": 18,     "vs": 0,     "am": 0,     "gear": 3,     "carb": iii,     "_row": "Merc 450SLC"   },   {     "mpg": ten.four,     "cyl": eight,     "disp": 472,     "hp": 205,     "drat": two.93,     "wt": five.25,     "qsec": 17.98,     "vs": 0,     "am": 0,     "gear": iii,     "carb": 4,     "_row": "Cadillac Fleetwood"   },   {     "mpg": 10.four,     "cyl": eight,     "disp": 460,     "hp": 215,     "drat": iii,     "wt": v.424,     "qsec": 17.82,     "vs": 0,     "am": 0,     "gear": iii,     "carb": iv,     "_row": "Lincoln Continental"   },   {     "mpg": 14.7,     "cyl": 8,     "disp": 440,     "hp": 230,     "drat": 3.23,     "wt": v.345,     "qsec": 17.42,     "vs": 0,     "am": 0,     "gear": three,     "carb": four,     "_row": "Chrysler Imperial"   },   {     "mpg": 32.4,     "cyl": 4,     "disp": 78.7,     "hp": 66,     "drat": 4.08,     "wt": 2.2,     "qsec": 19.47,     "vs": ane,     "am": 1,     "gear": 4,     "carb": 1,     "_row": "Fiat 128"   },   {     "mpg": 30.4,     "cyl": 4,     "disp": 75.7,     "hp": 52,     "drat": 4.93,     "wt": ane.615,     "qsec": 18.52,     "vs": one,     "am": 1,     "gear": 4,     "carb": 2,     "_row": "Honda Civic"   },   {     "mpg": 33.9,     "cyl": 4,     "disp": 71.i,     "hp": 65,     "drat": 4.22,     "wt": 1.835,     "qsec": nineteen.9,     "vs": ane,     "am": 1,     "gear": iv,     "carb": one,     "_row": "Toyota Corolla"   },   {     "mpg": 21.five,     "cyl": 4,     "disp": 120.1,     "hp": 97,     "drat": 3.7,     "wt": 2.465,     "qsec": twenty.01,     "vs": 1,     "am": 0,     "gear": 3,     "carb": 1,     "_row": "Toyota Corona"   },   {     "mpg": 15.5,     "cyl": eight,     "disp": 318,     "hp": 150,     "drat": 2.76,     "wt": three.52,     "qsec": 16.87,     "vs": 0,     "am": 0,     "gear": three,     "carb": two,     "_row": "Dodge Challenger"   },   {     "mpg": 15.2,     "cyl": eight,     "disp": 304,     "hp": 150,     "drat": three.15,     "wt": 3.435,     "qsec": 17.iii,     "vs": 0,     "am": 0,     "gear": 3,     "carb": ii,     "_row": "AMC Javelin"   },   {     "mpg": 13.3,     "cyl": 8,     "disp": 350,     "hp": 245,     "drat": 3.73,     "wt": 3.84,     "qsec": fifteen.41,     "vs": 0,     "am": 0,     "gear": three,     "carb": 4,     "_row": "Camaro Z28"   },   {     "mpg": 19.2,     "cyl": 8,     "disp": 400,     "hp": 175,     "drat": three.08,     "wt": three.845,     "qsec": 17.05,     "vs": 0,     "am": 0,     "gear": iii,     "carb": 2,     "_row": "Pontiac Firebird"   },   {     "mpg": 27.3,     "cyl": 4,     "disp": 79,     "hp": 66,     "drat": 4.08,     "wt": 1.935,     "qsec": 18.9,     "vs": ane,     "am": 1,     "gear": 4,     "carb": 1,     "_row": "Fiat X1-9"   },   {     "mpg": 26,     "cyl": four,     "disp": 120.3,     "hp": 91,     "drat": four.43,     "wt": two.14,     "qsec": 16.7,     "vs": 0,     "am": ane,     "gear": 5,     "carb": two,     "_row": "Porsche 914-2"   },   {     "mpg": 30.iv,     "cyl": 4,     "disp": 95.1,     "hp": 113,     "drat": 3.77,     "wt": one.513,     "qsec": sixteen.9,     "vs": 1,     "am": 1,     "gear": 5,     "carb": 2,     "_row": "Lotus Europa"   },   {     "mpg": 15.8,     "cyl": 8,     "disp": 351,     "hp": 264,     "drat": 4.22,     "wt": 3.17,     "qsec": 14.5,     "vs": 0,     "am": i,     "gear": 5,     "carb": 4,     "_row": "Ford Pantera Fifty"   },   {     "mpg": 19.seven,     "cyl": 6,     "disp": 145,     "hp": 175,     "drat": 3.62,     "wt": ii.77,     "qsec": fifteen.v,     "vs": 0,     "am": 1,     "gear": 5,     "carb": half-dozen,     "_row": "Ferrari Dino"   },   {     "mpg": fifteen,     "cyl": eight,     "disp": 301,     "hp": 335,     "drat": 3.54,     "wt": 3.57,     "qsec": 14.half dozen,     "vs": 0,     "am": 1,     "gear": 5,     "carb": 8,     "_row": "Maserati Bora"   },   {     "mpg": 21.4,     "cyl": 4,     "disp": 121,     "hp": 109,     "drat": iv.11,     "wt": two.78,     "qsec": 18.6,     "vs": 1,     "am": 1,     "gear": iv,     "carb": two,     "_row": "Volvo 142E"   } ]                          
              # Minify pretty_json: mini_json mini_json <- minify(pretty_json)  # Print mini_json mini_json            
              [{"mpg":21,"cyl":six,"disp":160,"hp":110,"drat":3.ix,"wt":2.62,"qsec":16.46,"vs":0,"am":1,"gear":4,"carb":iv,"_row":"Mazda RX4"},{"mpg":21,"cyl":6,"disp":160,"hp":110,"drat":3.9,"wt":2.875,"qsec":17.02,"vs":0,"am":i,"gear":4,"carb":4,"_row":"Mazda RX4 Wag"},{"mpg":22.eight,"cyl":four,"disp":108,"hp":93,"drat":3.85,"wt":two.32,"qsec":18.61,"vs":1,"am":1,"gear":4,"carb":1,"_row":"Datsun 710"},{"mpg":21.four,"cyl":6,"disp":258,"hp":110,"drat":3.08,"wt":three.215,"qsec":19.44,"vs":1,"am":0,"gear":iii,"carb":1,"_row":"Hornet 4 Drive"},{"mpg":18.vii,"cyl":8,"disp":360,"hp":175,"drat":iii.xv,"wt":iii.44,"qsec":17.02,"vs":0,"am":0,"gear":3,"carb":ii,"_row":"Hornet Sportabout"},{"mpg":18.1,"cyl":half-dozen,"disp":225,"hp":105,"drat":2.76,"wt":iii.46,"qsec":twenty.22,"vs":one,"am":0,"gear":3,"carb":ane,"_row":"Valiant"},{"mpg":14.three,"cyl":viii,"disp":360,"hp":245,"drat":iii.21,"wt":3.57,"qsec":15.84,"vs":0,"am":0,"gear":iii,"carb":iv,"_row":"Duster 360"},{"mpg":24.4,"cyl":four,"disp":146.vii,"hp":62,"drat":3.69,"wt":3.19,"qsec":20,"vs":1,"am":0,"gear":four,"carb":2,"_row":"Merc 240D"},{"mpg":22.8,"cyl":4,"disp":140.8,"hp":95,"drat":3.92,"wt":iii.fifteen,"qsec":22.9,"vs":ane,"am":0,"gear":4,"carb":2,"_row":"Merc 230"},{"mpg":nineteen.two,"cyl":vi,"disp":167.6,"hp":123,"drat":3.92,"wt":iii.44,"qsec":18.three,"vs":one,"am":0,"gear":4,"carb":4,"_row":"Merc 280"},{"mpg":17.8,"cyl":6,"disp":167.vi,"hp":123,"drat":3.92,"wt":3.44,"qsec":18.9,"vs":one,"am":0,"gear":4,"carb":4,"_row":"Merc 280C"},{"mpg":16.4,"cyl":viii,"disp":275.8,"hp":180,"drat":3.07,"wt":4.07,"qsec":17.4,"vs":0,"am":0,"gear":3,"carb":3,"_row":"Merc 450SE"},{"mpg":17.3,"cyl":8,"disp":275.8,"hp":180,"drat":iii.07,"wt":3.73,"qsec":17.six,"vs":0,"am":0,"gear":3,"carb":3,"_row":"Merc 450SL"},{"mpg":15.2,"cyl":viii,"disp":275.eight,"hp":180,"drat":3.07,"wt":three.78,"qsec":18,"vs":0,"am":0,"gear":3,"carb":3,"_row":"Merc 450SLC"},{"mpg":x.4,"cyl":8,"disp":472,"hp":205,"drat":2.93,"wt":five.25,"qsec":17.98,"vs":0,"am":0,"gear":iii,"carb":4,"_row":"Cadillac Fleetwood"},{"mpg":ten.iv,"cyl":8,"disp":460,"hp":215,"drat":three,"wt":five.424,"qsec":17.82,"vs":0,"am":0,"gear":3,"carb":four,"_row":"Lincoln Continental"},{"mpg":fourteen.seven,"cyl":8,"disp":440,"hp":230,"drat":three.23,"wt":five.345,"qsec":17.42,"vs":0,"am":0,"gear":3,"carb":4,"_row":"Chrysler Purple"},{"mpg":32.iv,"cyl":4,"disp":78.7,"hp":66,"drat":4.08,"wt":2.ii,"qsec":nineteen.47,"vs":1,"am":one,"gear":four,"carb":1,"_row":"Fiat 128"},{"mpg":30.four,"cyl":4,"disp":75.seven,"hp":52,"drat":4.93,"wt":1.615,"qsec":18.52,"vs":i,"am":1,"gear":4,"carb":two,"_row":"Honda Civic"},{"mpg":33.9,"cyl":4,"disp":71.1,"hp":65,"drat":iv.22,"wt":one.835,"qsec":xix.9,"vs":ane,"am":1,"gear":4,"carb":one,"_row":"Toyota Corolla"},{"mpg":21.5,"cyl":four,"disp":120.1,"hp":97,"drat":three.seven,"wt":2.465,"qsec":20.01,"vs":1,"am":0,"gear":iii,"carb":i,"_row":"Toyota Corona"},{"mpg":xv.5,"cyl":eight,"disp":318,"hp":150,"drat":2.76,"wt":3.52,"qsec":xvi.87,"vs":0,"am":0,"gear":3,"carb":2,"_row":"Dodge Challenger"},{"mpg":15.2,"cyl":8,"disp":304,"hp":150,"drat":iii.15,"wt":3.435,"qsec":17.3,"vs":0,"am":0,"gear":3,"carb":2,"_row":"AMC Javelin"},{"mpg":13.3,"cyl":8,"disp":350,"hp":245,"drat":3.73,"wt":3.84,"qsec":fifteen.41,"vs":0,"am":0,"gear":3,"carb":4,"_row":"Camaro Z28"},{"mpg":xix.2,"cyl":eight,"disp":400,"hp":175,"drat":three.08,"wt":iii.845,"qsec":17.05,"vs":0,"am":0,"gear":3,"carb":2,"_row":"Pontiac Firebird"},{"mpg":27.3,"cyl":4,"disp":79,"hp":66,"drat":4.08,"wt":ane.935,"qsec":18.9,"vs":1,"am":1,"gear":iv,"carb":1,"_row":"Fiat X1-nine"},{"mpg":26,"cyl":four,"disp":120.iii,"hp":91,"drat":4.43,"wt":2.fourteen,"qsec":16.7,"vs":0,"am":1,"gear":5,"carb":2,"_row":"Porsche 914-ii"},{"mpg":30.4,"cyl":4,"disp":95.1,"hp":113,"drat":3.77,"wt":1.513,"qsec":16.nine,"vs":ane,"am":one,"gear":5,"carb":ii,"_row":"Lotus Europa"},{"mpg":15.8,"cyl":eight,"disp":351,"hp":264,"drat":four.22,"wt":3.17,"qsec":xiv.5,"vs":0,"am":1,"gear":v,"carb":4,"_row":"Ford Pantera 50"},{"mpg":19.seven,"cyl":half-dozen,"disp":145,"hp":175,"drat":3.62,"wt":two.77,"qsec":15.five,"vs":0,"am":1,"gear":five,"carb":6,"_row":"Ferrari Dino"},{"mpg":15,"cyl":eight,"disp":301,"hp":335,"drat":iii.54,"wt":3.57,"qsec":14.vi,"vs":0,"am":1,"gear":5,"carb":8,"_row":"Maserati Bora"},{"mpg":21.4,"cyl":iv,"disp":121,"hp":109,"drat":4.11,"wt":two.78,"qsec":18.six,"vs":1,"am":i,"gear":4,"carb":2,"_row":"Volvo 142E"}]                          

haven

oasis is an extremely easy-to-use package to import data from three software packages: SAS, STATA and SPSS. Depending on the software, you employ different functions:

  • SAS: read_sas()
  • STATA: read_dta() (or read_stata(), which are identical)
  • SPSS: read_sav() or read_por(), depending on the file type.

All these functions take i key statement: the path to your local file. In fact, you tin even pass a URL; haven volition then automatically download the file for you earlier importing it.

            library(haven)          
            # Import sales.sas7bdat: sales sales <- read_sas("sales.sas7bdat")  # Display the structure of sales str(sales)          
            Classes 'tbl_df', 'tbl' and 'data.frame':   431 obs. of  4 variables:  $ purchase: num  0 0 1 1 0 0 0 0 0 0 ...  $ age     : num  41 47 41 39 32 32 33 45 43 40 ...  $ gender  : chr  "Female person" "Female" "Female person" "Female person" ...  $ income  : chr  "Low" "Low" "Low" "Depression" ...  - attr(*, "label")= chr "SALES"          

Side by side up are STATA data files; you can utilise read_dta() for these.

When inspecting the result of the read_dta() call, you lot will notice that one column will be imported as a labelled vector, an R equivalent for the common information construction in other statistical environments. In order to finer proceed working on the data in R, it's best to modify this data into a standard R class. To convert a variable of the course labelled to a factor, you'll need haven'southward as_factor() office.

            # Import the data from the URL: sugar sugar <- read_dta('http://avails.datacamp.com/production/course_1478/datasets/trade.dta')  # Structure of sugar str(sugar)          
            Classes 'tbl_df', 'tbl' and 'data.frame':   ten obs. of  v variables:  $ Date    : 'haven_labelled' num  10 9 8 7 half-dozen five 4 3 two ane   ..- attr(*, "characterization")= chr "Appointment"   ..- attr(*, "format.stata")= chr "%9.0g"   ..- attr(*, "labels")= Named num  1 2 iii 4 five 6 vii eight 9 ten   .. ..- attr(*, "names")= chr  "2004-12-31" "2005-12-31" "2006-12-31" "2007-12-31" ...  $ Import  : num  37664782 16316512 11082246 35677943 9879878 ...   ..- attr(*, "label")= chr "Import"   ..- attr(*, "format.stata")= chr "%9.0g"  $ Weight_I: num  54029106 21584365 14526089 55034932 14806865 ...   ..- attr(*, "characterization")= chr "Weight_I"   ..- attr(*, "format.stata")= chr "%9.0g"  $ Export  : num  5.45e+07 i.03e+08 3.79e+07 4.85e+07 7.15e+07 ...   ..- attr(*, "label")= chr "Consign"   ..- attr(*, "format.stata")= chr "%9.0g"  $ Weight_E: num  9.34e+07 1.58e+08 8.80e+07 1.12e+08 ane.32e+08 ...   ..- attr(*, "label")= chr "Weight_E"   ..- attr(*, "format.stata")= chr "%9.0g"  - attr(*, "label")= chr "Written by R."          
            # Convert values in Date cavalcade to dates sugar$Appointment <- every bit.Date(as_factor(sugar$Date))  # Structure of carbohydrate again str(sugar)          
            Classes 'tbl_df', 'tbl' and 'information.frame':   10 obs. of  v variables:  $ Appointment    : Date, format: "2013-12-31" ...  $ Import  : num  37664782 16316512 11082246 35677943 9879878 ...   ..- attr(*, "label")= chr "Import"   ..- attr(*, "format.stata")= chr "%9.0g"  $ Weight_I: num  54029106 21584365 14526089 55034932 14806865 ...   ..- attr(*, "label")= chr "Weight_I"   ..- attr(*, "format.stata")= chr "%9.0g"  $ Consign  : num  v.45e+07 1.03e+08 three.79e+07 4.85e+07 seven.15e+07 ...   ..- attr(*, "characterization")= chr "Export"   ..- attr(*, "format.stata")= chr "%9.0g"  $ Weight_E: num  9.34e+07 1.58e+08 eight.80e+07 1.12e+08 ane.32e+08 ...   ..- attr(*, "label")= chr "Weight_E"   ..- attr(*, "format.stata")= chr "%9.0g"  - attr(*, "label")= chr "Written by R."          

A plot can be very useful to explore the relationship between two variables. If y'all pass the plot() part two arguments, the first 1 volition be plotted on the 10-axis, the second 1 volition be plotted on the y-axis.

            plot(x = sugar$Import, y = sugar$Weight_I)          

The haven packet can besides import data files from SPSS. Again, importing the data is pretty straightforward. Depending on the SPSS data file yous're working with, you'll need either read_sav() - for .sav files - or read_por() - for .por files.

            # Import person.sav: traits traits <- read_sav("person.sav")  # Summarize traits summary(traits)          
                          Neurotic      Extroversion   Agreeableness   Conscientiousness  Min.   : 0.00   Min.   : 5.00   Min.   :15.00   Min.   : 7.00      1st Qu.:18.00   1st Qu.:26.00   1st Qu.:39.00   1st Qu.:25.00      Median :24.00   Median :31.00   Median :45.00   Median :thirty.00      Mean   :23.63   Hateful   :30.23   Mean   :44.55   Mean   :30.85      3rd Qu.:29.00   3rd Qu.:34.00   3rd Qu.:50.00   tertiary Qu.:36.00      Max.   :44.00   Max.   :65.00   Max.   :73.00   Max.   :58.00      NA'southward   :14      NA's   :16      NA's   :19      NA's   :xiv                      
            # Print out a subset subset(traits, Extroversion > 40 & Agreeableness > 40)          

With SPSS data files, it can too happen that some of the variables you lot import take the labelled class. This is done to go on all the labelling data that was originally nowadays in the .sav and .por files. It'southward advised to coerce (or alter) these variables to factors or other standard R classes.

            # Import SPSS information from the URL: work work <- read_sav("http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/employee.sav")  # Display summary of work$GENDER summary(piece of work$GENDER)          
                          Length          Grade           Fashion             474 haven_labelled      character                      
            # Convert work$GENDER to a factor work$GENDER <- as_factor(work$GENDER)   # Display summary of piece of work$GENDER again summary(work$GENDER)          
            Female person   Male     216    258                      

foreign

The foreign package offers a simple office to import and read STATA data: read.dta().

            library(foreign)          
            package 㤼㸱foreign㤼㸲 was built under R version three.6.3          
            # Import florida.dta and name the resulting data frame florida florida <- read.dta("florida.dta")  # Check tail() of florida tail(florida)          

Data can exist very diverse, going from character vectors to categorical variables, dates and more. It'southward in these cases that the additional arguments of read.dta() will come in handy.

The arguments you volition utilize well-nigh often are catechumen.dates, convert.factors, missing.blazon and convert.underscore. Their meaning is pretty straightforward.

            # Specify the file path using file.path(): path path <- file.path("edequality.dta")  # Create and print structure of edu_equal_1 edu_equal_1 <- read.dta(path) str(edu_equal_1)          
            'data.frame':   12214 obs. of  27 variables:  $ hhid              : num  i 1 1 2 two 3 4 4 5 6 ...  $ hhweight          : num  627 627 627 627 627 ...  $ location          : Factor due west/ 2 levels "urban location",..: 1 1 1 1 i 2 2 2 i 1 ...  $ region            : Factor w/ 9 levels "Sofia city","Bourgass",..: 8 8 8 ix ix iv 4 4 8 eight ...  $ ethnicity_head    : Factor w/ 4 levels "Republic of bulgaria","Turks",..: 2 two 2 1 1 1 1 1 1 i ...  $ age               : num  37 eleven viii 73 lxx 75 79 80 82 83 ...  $ gender            : Gene w/ ii levels "male","female": 2 two 1 ane 2 1 ane 2 two 2 ...  $ relation          : Factor w/ nine levels "head                      ",..: 1 3 three 1 2 1 1 2 1 1 ...  $ literate          : Factor w/ 2 levels "no","yes": ane 2 2 2 2 2 ii 2 ii two ...  $ income_mnt        : num  13.3 thirteen.3 13.3 142.v 142.five ...  $ income            : num  160 160 160 1710 1710 ...  $ aggregate         : num  1042 1042 1042 3271 3271 ...  $ aggr_ind_annual   : num  347 347 347 1635 1635 ...  $ educ_completed    : int  2 4 4 4 3 3 3 three iv 4 ...  $ grade_complete    : num  four 3 0 iii 4 iv 4 4 v 5 ...  $ grade_all         : num  4 11 8 eleven 8 eight 8 8 13 13 ...  $ unemployed        : int  two ane 1 one 1 one i 1 1 1 ...  $ reason_OLF        : int  NA NA NA 3 3 3 9 9 3 3 ...  $ sector            : int  NA NA NA NA NA NA 1 1 NA NA ...  $ occupation        : int  NA NA NA NA NA NA five 5 NA NA ...  $ earn_mont         : num  0 0 0 0 0 0 twenty 20 0 0 ...  $ earn_ann          : num  0 0 0 0 0 0 240 240 0 0 ...  $ hours_week        : num  NA NA NA NA NA NA thirty 35 NA NA ...  $ hours_mnt         : num  NA NA NA NA NA ...  $ fulltime          : int  NA NA NA NA NA NA 1 1 NA NA ...  $ hhexp             : num  100 100 100 343 343 ...  $ legacy_pension_amt: num  NA NA NA NA NA NA NA NA NA NA ...  - attr(*, "datalabel")= chr ""  - attr(*, "time.stamp")= chr ""  - attr(*, "formats")= chr  "%nine.0g" "%9.0g" "%9.0g" "%9.0g" ...  - attr(*, "types")= int  100 100 108 108 108 100 108 108 108 100 ...  - attr(*, "val.labels")= chr  "" "" "location" "region" ...  - attr(*, "var.labels")= chr  "hhid" "hhweight" "location" "region" ...  - attr(*, "expansion.fields")=List of 12   ..$ : chr  "_dta" "_svy_su1" "cluster"   ..$ : chr  "_dta" "_svy_strata1" "strata"   ..$ : chr  "_dta" "_svy_stages" "1"   ..$ : chr  "_dta" "_svy_version" "2"   ..$ : chr  "_dta" "__XijVarLabcons" "(sum) cons"   ..$ : chr  "_dta" "ReS_Xij" "cons"   ..$ : chr  "_dta" "ReS_str" "0"   ..$ : chr  "_dta" "ReS_j" "grouping"   ..$ : chr  "_dta" "ReS_ver" "five.2"   ..$ : chr  "_dta" "ReS_i" "hhid dur"   ..$ : chr  "_dta" "note1" "variables g1pc, g2pc, g3pc, g4pc, g5pc, g7pc, g8pc, g9pc, g10pc, g11pc, g12pc,  gall, health, rent, durables we"| __truncated__   ..$ : chr  "_dta" "note0" "1"  - attr(*, "version")= int 7  - attr(*, "label.table")=List of 12   ..$ location: Named int  1 2   .. ..- attr(*, "names")= chr  "urban location" "rural location"   ..$ region  : Named int  1 2 3 4 5 6 seven viii 9   .. ..- attr(*, "names")= chr  "Sofia urban center" "Bourgass" "Varna" "Lovetch" ...   ..$ ethnic  : Named int  1 2 3 4   .. ..- attr(*, "names")= chr  "Bulgaria" "Turks" "Roma" "Other"   ..$ s2_q2   : Named int  ane 2   .. ..- attr(*, "names")= chr  "male person" "female"   ..$ s2_q3   : Named int  1 ii 3 4 5 6 7 eight nine   .. ..- attr(*, "names")= chr  "caput                      " "spouse/partner            " "child                     " "son/daughter-in-law       " ...   ..$ lit     : Named int  ane ii   .. ..- attr(*, "names")= chr  "no" "yes"   ..$         : Named int  1 two iii 4   .. ..- attr(*, "names")= chr  "never attanded" "primary" "secondary" "postsecondary"   ..$         : Named int  one ii   .. ..- attr(*, "names")= chr  "Not unemployed" "Unemployed"   ..$         : Named int  i two 3 four v 6 7 8 ix x   .. ..- attr(*, "names")= chr  "student" "housewife/childcare" "in retirement" "illness, inability" ...   ..$         : Named int  one ii three four v 6 vii 8 9 10   .. ..- attr(*, "names")= chr  "agriculture" "mining" "manufacturing" "utilities" ...   ..$         : Named int  1 2 3 4 5   .. ..- attr(*, "names")= chr  "private company" "public works program" "government,public sector, army" "individual individual" ...   ..$         : Named int  1 2   .. ..- attr(*, "names")= chr  "no" "aye"          
            # Create and print structure of edu_equal_2 edu_equal_2 <- read.dta(path, convert.factors = Simulated) str(edu_equal_2)          
            'data.frame':   12214 obs. of  27 variables:  $ hhid              : num  ane 1 1 2 2 3 4 4 5 vi ...  $ hhweight          : num  627 627 627 627 627 ...  $ location          : int  1 i ane one 1 two ii 2 1 1 ...  $ region            : int  eight 8 viii nine 9 4 4 iv 8 eight ...  $ ethnicity_head    : int  2 2 2 i 1 1 1 1 1 ane ...  $ age               : num  37 11 8 73 70 75 79 80 82 83 ...  $ gender            : int  ii 2 1 one 2 ane 1 two 2 2 ...  $ relation          : int  1 3 iii 1 2 1 one 2 1 1 ...  $ literate          : int  i 2 2 two 2 2 ii 2 two ii ...  $ income_mnt        : num  xiii.iii 13.3 13.3 142.5 142.5 ...  $ income            : num  160 160 160 1710 1710 ...  $ aggregate         : num  1042 1042 1042 3271 3271 ...  $ aggr_ind_annual   : num  347 347 347 1635 1635 ...  $ educ_completed    : int  two 4 4 4 iii 3 iii 3 iv 4 ...  $ grade_complete    : num  iv 3 0 3 4 4 iv 4 5 5 ...  $ grade_all         : num  4 11 8 eleven 8 8 eight 8 xiii 13 ...  $ unemployed        : int  two 1 ane one 1 ane one ane ane 1 ...  $ reason_OLF        : int  NA NA NA 3 iii 3 9 9 3 iii ...  $ sector            : int  NA NA NA NA NA NA 1 i NA NA ...  $ occupation        : int  NA NA NA NA NA NA 5 5 NA NA ...  $ earn_mont         : num  0 0 0 0 0 0 20 20 0 0 ...  $ earn_ann          : num  0 0 0 0 0 0 240 240 0 0 ...  $ hours_week        : num  NA NA NA NA NA NA 30 35 NA NA ...  $ hours_mnt         : num  NA NA NA NA NA ...  $ fulltime          : int  NA NA NA NA NA NA 1 1 NA NA ...  $ hhexp             : num  100 100 100 343 343 ...  $ legacy_pension_amt: num  NA NA NA NA NA NA NA NA NA NA ...  - attr(*, "datalabel")= chr ""  - attr(*, "fourth dimension.stamp")= chr ""  - attr(*, "formats")= chr  "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...  - attr(*, "types")= int  100 100 108 108 108 100 108 108 108 100 ...  - attr(*, "val.labels")= chr  "" "" "location" "region" ...  - attr(*, "var.labels")= chr  "hhid" "hhweight" "location" "region" ...  - attr(*, "expansion.fields")=List of 12   ..$ : chr  "_dta" "_svy_su1" "cluster"   ..$ : chr  "_dta" "_svy_strata1" "strata"   ..$ : chr  "_dta" "_svy_stages" "1"   ..$ : chr  "_dta" "_svy_version" "2"   ..$ : chr  "_dta" "__XijVarLabcons" "(sum) cons"   ..$ : chr  "_dta" "ReS_Xij" "cons"   ..$ : chr  "_dta" "ReS_str" "0"   ..$ : chr  "_dta" "ReS_j" "grouping"   ..$ : chr  "_dta" "ReS_ver" "v.2"   ..$ : chr  "_dta" "ReS_i" "hhid dur"   ..$ : chr  "_dta" "note1" "variables g1pc, g2pc, g3pc, g4pc, g5pc, g7pc, g8pc, g9pc, g10pc, g11pc, g12pc,  gall, health, rent, durables nosotros"| __truncated__   ..$ : chr  "_dta" "note0" "1"  - attr(*, "version")= int vii  - attr(*, "label.tabular array")=List of 12   ..$ location: Named int  1 two   .. ..- attr(*, "names")= chr  "urban location" "rural location"   ..$ region  : Named int  1 2 3 4 5 6 7 eight 9   .. ..- attr(*, "names")= chr  "Sofia city" "Bourgass" "Varna" "Lovetch" ...   ..$ ethnic  : Named int  one 2 3 4   .. ..- attr(*, "names")= chr  "Bulgaria" "Turks" "Roma" "Other"   ..$ s2_q2   : Named int  1 2   .. ..- attr(*, "names")= chr  "male" "female"   ..$ s2_q3   : Named int  1 two 3 4 5 6 vii 8 9   .. ..- attr(*, "names")= chr  "caput                      " "spouse/partner            " "kid                     " "son/daughter-in-law       " ...   ..$ lit     : Named int  1 ii   .. ..- attr(*, "names")= chr  "no" "yeah"   ..$         : Named int  1 2 iii 4   .. ..- attr(*, "names")= chr  "never attanded" "chief" "secondary" "postsecondary"   ..$         : Named int  1 2   .. ..- attr(*, "names")= chr  "Non unemployed" "Unemployed"   ..$         : Named int  1 2 iii four five half-dozen 7 8 9 ten   .. ..- attr(*, "names")= chr  "student" "housewife/childcare" "in retirement" "disease, disability" ...   ..$         : Named int  ane ii three 4 5 6 7 eight 9 10   .. ..- attr(*, "names")= chr  "agriculture" "mining" "manufacturing" "utilities" ...   ..$         : Named int  one two three 4 v   .. ..- attr(*, "names")= chr  "private company" "public works program" "regime,public sector, army" "private individual" ...   ..$         : Named int  1 2   .. ..- attr(*, "names")= chr  "no" "aye"          
            # Create and print construction of edu_equal_3 edu_equal_3 <- read.dta(path, convert.underscore = TRUE) str(edu_equal_3)          
            'information.frame':   12214 obs. of  27 variables:  $ hhid              : num  i one 1 ii 2 iii four 4 five half-dozen ...  $ hhweight          : num  627 627 627 627 627 ...  $ location          : Factor w/ 2 levels "urban location",..: 1 one 1 1 ane two 2 2 one 1 ...  $ region            : Factor w/ ix levels "Sofia metropolis","Bourgass",..: 8 8 8 ix 9 four 4 four eight 8 ...  $ ethnicity.head    : Gene w/ 4 levels "Republic of bulgaria","Turks",..: 2 2 2 ane 1 i i 1 1 i ...  $ age               : num  37 11 8 73 70 75 79 80 82 83 ...  $ gender            : Gene w/ ii levels "male","female": 2 2 ane 1 2 one 1 ii two 2 ...  $ relation          : Gene west/ 9 levels "head                      ",..: ane 3 3 1 2 1 ane two 1 1 ...  $ literate          : Cistron w/ 2 levels "no","yes": 1 2 2 2 2 2 2 2 two ii ...  $ income.mnt        : num  xiii.3 thirteen.three 13.three 142.five 142.5 ...  $ income            : num  160 160 160 1710 1710 ...  $ aggregate         : num  1042 1042 1042 3271 3271 ...  $ aggr.ind.annual   : num  347 347 347 1635 1635 ...  $ educ.completed    : int  2 4 four four iii iii 3 iii four 4 ...  $ grade.complete    : num  4 iii 0 three iv 4 iv 4 5 five ...  $ grade.all         : num  4 11 8 11 8 8 8 8 13 13 ...  $ unemployed        : int  2 1 1 1 1 1 i 1 ane 1 ...  $ reason.OLF        : int  NA NA NA 3 3 3 9 9 3 3 ...  $ sector            : int  NA NA NA NA NA NA one 1 NA NA ...  $ occupation        : int  NA NA NA NA NA NA five 5 NA NA ...  $ earn.mont         : num  0 0 0 0 0 0 xx 20 0 0 ...  $ earn.ann          : num  0 0 0 0 0 0 240 240 0 0 ...  $ hours.week        : num  NA NA NA NA NA NA xxx 35 NA NA ...  $ hours.mnt         : num  NA NA NA NA NA ...  $ fulltime          : int  NA NA NA NA NA NA 1 one NA NA ...  $ hhexp             : num  100 100 100 343 343 ...  $ legacy.pension.amt: num  NA NA NA NA NA NA NA NA NA NA ...  - attr(*, "datalabel")= chr ""  - attr(*, "time.postage")= chr ""  - attr(*, "formats")= chr  "%nine.0g" "%9.0g" "%9.0g" "%9.0g" ...  - attr(*, "types")= int  100 100 108 108 108 100 108 108 108 100 ...  - attr(*, "val.labels")= chr  "" "" "location" "region" ...  - attr(*, "var.labels")= chr  "hhid" "hhweight" "location" "region" ...  - attr(*, "expansion.fields")=Listing of 12   ..$ : chr  "_dta" "_svy_su1" "cluster"   ..$ : chr  "_dta" "_svy_strata1" "strata"   ..$ : chr  "_dta" "_svy_stages" "ane"   ..$ : chr  "_dta" "_svy_version" "2"   ..$ : chr  "_dta" "__XijVarLabcons" "(sum) cons"   ..$ : chr  "_dta" "ReS_Xij" "cons"   ..$ : chr  "_dta" "ReS_str" "0"   ..$ : chr  "_dta" "ReS_j" "grouping"   ..$ : chr  "_dta" "ReS_ver" "v.ii"   ..$ : chr  "_dta" "ReS_i" "hhid dur"   ..$ : chr  "_dta" "note1" "variables g1pc, g2pc, g3pc, g4pc, g5pc, g7pc, g8pc, g9pc, g10pc, g11pc, g12pc,  gall, health, rent, durables nosotros"| __truncated__   ..$ : chr  "_dta" "note0" "one"  - attr(*, "version")= int seven  - attr(*, "label.table")=List of 12   ..$ location: Named int  one 2   .. ..- attr(*, "names")= chr  "urban location" "rural location"   ..$ region  : Named int  one 2 3 4 5 6 7 8 9   .. ..- attr(*, "names")= chr  "Sofia city" "Bourgass" "Varna" "Lovetch" ...   ..$ ethnic  : Named int  1 ii 3 4   .. ..- attr(*, "names")= chr  "Bulgaria" "Turks" "Roma" "Other"   ..$ s2_q2   : Named int  ane 2   .. ..- attr(*, "names")= chr  "male" "female"   ..$ s2_q3   : Named int  1 2 3 four v six 7 eight 9   .. ..- attr(*, "names")= chr  "head                      " "spouse/partner            " "child                     " "son/girl-in-law       " ...   ..$ lit     : Named int  1 2   .. ..- attr(*, "names")= chr  "no" "yeah"   ..$         : Named int  1 2 3 4   .. ..- attr(*, "names")= chr  "never attanded" "principal" "secondary" "postsecondary"   ..$         : Named int  1 2   .. ..- attr(*, "names")= chr  "Not unemployed" "Unemployed"   ..$         : Named int  1 2 3 iv 5 vi 7 8 9 x   .. ..- attr(*, "names")= chr  "pupil" "housewife/childcare" "in retirement" "illness, inability" ...   ..$         : Named int  ane two 3 four five 6 7 viii 9 10   .. ..- attr(*, "names")= chr  "agronomics" "mining" "manufacturing" "utilities" ...   ..$         : Named int  one two iii iv five   .. ..- attr(*, "names")= chr  "private company" "public works programme" "authorities,public sector, army" "individual private" ...   ..$         : Named int  1 ii   .. ..- attr(*, "names")= chr  "no" "yeah"          

how many observations (e.g. how many people) have an age college than twoscore and are literate?

            str(edu_equal_1)          
            'data.frame':   12214 obs. of  27 variables:  $ hhid              : num  1 1 1 2 ii 3 4 4 5 6 ...  $ hhweight          : num  627 627 627 627 627 ...  $ location          : Gene w/ 2 levels "urban location",..: one 1 1 1 ane ii 2 2 1 1 ...  $ region            : Cistron westward/ nine levels "Sofia city","Bourgass",..: 8 8 8 9 9 four four 4 eight eight ...  $ ethnicity_head    : Factor w/ 4 levels "Bulgaria","Turks",..: 2 ii 2 1 1 one 1 1 1 1 ...  $ age               : num  37 11 8 73 lxx 75 79 80 82 83 ...  $ gender            : Factor w/ 2 levels "male","female": two 2 one 1 2 ane 1 2 two ii ...  $ relation          : Factor westward/ 9 levels "head                      ",..: one 3 3 i 2 one ane 2 1 ane ...  $ literate          : Factor westward/ 2 levels "no","yes": 1 2 2 ii two 2 2 2 two two ...  $ income_mnt        : num  13.iii 13.three xiii.three 142.5 142.5 ...  $ income            : num  160 160 160 1710 1710 ...  $ amass         : num  1042 1042 1042 3271 3271 ...  $ aggr_ind_annual   : num  347 347 347 1635 1635 ...  $ educ_completed    : int  2 iv 4 4 3 3 iii 3 iv 4 ...  $ grade_complete    : num  4 3 0 3 4 4 iv four 5 5 ...  $ grade_all         : num  4 11 eight 11 8 viii 8 8 xiii 13 ...  $ unemployed        : int  2 i i i one 1 i i 1 1 ...  $ reason_OLF        : int  NA NA NA three iii 3 9 ix 3 3 ...  $ sector            : int  NA NA NA NA NA NA 1 1 NA NA ...  $ occupation        : int  NA NA NA NA NA NA 5 v NA NA ...  $ earn_mont         : num  0 0 0 0 0 0 20 20 0 0 ...  $ earn_ann          : num  0 0 0 0 0 0 240 240 0 0 ...  $ hours_week        : num  NA NA NA NA NA NA 30 35 NA NA ...  $ hours_mnt         : num  NA NA NA NA NA ...  $ fulltime          : int  NA NA NA NA NA NA 1 1 NA NA ...  $ hhexp             : num  100 100 100 343 343 ...  $ legacy_pension_amt: num  NA NA NA NA NA NA NA NA NA NA ...  - attr(*, "datalabel")= chr ""  - attr(*, "time.stamp")= chr ""  - attr(*, "formats")= chr  "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...  - attr(*, "types")= int  100 100 108 108 108 100 108 108 108 100 ...  - attr(*, "val.labels")= chr  "" "" "location" "region" ...  - attr(*, "var.labels")= chr  "hhid" "hhweight" "location" "region" ...  - attr(*, "expansion.fields")=Listing of 12   ..$ : chr  "_dta" "_svy_su1" "cluster"   ..$ : chr  "_dta" "_svy_strata1" "strata"   ..$ : chr  "_dta" "_svy_stages" "1"   ..$ : chr  "_dta" "_svy_version" "2"   ..$ : chr  "_dta" "__XijVarLabcons" "(sum) cons"   ..$ : chr  "_dta" "ReS_Xij" "cons"   ..$ : chr  "_dta" "ReS_str" "0"   ..$ : chr  "_dta" "ReS_j" "group"   ..$ : chr  "_dta" "ReS_ver" "v.2"   ..$ : chr  "_dta" "ReS_i" "hhid dur"   ..$ : chr  "_dta" "note1" "variables g1pc, g2pc, g3pc, g4pc, g5pc, g7pc, g8pc, g9pc, g10pc, g11pc, g12pc,  gall, health, rent, durables we"| __truncated__   ..$ : chr  "_dta" "note0" "1"  - attr(*, "version")= int 7  - attr(*, "characterization.table")=List of 12   ..$ location: Named int  i two   .. ..- attr(*, "names")= chr  "urban location" "rural location"   ..$ region  : Named int  one 2 three 4 5 6 vii eight nine   .. ..- attr(*, "names")= chr  "Sofia metropolis" "Bourgass" "Varna" "Lovetch" ...   ..$ ethnic  : Named int  1 2 three 4   .. ..- attr(*, "names")= chr  "Bulgaria" "Turks" "Roma" "Other"   ..$ s2_q2   : Named int  1 ii   .. ..- attr(*, "names")= chr  "male" "female"   ..$ s2_q3   : Named int  1 2 3 four five 6 7 8 9   .. ..- attr(*, "names")= chr  "head                      " "spouse/partner            " "child                     " "son/girl-in-law       " ...   ..$ lit     : Named int  1 ii   .. ..- attr(*, "names")= chr  "no" "yes"   ..$         : Named int  one 2 three 4   .. ..- attr(*, "names")= chr  "never attanded" "chief" "secondary" "postsecondary"   ..$         : Named int  1 ii   .. ..- attr(*, "names")= chr  "Not unemployed" "Unemployed"   ..$         : Named int  ane 2 3 iv five half dozen 7 8 9 ten   .. ..- attr(*, "names")= chr  "educatee" "housewife/childcare" "in retirement" "illness, disability" ...   ..$         : Named int  1 2 3 4 five 6 7 8 nine ten   .. ..- attr(*, "names")= chr  "agriculture" "mining" "manufacturing" "utilities" ...   ..$         : Named int  ane 2 3 4 5   .. ..- attr(*, "names")= chr  "private visitor" "public works program" "authorities,public sector, army" "private individual" ...   ..$         : Named int  1 ii   .. ..- attr(*, "names")= chr  "no" "aye"          
            nrow(subset(edu_equal_1, age > xl & literate == "yes"))          
            [1] 6506          

How many observations/individuals of Bulgarian ethnicity have an income above yard?

            nrow(subset(edu_equal_1, ethnicity_head == "Republic of bulgaria" & income > chiliad))          
            [1] 8997          

Where foreign provided read.dta() to read Stata data, there'due south also read.spss() to read SPSS data files. To get a information frame, brand sure to fix to.data.frame = TRUE inside read.spss().

            # Import international.sav as a information frame: demo demo <- read.spss("international.sav", to.data.frame = TRUE)          
            re-encoding from CP1252          
            # Create boxplot of gross domestic product variable of demo boxplot(demo$gdp)          

If you're familiar with statistics, you'll have heard well-nigh Pearson's Correlation. It is a measurement to evaluate the linear dependency betwixt ii variables, say X and Y. It can range from -one to 1; if it's close to 1 it means that there is a strong positive association between the variables. If 10 is loftier, as well Y tends to be loftier. If information technology's close to -1, there is a potent negative association: If X is loftier, Y tends to be depression. When the Pearson correlation between two variables is 0, these variables are maybe independent: there is no association between Ten and Y.

You can calculate the correlation between two vectors with the cor() office. Take this lawmaking for example, that computes the correlation between the columns height and width of a fictional information frame size:

            cor(size$superlative, size$width)          

What is the correlation coefficient for the two numerical variables gdp and f_illit (female person illiteracy rate)?

            cor(demo$gdp, demo$f_illit)          
            [1] -0.4476856          

There are many other ways in which to customize the fashion your SPSS information is imported. use.value.labels. Information technology specifies whether variables with value labels should exist converted into R factors with levels that are named accordingly. The argument is TRUE by default which means that and then called labelled variables inside SPSS are converted to factors inside

            # Import international.sav every bit demo_1 demo_1 <- read.spss("international.sav", to.data.frame = True)          
            re-encoding from CP1252          
            # Print out the head of demo_1 head(demo_1)          
                          # Import international.sav as demo_2 demo_2 <- read.spss("international.sav", to.information.frame = True, use.value.labels = FALSE)          
            re-encoding from CP1252          
            # Print out the head of demo_2 head(demo_2)          



milfordobecam.blogspot.com

Source: https://rstudio-pubs-static.s3.amazonaws.com/595074_83acb908a9b54a6eb3da9d702fe64620.html