Title: | Tools Developed by the Long Term Ecological Research Community |
---|---|
Description: | Set of the data science tools created by various members of the Long Term Ecological Research (LTER) community. These functions were initially written largely as standalone operations and have later been aggregated into this package. |
Authors: | Nicholas Lyon [aut, cre] (https://njlyon0.github.io/), Angel Chen [aut] (https://angelchen7.github.io), Miguel C. Leon [ctb] (https://luquillo.lter.network/), National Science Foundation [fnd] (NSF 1929393, 09/01/2019 - 08/31/2024), University of California, Santa Barbara [cph] |
Maintainer: | Nicholas Lyon <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 2.0.0.900 |
Built: | 2025-03-27 14:22:17 UTC |
Source: | https://github.com/lter/ltertools |
Creates the start of a 'column key' for harmonizing data. A column key includes a column for the file names to be harmonized into a single data object as well as a column for the column names in those files. Finally, it includes a column indicating the tidied name that corresponds with each raw column name. Harmonization can accept this key object and use it to rename all raw column names–in a reproducible way–to standardize across datasets. Currently supports raw files of the following formats: CSV, TXT, XLS, and XLSX
begin_key( raw_folder = NULL, data_format = c("csv", "txt", "xls", "xlsx"), guess_tidy = FALSE )
begin_key( raw_folder = NULL, data_format = c("csv", "txt", "xls", "xlsx"), guess_tidy = FALSE )
raw_folder |
(character) folder / folder path containing data files to include in key |
data_format |
(character) file extensions to identify within the |
guess_tidy |
(logical) whether to attempt to "guess" what the tidy name equivalent should be for each raw column name. This is accomplished via coercion to lowercase and removal of special character/repeated characters. If |
(dataframe) skeleton of column key
# Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Generate a column key with "guesses" at tidy column names ltertools::begin_key(raw_folder = temp_folder, data_format = "csv", guess_tidy = TRUE)
# Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Generate a column key with "guesses" at tidy column names ltertools::begin_key(raw_folder = temp_folder, data_format = "csv", guess_tidy = TRUE)
Accepts a column key dataframe and checks to make sure it has the needed structure for ltertools::harmonize
. Also removes unnecessary columns and rows that lack a "tidy_name". Function invoked 'under the hood' by ltertools::harmonize
.
check_key(key = NULL)
check_key(key = NULL)
key |
(dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored |
(dataframe) key object with only "source", "raw_name" and "tidy_name" columns and only retains rows where a "tidy_name" is specified.
# Generate a column key object manually key_obj <- data.frame("source" = c(rep("df1.csv", 3), rep("df2.csv", 3)), "raw_name" = c("xx", "unwanted", "yy", "LETTERS", "NUMBERS", "BONUS"), "tidy_name" = c("numbers", NA, "letters", "letters", "numbers", "kingdom")) # Check it ltertools::check_key(key = key_obj)
# Generate a column key object manually key_obj <- data.frame("source" = c(rep("df1.csv", 3), rep("df2.csv", 3)), "raw_name" = c("xx", "unwanted", "yy", "LETTERS", "NUMBERS", "BONUS"), "tidy_name" = c("numbers", NA, "letters", "letters", "numbers", "kingdom")) # Check it ltertools::check_key(key = key_obj)
Converts a given set of temperature values from one unit to another
convert_temp(value = NULL, from = NULL, to = NULL)
convert_temp(value = NULL, from = NULL, to = NULL)
value |
(numeric) temperature values to convert |
from |
(character) starting units of the value, not case sensitive. |
to |
(character) units to which to convert, not case sensitive. |
(numeric) converted temperature values
# Convert from Fahrenheit to Celsius convert_temp(value = 32, from = "Fahrenheit", to = "c")
# Convert from Fahrenheit to Celsius convert_temp(value = 32, from = "Fahrenheit", to = "c")
Computes the coefficient of variation (CV), by dividing the standard deviation (SD) by the arithmetic mean of a set of numbers. If na_rm
is TRUE
then missing values are removed before calculation is completed
cv(x, na_rm = TRUE)
cv(x, na_rm = TRUE)
x |
(numeric) vector of numbers for which to calculate CV |
na_rm |
(logical) whether to remove missing values from both average and SD calculation |
(numeric) coefficient of variation
# Convert from Fahrenheit to Celsius cv(x = c(4, 5, 6, 4, 5, 5), na_rm = TRUE)
# Convert from Fahrenheit to Celsius cv(x = c(4, 5, 6, 4, 5, 5), na_rm = TRUE)
Data discovery–and harmonization–is an iterative process. For those already depending upon a column key and the harmonize
function, it can be cumbersome to add rows to an existing column key. This function formats rows for an existing column key for only datasets that are not already (A) in the column key or (B) in the harmonized data table.
expand_key( key = NULL, raw_folder = NULL, harmonized_df = NULL, data_format = c("csv", "txt", "xls", "xlsx"), guess_tidy = FALSE )
expand_key( key = NULL, raw_folder = NULL, harmonized_df = NULL, data_format = c("csv", "txt", "xls", "xlsx"), guess_tidy = FALSE )
key |
(dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored |
raw_folder |
(character) folder / folder path containing data files to include in key |
harmonized_df |
(dataframe) harmonized data table produced with the current version of the column key. Must include a "source" column but other columns are ignored. |
data_format |
(character) file extensions to identify within the |
guess_tidy |
(logical) whether to attempt to "guess" what the tidy name equivalent should be for each raw column name. This is accomplished via coercion to lowercase and removal of special character/repeated characters. If |
(dataframe) skeleton of rows to add to column key for data sources not already in harmonized data table
# Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Generate a column key with "guesses" at tidy column names key1 <- ltertools::begin_key(raw_folder = temp_folder, data_format = "csv", guess_tidy = TRUE) # Harmonize the data harmony <- ltertools::harmonize(key = key1, raw_folder = temp_folder) # Make a new data file df3 <- data.frame("xx" = c(10:15), "letters" = letters[10:15]) # Export this locally to the temp folder too utils::write.csv(x = df3, file = file.path(temp_folder, "df3.csv"), row.names = FALSE) # Identify what needs to be added to the existing column key ltertools::expand_key(key = key1, raw_folder = temp_folder, harmonized_df = harmony, data_format = "csv", guess_tidy = TRUE)
# Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Generate a column key with "guesses" at tidy column names key1 <- ltertools::begin_key(raw_folder = temp_folder, data_format = "csv", guess_tidy = TRUE) # Harmonize the data harmony <- ltertools::harmonize(key = key1, raw_folder = temp_folder) # Make a new data file df3 <- data.frame("xx" = c(10:15), "letters" = letters[10:15]) # Export this locally to the temp folder too utils::write.csv(x = df3, file = file.path(temp_folder, "df3.csv"), row.names = FALSE) # Identify what needs to be added to the existing column key ltertools::expand_key(key = key1, raw_folder = temp_folder, harmonized_df = harmony, data_format = "csv", guess_tidy = TRUE)
A "column key" is meant to streamline harmonization of disparate datasets. This key must include three columns containing: (1) the name of each raw data file to be harmonized, (2) the name of all of the columns in each of those files, and (3) the "tidy name" that corresponds to each raw column name. This function accepts that key and the path to a folder containing all raw data files included in the key. Each dataset is then read in and the original column names are replaced with their respective "tidy_name" indicated in the key. Once this has been done to all files, a single dataframe is returned with only columns indicated in the column name. Currently the following file formats are supported for the raw data: CSV, TXT, XLS, and XLSX
Note that raw column names without an associated tidy name in the key are removed. We recommend using the begin_key
function in this package to generate the skeleton of the key to make achieving the required structure simpler.
harmonize( key = NULL, raw_folder = NULL, data_format = c("csv", "txt", "xls", "xlsx"), quiet = TRUE )
harmonize( key = NULL, raw_folder = NULL, data_format = c("csv", "txt", "xls", "xlsx"), quiet = TRUE )
key |
(dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored |
raw_folder |
(character) folder / folder path containing data files to include in key |
data_format |
(character) file extensions to identify within the |
quiet |
(logical) whether to suppress certain non-warning messages. Defaults to |
(dataframe) harmonized dataframe including all columns defined in the "tidy_name" column of the key object
# Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Generate a column key object manually key_obj <- data.frame("source" = c(rep("df1.csv", 3), rep("df2.csv", 3)), "raw_name" = c("xx", "unwanted", "yy", "LETTERS", "NUMBERS", "BONUS"), "tidy_name" = c("numbers", NA, "letters", "letters", "numbers", "kingdom")) # Use that to harmonize the 'raw' files we just created ltertools::harmonize(key = key_obj, raw_folder = temp_folder, data_format = "csv")
# Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Generate a column key object manually key_obj <- data.frame("source" = c(rep("df1.csv", 3), rep("df2.csv", 3)), "raw_name" = c("xx", "unwanted", "yy", "LETTERS", "NUMBERS", "BONUS"), "tidy_name" = c("numbers", NA, "letters", "letters", "numbers", "kingdom")) # Use that to harmonize the 'raw' files we just created ltertools::harmonize(key = key_obj, raw_folder = temp_folder, data_format = "csv")
There are currently 28 field sites involved with the Long Term Ecological Research (LTER) network. These sites occupy a range of habitats and were started / are renewed on site-specific timelines. To make this information more readily available to interested parties, this data object summarizes the key components of each site in an easy-to-use data format.
lter_sites
lter_sites
Dataframe with 8 columns and 32 rows
Full name of the LTER site
Abbreviation (typically three letters) of the site name
Simplified habitat designation of the site (or "mixed" for more complex habitat contexts)
Year of initial funding by NSF as an official LTER site
End of current funding cycle grant
Degrees latitude of site
Degrees longitude of site
Website URL for the site
Long Term Ecological Research Network Office. https://lternet.edu/site/
Reads in all data files of specified types found in the designated folder. Returns a list with one element for each data file. Currently supports CSV, TXT, XLS, and XLSX
read(raw_folder = NULL, data_format = c("csv", "txt", "xls", "xlsx"))
read(raw_folder = NULL, data_format = c("csv", "txt", "xls", "xlsx"))
raw_folder |
(character) folder / folder path containing data files to read |
data_format |
(character) file extensions to identify within the |
(list) data found in specified folder of specified file format(s)
# Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Read in all CSV files in that folder read(raw_folder = temp_folder, data_format = "csv")
# Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Read in all CSV files in that folder read(raw_folder = temp_folder, data_format = "csv")
Subsets the information on long term ecological research (LTER) sites based on user-specified site codes (i.e., three letter abbreviations), and/or desired habitats. See lter_sites
for the full set of site information
site_subset(sites = NULL, habitats = NULL)
site_subset(sites = NULL, habitats = NULL)
sites |
(character) three letter site code(s) identifying site(s) of interest |
habitats |
(character) habitat(s) of interest. See |
(dataframe) complete site information (8 columns) for all sites that meet the provided site code and/or habitat criteria
Creates a ggplot2 plot of all sites that meet the user-specified site code (i.e., three letter abbreviation) and/or habitat criteria. See lter_sites
for the full set of site information including accepted site codes and habitat designations (unrecognized entries will trigger a warning and be ignored). Lines are grouped and colored by habitat to better emphasize possible similarities among sites
site_timeline(sites = NULL, habitats = NULL, colors = NULL)
site_timeline(sites = NULL, habitats = NULL, colors = NULL)
sites |
(character) three letter site code(s) identifying site(s) of interest |
habitats |
(character) habitat(s) of interest. See |
colors |
(character) colors to assign to the timelines expressed as a hexadecimal (e.g, #00FF00). Note there must be as many colors as habitats included in the graph |
(ggplot2) plot object of timeline of site(s) that meet user-specified criteria
# Make the full timeline of all sites with default colors by supplying no arguments site_timeline() # Or make a timeline of only sites that meet certain criteria site_timeline(habitats = c("grassland", "forest"))
# Make the full timeline of all sites with default colors by supplying no arguments site_timeline() # Or make a timeline of only sites that meet certain criteria site_timeline(habitats = c("grassland", "forest"))
For all days between the specified start and end date, identify the time of sunrise, sunset, and solar noon (in UTC) as well as the day length. The idea for this function was contributed by Miguel C. Leon and a Python equivalent lives in the Luquillo site's LUQ-general-utils GitHub repository.
solar_day_info( lat = NULL, lon = NULL, start_date = NULL, end_date = NULL, quiet = FALSE )
solar_day_info( lat = NULL, lon = NULL, start_date = NULL, end_date = NULL, quiet = FALSE )
lat |
(numeric) latitude coordinate for which to find day length |
lon |
(numeric) longitude coordinate for which to find day length |
start_date |
(character) starting date in 'YYYY-MM-DD' format |
end_date |
(character) ending date in 'YYYY-MM-DD' format |
quiet |
(logical) whether to suppress certain non-warning messages. Defaults to |
(dataframe) table of 6 columns and a number of rows equal to the number of days between the specified start and end dates (inclusive). Columns contain: (1) date, (2) sunrise time, (3) sunset time, (4) solar noon, (5) day length, and (6) time zone of columns 2 to 4.
## Not run: # Identify day information in Santa Barbara (California) for one week solar_day_info(lat = 34.416857, lon = -119.712777, start_date = "2022-02-07", end_date = "2022-02-12", quiet = TRUE) ## End(Not run)
## Not run: # Identify day information in Santa Barbara (California) for one week solar_day_info(lat = 34.416857, lon = -119.712777, start_date = "2022-02-07", end_date = "2022-02-12", quiet = TRUE) ## End(Not run)
A "column key" is meant to streamline harmonization of disparate datasets. This key must include three columns containing: (1) the name of each raw data file to be harmonized, (2) the name of all of the columns in each of those files, and (3) the "tidy name" that corresponds to each raw column name. This function accepts that key and a list of datasets that can be standardized with that key. The function standardizes the specified dataset out of any number of datasets in the key or list. While usable on its own, this function is intended to streamline internal operations of ltertools::harmonize
– which is the recommended tool for key-based harmonization.
standardize(focal_file = NULL, key = NULL, df_list = NULL)
standardize(focal_file = NULL, key = NULL, df_list = NULL)
focal_file |
(character) filename corresponding to one value of "source" column of "key" data and to one name in "df_list". |
key |
(dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored |
df_list |
(list) named list of dataframe-like objects where each name is the filename initially containing that data |
(dataframe) single standardized dataframe including all columns defined in the "tidy_name" column of the key object
#' # Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Read in list of these data files data_list <- ltertools::read(raw_folder = temp_folder, data_format = "csv") # Generate a column key object manually key_obj <- data.frame("source" = c(rep("df1.csv", 3), rep("df2.csv", 3)), "raw_name" = c("xx", "unwanted", "yy", "LETTERS", "NUMBERS", "BONUS"), "tidy_name" = c("numbers", NA, "letters", "letters", "numbers", "kingdom")) # Standardize one dataset ltertools::standardize(focal_file = "df1.csv", key = key_obj, df_list = data_list)
#' # Generate two simple tables ## Dataframe 1 df1 <- data.frame("xx" = c(1:3), "unwanted" = c("not", "needed", "column"), "yy" = letters[1:3]) ## Dataframe 2 df2 <- data.frame("LETTERS" = letters[4:7], "NUMBERS" = c(4:7), "BONUS" = c("plantae", "animalia", "fungi", "protista")) # Generate a local folder for exporting temp_folder <- tempdir() # Export both files to that folder utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE) utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE) # Read in list of these data files data_list <- ltertools::read(raw_folder = temp_folder, data_format = "csv") # Generate a column key object manually key_obj <- data.frame("source" = c(rep("df1.csv", 3), rep("df2.csv", 3)), "raw_name" = c("xx", "unwanted", "yy", "LETTERS", "NUMBERS", "BONUS"), "tidy_name" = c("numbers", NA, "letters", "letters", "numbers", "kingdom")) # Standardize one dataset ltertools::standardize(focal_file = "df1.csv", key = key_obj, df_list = data_list)