R/R Studio Codes - Short Codes for Common Roadblocks

My work with Dell EMC has led to me automate a few processes because:
1) Ease of data handling
2) Increase efficiency
3) Reduce the time taken to analyze loads and loads of data

All of these advantages help me make increased strategic decisions and drive multiple actions constantly. R is one tool which is helped me do regular reporting in the quickest way possible. Here are a few examples of codes that might help others who are just starting off:

Assigning all NA’s in a particular column with a required value:
dataframe$column_name[is.na(dataframe$column_name)] <- “Character/Value”

df_main$LOB[is.na(df_main$LOB)] <- “Storage”

Replacing particular value in a column with another value:
dataframe$column_name <- dataframe$column_name <- str_replace_all(“Original”,”Replacement”)

df_main$LOB <- df_main$LOB %>% str_replace_all(“Packaging”, “Storage”)

Creating Pivot Table Function similar to MS Excel in R:
dataframe <- dataframe %>% select(`All required columns`)
%>% group_by(`common grouping elements`)
%>% summarise(column_name = sum(column_name))

df_main <- df_main %>% select(`Part No`,`Supplier`,`LOB`,`STD Cost`,`Total Quantity`,`Total Cost`)%>% group_by(`Part No`,`LOB`,`STD Cost`) %>% summarise(‘Total Quantity’ = sum(`Total Quantity`), ‘Total Cost’ = sum(`Total Cost`))

Removing duplicate values in single column:
dataframe <- dataframe[!duplicated(dataframe$column_name),]

df_main <- df_main[!duplicated(df_main$LOB),]

Removing duplicate rows (where all values are same):
dataframe$concatenate <- paste(‘all columns in table’)
dataframe <- dataframe[!duplicated(dataframe$column_name),]

df_main$concatenate <- paste(df_main$`Part No`, df_main$`Supplier`, df_main$`Total Quantity`)
df_main <- df_main[!duplicated(df_main$concatenate),]

Merging different dataframes:
dataframe <- merge(dataframe1, dataframe2, by.x = ‘Common Column from dataframe1’, by.y = ‘Common Column from dataframe2’)

df_main <- merge(df_test1 ,df_test2 , by.x = ‘Part No’, by.y =’Material’)

dataframe <- merge(dataframe1, dataframe2, by.x = ‘Common Column from dataframe1’, by.y = ‘Common Column from dataframe2’, all = T)

df_main <- merge(df_test1 ,df_test2 , by.x = ‘Part No’, by.y =’Material’, all = T)

dataframe <- merge(dataframe1, dataframe2, by.x = ‘Common Column from dataframe1’, by.y = ‘Common Column from dataframe2’, all = T)

df_main <- merge(df_test1 ,df_test2 , by.x = ‘Part No’, by.y =’Material’, all.x = TRUE)

Selecting only required columns:
dataframe <- dataframe %>% select(‘Required Columns’)

df_main <- df_main%>% select(Material,Supplier)

Conditionally change value of one column depending on another column value of same row of a dataframe:
dataframe <- dataframe %>% mutate(column_name1 = replace (column_name1, which(column_name2 == “Conditional Value”), “New Value”))

df_main <- df_main %>% mutate(Supplier = replace(Supplier, which(family == “TOYOTA”), “Toyota Yaris”))

Basic data cleaning (removing comma, slash, dashes and dots in R):
dataframe$column_name <- chartr(“Character to be trimmed”, “Replace with other character or space”, dataframe$column_name)

df_main$description <- chartr(“,”, ” “, df_main$description)
df_main$description <- chartr(“/”, ” “, df_main$description)
df_main$description <- chartr(“-“, ” “, df_main$description)
df_main$description <- chartr(“.”, ” “, df_main$description)

Conditionally change value of one column depending on another column value of same row of a dataframe (where you would have find a word or pattern – for example here the code the word Yaris in ‘Desciption’ and replaces the value in ‘Supplier’ to Toyota Yaris):

df_flag <- df_main
count <- grep(“YARIS”, df_flag$description)
flag <- length(count)
for (val in count){df_flag$description[count[flag]] <- “YARIS”
flag <- flag – 1}
df_main <- merge(df_main,df_flag, by.x = ‘Part No’, by.y =’Part No’)
df_main <- df_main %>% mutate(Supplier = replace(Supplier, which(description == “YARIS”), “Toyota Yaris”))

Filtering a column to get only required rows:
dataframe <- dataframe %>% select(‘Required Columns’) %>% filter(column_name == ‘Filter Condition’)

ds_test3 <- ds_test3 %>% select(`Part No`,`Supplier`,`LOB) %>% filter(`LOB` == “Storage”)

Creating dataframe row wise (Headers in columns and values across the rows):
ds_main <- data.frame( c(“Column Headers”), c(“Column Values))

ds_values <- data.frame( c(“Total”,”Status”,”Current Total Value”), c(“”,Status,Current_Total_Value))
#Make sure that the number of items in both column headers and column values is the same.

Setting current dates as column headers:
date <- format(seq(dmy(“Start Date”), by = “weeks/days, length.out = ‘Number of repeatations’), format = “Month – Date/Date -Month/Date – Month – Year”)

date <- format(seq(dmy(“23/06/2020”), by = “weeks”, length.out = 13), format=”%m-%d”)
#Count helps to start from a particular point in the dataframe. If its first column, make count = 1. If there is gaps between columns – increase the increment in the for loop.
count <- 9
for (val in date){names(df_main)[count]<-val
count<-count+1}
rm(count)

Dropping a column from a dataframe:
dataframe <- subset(dataframe, select = -c(description))

df_main <- subset(df_main, select = -c(description))
df_main <- select (df_main, -c(description))

Removing rows based on multiple conditions:
dataframe <- dataframe[!(dataframe$column_name2 == “Condition 1” & dataframe$column_name2 == “Condition 2”),]

df_main<-df_main[!(df_main$Supplier==”Toyota” & df_main$Color==”Blue)”),]

Creating new column conditonally based on another column:
dataframe$’New Column Name’ <- apply(dataframe, 1, FUN = function(x) if(x[column number]==”Condition”) “New Column Value” else “Not-Known”)

df_main$Region <- apply(df_main, 1, FUN = function(df_main) if(df_main$Supplier==”Toyota”) “Japan” else “Not-Known”)

Concatenate multiple column values:
dataframe$concatenate <- paste(dataframe$column1, “Seperator”, dataframe$column2)

df_main$Concat <- paste(df_main$Supplier, “/”, df_main$Model)

Libraries Used:
tidyverse – The core packages are ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats, which provide functionality to model, transform, and visualize data. The use of dplyr which help write constructive and efficient codes.
naniar – It provides data structures and functions that facilitate the plotting of missing values and examination of imputations.
magrittr – The magrittr package offers a set of operators which make your code more readable by structuring sequences of data operations left-to-right (as opposed to from the inside and out), avoiding nested function calls,
minimizing the need for local variables and function definitions, and
making it easy to add steps anywhere in the sequence of operations. %>%
lubridate – Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not.
readxl & writexl – Provides us the ability to read and write xls sheets.

R/R Studio Codes – Short Codes for Common Roadblocks

Related

Leave a Reply Cancel reply

Share this:

Related

Leave a Reply Cancel reply