Filtering in R Programming

 

Filtering is a common operation in R programming used to extract specific subsets of data from a larger data structure, such as a vector or matrix. In R, filtering can be done using logical indexing, which involves specifying a logical condition that is applied to each element of the data structure. Elements that satisfy the condition are included in the filtered subset, while those that do not are excluded.


Ø   Types of Filtering

There are two types of filtering in R programming:


A. Boolean Indexing: Boolean indexing is a filtering method that uses logical operators to subset data. It involves specifying a logical condition that is applied to each element of the data structure, and returns a Boolean vector of TRUE or FALSE values for each element.


Syntax:

vector[condition] # filtering a vector

matrix[condition] # filtering a matrix


Parameters:

The condition can be a logical expression, comparison, or a combination of both. It can also use logical operators such as &, |, and !.


Example:

# filtering a vector using boolean indexing

x <- c(1, 2, 3, 4, 5)

x[x > 3]

# filtering a matrix using boolean indexing

m <- matrix(1:9, nrow = 3)

m[m > 5]


Output:

[1] 4 5

[1] 6 7 8 9


B. Filter Function: The filter function is another filtering method in R that allows users to select rows from a data frame that meet specific criteria. This function is part of the dplyr package in R.


Syntax:      filter(data, condition) # filtering a data frame


Parameters:

The filter function requires two parameters: the data frame to be filtered and the condition to be applied to the data. The condition can be a logical expression or comparison.


Example:

# create a data frame

df <- data.frame(name = c("Alice", "Bob", "Charlie", "David"),

                 age = c(25, 30, 35, 40),

                 sex = c("F", "M", "M", "M"))

# filter rows based on a condition

library(dplyr)

filtered_df <- filter(df, age > 30)

filtered_df # returns "Charlie" and "David" rows


Output:

          Name             age                 sex

1       Charlie           35                  M

2       David             40                    M

No comments:

Post a Comment

R Programming Language

Data Analytics and the key concepts and techniques of R language

Data Analytics  Data analytics is the process of examining, cleaning, transforming, and modeling data to extract useful information and insi...