Higher Dimensional Arrays In R Programming

 

Higher Dimensional Arrays

A higher dimensional array is an array with more than two dimensions. In R programming language, a higher dimensional array can be created using the array() function.

Syntax:                          array(data = NA, dim = length(data), dimnames = NULL)

Parameters:

l     data: A vector or matrix containing the data for the array.

l     dim: A vector of integers specifying the dimensions of the array.

l     dimnames: An optional list of character vectors giving the names of the dimensions.

Ø    Types of Higher Dimensional Arrays:

There are several types of higher dimensional arrays in R, such as:

l     Three-dimensional arrays

l     Four-dimensional arrays

l     Five-dimensional arrays

l     n-dimensional arrays.

Creating a Higher Dimensional Array:

To create a three-dimensional array in R, we can use the array() function.

For example, let's create a 3x3x2 array:

my_array <- array(1:18, dim=c(3,3,2))                                     # create a 3x3x2 array

print(my_array)                                                                              # print the array

Output:

, , 1

     [,1] [,2] [,3]

[1,]    1    4    7

[2,]    2    5    8

[3,]    3    6    9

, , 2

     [,1] [,2] [,3]

[1,]   10   13   16

[2,]   11   14   17

[3,]   12   15   18

Ø    Accessing and Modifying a Higher Dimensional Array:

We can access and modify the elements of a higher dimensional array using the same indexing methods as for a matrix.

For example, accessing and modifying the elements in the second row, third column, and first layer of the above array.

my_array[2,3,1]     # access the element in the second row, third column, and first layer

my_array[2,3,1] <- 20    # modify the element in the second row, third column, and first layer

print(my_array)                  # print the modified array

Output:

[1] 8

, , 1

     [,1] [,2] [,3]

[1,]    1    4    7

[2,]    2    5   20

[3,]    3    6    9

, , 2

     [,1] [,2] [,3]

[1,]   10   13   16

[2,]   11   14   17

[3,]   12   15   18

Ø    Naming Columns and Rows in a Higher Dimensional Array:

We can name the rows, columns, and layers of a higher dimensional array using the dimnames() function.

For example, let's name the rows, columns, and layers of the above array:

# name the rows, columns, and layers of the array

dimnames(my_array) <- list(c("row1", "row2", "row3"), c("col1", "col2", "col3"))

Output :

      col1 col2 col3

row1   ...  ...  ...

row2   ...  ...  ...

row3   ...  ...  ...

Ø    Manipulating Array Elements

As the array is made up of matrices in multiple dimensions, the operations on elements of the array are carried out by accessing elements of the matrices.

There are various different operations can be performed on Arrays.

Example:

# Create two vectors of different lengths.

vector1 <- c(5, 9, 3)

vector2 <- c(10, 11, 12, 13, 14, 15)

# Take these vectors as input to the array.

array1 <- array(c(vector1, vector2), dim = c(3, 3, 2))

# Create two vectors of different lengths.

vector3 <- c(9, 1, 0)

vector4 <- c(6, 0, 11, 3, 14, 1, 2, 6, 9)

array2 <- array(c(vector1, vector2), dim = c(3, 3, 2))

# create matrices from these arrays.

matrix1 <- array1[,,2]

matrix2 <- array2[,,2]

# Add the matrices.

result <- matrix1 + matrix2

print(result)

Output:

    [,1] [,2] [,3]

[1,]   10   20   26

[2,]   18   22   28

[3,]    6   24   30


Need of Dimension reduction and Some Tips For Effective Use Of Dimension Reductions

Use Of Dimension Reduction 

Dimension reduction techniques are used to reduce the number of variables or features in a dataset while retaining most of the important information.

There are several reasons why we use dimension reduction:

a.            Simplification of the analysis: With a large number of variables, it can be difficult to analyze and interpret the data. Dimension reduction techniques simplify the analysis by reducing the number of variables and making the data easier to understand.

b.            Noise reduction: Many datasets contain noise or irrelevant variables that can obscure the important information. Dimension reduction techniques can help to remove this noise and focus on the important variables.

c.            Improved efficiency: High-dimensional data can be computationally expensive to

analyze. Dimension reduction techniques can reduce the number of variables, which can improve the computational efficiency of analysis.

d.            Better visualization: It can be difficult to visualize high-dimensional data. Dimension reduction techniques can help to visualize the data in a lower-dimensional space.

e.            Improved predictive performance: In many cases, reducing the number of variables can improve the predictive performance of machine learning models. By removing noise and irrelevant variables, dimension reduction techniques can help to improve the accuracy of predictions.


Some Tips For Effective Use Of Dimension Reductions

Here are some tips for effective use of dimension reductions:

a.                  Understand the data: Before applying any dimension reduction technique, it is important to understand the data and the underlying patterns. This can help in selecting the appropriate technique that is suitable for the data.

b.                 Select the appropriate technique: There are many different dimension reduction techniques available, each with its own strengths and weaknesses. It is important to select the appropriate technique that is suitable for the data and the task at hand.

c.                  Check the assumptions: Different dimension reduction techniques have different assumptions. It is important to check if the assumptions are met before applying the technique.

d.                 Choose the right number of dimensions: The number of dimensions to keep is an important decision in dimension reduction. It is important to choose the right number of dimensions that retain the most important information while avoiding over fitting.

e.                 Evaluate the results: It is important to evaluate the results of dimension reduction techniques to ensure that they are meaningful and useful for the intended task.

f.                   Consider the interpret ability: Some dimension reduction techniques may produce results that are difficult to interpret. It is important to consider the interpret ability of the results and select a technique that produces results that are easy to interpret.

g.                  Use visualization techniques: Visualization techniques can be useful for exploring the data and the results of dimension reduction techniques. They can help in identifying patterns and relationships that may not be apparent in the raw data.

h.                 Consider computational complexity: Some dimension reduction techniques can be computationally expensive and may not be feasible for large datasets. It is important to consider the computational complexity of the technique and select one that is suitable for the available computational resources.

i.                    Combine multiple techniques: It may be useful to combine multiple dimension reduction techniques to achieve the best results. For example, a nonlinear technique may be followed by a linear technique to further reduce the dimensionality of the data.

j.                    Keep the original data: It is important to keep the original data, even after applying dimension reduction techniques. This can help in re-analyzing the data if needed and can also be useful for validating the results of the dimension reduction technique.


Dimension Reduction and techniques

Dimension Reduction and techniques

Avoiding dimension reduction means preserving the original dimensionality of the data in a machine learning or statistical analysis. Dimension reduction is a technique used to reduce the number of variables or features in a dataset, while retaining the maximum amount of information. It is used to simplify data visualization, reduce computation time, and to prevent over-fitting in machine learning models.

Ø    There are two main types of dimension reduction techniques:

a.            Feature Selection - this involves selecting a subset of the original features from the dataset based on their importance or relevance to the problem at hand.

b.            Feature Extraction - this involves transforming the original features into a smaller set of new features that capture the most important information in the original features.

Dimension reduction is a technique used to reduce the number of features or variables in a dataset while retaining the most important information. The main reason for using dimension reduction is to reduce the computational complexity of models, improve model performance, and to make data visualization easier.

Ø    There are several common dimension reduction techniques:

A. Principal Component Analysis (PCA): PCA is a statistical technique that reduces the dimensionality of a dataset by identifying the most important features in the data. PCA transforms the original features into a new set of uncorrelated features called principal components.

Example: Consider a dataset with features such as age, income, education level, and occupation. PCA can be used to identify the most important features that contribute the most to the overall variance of the data.

Code :                       # Generate sample data

set.seed(123)

x1 <- rnorm(100)

x2 <- rnorm(100)

x3 <- 2*x1 + 3*x2 + rnorm(100)

dat <- data.frame(x1, x2, x3)

# Perform PCA

pca <- prcomp(dat, scale = TRUE)

summary(pca)

Output :

Importance of components:

                          PC1    PC2     PC3

Standard deviation     1.3860 1.0207 0.19316

Proportion of Variance 0.6403 0.3473 0.01244

Cumulative Proportion  0.6403 0.9876 1.00000

B. Singular Value Decomposition (SVD): SVD is a matrix factorization technique used to reduce the dimensionality of a dataset by identifying the most important features in the data. SVD decomposes a matrix into three components: a left singular matrix, a diagonal matrix, and a right singular matrix.

Example: SVD can be used to identify the most important features in an image dataset. The diagonal matrix represents the most important features that contribute the most to the overall variance of the data.

Code :                     # Generate sample data

set.seed(123)

x1 <- rnorm(100)

x2 <- rnorm(100)

x3 <- 2*x1 + 3*x2 + rnorm(100)

dat <- data.frame(x1, x2, x3)

# Perform SVD

svd_dat <- svd(scale(dat))

u <- svd_dat$u

d <- svd_dat$d

v <- svd_dat$v

C. t-distributed Stochastic Neighbor Embedding (t-SNE): t-SNE is a technique used for visualization of high-dimensional data. t-SNE reduces the dimensionality of the dataset while preserving the relationships between data points.

Example:

Consider a dataset with high-dimensional features such as gene expression levels. t-SNE can be used to visualize the relationships between different genes and identify clusters of genes that are related to a particular biological process.

Code :                     # Load sample data

library(Rtsne)

data(mnist)

# Subset data to 1000 observations

set.seed(123)

idx <- sample(nrow(mnist$y), 1000)

dat <- mnist$x[idx,]

# Perform t-SNE

tsne <- Rtsne(dat, dims = 2, perplexity = 30, verbose = TRUE)

plot(tsne$Y)

D. Non-negative Matrix Factorization (NMF): NMF is a matrix factorization technique used to identify the most important features in a dataset. NMF decomposes a matrix into two non-negative matrices, where the columns of the first matrix represent the most important features in the data.

Example: NMF can be used to identify the most important features in a dataset of text documents. The columns of the first matrix represent the most important topics in the documents.

Code :            # Generate sample data

set.seed(123)

dat <- matrix(abs(rnorm(100)), nrow = 10, ncol = 10)

# Perform NMF

nmf_res <- nmf(dat, 5)

W <- nmf_res$W

H <- nmf_res$H

E. Independent Component Analysis (ICA): ICA is a statistical technique used to identify independent components in a dataset. ICA separates a dataset into independent sources by assuming that the sources are non-Gaussian and statistically independent.

Example: ICA can be used to identify independent components in a dataset of EEG signals. ICA can be used to separate the independent sources of brain activity from the signals recorded

by the electrodes.

Code :                     library(ica)

ica_cereals_num <- icafast(cereals_num[, 1:12], 2,

  center = TRUE, maxit = 100,

  tol = 1e-6

)

ica_cereals_num <- data.frame(

  ICA1 = ica_cereals_num$Y[, 1],

  ICA2 = ica_cereals_num$Y[, 2],

  label = cereals_num$label,

  classification = cereals_num$classification

)

ggplot(ica_cereals_num, aes(

  x = ICA1, y = ICA2,

  label = label, col = classification

)) +

  geom_point() +

  ggrepel::geom_text_repel(cex = 2.5)

Output :        


Vectors vs Matrices

 Difference Between Vectors and Matrices

The main differences between vectors and matrices in r programming:

Feature

Vectors

Matrices

Dimensionality

1D

2D

Syntax

c(1, 2, 3)

matrix(1:9, nrow=3)

Element Access

vec[1]

mat[1, 2]

Element Addition

vec <- c(vec, 4)

mat <- rbind(mat, c(10, 11, 12)) <br> mat <- cbind(mat, c(4, 5, 6))

Element Deletion

vec <- vec[-2]

mat <- mat[-2, ] <br> mat <- mat[, -2]

Concatenation

c(vec1, vec2)

rbind(mat1, mat2) <br> cbind(mat1, mat2)

Mathematical Ops

vec1 + vec2

mat1 + mat2

Element-wise Ops

vec1 * vec2

mat1 * mat2

Transpose

t(vec)

t(mat)

Special Functions

sum(vec)

apply(mat, 1, sum)

Names

vec <- c(a=1, b=2, c=3)

colnames(mat) <- c("A", "B", "C") <br> rownames(mat) <- c("X", "Y", "Z")

Ø    Some additional points of difference are:

i.             Vectors can have any length, whereas matrices have to have a specified number of rows and columns.

ii.            Vectors are 1-dimensional, whereas matrices are 2-dimensional. This means that vectors have only one "axis" or "dimension" to them, whereas matrices have two (rows and columns).

iii.           Vectors are usually created using the c() function, whereas matrices are created using the matrix() function (although matrices can also be created using other functions like cbind() and rbind()).

iv.           Accessing an element of a vector requires only one index (e.g. vec[1]), whereas accessing an element of a matrix requires two indices, one for the row and one for the column (e.g. mat[1, 2]).

v.            Vectors can be concatenated using the c() function, whereas matrices can be concatenated using either rbind() (for concatenating rows) or cbind() (for concatenating columns).

vi.           Mathematical operations on vectors (e.g. vec1 + vec2) are performed element-wise, whereas mathematical operations on matrices (e.g. mat1 + mat2) are performed element-wise on the corresponding elements.

vii.         Special functions like sum() can be applied to vectors to compute a summary statistic, whereas for matrices, the apply() function is typically used to apply a function to either rows or columns (specified by the second argument).

viii.        Vectors can have named elements, whereas matrices can have named rows and columns (specified using the rownames() and colnames() functions).


R Programming Language

Data Analytics and the key concepts and techniques of R language

Data Analytics  Data analytics is the process of examining, cleaning, transforming, and modeling data to extract useful information and insi...