Title: | Perform Monothetic Clustering with Extensions to Circular Data |
---|---|
Description: | Implementation of the Monothetic Clustering algorithm (Chavent, 1998 <doi:10.1016/S0167-8655(98)00087-7>) on continuous data sets. A lot of extensions are included in the package, including applying Monothetic clustering on data sets with circular variables, visualizations with the results, and permutation and cross-validation based tests to support the decision on the number of clusters. |
Authors: | Tan Tran [aut, cre] |
Maintainer: | Tan Tran <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.1 |
Built: | 2025-01-20 04:01:17 UTC |
Source: | https://github.com/vinhtantran/monoclust |
The function turns a MonoClust-similar object into MonoClust object so it
can use supported functions for MonoClust such as print.MonoClust()
and
plot.MonoClust()
.
as_MonoClust(x, ...) ## Default S3 method: as_MonoClust(x, ...)
as_MonoClust(x, ...) ## Default S3 method: as_MonoClust(x, ...)
x |
An object that can be coerced to MonoClust object. |
... |
For extensibility. |
as_MonoClust()
is an S3 generic. The function itself doesn't run unless
it is implemented for another similar object. Currently, this function is not
implemented within monoClust
package.
Add/subtract two circular variables in degrees (%cd+%
and %cd-%
) and
radian (%cr+%
and %cr-%
).
x %cd+% y x %cd-% y x %cr+% y x %cr-% y
x %cd+% y x %cd-% y x %cr+% y x %cr-% y
x , y
|
Circular values in degrees/radians. |
A value between [0, 360) in degrees or [0, 2*pi) in radian.
90 %cd+% 90 250 %cd+% 200 25 %cd-% 80 pi %cr+% (pi/2)
90 %cd+% 90 250 %cd+% 200 25 %cd-% 80 pi %cr+% (pi/2)
Calculates the distance matrix of observations with circular variables using an adapted version of Gower's distance. This distance should be compatible with the Gower's distance for other variable types.
circ_dist(frame)
circ_dist(frame)
frame |
A data frame with all columns are circular measured in degrees. |
The distance between two observations i and j of a circular variable q is suggested to be
Object of class "dist".
Tran, T. V. (2019). Chapter 3. Monothetic Cluster Analysis with Extensions to Circular and Functional Data. Montana State University - Bozeman.
# Make a sample data set of 20 observations with 2 circular variables data <- data.frame(var1 = sample.int(359, 20), var2 = sample.int(359, 20)) circ_dist(data)
# Make a sample data set of 20 observations with 2 circular variables data <- data.frame(var1 = sample.int(359, 20), var2 = sample.int(359, 20)) circ_dist(data)
Perform cross-validation test for different different number of clusters of Monothetic Clustering.
cv.test(data, fold = 10L, minnodes = 2L, maxnodes = 10L, ncores = 1L, ...)
cv.test(data, fold = 10L, minnodes = 2L, maxnodes = 10L, ncores = 1L, ...)
data |
Data set to be partitioned. |
fold |
Number of folds (k). |
minnodes |
Minimum number of clusters to be checked. |
maxnodes |
Maximum number of clusters to be checked. |
ncores |
Number of CPU cores on the current host. When set to NULL, all available cores are used. |
... |
Other parameters transferred to |
The -fold cross-validation randomly partitions data into
subsets with equal (or close to equal) sizes.
subsets are used as
the training data set to create a tree with a desired number of leaves and
the other subset is used as validation data set to evaluate the predictive
performance of the trained tree. The process repeats for each subset as the
validating set (
) and the mean squared difference,
is calculated, where is the cluster mean on the
variable
of the cluster created by the training data where the observed value,
, of the validation data set will fall into, and
is the squared Euclidean distance
(dissimilarity) between two observations at variable $q$. This process is
repeated for the $k$ subsets of the data set and the average of these test
errors is the cross-validation-based estimate of the mean squared error of
predicting a new observation,
A MonoClust.cv
class containing a data frame of mean sum of square
error and its standard deviation.
This function supports parallel processing with foreach::foreach()
.
It distributes MonoClust calls to processes.
plot.cv.MonoClust()
, MonoClust()
, predict.MonoClust()
library(cluster) data(ruspini) # Leave-one-out cross-validation cv.test(ruspini, fold = 1, minnodes = 2, maxnodes = 4) # 5-fold cross-validation cv.test(ruspini, fold = 5, minnodes = 2, maxnodes = 4)
library(cluster) data(ruspini) # Leave-one-out cross-validation cv.test(ruspini, fold = 1, minnodes = 2, maxnodes = 4) # 5-fold cross-validation cv.test(ruspini, fold = 5, minnodes = 2, maxnodes = 4)
GGPlot the Mean Square Error with Error Bar for +/- 1 Standard Error
ggcv( cv.obj, title = "MSE for CV of monothetic clustering", xlab = "Number of clusters", ylab = "MSE +/- 1 SE", type = c("b", "p", "l"), linetype = 2, err.col = "red", err.width = 0.2 )
ggcv( cv.obj, title = "MSE for CV of monothetic clustering", xlab = "Number of clusters", ylab = "MSE +/- 1 SE", type = c("b", "p", "l"), linetype = 2, err.col = "red", err.width = 0.2 )
cv.obj |
A |
title |
Overall title for the plot. |
xlab |
Title for x axis. |
ylab |
Title for y axis. |
type |
What type of plot should be drawn. Choosing between |
linetype |
The line type. See |
err.col |
Color of the error bars. |
err.width |
Width of the bars. |
A ggplot2 object.
Plot using base R plot.cv.MonoClust()
library(cluster) data(ruspini) # 10-fold cross-validation cptable <- cv.test(ruspini, minnodes = 2, maxnodes = 4) ggcv(cptable)
library(cluster) data(ruspini) # 10-fold cross-validation cptable <- cv.test(ruspini, minnodes = 2, maxnodes = 4) ggcv(cptable)
Making a parallel coordinates plot with the circular variables are plotted as ellipses. The function currently works well with data with one circular variable.
ggpcp( data, circ.var = NULL, is.degree = TRUE, rotate = 0, north = 0, cw = FALSE, order.appear = NULL, linetype = 1, size = 0.5, alpha = 0.5, clustering, medoids = NULL, cluster.col = NULL, show.medoids = FALSE, labelsize = 4, xlab = "Variables", ylab = NULL, legend.cluster = "groups" )
ggpcp( data, circ.var = NULL, is.degree = TRUE, rotate = 0, north = 0, cw = FALSE, order.appear = NULL, linetype = 1, size = 0.5, alpha = 0.5, clustering, medoids = NULL, cluster.col = NULL, show.medoids = FALSE, labelsize = 4, xlab = "Variables", ylab = NULL, legend.cluster = "groups" )
data |
Data set. |
circ.var |
Circular variable(s) in the data set, indicated by names or index in the data set. |
is.degree |
Whether the unit of the circular variables is degree or not
(radian). Default is |
rotate |
The rotate (offset, shift) of the circular variable, in radians. Default is 0 (no rotation). |
north |
What value of the circular variable is labeled North. Default is 0 radian. |
cw |
Which direction of the circular variable is considered increasing
in value, clockwise ( |
order.appear |
The order of appearance of the variables, listed by a vector of names or index. If set, length has to be equal to the number of variables in the data set. |
linetype |
Line type. Default is solid line. See details in
|
size |
Size of a line is its width in mm. Default is 0.5. See details in
|
alpha |
The transparency of the lines. Default is 0.1. |
clustering |
Cluster membership. |
medoids |
Vector of medoid observations of cluster. Only required when
|
cluster.col |
Color of clusters, indicating by a vector. If set, the
length of this vector must be equal to the number of clusters in
|
show.medoids |
Whether to highlight the median lines or not. Default is
|
labelsize |
The size of labels on the plot. Default is 4. |
xlab |
Labels for x-axis. |
ylab |
Labels for y-axis. |
legend.cluster |
Labels for group membership. Implemented by setting
label for ggplot |
A ggplot2 object.
# Set color constant COLOR4 <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3") # Reduce the size of the data for for sake of example speed set.seed(12345) wind_reduced <- wind_sensit_2007[sample.int(nrow(wind_sensit_2007), 50), ] sol42007 <- MonoClust(wind_reduced, cir.var = 3, nclusters = 4) library(ggplot2) ggpcp(data = wind_reduced, circ.var = "WDIR", # To improve aesthetics rotate = pi*3/4-0.3, order.appear = c("WDIR", "has.sensit", "WS"), alpha = 0.5, clustering = sol42007$membership, medoids = sol42007$medoids, cluster.col = COLOR4, show.medoids = TRUE) + theme(panel.background = element_rect(color = "white"), panel.border = element_rect(color = "white", fill = NA), panel.grid.major = element_line(color = "#f0f0f0"), panel.grid.minor = element_blank(), axis.line = element_line(color = "black"), legend.key = element_rect(color = NA), legend.position = "bottom", legend.direction = "horizontal", legend.title = element_text(face = "italic"), legend.justification = "center")
# Set color constant COLOR4 <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3") # Reduce the size of the data for for sake of example speed set.seed(12345) wind_reduced <- wind_sensit_2007[sample.int(nrow(wind_sensit_2007), 50), ] sol42007 <- MonoClust(wind_reduced, cir.var = 3, nclusters = 4) library(ggplot2) ggpcp(data = wind_reduced, circ.var = "WDIR", # To improve aesthetics rotate = pi*3/4-0.3, order.appear = c("WDIR", "has.sensit", "WS"), alpha = 0.5, clustering = sol42007$membership, medoids = sol42007$medoids, cluster.col = COLOR4, show.medoids = TRUE) + theme(panel.background = element_rect(color = "white"), panel.border = element_rect(color = "white", fill = NA), panel.grid.major = element_line(color = "#f0f0f0"), panel.grid.minor = element_blank(), axis.line = element_line(color = "black"), legend.key = element_rect(color = NA), legend.position = "bottom", legend.direction = "horizontal", legend.title = element_text(face = "italic"), legend.justification = "center")
Calculate inertia for a given subset of the distance matrix from the original
data set provided to x
. Assumes that distance matrices are stored as
matrices and not distance objects.
inertia_calc(x)
inertia_calc(x)
x |
Distance matrix, not an object of some distance measure. |
Inertia value of the matrix, formula in Chavent (1998). If x
is a
single number, return 0.
data(iris) # Euclidean distance on first 20 rows of the 4 continuous variables dist_mat <- as.matrix(dist(iris[1:20, 1:4])) inertia_calc(dist_mat)
data(iris) # Euclidean distance on first 20 rows of the 4 continuous variables dist_mat <- as.matrix(dist(iris[1:20, 1:4])) inertia_calc(dist_mat)
This function returns TRUE
for MonoClust, and FALSE for all other objects.
is_MonoClust(mono_obj)
is_MonoClust(mono_obj)
mono_obj |
An object. |
TRUE
if the object inherits from the MonoClust
class.
Medoid is the point that has minimum distance to all other points in the cluster.
medoid(members, dist_mat)
medoid(members, dist_mat)
members |
index vector indicating which observation belongs to the cluster. |
dist_mat |
distance matrix of the whole data set. A class of |
index of the medoid point in the members vector.
library(cluster) data(ruspini) ruspini4sol <- MonoClust(ruspini, nclusters = 4) ruspini4sol medoid(which(ruspini4sol$membership == 4), ruspini4sol$dist) # Check with the output with "4" label ruspini4sol$medoids
library(cluster) data(ruspini) ruspini4sol <- MonoClust(ruspini, nclusters = 4) ruspini4sol medoid(which(ruspini4sol$membership == 4), ruspini4sol$dist) # Check with the output with "4" label ruspini4sol$medoids
Creates a MonoClust object after partitioning the data set using Monothetic Clustering.
MonoClust( toclust, cir.var = NULL, variables = NULL, distmethod = NULL, digits = getOption("digits"), nclusters = 2L, minsplit = 5L, minbucket = round(minsplit/3), ncores = 1L )
MonoClust( toclust, cir.var = NULL, variables = NULL, distmethod = NULL, digits = getOption("digits"), nclusters = 2L, minsplit = 5L, minbucket = round(minsplit/3), ncores = 1L )
toclust |
Data set as a data frame. |
cir.var |
Index or name of the circular variable in the data set. |
variables |
List of variables selected for clustering procedure. It could be a vector of variable indexes, or a vector of variable names. |
distmethod |
Distance method to use with the data set. Can be chosen
from "euclidean" (for Euclidean distance), "mahattan" (for Manhattan
distance), or "gower" (for Gower distance). If not set, Euclidean distance
is used unless |
digits |
Significant decimal number printed in the output. |
nclusters |
Number of clusters created. Default is 2. |
minsplit |
The minimum number of observations that must exist in a node in order for a split to be attempted. Default is 5. |
minbucket |
The minimum number of observations in any terminal leaf
node. Default is |
ncores |
Number of CPU cores on the current host. If greater than 1,
parallel processing with |
A MonoClust
object. See MonoClust.object
.
Chavent, M. (1998). A monothetic clustering method. Pattern Recognition Letters, 19(11), 989-996. doi:10.1016/S0167-8655(98)00087-7.
Tran, T. V. (2019). Monothetic Cluster Analysis with Extensions to Circular and Functional Data. Montana State University - Bozeman.
# Very simple data set library(cluster) data(ruspini) ruspini4sol <- MonoClust(ruspini, nclusters = 4) ruspini4sol # data with circular variable library(monoClust) data(wind_sensit_2007) # Use a small data set set.seed(12345) wind_reduced <- wind_sensit_2007[sample.int(nrow(wind_sensit_2007), 10), ] circular_wind <- MonoClust(wind_reduced, cir.var = 3, nclusters = 2) circular_wind
# Very simple data set library(cluster) data(ruspini) ruspini4sol <- MonoClust(ruspini, nclusters = 4) ruspini4sol # data with circular variable library(monoClust) data(wind_sensit_2007) # Use a small data set set.seed(12345) wind_reduced <- wind_sensit_2007[sample.int(nrow(wind_sensit_2007), 10), ] circular_wind <- MonoClust(wind_reduced, cir.var = 3, nclusters = 2) circular_wind
The structure and objects contained in MonoClust, an object returned from
the MonoClust()
function and used as the input in other functions in the
package.
Data frame in the form of a tibble::tibble()
representing
a tree structure with one row for each node. The columns include:
Index of the node. Depth of a node can be derived by
number %/% 2
.
Name of the variable used in the split at a node or
"<leaf>"
if it is a leaf node.
Splitting value, so values of var
that are smaller than
that go to left branch while values greater than that go to the right
branch.
Cluster size, the number of observations in that cluster.
Inertia value of the cluster at that node.
Position of the next split row in the data set (that position will belong to left node (smaller)).
Position of the next split variable in the data set.
Proportion of inertia value of the cluster at that node to the inertia of the root.
Position of the data point regarded as the medoid of its cluster.
y-coordinate of the splitting node to facilitate showing
on the tree. See plot.MonoClust()
for details.
Order of the splits with root is 0.
Percent inertia explained as described in
Chavent (2007). It is 1 - (sum(current inertia)/inertial[1])
.
A nested tibble of alternate splits at a node. It contains
bipartsplitrow
and bipartsplitcol
with the same meaning above.
Note that this is only for information purpose. Currently monoClust
does not support choosing an alternate splitting route. Running
MonoClust()
with nclusters = 2
step-by-step can be run if
needed.
Vector of the same length as the number of rows in the
data, containing the value of frame$number
corresponding to the leaf
node that an observation falls into.
Distance matrix calculated using the method indicated in
distmethod
argument of MonoClust()
.
Vector of variable names in the data that were used to split.
Data frame with one row for centroid value of each cluster.
Named vector of positions of the data points regarded as medoids of clusters.
Indicator of having an alternate splitting route occurred when splitting.
List of values designed for circular variable in the
data set. var
is the name of circular variable and cut
is its first
best split value. If circular variable is not available, both objects are
NULL.
Chavent, M., Lechevallier, Y., & Briant, O. (2007). DIVCLUS-T: A monothetic divisive hierarchical clustering method. Computational Statistics & Data Analysis, 52(2), 687-701. doi:10.1016/j.csda.2007.03.013.
Testing the significance of each monothetic clustering split by permutation
methods. The "simple-withhold" method ("sw"
) shuffles the observations
between two groups without the splitting variable. The other two methods
shuffle the values in the splitting variable to create a new data set, then
it either splits again on that variable ("resplit-limit", "rl"
) or use all
variables as the splitting candidates ("resplit-nolimit", "rn"
).
perm.test( object, data, auto.pick = FALSE, sig.val = 0.05, method = c("sw", "rl", "rn"), rep = 1000L, stat = c("f", "aw"), bon.adj = TRUE, ncores = 1L )
perm.test( object, data, auto.pick = FALSE, sig.val = 0.05, method = c("sw", "rl", "rn"), rep = 1000L, stat = c("f", "aw"), bon.adj = TRUE, ncores = 1L )
object |
The |
data |
The data set which is being clustered. |
auto.pick |
Whether the algorithm stops when p-value becomes larger than
|
sig.val |
Significance value to decide when to stop splitting. This
option is ignored if |
method |
Can be chosen between |
rep |
Number of permutations required to calculate test statistic. |
stat |
Statistic to use. Choosing between |
bon.adj |
Whether to adjust for multiple testing problem using Bonferroni correction. |
ncores |
Number of CPU cores on the current host. When set to NULL, all available cores are used. |
The stat
calculated from the shuffles create the reference distribution
to find the p-value. Because the splitting variable that was chosen is
already the best in terms of reduction of inertia, that variable is withheld
from the distance matrix used in the permutation test.
This method shuffles the values of the splitting variables while keeping
other variables fixed to create a new data set, then the chosen stat
is
calculated for each rep to compare with the observed stat
.
Similar to Method 2 but all variables are splitting candidates.
A hypothesis test occurred lower in the monothetic clustering tree could have its p-value corrected for multiple tests happened before it in order to reach that node. The formula is
with is 1 at the root node.
The same MonoClust
object with an extra column (p-value), as well
as the numofclusters
object if auto.pick = TRUE
.
This function uses foreach::foreach()
to facilitate parallel
processing. It distributes reps to processes.
Calinski, T. and Harabasz, J (1974). "A dendrite method for cluster analysis". en. In: Communications in Statistics 3.1, pp. 1-27. doi:10.1080/03610927408827101.
Rousseeuw, P. J. (1987). "Silhouettes: A graphical aid to the interpretation and validation of cluster analysis". In: Journal of Computational and Applied Mathematics 20, pp. 53-65. ISSN: 03770427. doi:10.1016/0377-0427(87)90125-7.
library(cluster) data(ruspini) ruspini6sol <- MonoClust(ruspini, nclusters = 6) ruspini6.p_value <- perm.test(ruspini6sol, data = ruspini, method = "sw", rep = 1000) ruspini6.p_value
library(cluster) data(ruspini) ruspini6sol <- MonoClust(ruspini, nclusters = 6) ruspini6.p_value <- perm.test(ruspini6sol, data = ruspini, method = "sw", rep = 1000) ruspini6.p_value
Plot the Mean Square Error with Error Bar for +/- 1 Standard Error
## S3 method for class 'cv.MonoClust' plot( x, main = "MSE for CV of monothetic clustering", xlab = "Number of clusters", ylab = "MSE +/- 1 SE", type = "b", lty = 2, err.col = "red", err.width = 0.1, ... )
## S3 method for class 'cv.MonoClust' plot( x, main = "MSE for CV of monothetic clustering", xlab = "Number of clusters", ylab = "MSE +/- 1 SE", type = "b", lty = 2, err.col = "red", err.width = 0.1, ... )
x |
A |
main |
Overall title for the plot. |
xlab |
Title for x axis. |
ylab |
Title for y axis. |
type |
What type of plot should be drawn. See |
lty |
The line type. |
err.col |
Color of the error bars. |
err.width |
Width of the bars. |
... |
Arguments to be passed to |
A line plot with error bars.
Plot using ggplot2 ggcv()
library(cluster) data(ruspini) # 10-fold cross-validation cptable <- cv.test(ruspini, minnodes = 2, maxnodes = 4) plot(cptable)
library(cluster) data(ruspini) # 10-fold cross-validation cptable <- cv.test(ruspini, minnodes = 2, maxnodes = 4) plot(cptable)
Print the MonoClust tree in the form of dendrogram.
## S3 method for class 'MonoClust' plot( x, uniform = FALSE, branch = 1, margin = c(0.12, 0.02, 0, 0.05), minbranch = 0.3, text = TRUE, which = 4, stats = TRUE, abbrev = c("no", "short", "abbreviate"), digits = getOption("digits") - 2, cols = NULL, col.type = c("l", "p", "b"), rel.loc.x = TRUE, show.pval = TRUE, ... )
## S3 method for class 'MonoClust' plot( x, uniform = FALSE, branch = 1, margin = c(0.12, 0.02, 0, 0.05), minbranch = 0.3, text = TRUE, which = 4, stats = TRUE, abbrev = c("no", "short", "abbreviate"), digits = getOption("digits") - 2, cols = NULL, col.type = c("l", "p", "b"), rel.loc.x = TRUE, show.pval = TRUE, ... )
x |
MonoClust result object. |
uniform |
If TRUE, uniform vertical spacing of the nodes is used; this may be less cluttered when fitting a large plot onto a page. The default is to use a non-uniform spacing proportional to the inertia in the fit. |
branch |
Controls the shape of the branches from parent to child node. Any number from 0 to 1 is allowed. A value of 1 gives square shouldered branches, a value of 0 give V shaped branches, with other values being intermediate. |
margin |
An extra fraction of white space to leave around the borders of the tree. (Long labels sometimes get cut off by the default computation). |
minbranch |
Set the minimum length for a branch to |
text |
Whether to print the labels on the tree. |
which |
Labeling modes, which are:
|
stats |
Whether to show statistics (cluster sizes and medoid points) on the tree. |
abbrev |
Whether to print the abbreviated versions of variable names. Can be either "no" (default), "short", or "abbreviate". Short forms of them can also be used. If "no", the labels recorded in If "short", variable names will be turned into "V1", "V2", ... If "abbreviate", |
digits |
Number of significant digits to print. |
cols |
Whether to shown color bars at leaves or not. It helps matching
this tree plot with other plots whose cluster membership were colored. It
only works when |
col.type |
When |
rel.loc.x |
Whether to use the relative distance between clusters as x coordinate of the leaves. Default is TRUE. |
show.pval |
If MonoClust object has been run through |
... |
Arguments to be passed to |
A plot of splitting rule.
library(cluster) data(ruspini) # MonoClust tree ruspini4sol <- MonoClust(ruspini, nclusters = 4) plot(ruspini4sol) # MonoClust tree after permutation test is run ruspini6sol <- MonoClust(ruspini, nclusters = 6) ruspini6_test <- perm.test(ruspini6sol, data = ruspini, method = "sw", rep = 1000) plot(ruspini6_test, branch = 1, uniform = TRUE)
library(cluster) data(ruspini) # MonoClust tree ruspini4sol <- MonoClust(ruspini, nclusters = 4) plot(ruspini4sol) # MonoClust tree after permutation test is run ruspini6sol <- MonoClust(ruspini, nclusters = 6) ruspini6_test <- perm.test(ruspini6sol, data = ruspini, method = "sw", rep = 1000) plot(ruspini6_test, branch = 1, uniform = TRUE)
Predict the cluster memberships of a new data set from a MonoClust
object.
## S3 method for class 'MonoClust' predict(object, newdata, type = c("centroid", "medoid"), ...)
## S3 method for class 'MonoClust' predict(object, newdata, type = c("centroid", "medoid"), ...)
object |
MonoClust result object. |
newdata |
Data frame containing the values to be predicted. If missing, the memberships of the MonoClust object are returned. |
type |
Type of returned cluster representatives. Either |
... |
Further arguments passed to or from other methods. |
A tibble of cluster index in cname
and either centroid values or
medoid observations index based on the value of type
argument.
library(cluster) data(ruspini) set.seed(1234) test_index <- sample(1:nrow(ruspini), nrow(ruspini)/5) train_index <- setdiff(1:nrow(ruspini), test_index) ruspini_train <- ruspini[train_index, ] ruspini_test <- ruspini[test_index, ] ruspini_train_4sol <- MonoClust(ruspini_train, nclusters = 4) predict(ruspini_train_4sol, newdata = ruspini_test)
library(cluster) data(ruspini) set.seed(1234) test_index <- sample(1:nrow(ruspini), nrow(ruspini)/5) train_index <- setdiff(1:nrow(ruspini), test_index) ruspini_train <- ruspini[train_index, ] ruspini_test <- ruspini[test_index, ] ruspini_train_4sol <- MonoClust(ruspini_train, nclusters = 4) predict(ruspini_train_4sol, newdata = ruspini_test)
Print MonoClust Cross-Validation Result
## S3 method for class 'cv.MonoClust' print(x, ...)
## S3 method for class 'cv.MonoClust' print(x, ...)
x |
A |
... |
Further arguments passed to or from other methods. |
library(cluster) data(ruspini) # 10-fold cross-validation cp_table <- cv.test(ruspini, minnodes = 2, maxnodes = 4) print(cp_table)
library(cluster) data(ruspini) # 10-fold cross-validation cp_table <- cv.test(ruspini, minnodes = 2, maxnodes = 4) print(cp_table)
Render the MonoClust
split tree in an easy to read format with important
information such as terminal nodes, p-value (if possible), etc.
## S3 method for class 'MonoClust' print( x, abbrev = c("no", "short", "abbreviate"), spaces = 2L, digits = getOption("digits"), ... )
## S3 method for class 'MonoClust' print( x, abbrev = c("no", "short", "abbreviate"), spaces = 2L, digits = getOption("digits"), ... )
x |
MonoClust result object. |
abbrev |
Whether to print the abbreviated versions of variable names. Can be either "no" (default), "short", or "abbreviate". Short forms of them can also be used. If "no", the labels recorded in If "short", variable names will be turned into "V1", "V2", ... If "abbreviate", |
spaces |
Spaces indent between 2 tree levels. |
digits |
Number of significant digits to print. |
... |
Optional arguments to |
A nicely displayed MonoClust split tree.
library(cluster) data(ruspini) ruspini4sol <- MonoClust(ruspini, nclusters = 4) print(ruspini4sol, digits = 2)
library(cluster) data(ruspini) ruspini4sol <- MonoClust(ruspini, nclusters = 4) print(ruspini4sol, digits = 2)
This function transforms a circular angle from degree to radian or from radian to degree.
torad(x) todeg(x)
torad(x) todeg(x)
x |
A degree value if |
A radian value if torad
or degree value if todeg
.
torad(90) torad(-45) todeg(pi/2)
torad(90) torad(-45) todeg(pi/2)
Data set is a part of a study on microorganisms carried in strong f\"ohn winds at the Bonney Riegel location of Taylor Valley, an ice free area in the Antarctic continent. Wind direction and wind speed data were obtained from the meteorological station. Wind direction was recorded every 30 seconds and wind speeds every 4 seconds at 1.15 meters above the ground surface. The recorded wind directions and speeds were averaged at 15 minute intervals. For wind direction, as discussed previously, winds from the north are defined as 0/360 degrees and from the east as 90 degrees. 2007 data were collected from August 4–11, 2007.
wind_sensit_2007
wind_sensit_2007
A data frame with 671 rows and 3 variables:
A binary variable of the existence of particles in the wind (1) or not (0).
Wind speed measured in m/s.
Wind direction in degree with 0 indicates "from the north" and 90 degrees indicate "from the east".
Sabacka, M., Priscu, J. C., Basagic, H. J., Fountain, A. G., Wall, D. H., Virginia, R. A., and Greenwood, M. C. (2012). "Aeolian flux of biotic and abiotic material in Taylor Valley, Antarctica". In: Geomorphology 155-156, pp. 102-111. issn: 0169555X. doi:10.1016/j.geomorph.2011.12.009.
Data set is a part of a study on microorganisms carried in strong f\"ohn winds at the Bonney Riegel location of Taylor Valley, an ice free area in the Antarctic continent. Wind direction and wind speed data were obtained from the meteorological station. Wind direction was recorded every 30 seconds and wind speeds every 4 seconds at 1.15 meters above the ground surface. The recorded wind directions and speeds were averaged at 15 minute intervals. For wind direction, as discussed previously, winds from the north are defined as 0/360 degrees and from the east as 90 degrees. 2008 data were collected from July 7–14, 2008.
wind_sensit_2008
wind_sensit_2008
A data frame with 673 rows and 3 variables:
A binary variable of the existence of particles in the wind (1) or not (0).
Wind speed measured in m/s.
Wind direction in degree with 0 indicates "from the north" and 90 degrees indicate "from the east".
Sabacka, M., Priscu, J. C., Basagic, H. J., Fountain, A. G., Wall, D. H., Virginia, R. A., and Greenwood, M. C. (2012). "Aeolian flux of biotic and abiotic material in Taylor Valley, Antarctica". In: Geomorphology 155-156, pp. 102-111. issn: 0169555X. doi:10.1016/j.geomorph.2011.12.009.