Algorithms¶
SystemDS support different Machine learning algorithms out of the box.
As an example the lm algorithm can be used as follows:
# Import numpy and SystemDS matrix
import numpy as np
from systemds.context import SystemDSContext
from systemds.matrix import Matrix
from systemds.operator.algorithm import lm
# Set a seed
np.random.seed(0)
# Generate matrix of feature vectors
features = np.random.rand(10, 15)
# Generate a 1-column matrix of response values
y = np.random.rand(10, 1)
# compute the weights
with SystemDSContext() as sds:
  weights = lm(Matrix(sds, features), Matrix(sds, y)).compute()
  print(weights)
The output should be similar to:
[[-0.11538199]
[-0.20386541]
[-0.39956035]
[ 1.04078623]
[ 0.4327084 ]
[ 0.18954599]
[ 0.49858968]
[-0.26812763]
[ 0.09961844]
[-0.57000751]
[-0.43386048]
[ 0.55358873]
[-0.54638565]
[ 0.2205885 ]
[ 0.37957689]]
- 
systemds.operator.algorithm.kmeans(x: systemds.operator.operation_node.OperationNode, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]]) → systemds.operator.operation_node.OperationNode¶
- Performs KMeans on matrix input. - Parameters
- x – Input dataset to perform K-Means on. 
- k – The number of centroids to use for the algorithm. 
- runs – The number of concurrent instances of K-Means to run (with different initial centroids). 
- max_iter – The maximum number of iterations to run the K-Means algorithm for. 
- eps – Tolerance for the algorithm to declare convergence using WCSS change ratio. 
- is_verbose – Boolean flag if the algorithm should be run in a verbose manner. 
- avg_sample_size_per_centroid – The average number of records per centroid in the data samples. 
 
- Returns
- OperationNode List containing two outputs 1. the clusters, 2 the cluster ID associated with each row in x. 
 
- 
systemds.operator.algorithm.l2svm(x: systemds.operator.operation_node.OperationNode, y: systemds.operator.operation_node.OperationNode, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]]) → systemds.operator.operation_node.OperationNode¶
- Perform L2SVM on matrix with labels given. - Parameters
- x – Input dataset 
- y – Input labels in shape of one column 
- kwargs – Dictionary of extra arguments 
 
- Returns
- OperationNode containing the model fit. 
 
- 
systemds.operator.algorithm.lm(x: systemds.operator.operation_node.OperationNode, y: systemds.operator.operation_node.OperationNode, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]]) → systemds.operator.operation_node.OperationNode¶
- Performs LM on matrix with labels given. - Parameters
- x – Input dataset 
- y – Input labels in shape of one column 
- kwargs – Dictionary of extra arguments 
 
- Returns
- OperationNode containing the model fit. 
 
- 
systemds.operator.algorithm.multiLogReg(x: systemds.operator.operation_node.OperationNode, y: systemds.operator.operation_node.OperationNode, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]]) → systemds.operator.operation_node.OperationNode¶
- Performs Multiclass Logistic Regression on the matrix input using Trust Region method. - See: Trust Region Newton Method for Logistic Regression, Lin, Weng and Keerthi, JMLR 9 (2008) 627-650) - Parameters
- x – Input dataset to perform logstic regression on 
- y – Labels rowaligned with the input dataset 
- icpt – Intercept, default 2, Intercept presence, shifting and rescaling X columns: 0 = no intercept, no shifting, no rescaling; 1 = add intercept, but neither shift nor rescale X; 2 = add intercept, shift & rescale X columns to mean = 0, variance = 1 
- tol – float tolerance for the algorithm. 
- reg – Regularization parameter (lambda = 1/C); intercept settings are not regularized. 
- maxi – Maximum outer iterations of the algorithm 
- maxii – Maximum inner iterations of the algorithm :return: OperationNode of a matrix containing the regression parameters trained. 
 
 
- 
systemds.operator.algorithm.multiLogRegPredict(x: systemds.operator.operation_node.OperationNode, b: systemds.operator.operation_node.OperationNode, y: systemds.operator.operation_node.OperationNode, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]]) → systemds.operator.operation_node.OperationNode¶
- Performs prediction on input data x using the model trained, b. - Parameters
- x – The data to perform classification on. 
- b – The regression parameters trained from multiLogReg. 
- y – The Labels expected to be contained in the X dataset, to calculate accuracy. 
- verbose – Boolean specifying if the prediction should be verbose. 
 
- Returns
- OperationNode List containing three outputs. 1. The predicted means / probabilities 2. The predicted response vector 3. The scalar value of accuracy 
 
- 
systemds.operator.algorithm.pca(x: systemds.operator.operation_node.OperationNode, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]]) → systemds.operator.operation_node.OperationNode¶
- Performs PCA on the matrix input - Parameters
- x – Input dataset to perform Principal Componenet Analysis (PCA) on. 
- K – The number of reduced dimensions. 
- center – Boolean specifying if the input values should be centered. 
- scale – Boolean specifying if the input values should be scaled. :return: OperationNode List containing two outputs 1. The dimensionality reduced X input, 2. A matrix to reduce dimensionality similarly on unseen data.