November 5, 2015 By Brad Harris 3 min read

In 2010, Steffen Rendle, currently a senior research scientist at Google, introduced a seminal paper in the world of machine learning. In this work, Rendle described a concept known as a factorization machine. Many years have passed since such an impactful algorithm has been introduced in the world of machine learning.

Deep learning has become a popular topic, but the idea behind it has been understood for years. However, only now do we have the computational power to execute its more theoretical aspects. In this post, I will examine something brand new in this sphere.

About Factorization and Support Vector Machines

Factorization machines can be compared to support vector machines (SVMs) with a polynomial kernel, according to Rendle and others. This algorithm has been well-studied and evaluated. The interesting thing behind support vector machines is that they are still somewhat of a mystery. They are often verified empirically rather than against a theoretical baseline or limit. Even so, they are widely used as general predictors. SVMs define a multidimensional hyperplane, which learns the shape of the curve of the data.

However, SVMs have known weaknesses, some of which are addressed by Rendle’s factorization machines. For example, they require that the output model contain support vectors, which are training examples that help the algorithm define the shape of the hyperplane as well as other parameters such as the margin for prediction.

SVMs function best on dense data — that is, data with few to no missing values and data whose plotted points fall near each other. Finally, in SVMs, the input variables are still independent variables even though the polynomial kernel attempts to model the interaction among the variables. This is computed in polynomial time based upon the number of dimensions in the input data as well as the order of the polynomial (e.g., quadratic, cubic or quartic).

Working Around Weaknesses

Factorization machines were designed to address these weaknesses. Firstly, no training examples are required in the model parameters, making the models much more compact. Factorization machines perform extremely well on sparse data, including data of very high sparsity. As a trade-off, they do not perform well on dense data, so other algorithms are more suited to this class of data.

Finally, and perhaps most amazing of all, factorization machines can model n-way variable interactions, where n is the number of polynomial order. Note that in all current implementations, the order has been fixed at two. While the equations generalize to higher orders, there are some issues such as the numerical stability of the optimization methods that have prevented the generalization of n.

Rendle ingeniously proposed a method of reducing the polynomial computation time to linear complexity. It is this capability of factorization machines that makes them extremely attractive to researchers.

Additionally, Rendle demonstrates that, through proper feature engineering, the machines can mimic the best specialized factorization models developed for very specific situations. This allows them to be applied in a multitude of situations where a specific form of learning algorithm and predictor are required.

Additional Methods

In his original paper, Rendle discussed a method of optimization for the model parameters known as stochastic gradient descent, which works well with several loss functions. However, the optimization algorithm is extremely dependent on the learning rate, one of the hyperparameters of the method. If the learning rate is too high, the model parameters will not converge, while if it is too low, the algorithm is no longer time-efficient.

Because of this, Rendle reviewed three more methods of optimization in “Factorization Machines With LibFM,” known as alternating least-squares, marcov chain monte carlo inference and adaptive stochastic gradient descent. He recommends marcov chain monte carlo inference because there are fewer hyperparameters, and those that must be specified are not as sensitive to their initial values.

Factorization machines have been widely applied in the field of collaborative recommendation systems, where their sparse predictive power and ability to mimic state-of-the-art, specific factorization methods make them generalizable to several tasks. Not only are they used in recommending movies and music, for example, but researchers have even applied them to use social media to predict the stock market with surprisingly positive results.

Factorization and Data Science

As I discussed above, factorization machines can function as general predictors akin to support vector machines in high-dimensional sparse data. While they are not commonly used outside of recommendation systems at present, they have the potential to become classifiers similar to SVMs.

There are several implementations of factorization machines. The state-of-the-art continues to be libFM, created by Rendle himself. However, there are other interesting implementations such as fastFM. There’s even a version of libFM designed for the Spark framework, known as spark-libFM, that allows factorization machines to be parallelized.

Factorization machines are just now being studied rigorously, but they hold great promise in the world of sparse predictors, especially when pairwise interaction of variables is useful and linear complexity with polynomial results is desired. I suspect that soon, as these machines are studied in the realm of general prediction, they will become a critical tool in the data scientist’s toolbox.

More from X-Force

Being a good CLR host – Modernizing offensive .NET tradecraft

14 min read - The modern red team is defined by its ability to compromise endpoints and take actions to complete objectives. To achieve the former, many teams implement their own custom command-and-control (C2) or use an open-source option. For the latter, there is a constant stream of post-exploitation tooling being released that takes advantage of various features in Windows, Active Directory and third-party applications. The execution mechanism for this tooling has, for the last several years, relied heavily on executing .NET assemblies in…

Abusing MLOps platforms to compromise ML models and enterprise data lakes

15 min read - For full details on this research, see the X-Force Red whitepaper “Disrupting the Model: Abusing MLOps Platforms to Compromise ML Models and Enterprise Data Lakes”.Machine learning operations (MLOps) platforms are used by enterprises of all sizes to develop, train, deploy and monitor large language models (LLMs) and other foundation models (FMs), as well as the generative AI (gen AI) applications built on top of these models. The rush to leverage AI throughout enterprises has meant that security has been often…

FYSA – Adobe Cold Fusion Path Traversal Vulnerability

2 min read - Summary Adobe has released a security bulletin (APSB24-107) addressing an arbitrary file system read vulnerability in ColdFusion, a web application server. The vulnerability, identified as CVE-2024-53961, can be exploited to read arbitrary files on the system, potentially leading to unauthorized access and data exposure. Threat Topography Threat Type: Arbitrary File System Read Industries Impacted: Technology, Software, and Web Development Geolocation: Global Environment Impact: Web servers running ColdFusion 2021 and 2023 are vulnerable Overview X-Force Incident Command is monitoring the disclosure…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today