Most popular programming language frameworks and tools for machine learning

If you're wondering which of the growing suite of programming language libraries and tools are a good choice for implementing machine-learning models then help is at hand.

More than 1,300 people mainly working in the tech, finance and healthcare revealed which machine-learning technologies they use at their firms, in a new O'Reilly survey.

The list is a mix of software frameworks and libraries for data science favorite Python, big data platforms, and cloud-based services that handle each stage of the machine-learning pipeline.

Most firms are still at the evaluation stage when it comes to using machine learning, or AI as the report refers to it, and the most common tools being implemented were those for 'model visualization' and 'automated model search and hyperparameter tuning'.

Unsurprisingly, the most common form of ML being used was supervised learning, where a machine-learning model is trained using large amounts of labelled data. For instance, a computer-vision model tasked with spotting people in video might be trained on images annotated to indicate whether they contain a person.

Here are the libraries, frameworks, big data platforms, and cloud services that businesses say they're using for machine learning.

Software libraries and frameworks

TensorFlow

Google's widely used machine-learning framework, designed to handle the numerical computation demanded when training machine learning models and able to split calculations between CPUs, GPUs and specialized chips such as Google's Tensor Processing Units (TPUs).

scikit-learn

A popular Python library for data mining and data analysis that implements a wide-range of machine-learning algorithms.

Pytorch

An open-source, deep learning framework that has a reputation for being easier to learn than some competing frameworks like TensorFlow and that is designed to be used at each stage of the machine-learning pipeline.

Keras

A deep-learning framework for working with neural networks, the brain-inspired mathematical models that underpin deep learning, that is designed to be simpler for people to work with than competing frameworks.

Written in Python, it is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit (CNTK), and the Python library Theano.

Cloud suites

Microsoft Azure ML Studio

This suite of services is designed to help firms build, train, and deploy machine-learning models, both on Microsoft's Azure cloud and also on computing devices close to the edge of the network. Tools help automate the process of identifying and tuning an appropriate machine-learning model, as well as with scaling the underlying compute to match demand.

Google Cloud ML Engine

Similar to Azure ML Studio, Google Cloud ML Engine also provides tools for training, evaluating, tuning, and deploying machine-learning models.

Amazon SageMaker

Amazon SageMaker similarly offers services for building, training, and deploying machine-learning models, with a view to making it possible to get models to production more rapidly and at a lower cost.

If you're interested in the relative merits of these machine-learning suites from the major cloud platforms, check out this comparison from our sister site ZDNet.

Big data platform tools

H20

An open-source, in-memory platform that can scale machine-learning workloads across distributed systems.

The platform is designed to support the most widely used statistical and machine-learning algorithms and also offers a degree of automation to help data scientists identify and tune appropriate machine-learning models.

Prodigy

Designed to streamline the process of training and evaluating machine-learning models, Prodigy is a tool for helping data scientists annotate training datasets appropriately.

Spark NLP

Spark NLP provides a Natural Language Processing (NLP) library designed to work with distributed systems running the in-memory, big-data platform Apache Spark.

OpenAI Gym

Described as a toolkit for developing and comparing algorithms for reinforcement-learning, a type of machine learning where software agents learn how to perform tasks by being rewarded for actions that result in a desired outcome.

Analytics Zoo

Analytics Zoo brings together a series of big data and machine-learning technologies into what it describes as a unified analytics and AI platform.

The platform integrates Spark, TensorFlow, Keras, and the deep learning library BigDL, and can scale machine-learning models across distributed Hadoop and Spark clusters for training and inference.

AllenNLP

Designed to simplify the process of designing and evaluating new deep-learning models for Natural Language Processing problems.

The library includes reference implementations of high-quality models for both core NLP problems and NLP applications.

Rise Lab Ray

A framework for running machine learning models across distributed systems, offering both high performance and fault tolerance, while still being scalable.

If you're interested in more information about which programming language libraries and frameworks are typically used for machine learning, check out GitHub's round-up of the top 10 languages.

Also see