DM/ML tools![]() |
In the two previous lectures, we looked at a variety of algorithms for DM and ML, eg. kNN, clustering, neural networks.
Now we'll examine how these have been/can be implemented - using tools/frameworks or APIs/languages/hardware.
Note that in 'industry' (ie. outside academia and gov't), tools/APIs... are heavily used to build RL products - so you need to be aware of, and be knowledgeable in, as many of them as possible.
These are the most heavily used:
TL;DR: simply learn Keras or PyTorch, and if necessary, TF.
Upcoming/lesser-used/'internal'/specific:
Here is an article about deep learning tools.
The virtually unlimited computing power and storage that a cloud offers, make it an ideal platform for data-heavy and computation-heavy applications such as ML.
Amazon: https://aws.amazon.com/machine-learning/ Their latest offerings make it possible to 'plug in' data analysis anywhere.
Google: https://cloud.google.com/products/ai/ [in addition, Colab is an awesome resource!]
Microsoft: https://azure.microsoft.com/en-us/services/machine-learning-studio/ [and AutoML] [aside: alternatives to brute-force 'auto ML' include 'Neural Architecture Search' [incl. this], pruning, and better network design (eg using ODEs - see this).
IBM Cloud, Watson: https://www.ibm.com/cloud/ai [eg. look at https://www.ibm.com/cloud/watson-language-translator]
Others:
With so much available out of the box, it's time for citizen data scientists?
A pre-trained model includes an architecture, and weights obtained by training the architecture on specific data (eg. flowers, typical objects in a room, etc) - ready to be deployed.
Eg. this is simple object detection in the browser! You can even run this detector on a command line.
TinyMOT: https://venturebeat.com/2020/04/08/researchers-open-source-state-of-the-art-object-tracking-ai
Apple's CreateML is useful for creating a pre-trained model, which can then be deployed (eg. as an iPad app) using the companion CoreML product. NNEF and ONNX are other formats, for NN interchange.
Pre-trained models in language processing, include Transformer-based BERT and GPT-2. Try this demo (of GPT etc). There is GPT-3 currently available, GPT-4 in the works, Wu Dao 2.0, MT-NLG...
There are also, combined (bimodal) models, based on language+image data.
Several end-to-end applications exist, for DM/ML. Here popular ones.
Weka is a Java-based collection of machine learning algorithms.
RapidMiner uses a dataflow ("blocks wiring") approach for building ML pipelines.
KNIME is another dataflow-based application.
TIBCO's 'Data Science' software is a similar (to WEKA etc) platform. Statistica [similar to Mathematica] is a flexible, powerful analytics software [with an old-fashioned UI].
bonsai is a newer platform.
To do ML at scale, a job scheduler such as from cnvrg.io can help.
SynapseML is a new ML library from Microsoft.
There are a variety of DATAFLOW ('connect the boxes') tools! This category is likely to become HUGE:
These languages are popular, for building ML applications (the APIs we saw earlier, are good examples):
Because (supervised) ML is computationally intensive, and detection/inference needs to happen in real-time almost always, it makes sense to accelerate the calculations using hardware. Following are examples.
Google TPU: TF is in hardware! Google uses a specialized chip called a 'TPU', and documents TPUs' improved performance compared to GPUs. Here is a pop-sci writeup, and a Google blog post on it.
Amazon Inferentia: a chip, for accelerating inference (detection): https://aws.amazon.com/machine-learning/inferentia/
NVIDIA DGX-1: an 'ML supercomputer': https://www.nvidia.com/en-us/data-center/dgx-1/ [here is another writeup]
Intel's Movidius (VPU): https://www.movidius.com/ - on-device computer vision
In addition to chips and machines, there are also boards and devices:
Overall, there's an explosion/resurgence in 'chip design', for accelerating AI training, inference. In April '21, NVIDIA announced its new A30 and A10 GPUs, at the annual [GTC] conference.
We looked at a plethora of ways to 'do' ML. Pick a few, and master them - they complement your coursework-based (theoretical) knowledge, and, make you marketable to employers! Aside: LOTS of salaries etc., revealed here :)
Also, FYI - in industry (G-MAFIA/FAANG/MAMMA MIA, BAT, more!), ML is part of a bigger 'production pipeline':