More DM/ML tools![]() |
In the previous lectures, we looked at a variety of algorithms for DM and ML, eg. kNN, clustering, neural networks.
Now we'll examine how these have been/can be implemented - using tools/frameworks or APIs/languages/hardware. Note that you've already seen some of them, and even used some in HWs.
Note that in 'industry' (ie. outside academia and gov't), tools/APIs... are heavily used to build RL products - so you need to be aware of, and knowledgeable in, as many of them as possible.
ADVICE: you are valued (compensated!) for your knowledge of tools, as much as (if not more than) your knowledge of theory - so, MASTER USEFUL TOOLS!
These are the most heavily used:
JAX is relatively new - it is high-performance numerical computing library written 'on top of' (using) NumPy. You can learn more here.
Upcoming/lesser-used/'internal'/specific:
The virtually unlimited computing power and storage that a cloud offers, make it an ideal platform for data-heavy and computation-heavy applications such as ML.
Amazon: https://aws.amazon.com/machine-learning/
Google: https://cloud.google.com/products/ai/ [in addition, Colab is an awesome resource!]
Microsoft: https://azure.microsoft.com/en-us/services/machine-learning-studio/ [and AutoML]
IBM Cloud, Watson: https://www.ibm.com/cloud/ai [eg. look at https://www.ibm.com/cloud/watson-language-translator]
Others:
A pre-trained model includes an architecture, and weights obtained by training the architecture on specific data (eg. flowers, typical objects in a room, etc) - ready to be deployed.
Eg. this is simple object detection in the browser! You can even run this detector on a command line.
Apple's CreateML is useful for creating a pre-trained model, which can then be deployed (eg. as an iPad app) using the companion CoreML product.
Several end-to-end applications exist, for DM/ML. Here are popular ones.
Weka is a Java-based collection of machine learning algorithms.
RapidMiner uses a dataflow ("blocks wiring") approach for building ML pipelines.
KNIME is another dataflow-based application.
Orange is yet another dataflow-based tool.
Analytic Solver Data Mining is an Excel add-on.
These languages are popular, for building ML applications (the APIs we saw earlier, are good examples):
Because (supervised) ML is computationally intensive, and detection/inference needs to happen in real-time almost always, it makes sense to accelerate the calculations using hardware. Following are examples.
Google TPU: TF is in hardware! Google uses a specialized chip called a 'TPU', and documents TPUs' improved performance compared to GPUs. Here is a pop-sci writeup, and a Google blog post on it.
Amazon Inferentia: a chip, for accelerating inference (detection): https://aws.amazon.com/machine-learning/inferentia/
NVIDIA DGX-1: an 'ML supercomputer': https://www.nvidia.com/en-us/data-center/dgx-1/ [here is another writeup]
Intel's Movidius (VPU): https://www.movidius.com/ - on-device computer vision
In addition to chips and machines, there are also boards and devices:
We looked at a plethora of ways to 'do' ML. Pick a few, and master them - they complement your coursework-based (theoretical) knowledge, and, make you marketable to employers!