'More'
Below you'll find an (increasing) assortment of useful notes/links on data/bases (and related topics)... The point of this content is to help connect what you learn in the lectures, with the real world. There is a likely to be a Final exam question, based on what's below (so, pay attention to them!).
The first wife of Larry Ellison [Larry founded Oracle]: https://peoplepill.com/people/adda-quinn
Look for the part about 'modeling': https://aeon.co/essays/a-mathematician-a-philosopher-and-a-gambler-walk-into-a-bar
https://iaifi.org/talks/2021_03_04_IAIFI_Colloquium_Tegmark.pdf
https://www.technologyreview.com/2019/07/01/65601/machine-learning-has-been-used-to-automatically-translate-long-lost-languages/
"Pair programming": https://www.google.com/search?q=GitHub+CoPilot+youtube&rlz=1C1CHBF_enUS723US723&oq=GitHub+CoPilot+youtube&aqs=chrome..69i57j69i64.7200j0j4&sourceid=chrome&ie=UTF-8
https://techcrunch.com/2022/05/24/microsofts-new-power-apps-feature-turns-sketches-into-apps/
THIS is the paper (from Ed Codd at IBM San Jose) that launched the relational DB revolution! Section 2.1, on relational operators, became the basis for SQL... Fun fact: prior to this, Ed Codd worked on cellular automata.
Raven, writing desk :) https://www.google.com/search?q=Raven%2C+writing+desk
Text DATA plus image DATA is used to train this generator [DALL·E 2 and VQ-GAN+CLIP are other systems that are similar in functionality]: https://imagen.research.google/
Auto-extraction of Es and Rs [after training]: https://primer.ai/developer/using-nlp-entities-and-their-relationships-from-unstructured-financial-documents/
History of R: https://cran.r-project.org/doc/html/interface98-paper/paper_1.html
And here is history of sorts, of C#.
Backus, Naur [a la BNF]: https://dl.acm.org/doi/pdf/10.1145/359576.359579 and https://amturing.acm.org/award_winners/naur_1024454.cfm
'History' of modern CS: https://amturing.acm.org/byyear.cfm
https://www.metrowestdailynews.com/story/business/2021/10/22/amazon-opens-new-robotics-facility-westborough-creates-200-jobs/6109385001/
'Pre-cloud' cloud :) http://satysfactory.blogspot.com/2009/11/lets-go-clubbing.html
SQL standard: https://en.wikipedia.org/wiki/ISO/IEC_9075
https://en.wikipedia.org/wiki/Amdahl%27s_law [processor count vs speedup]
Hub-and-spoke (wrt distributed DBs): https://blog.bmtmicro.com/hub-and-spoke-model/ and https://bizfluent.com/info-8452377-list-united-parcel-service-hubs.html [the model is useful in business contexts as well]
WebContainers! https://blog.logrocket.com/stackblitz-webcontainers-nextjs-browser/
https://bytes.usc.edu/~saty/tools/xem/run.html?x=Vue
A company from which your HW1/HW2 fictitious school could purchase products :) http://www.yahboom.net/home [eg. https://category.yahboom.net/products/rosmaster-x3]
https://bytes.usc.edu/~saty/tools/xem/run.html?x=pitch :)
Informatica: https://www.informatica.com/solutions/power-cloud-analytics.html
'Boomers' etc.: https://www.defense.gov/Multimedia/Experience/Americas-Nuclear-Triad/
Six cool (USEFUL!) pdfs:
• Query optimization, via access plan alaysis
• DB Readings [I showed this in class]
• 'Change data'
• DBs: overview
• Architecting for scale
• System design
A tutorial that teaches you to build an e-commerce app in Python, driven by 9 microservices :): https://www.minos.run/learn/
https://www.vox.com/recode/23170900/leaked-amazon-memo-warehouses-hiring-shortage
Another questionable (imo) approach, to bootstrap intelligence (good idea) but without a detailed, corresponding body+brain design (bad idea): https://www.technologyreview.com/2021/05/27/1025453/artificial-intelligence-learning-create-itself-agi/
VR is also built upon -data-: https://www.awn.com/blog/metaverse-musings-alvin-wang-graylin
'OLAP cube': https://www.actian.com/blog/cloud-data-warehouse/olap-in-data-warehouses
https://csweb.rice.edu/news/rice-cs-xia-ben-hu-named-d2k-director
OMG on p.16: https://arxiv.org/pdf/2206.10498.pdf, and OMG II [TL;DR: LLMs can't plan, can't reason!]
https://sqlbolt.com/
https://www.postgresqltutorial.com/
https://towardsdatascience.com/ten-advanced-sql-concepts-you-should-know-for-data-science-interviews-4d7015ec74b0
CTEs: https://towardsdatascience.com/take-your-sql-from-good-to-great-part-1-3ae61539e92a
https://count.co/, https://count.co/n/jFWwjRetKMx
All the math behind ML:
A 'packed' cheatsheet: https://github.com/soulmachine/machine-learning-cheat-sheet/raw/master/machine-learning-cheat-sheet.pdf
MML: https://mml-book.github.io/book/mml-book.pdf
PML: https://github.com/probml/pml-book [book{0,1,2}] and https://www.ics.uci.edu/~smyth/courses/cs274/notes.html
Amazon's book[s]: https://d2l.ai/, https://zh-v2.d2l.ai/ [https://www.amazon.science/latest-news/amazon-scientists-author-popular-deep-learning-book]
Deep Learning principles: https://arxiv.org/pdf/2106.10165.pdf [https://arxiv.org/abs/2106.10165]
The 'supervision' part of supervised ML: https://crowdsource.google.com/cs/contribute/image-labeler
TinyML: https://bdtechtalks.com/2022/04/18/fomo-tinyml-object-detection/
Excellent article on Transformers, etc: https://ravishrawal.medium.com/transforming-natural-language-understanding-c1ac7f57613f
Hierarchical topic extraction and labeling: https://primer.ai/blog/generating-data-driven-topic-hierarchies-from-text-using-deep-nlp-models
Deploying ML models: https://towardsdatascience.com/the-easiest-way-to-deploy-your-ml-dl-models-in-2022-streamlit-bentoml-dagshub-ccf29c901dac
OSSO: https://www.youtube.com/watch?v=1wIiPVNDXY0 [and https://osso.is.tue.mpg.de/]
https://www.digitaltrends.com/mobile/google-health-ai/
https://github.com/savan77/Practical-Machine-Learning-With-Python
https://www.oxfordsemantic.tech/blog/the-intuitions-behind-knowledge-graphs-and-reasoning
https://www.protocol.com/enterprise/chip-ai-venture-capital
Waymo/Uber lawsuit: https://jolt.law.harvard.edu/digest/waymo-v-uber-surprise-settlement-five-days-into-trial
Amazon's Inferentia ML CHIP [includes a mini description of ML]: https://www.cloudmanagementinsider.com/amazon-inferentia-for-machine-learning-and-artificial-intelligence/
OPT3, free GPT-3 clone from Meta: https://www.technologyreview.com/2022/05/03/1051691/meta-ai-large-language-model-gpt3-ethics-huggingface-transparency/ and https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b
https://medium.com/syncedreview/meta-ai-open-sources-a-175b-parameter-language-model-gpt-3-comparable-performance-at-one-seventh-b1e099c382cd
https://informationmatters.org/2022/05/the-power-and-the-pitfalls-of-large-language-models-a-fireside-chat-with-ricardo-baeza-yates/ [LLMs - Large Language Models - still don't actually 'understand' anything the way we do]
An SDC paper [see p.284; from IJCAI, 1979!!!!!]
Yikes - likely no bad consequences, but still: https://www.cnn.com/2022/06/24/asia/japan-amagasaki-usb-data-intl-hnk/index.html
Creepy: https://www.cnn.com/2022/06/24/tech/abortion-laws-data-privacy/index.html
OMG: https://cybernews.com/security/rockyou2021-alltime-largest-password-compilation-leaked/
Speaking of hashing... :) https://www.blockchain.com/btc/address/197usDS6AsL9wDKxtGM6xaWjmR5ejgqem7
https://www.infoworld.com/article/3615195/googles-logica-language-addresses-sqls-flaws.amp.html
https://arstechnica.com/science/2021/06/researchers-build-a-metadata-based-image-database-using-dna-storage/
https://www.futureintelligence.co.uk/2021/06/09/big-tech-data-scraping-to-discover-our-emotions/
"It's always data data data!!!" :) [https://www.urbandictionary.com/define.php?term=Marsha%20Marsha%20Marsha] https://www.ted.com/talks/susan_etlinger_what_do_we_do_with_all_this_big_data
Here is an interesting article, on DATA.
Very useful: https://towardsdatascience.com/distributed-training-on-aws-sagemaker-8bcbea28466c and https://github.com/aaronwangy/Data-Science-Cheatsheet
Big, bigger, biggest, biggestest... :) https://en.pingwest.com/a/8693 The article says: '“The way to artificial general intelligence is big models and big computer,” said Dr. Zhang Hongjiang, chairman of BAAI...' - with all due respect, the chairman is very wrong :( Even a trillion trillion (10^24) or more parameters won't make a system 'intelligent'!
Oh no... https://incidentdatabase.ai/ [ALL the incidents: https://incidentdatabase.ai/apps/discover]; eg. in https://incidentdatabase.ai/cite/13, search for 'Perspective', look at the results
So much data gathering, in the open... https://www.wired.com/story/all-seeing-eyes-new-york-15000-surveillance-cameras/
DL's principles (at last - June '21!): https://ai.facebook.com/blog/advancing-ai-theory-with-a-first-principles-understanding-of-deep-neural-networks [direct link: PDLT.pdf]
Mongo vs MySQL: https://www.simform.com/mongodb-vs-mysql-databases/
Spatial indexing's evolution ('phylogenetic') tree: https://bytes.usc.edu/~saty/courses/docs/data/SpatialIndexing.pdf
Imagined text becomes real: https://www.nature.com/articles/s41586-021-03506-2
An example of 'AgTech': https://www.freethink.com/articles/farming-robot
Real-life ML stages/steps/pipeline: https://www.cognistx.com/cx-blog/our-strategy-and-evolution-over-the-past-six-years
IPFS: https://www.cio.com/article/3176189/the-quiet-revolution-the-internet-of-data-structures-with-ipfs.html
TensorFlow on a Raspberry Pi: https://bytes.usc.edu/~saty/courses/pics/TFLite_on_Pi.jpg
Crowdsourced QA/testing: https://www.microsoft.com/en-us/msrc/bounty-microsoft-azure
ML for vision (in SDCs): https://bdtechtalks.com/2021/06/28/tesla-computer-vision-autonomous-driving/
Two types of explanations, of functioning NNs: https://arxiv.org/pdf/2010.01496.pdf
Lots of cool ML articles! https://ai-scholar.tech/en/
Oooh: JUICY: https://webkid.io/blog/datablocks-node-based-editor-data-processing-visualization/ and https://towardsdatascience.com/introducing-flowpy-an-intuitive-front-end-for-processing-data-with-python-a619ebe6bb9e and https://github.com/schlerp/pelt-studio - all are examples of 'visual dataflow' used for data analysis
Video frame generation: https://www.vox.com/2016/6/1/11787262/blade-runner-neural-network-encoding
China supercomp 174 trillion params: https://www.tomshardware.com/news/china-builds-brain-scale-ai-model-using-exaflops-supercomputer
Nice: https://spectrum.ieee.org/low-power-ai-spiking-neural-net
Blockchain is a form of distributed DB: https://www.makeuseof.com/blockchain-technology-simplified
Postgres, on Azure: https://www.techrepublic.com/article/azure-database-postgresql-flexible-server/
LOL: https://techcrunch.com/2022/06/22/jacuzzi-flaws-admin-exposed-users/
https://www.vice.com/en/article/g5vbx9/dall-e-is-now-generating-realistic-faces-of-fake-people
https://coda.io/gallery?filter=Packs
OpenShift: https://www.redhat.com/en/technologies/cloud-computing/openshift [more here]
k-means clustering, on large datasets: https://towardsdatascience.com/large-scale-k-means-clustering-with-gradient-descent-c4d6236acd7a
Nice: https://spectrum.ieee.org/ai-guided-robots-are-ready-to-sort-your-recyclables
A visual intro' to the Transformer [thanks for the link, Geet!]: https://jalammar.github.io/illustrated-transformer/