Contents

Newsletter Issue 2


Intro

Hello again, welcome to the second issue of my newsletter. So yes, I have made it for the second consecutive week! And since this issue will be more flooded with my own comments and ideas than its predecessor, I will save you the trouble of reading through the formality of introduction - let’s dive in!

Inbox

Effective Altruism

  • I first came across the book 80,000 hours on my social media feed, thinking it’s just another internet personality’s ballyhoo; and I saw it again in my college library shortly after, so I couldn’t help flipping through it to verify my hypothesis. Despite my a priori bias against it, I was a bit surprised that it actually contains some useful careers information even for myself, not to mention those who are less certain about their career choices. I later realised the book falls under the realm of Effective Altruism, and according to their website, it is about

…using evidence and careful reasoning to take actions that help others as much as possible.

  • The core idea of effective altruism is to do the most “good” to the communities and the broader society with limited energy resources. However, on their FAQpage (which is, btw, very informative), they argued that while the resemblance between effective altruism and utilitarianism appears strong, many effective altruists do not identify themselves as utilitarians at all. Several main differences between them were listed:

    • EA doesn’t necessarily say doing everything possible to help others is obligatory.
    • EA does not advocate for violating people’s rights even if doing so would lead to the best consequences.
    • EA cares intrinsically about things other than welfare, such as rights, freedom, inequality, personal virtue, etc.
  • I genuinely think this idea and community is a very plausible gateway for people (myself included) to being more socially involved and to making an impact in the world in the most efficient way. Here’s an introduction to effective altruism, and there is also a very detailed handbookthat delves much deeper.

So You Want to Study Philosophy

  • 6 years ago, as a puerile high school kid, I read my first “philosophy” book Tolerance by van Loon. The quotation marks are there because it is rather a history-religion book about religious intolerance and the push against it in the western history, but it lied in the philosophy section on my city library bookshelf. I was after all a curious kid, and there were many copies of that book for some reason, so I thought that must be a very popular philosophical school. Naturally, it was a really tough read, and I made next to nothing out of it at the time. And that’s my first attempt at “philosophy”.
  • Nevertheless, years of physics studies all the way from middle school to college inevitably led me to think beyond the mathematical frameworks to which most of my physics courses are confined, and to be honest, I never explored far enough to collect my thoughts and formulate any coherent argument on my own.
  • I stumbled upon this piece of study guide, which lays out the whole foundation and syllabus for how to really study philosophy, and there’s a full syllabus with reading recommendations associated with each course. Even though most of us cannot follow this syllabus in full, it is still recommended to have an overview of the field, and perhaps find your next bedtime read. I will give the introductory book a go - Think: A Compelling Introduction to Philosophy.

Amazon announced its purchase of One Medical

  • The venturesome Amazon is about to make its third largest acquisition, and this time taking on healthcare sector. One Medical is a one-house healthcare solution company that provides premium medical service for patients who are willing to pay $200 a year just for them to be around, on top of the normal care charges.
  • Like the other overarching big techs, Amazon wants to build its own social ecosystem. Indeed, its previous venture in online shopping has made them who they are, and their investment in prime videos is also largely worth the price.

Blots on a field

  • A shocking whistle blown on this fraudulent paper in the field of Alzheimer’s disease was brutally revealed, after millions of dollars of investment and years of research in the field potentially down the drain. According to the article, protein bands in a key image are shown to be tampered with, and the fact that their claim of the Aβ*56 protein that results in memory impairment has never been henceforth reproduced casts stronger doubt over the validity of their “discovery”.
  • Falsifications are not rare in academia, where competition for tenure track and research funding is so fierce that some sacrifice not only their own integrity but also the progress of the entire field for personal gains. Have a look at this Wikipedia list for many more examples.
  • As despicable as such conducts are, many times the researchers might as well get away with it easily, even though no one could reproduce their findings. In ML research, although there are communities such as Paper with code that makes experimental environments and codebases open-source, many papers do not conform to the same standards for various reasons. Thus, even if some malicious researchers decided to concoct a beautiful table beating the SOTA (state of the art) method, it is excruciatingly difficult to prove otherwise as no sufficient experiment environment information is provided. This is rather alarming for all, as one brick crumbling at the bottom can tumble one’s whole research area.
  • This is exactly why I am a firm advocate of a next-generation scientific research publication media to replace the existing conferences that are confined to paper publishing. Have a look at this article: The Scientific paper is obsolete.

How to log in python like a pro

  • So there’s more to logging than print() and logging.info()? This article goes deep into python logging, from fundamental logging principles to production-ready logging practices, starting from setting up a logger from ground up. I might flagrantly stick to my one-line printing logs for everyday work and probably debugging (before you ask, yes I know what a debugger is, change my mind :) ); however, when the day of making gadgets not just for myself comes, I will return to this article for the best practice.

Thoughts

Federated Learning - Is it a fake demand?

  • How do I define a fake demand?
    • Artificial demands are more widely seen in marketing and advertising practices, where businesses construct the new demands that would not have existed had it not been for their advertisements.
    • I would categorise any applied ML research field or topic addressing a problem that satisfies one or many of the following (incomprehensive) criteria:
      1. The research problem is highly hypothetical, such that it barely exists in the real world, regardless of how it is advertised.
      2. The research problem simplifies the real world problem in the lab setting to such an extent that taking away any of the assumptions of the research would render the whole work impractical.
      3. The proxy research problem of the real-world problem introduces new ineffaceable systematic issues that could invalidate the proxy approach.
    • I would also like to clarify on one point: I absolutely believe in the necessity and importance of theoretical ground work in ML research, even if there are no explicit demands. My definition of fake demands, however, only applies to applied ML research.
  • This question was first recommended to me on zhihu, and by the time I saw it I already had doubts about what the endgame of FL was, and how we could realistically get there with current research.
  • I collaborated with my course mates on a FL topic dealing with data and model heterogeneity in FL (See the full project [here]). While I was catching up on the FL review and research papers, the key idea that drives the field which popped up the most was “privacy”. Indeed, FL is expected to be most effective when training on distributed mobile devices provides an evident edge over training on proxy data in data centre or when privacy and safety are at very high stakes. One of the groundbreaking work that leveraged FL in a business scenario is Google’s application in next word prediction task, demonstrating its feasibility and advantage of language modelling in a real-life case.
  • Some main criticisms over FL approaches are outlined below:
    • Local data can be anonymised and aggregated in the data centre without violating data privacy.
    • In terms of model performances, could the marginal benefits (if at all) justify the much higher cost of deploying FL?
    • There are very few examples of FL in production.
    • The reverse privacy issue - how could FL protect the models trained by companies that invested millions into them from being exposed?
  • Although to some extent, I agree with some of the arguments above, I still don’t think it’s fair to say that the whole field of FL is a fake demand per se.
  • For one, the research problem of privacy truly exists, the gravity of which will only grow in the foreseeable future. For instance, the GDPR policies regulate heavily on data sharing, and in very sensitive industries such as the financial sector and healthcare, amassing data from different institutes is also effectively impossible, even if compliance permits.
  • Also, the current solutions do exist for simpler problems (like the google application), and although they are still far from ready for large-scale deployments, they are indeed ONE plausible solution to the privacy issue when direct data aggregation is simply not possible. They might not completely eradicate privacy concerns - one may argue no such approaches exist - but they do not seem to bring on any additional irresolvable issues.
  • The might of FL, especially in cases with non-IID datasets, can be more glamorous compared to traditional data-centre model training, and increasingly more research has been focusing on FL performance superiority with strong data heterogeneity (such as FedMD, FedRep, etc.)
  • I am, however, certainly not an expert in FL in either academia or industry, so I am only defending its name from a passerby’s point of view. And I will keep monitoring the field for new developments. This thought, nonetheless, led me to ponder over a bigger topic - if FL is not a fake demand, does one even exist in ML or the broader applied science communities?

Other fake demands in ML research?

  • Surely, Federated Learning is not the only research field that faces such debates, and I can’t help but wonder what are some other areas pressed with the same charges?
  • Researching on this topic was not easy at all, as there will always be a survivorship bias - the topics might be too niche for the general public to find out easily, and those who know might be fully invested in the subject and may not even realise its actually applicability issues.
  • I will, again, push my further thoughts on this issue to a separate piece to fully investigate the question, and I would love to discuss this with you if you have any candidates in mind or if you don’t think such fake demands exist.

Miscellaneous

How GPU Computing Works

  • GPU is so indispensable in ML research nowadays - but the Nvidia video shedding light on how the magic actually works.

Anna Karenina by Karen Shakhnazarov

  • Very impressed by this Russian series adapted from Tolstoy’s original novel Anna Karenina. Plots aside, the cinematography and acting in this movie-series are stunning, and the retrospective narrative built under the context of Russo-Japanese war added one extra layer of engagement.

Explosions&Fire

  • Full disclaimer: I do not support using any of the techniques for illegal purposes. The videos in this channel, however, reminded me of my initial interest in science, particularly chemistry, during middle school.

A Hedge Fund Analyst Christmas List

  • A compilation of quality free and paid resources for professional investors.