Two Nobel Prizes: AI is Still resting on Giant Shoulders
John Hopfield and Geoffrey Hinton got the Nobel Prize of Physics, Demis Hassabis and John Jumper the nobel Prize of Chemistry. It is obvious that the first Nobel Prize was not given merely for their contributions to physics, but mostly for their profound and foundational contributions to what is today modern AI.
Let's talk about the second Nobel prize.
AlphaFold was put on map by beating other methods on a competition (CASP14/CASP15) that has been running for year on a well established dataset. As such, AlphaFold winning is more like an ImageNet moment (when the team of Geof Hinton demonstrated the superiority of Convolutional Networks on Image Classification), than a triumph of multi-disciplinary AI research.
The dataset of Alphafold rests on many years of slow and arduous research to compile a dataset in a format that could be understood not by machines, but by computer scientists. This massive problem of finding the protein structure was, through that humongous work, reduced to a simple question of minimizing distances. A problem that could now be tackled with little to no knowledge of chemistry, biology or proteomics.
This in no way reduces the profond impact of AlphaFold. However it does highlight a major issue in applied AI: computer scientists, not AI, are still reliant on other disciplines to drastically simplify complex problems for them. The contributions and hard work required to do so gets unfortunately forgotten everything has been reduced to a dataset and a competition.
What to do when we do not have problems that computer scientists can easily understand? This is true for all fields that require a very high level of domain knowledge. Through experience, I came to consider the pairing of AI specialists with specialists of other disciplines, a sub-optimal strategy at best. The Billions of dollars invested in such enterprises have failed to produce any significant return on investment.
The number one blind spot of these endeavours is the supply chain, it usually takes years and looks like this:
1- Domain specialists identify a question
2- Years are spent to develop methods to measure and tackle it
3- The methods are made cheaper
4- The missing links: Computational chemists, Bioinformaticians, ... start the work on what will become the dataset
5- AI can finally enter the scene
Point number (1) is the foundation. You can measure and ask an infinite number of questions about anything. Finding the most important one is not as obvious as it seems. For example, it is not at all obvious that a protein structure is an important feature a priory. Another example, is debugging code. A successful debugging session involves asking and answering a succession of relevant questions. Imagine giving a code to someone with no programming experience and asking them to debug it. The probabilities of them asking the right questions is very close to 0.
Identifying what is important is called inserting inductive Biases. In theory LLMs could integrate the inductive biases of a field and generate interesting questions, even format datasets from open-source data. However until this ability has been fully demonstrated, the only cost efficient way to accelerate AI driven scientific discoveries is to build the disciplinarily into the people: AI Researchers that know enough about the field to be able to identify the relevant questions of the future.
Share:
Two Nobel Prizes: AI is Still resting on Giant Shoulders
copy:
https://bluwr.com/p/39046977
A formal Definition of Stealing
One of the basic rules of economy is that value is created by exchanging (not by printing money).
----
==Lets imagine a simple example:==
Person A has lots of pens. For them a pen is only worth 1$, a sheet of paper, however is worth 4$. Person B has a lot of paper for them a sheet is only worth 1$, but a pen is a valuable item worth 4$.
Person A wants a sheet and Person B wants a pen. They decide to exchange A gives a pen to B and, B gives a sheet of paper to A.
At the end of the exchange, both have lost 1$ of value, but got 4$ in return, meaning that they have made 3$ of value each.
A total of 6$ of value has been created by the exchange.
----
Now lets look at what happen during theft. When something is stolen, no exchange happenes between the parties, therefor no value has been created. In fact for society as a whole the yield is negative, as the thief had to spend energy (value) to get what he wanted. So although he enriched himself, he also made everybody poorer.
We can consider this a definition of stealing: A transfer of goods that results in a negative creation of value.
The same is true, to a lesser degree, when one of the parties cheats the other by providing an item that is less valuable than previously thought. Like a pen that does not write.
Share:
A formal Definition of Stealing
copy:
https://bluwr.com/p/22023977
Data is Not the new Oil, Data is the new Diamonds (maybe)
Over the past decade I have heard this sentence more than I can count: "Data is the new oil". At the the time it sounded right, now I see it as misguided.
That simple sentence started when people realized that big tech (mostly Facebook, Google) were collecting huge amounts of data on their users. Although it was before (in hindsight) AI blew up as the massive thing it is now, It had a profound effect on people's mind. The competitive advantages that companies who had data where able to achieve inspired a new industry and a new speciality in computer science: Big Data, and fostered the creation of many new technologies that have become essential to the modern internet.
"Data is the new Oil", means two things:
1- Every drop is valuable
2- The more you have, the better.
And it seemed true, but it was an artifact of a Big Tech use case. What Big Tech was doing at the time was selling ads with AI. To sell ads to people, you need to model their behaviour and psychology, to achieve that you need behavioural data, and that's what Google and Facebook had: Behavioural data. It is a prefect use case, were the data collected is very clean and tightly fits the application. In other words, the noise to signal ratio is low, and in this case, the more data you can collect the better.
This early success however hid a major truth for years. For AI to work great the quality of the dataset highly matters. Unlike oil, when it comes to data, some drops are more valuable than others.
In other words, data like a diamond needs to be carved and polished before it can be presented. Depending on the application, we need people able to understand the type of data, the meanings associated to it, the issues associated to collection and most importantly how to clean it, and normalized it.
It is in my opinion that data curation is a major factors in what differentiates a great AI from an below average AI. Those who misunderstood this concept ended up significantly increasing their costs with complex Big Data infrastructures to drown themselves in heaps of data that they don't need and hinder the training of their models.
When it comes to data hoarding and greed are not the way to go. We should keep in mind that data has no intrinsic value, the universe keeps generating infinite amounts of it. What we need is useful data.
Share:
Data is Not the new Oil, Data is the new Diamonds (maybe)
copy:
https://bluwr.com/p/17474669
The future of AI is Small and then Smaller.
We need smaller models, but don't expect big tech to develop them.
Current state-of the-art architectures are very inefficient, the cost of training them is getting out of hand, more and more unaffordable for most people and institutions. This effectively is creating a 3 tiers society in AI:
1- Those who can afford model development and training (Big tech mostly). And make *foundation models* for everybody else
2- Those who can only afford the fine tuning of the *foundation models*
3- Those who can only use the fine tuned models through APIs.
This is if far from an ideal situation for innovation and development because it effectively creates one producer tier (1) and 2 consumer tiers (2 and 3). It concentrates most of the research and development into tier 1, leaves a little for tier 2 and almost completely eliminates tier 3 from R&D in AI. Tier 3 is most of the countries and most of the people.
This also explains why most of the AI startups we see all over the place are at best tier 2, this means that their *Intellectual Property* is low. The barrier to entry for competition is very low, as someone else can easily replicate their product. The situation for tier 3 AI startups is even worst.
This is all due to two things:
1- It took almost 20 years for governments and people to realize that AI is coming. In fact they only did it after the fact. The prices for computer hardware (GPUs) where already through the roof and real talent already very rare. Most people still think they need *Data scientists*, in fact they need: AI Researchers, DevOps Engineers, Software Engineers, Machine Learning Engineers, Cloud Infrastructure Engineers, ... The list of specialties is long. The ecosystem is now complex and most countries do not have the right curriculums in place at their universities.
2- The current state-of-the-art models are **huge and extremely inefficient**, they require a lot of compute ressources and electricity.
Point number 2 is the most important one. Because if we solve 2, the need for cloud, DevOps, etc... decreases significantly. Meaning we not only solve the problem of training and development cost, we also solve part of the talent acquisition problem. Therefore, it should be the absolute priority: __we need smaller, more efficient models__.
But why are current models so inefficient. The answer is simple, the first solution that works is usually not efficient, it just works. We have seen the same things with steam machine and computers. Current transformer based models, for example need several layers of huge matrices that span the whole dictionary. That's a very naive approach, but it works. In a way we still have not surpassed the Deep Learning trope of 15 years ago: Just add more layers.
Research in AI should not focus on large language models, it should be focusing on small language models that have results on par with the large ones. That is the only way to keep research and development in AI alive and thriving and open to most. The alternative is to keep using these huge models than only extremely wealthy organisation can make, leading to a concentration of knowledge and to too many tier 2 and tier 3 startups that will lead us to a disastrous pop of the AI investment bubble.
However, don't count on Big Tech to develop and popularize these efficient models. They are unlikely to as having a monopoly on AI development is on their advantage as long as they can afford it.
Universities, that's your job.
Share:
The future of AI is Small and then Smaller.
copy:
https://bluwr.com/p/16874904
Digital: The perfect undying art
Great paintings deteriorate, great statues erode, fall and break, great literature is forgotten and it's subtleties lost as languages for ever evolve and disappear. But now we have a new kind of art. A type of art that in theory cannot die, it transcends space and time and can remain pristine for ever and ever. That is digital art.
Digital art is pure information. Therefore it can be copied for ever and ever, exactly reproduced for later generations. Digital art cannot erode, cannot break, it is immortal. Thus is the power of bits, so simple zeros and ones and yet so awesome. Through modern AI and Large Language Models we can now store the subtleties of languages in an abstract vectorial space, also pure information, that can be copied ad infinitum without loss of information. Let's think about the future, a future so deep that we can barely see it's horizon. In that future, with that technology we can resurrect languages. However the languages resurrected will be the ones we speak today.
We have a technology that allows us to store reliably and copy indefinitely that technology is called the *Blockchain*. The most reliable and resilient ledger we have today. We have almost everything we need to preserve what we cherish.
Let's think of a deep future.
Share:
Digital: The perfect undying art
copy:
https://bluwr.com/p/9930050
AI+Health: An Undelivered Promise
AI is everywhere, or so would it seems, but the promises made for Drug Discovery and Medicine are still yet to be fulfilled. AI seems to always spring from a Promethean impulse. The goal of creating a life beyond life, doing the work of gods by creating a new life form as Prometheus created humanity. From Techne to independent life, a life that looks life us. Something most people refer to as AGI today.
This is the biggest blind spot of AI development. The big successes of AI are in a certain way always in the same domains:
- Image Processing
- Natural Language Processing
The reason is simple, we are above all visual, talking animals. Our Umwelt, the world we inhabit is mostly a world of images and language, every human is an expert in these two fields. Interestingly, most humans are not as sound aware as they are visually aware. Very few people can separate the different tracks in a music piece, let alone identify certain frequencies or hear delicate compressions and distortions. We are not so good with sound, and it shows in the relatively less ground breaking AI tools available for sound processing.
The same phenomenon explains why AI struggles to achieve in very complex domains such as Biology and Chemistry.
At it's core, modern AI is nothing more than a powerful general way to automatically guess relevant mathematical functions describing a phenomenon from collected data. What statisticians call a *Model*. From this great power derives the domain chief illusion: because the tool is general, therefore the wielder of that tool can apply it to any domain. Experience shows that this thinking is flawed.
Every AI model is framed between two thing: its dataset (input) and its desired output as represented by the loss function. What is important, what is good, what is bad, how should the dataset be curated, how should the model be adjusted. For all these questions and more, you need a deep knowledge of the domain, of the assumptions of the domain, of the technicalities of the domain, of the limitations that are inherent to data collection in that domain. Domain knowledge is paramount, because AI algorithms are always guided by the researchers and engineers. This I know from experience, having spent about 17 years closely working with biologists.
Pairing AI specialists with domain specialist with little knowledge of AI also rarely delivers. A strategy that has been tested time and time again in the last 10 years. Communication is hard and slow, most is lost in translation. The best solution is to have AI experts that are also experts in the applied domain, or domain experts that are also AI experts. Therefore the current discrepancies we see in AI performances across domains, could be layed at the feet of universities, and there siloed structures.
Universities are organized in independent departments that teach independently. AI is taught at the Computer Science department, biology at the Biochemistry department. These two rarely meet in any substantial manner. It was true went I was a student, it is still true today.
This is one of the things we are changing at the Faculty of Medical Science of the University Mohammed VI Polytechnic. Students in Medicine and Pharmacy have to go through a serious AI and Data science class over a few years. They learn to code, they learn the mathematical concepts of AI, they learn to gather their own datasets, to derive their hypothesizes, and build, train and evaluate their own models using pyTorch.
The goal being to produce a new generation of scientists that are intimate with their domain as well as with modern AI. One that can consistently deliver the promises of AI for Medicine and Drug Discovery.
Share:
AI+Health: An Undelivered Promise
copy:
https://bluwr.com/p/9804934
El Salvador: The most important country you barely hear about
El Salvador has a significant diaspora, so much that money coming from the US is a major source of income. **Not so long ago you would have been pressed to find a Salvadorian who wanted to go back to El Salvador. Now things seems to be changing.**
El Salavador, used to have one of the highest homicide rates in the Americas, now it looks relatively safe. El Salvador showed an interesting strategy. First boost the economy before handling the crime situation. Crime is indeed a part of GDP, albeit a hard one to quantify. Since it is an economic activity, it participates in exchanges and provides people with activities that supports them and their families. Drastically reducing crime has the effect of creating *'unemployed criminals'* people with a skillset that's hard to sell in a traditional economy.
El Salvador probably did take a hit to its GDP, but that was compensated by the increase in economic activity and investments.
Bitcoin was a big part of that.
Bitcoin got a lot of bad press as a technology only used by criminals, or a crazy investment for crazy speculators. These takes failed to understand the technology and it's potential. What Bitcoin offers is a decentralized, fast and secure payment system for free. El Salvador doesn't have to maintain it, regulate it, or even monitor it. All very costly activities that a small country can do without. Bitcoin is a mathematically secure way of payment.
In a country where road infrastructures are challenging, Bitcoin offers people in remote areas the possibility to pay their bills without travelling for hours. In a country that was unsafe, Bitcoin offered people the possibility to go out without the fear of being robbed.
It also attracted a kind of investors that would go nowhere else. And even if these investment can appear small, for a country like El Salvador it's a big change.
The Salvadorian experiment in a freer economy, crypto-friendly and smaller government, in a time of increasing inflation, has a lot of people watching. In a continent that leaned left for so long, this is a big change.
My opinion is that there would be no Javier Millier hadn't there been a Nayib Bukele before. Argentina has been a bastion of the left for decades. If the libertarian policies of Millier succeed in bettering the lives of Argentinians, we might be on the brink of a major cultural shift in the Americas and then the world.
Argentina is a far bigger country than El Salvador, with far more people watching.
Share:
El Salvador: The most important country you barely hear about
copy:
https://bluwr.com/p/9336980
Applied Machine Learning Africa!
I have been to more scientific conferences than I can count. From to smallest to the biggest like NeuRIPS (even back when it was still called NIPS). Of all these events AMLD Africa is my favorite, by far.
I first met the team two years ago when they organized the first in-person edition of the conference at the University Mohammed VI Polytechnic. I was immediately charmed by the warmth and professionalism, ambition and fearlessness of the team. So much that I joined the organization.
AMLD Africa is unique on every aspect. By its focus on Africa, by its scope and ambition, by its incredibly dynamic, young, passionate, honest and resourceful team, all volunteers. It is hard to believe that this year in Nairobi was only the second in-person edition.
AMLD Africa does the impossible without even realizing it. It has an old school vibe of collegiality, community and most importantly **__fun__** that is so lacking in most conferences today. All without compromising on the quality of the science.
It offers one of the best windows into everything AI and Machine learning happening in Africa. Africa is a continent on the rise. But a very hard continent to navigate because of information bottlenecks. Traveling across Africa is not easy (it took me 28H from Nairobi to Casablanca), there are language barierers separating the continent into different linguistic regions (French, English, Portuguese being the main ones). And just the fact that all too often we do not look to Africa for solutions.
AMLD Africa is solving all that, by bringing everybody together for a few days in one of the best environments I got to experience.
Thank you AMLD Africa.
Share:
Applied Machine Learning Africa!
copy:
https://bluwr.com/p/9030113
GenZ: The Fiscally Aware Generation
I am sitting at Paul's cafe at the airport en route to Nairobi via Cairo for Applied Machine Learning Days (AMLD) Africa (a wonderful conference, more on that later). **In front of me 4 young males, early 20s, they speak loudly in french as they eat the burgers and fries they bought at another restaurant.**
They talk about money.
"You have no idea how much money I lose to taxes", says one of them. "40 to 50%! It's a lot of money, I would make so much more without it". He sees taxes not as a net necessary good, as most have been trained to see it, but as any other cost.
Interesting, that's not the type of conversations you would expect from someone that young. It's not the first time I hear this type of conversation from GenZs. Why are GenZs becoming more fiscally aware than previous generations? I think it comes down to two factors:
- Inflation
- The entrepreneurial culture
Inflation has hit everybody, for obvious reasons. However one constant with inflation is that it hits the poorest hardest. Young people tend to have less money. But that's not enough to raise awareness about a subject that most consider beyond boring. This brings us to the next point: *The entrepreneurial culture*.
As a millennial I witnessed it's burgeoning and blossoming. It started timidly with a few books and blogs, then massive blogs, then best sellers, then YouTube videos and finally podcasts. Not so long ago being an entrepreneur was considered an unwise life choice. Successful people go to work for established companies. Such was common wisdom. However, as the 2008 recession hit and people started to look for more revenue streams, they also discovered the concept that having one's business can also mean more freedom and better financial security.
There is however a big difference between the Millennial Entrepreneur and the GenZ Entrepreneur. The Millennial was still uneasy with the idea of making money and as such would speak about *"making a positive impact in the world"*, the GenZ is not burden in this way. You can see the shift in YouTube ads, today it's all bout how much you will make if you buy this or that business course.
So whatever online business they start, being it drop shipping or whatever, they tend do it in a money aware way. Starting an online business is a hard, the competition is fierce. Naturally, they try to invest their hard earned money wisely. When the tax bill comes, they see it as it is: an unexpected cost that does not necessarily translate to a better life quality. Nothing is free in this incarnation. Some are not even shy about relocating to fiscally advantageous locations like Dubai and making videos about it.
This could be the end of the blissful fiscally unaware generations.
Share:
GenZ: The Fiscally Aware Generation
copy:
https://bluwr.com/p/8912938
How Bluwr is optimized for SEO, Speed and Worldwide Accessibility.
TL;DR: Bluwr is Fast & Writing on Bluwr will help you get traffic.
We made some unusual choices while building Bluwr. In an age where front-end web development means Javascript frameworks, we took a *hybrid* somewhat old-school approach. Our stack is super lean, fast, and optimized for ease of maintenance and search engines.
----
Most of the website is served statically through python Jinja Template and we use Javascript when interaction is needed, for these cases we use Vue.JS, 100% homemade vanilla JS and JQuery. For looks we use Uikit and in-house custom made CSS.
These choices allow us to have a lighting fast website and have great benefits for our writers. Because most of Bluwr appears as static HTML, articles appear first, readers never have to wait for them to load, and search engines have no difficulty indexing what's on Bluwr.com. This makes everything you write on Bluwr easier to find on the internet. It also means that Bluwr.com loads fast even on the worst of connections. Something noteworthy as even a slight delay in loading can significantly reduce the chances of your article being read.
Our goal is to make Bluwr accessible to anybody on the internet, even on a limited 3G connection.
Share:
How Bluwr is optimized for SEO, Speed and Worldwide Accessibility.
copy:
https://bluwr.com/p/856465