Swapnam Swapnam’s Comments (group member since Dec 18, 2024)



Showing 1-20 of 24
« previous 1

Feb 04, 2025 08:27AM

1258739 Chapter 1 notes

We want to perform an Inference y given Input x using the Model f (i.e. a family of relations), represented as :
y = f(x)
Here x and y are (multi-dimensional) vectors encoding the input/output in a suitable manner and f a function.

A particular relation is determined by the choice of the Parameters p :
y = f(x, p)
Then, the goal of Supervised Learning is to learn the model's parameters p by using a Training Dataset, which is a collection of pairs {x_i, y_i} of input and output vectors.

Call Loss L as the degree of mismatch (needs to be precisely quantified) between the actual relation and the input-output mapping prediction.
Then our objective is to find parameters p_m that minimize the loss function :
p_m = argmin_p (L(p, {x_i, y_i}))
Obviously, the challenge is to find the optimal parameters p_m with the minimum resource consumption possible.

After the training phase is done, we run the model against the Test Data to evaluate its Generalization i.e. performance for instances outside the training distribution.
Feb 03, 2025 01:12AM

1258739 This is the discussion thread for Understanding Deep Learning, the pick for February 2025.
Happy reading and let's hope we can have illuminating discussions on the same!
Feb 03, 2025 01:10AM

1258739 I didn't see any suggestions here so didn't create the poll.
Reading Understanding Deep Learning for February.
Jan 27, 2025 07:38AM

1258739 This is an invitation to read the seminal research papers that have shaped the field of Deep Learning, alongside relevant chapters from books, blog posts, video explainers etc.

We can proceed in chronological order, starting all the way back from the introduction of backpropagation in the 80s that gave birth to multi-layer neural networks to the state-of-the-art, which as of January 2025, seems Deepseek-R1, or is it too soon to speak? ;)

This undertaking will thus be a walk through ideas that have been demonstrated to work at scale. Although one can never dismiss alternative paradigms entirely (e.g. evolutionary algorithms), lest a clever refinement (or just throwing tons of computational power) should resurrect them from the dead, rigorously understanding the Transformer based LLM model is crucial, given its total dominance currently.
Jan 27, 2025 02:32AM

1258739 My two suggestions for February are :

1. Understanding Deep Learning, Simon J. Prince : Neither filled with rigorous mathematical proofs nor concerned with exploring library implementations. In stead, its focus is on conceptual understanding of the theory and what ideas make Deep Learning work. Starts off with the basic perceptron and gradient descent techniques all the way to the most exciting recent development i.e. Transformers.

2. Algorithms and Data Structures for Massive Datasets, Dzejla Medjedovic, Emin Tahirovic, and Ines Dedovic : Goes over data structures like Bloom/quotient filters, Count-min sketch, Hyperloglog and techniques such as sampling from real time data streams and external memory models that are used when dealing with datasets of humongous sizes where the optimal algorithms (typically taught in undergrad curriculum i.e. simple hash tables) are no longer feasible and a tradeoff between space/time constraints and accuracy needs to be made.
Jan 27, 2025 02:20AM

1258739 It's that time of the year again.

Welcome to all the wonderful new members to the group. We have ballooned from 5 to 20 within a month! Don't hesitate to invite your math/science loving friends, partners, neighbors, colleagues to participate in the collective endeavor of seeking knowledge.

Please mention (at most) two books in the comments in this thread that you would like to read with the group in February, preferably with a short summary on why it is a meaningful choice.

I will create a poll and add entries to it as the suggestions start to flow in.
Jan 20, 2025 07:24AM

1258739 The characterization of the geometry of a surface is done through an object called the "metric tensor", which has a collection of components, each of which is a function corresponding to a unique combination of the coordinates.

For Euclidean space, this is trivially the identity matrix (recall the distance element : ds^2 = dx^2 + dy^2 i.e. the only non-0 components are those where the x and y coordinates match with themselves, and these are 1), for Minkowski geometry we get a -1 in the (t, t) position and identity matrix elsewhere (recall the definition of proper time/distance interval in SR : ds^2 = -dt^2 + dx^2 + dy^2), and for a generally curved spacetime it can be a complicated matrix.

Thus an analogue of the problem in GR had already been tackled by Gauss, Riemann etc. in the development of Differential Geometry and can be restated as : Can we find a coordinate transformation such that the metric tensor becomes the identity matrix everywhere in the space? If so, the metric, and hence the space, was only apparently curved and actually flat, and if not, then the curvature is real, and so is the gravitational field.

To establish the above needs a framework for the transformation of objects across coordinate systems, which is the subject of Tensor Analysis.
Jan 20, 2025 05:13AM

1258739 Next we come to the issue of curvilinear coordinate systems. Note that we can represent even flat spacetime with curved coordinates - what differentiates flat from curved space is the possibility that we can always find globally applicable Lorentz coordinates in the former if we wish to, which is not possible in a curved space.

As an example, consider the surface of a flat paper rolled up without deformities. It appears curved when we see it from our 3d space in which it is embedded i.e. it has "extrinsic curvature", but an observer living on the surface on the paper would notice no differences in his observations before and after the rolling. Compare this to the surface of a doughnut with a hole, which is "intrinsically curved".

So we need to differentiate between real vs. apparent curvature in spacetime, which in fact is precisely the difference between a real gravitational field vs. an accelerating reference frame in flat space, i.e. between fictitious forces (e.g. centrifugal force in a rotating frame) due to coordinate choice and physically real tidal forces due to gravitation.
Jan 20, 2025 02:56AM

1258739 However, note that this equivalence is a local approximation, since if two particles starting from an initial separation d are freely falling towards a source of a gravitational field e.g. the Earth, they'll both be drawn towards its center, and by a careful measurement of the decrease in d over time, one will be able to detect the presence of a real gravitational field as opposed to merely an accelerating frame of reference where no such central convergence occurs.

Obviously, the uniformity of the gravitational field and the "point particle" are themselves an idealization, as points at various distances from a mass will experience different field values, and likewise for different parts of any extended rigid body leading to its distortion, which will cause an observational difference from a uniformly accelerating reference frame. In summary, tidal forces that are a physical manifestation of a real gravitational field can be used as a test for it.
Jan 20, 2025 02:53AM

1258739 The starting point is the Equivalence Principle, which states that no physical experiment can differentiate a reference frame in a uniform gravitational field from one in a state of uniform acceleration (Compare with SR, where there is no notion of absolute velocity).

To see this, recall the equivalence of the two masses of a body appearing in Newton's law of gravitation : ma = Gmn/r^2, where the m on the left side is the inertial mass, representing resistance to acceleration (by Newton's second law), whereas the m on the right side is the gravitational mass (An analogue of a particle's charge for Coulomb's electrostatic force. Notice that the inertial mass and charge of a particle have no relation).
It is a peculiar feature of gravity that these two are experimentally found to be exactly equal (accuracy upto one part in 10^12), which implies that all objects with various masses and constitutions freely falling in a gravitational field produced by mass n experience the same acceleration a = Gn/r^2. (Recall the apocryphal account of Galileo dropping different balls from Tower of Pisa and reporting they reach the ground at the same time.)
But this implies that an object falling in a uniform gravitational field of strength g has the same experience as falling in an elevator accelerating upwards at g in empty space.

Immediately we are led to a famous prediction of GR : Since a light ray fired horizontally in an elevator accelerating up will appear as bent down (which the observer in the elevator attributes to the presence of an external force i.e. gravity, while the observer outside blames on viewing a straight line in an accelerated frame i.e. "wrong" coordinate choice), the very same observation holds for light travelling in a gravitational field - Gravity curves light away from its straight line trajectory. The verification of this effect (gravitational lensing) was one of the triumphs of Einstein's theory.
Jan 20, 2025 02:43AM

1258739 The following is a elucidation of GR as gleaned from Susskind's book and some parallel readings. The focus is on the theory's overall conceptual logic rather than heavy mathematical techniques.

The objective of General Relativity (GR) is to extend the study of spacetime in Special Relativity (SR) by including acceleration within the purview of analysis. The flat Minkowski geometry is replaced by a curved structure in the presence of gravitating bodies. A pithy lesson is "Spacetime tells mass how to move, Mass tells spacetime how to curve". Or even simply Gravity = Acceleration = Geometry.

This implies a revision of old ideas such as a global Lorentz reference frame, where all clocks run at the same rate. Now we have to contend with transformations across curvilinear coordinate systems, while ensuring the laws of physics retain the same form. (Compare with Galilean principle of relativity, one of the axioms of SR : The laws of physics take the same form for all observers in a state of uniform relative motion.)
1258739 Let p denote the wavefunction as it evolves through Schrodinger. A contrast Maudlin draws attention to is between p-ontic and p-epistemic interpretations of QM (due to Harrigan and Spekken). A p-ontic view sees the mathematical wavefunction as reflective of some real physical aspect of the system, while p-epistemic instead sees it as encoding degrees of belief in facts about the world on part of the observer. In some variations, the wavefunction is a statistical characteristic of a collection and not even mapped to a unique object.
[Compare E.T.Jaynes : "our present QM formalism is not purely epistemological; it is a peculiar mixture describing in part realities of Nature, in part incomplete human information about Nature — all scrambled up by Heisenberg and Bohr into an omelette that nobody has seen how to unscramble."]

Thus, a p-epistemic theory would try to circumvent paradoxes such as "But where is the real physical object present under superposition? Is the cat dead or alive or both or neither?" by positing that the wavefunction only represents our information about where the object is, and thus if the wavefunction is spread out, it doesn't correspond to a point particle being mysteriously "smeared across reality" but that our knowledge regarding its whereabouts has become diffuse.

Most interestingly, the PBR theorem from 2012 tightens the screws in favor of the p-ontic theories. Their claim is "any model in which a quantum state represents mere information about an underlying physical state of the system, and in which systems that are prepared independently have independent physical states, must make predictions which contradict those of quantum theory. The result is in the same spirit as Bell’s theorem, which states that no local theory can reproduce the predictions of quantum theory."

Their idea is to take two collections of electrons prepared in different (non-orthogonal pure) quantum states (and hence described by different wavefunctions) and ask : Is every single electron in the first physically different from every electron in the second?
By the p-epistemic approach, it need not be, since a physically unique electron doesn't need an associated unique wavefunction, which is "merely" a representation of our ignorance. Per p-ontic approach, they are necessarily different.

So given a pair of electrons, each in one of the two possible states (X/Y) , we have four combinations (XX, XY, YX, YY) which all might correspond to the same underlying state (SS) as per a p-epistemic theory, and thus all predictions of QM on SS must be compatible with all quantum states assigned to it.
PBR comes up with an experimental procedure that can produce four possible outcomes for a given input pair such that each outcome is incompatible (i.e. QM gives probability 0) with one of the four combinations. In other words, the pair can't behave in a way consistent with all possible results and there must be an underlying physical deviation for the quantum state that produces the experimental difference.
This argument can be extended for any preparations with different wavefunctions, with appropriate experimental setups.

p-epistemic theorists accept PBR while denying that one or more of its crucial assumptions holds true in their formulation.
e.g. In the paper "FAQBism", defenders of QBism draw the difference between PBR's notion of "epistemic belief in an ontic variable" and "belief assigned to one's personal experiences upon interactions with the world". Thus PBR applies when one associates the quantum state with a real physical property, while QBism is safe since it interprets quantum states as representing probabilities about the measuring agent's own future observations.
Their motivation for their own approach is Bell's experimental invalidation of "local realism", which would either imply a change of stance on locality, which is deeply problematic, or a revision of realism, which they choose to do by denying objective physical reality to a quantum state in the first place.
1258739 What open questions do we have regarding the early universe?

We want explanations of the parameter values needed at the early epochs to lead to current cosmic structure. These include :
Expansion rate
Proportion of atoms, dark matter, dark energy, radiation
Character of fluctuations
Values of fundamental constants of physics

Let Q denote the 'roughness' of the fluctuations in the early microscopic universe that gives rise to present structures observed. Then for
Q << 10^-5 : there would be no stars as the universe would be too smooth for the amorphous hydrogen to condense and gravitate
Q >> 10^-5 : there would be too much aggregation early on in the expansion and the universe would be filled with black holes of the size of galactic clusters
Thus Q must be in a small range around 10^-5 for stars, planets, people to exist. Why does our universe have this value?

Gravity is far weaker than other forces, which either fall with distance or disappear due to opposite charges being balanced out in massive bodies. It is the only force dominating at large distances and its attractive force causes mass to aggregate and lead to star formation, galaxies and clusters. But it can't be too strong, otherwise mass even at small scales would be crushed together and complex life that is usually oblivious to gravity wouldn't emerge.
Why is the value of the gravitational fine structure constant not too low and not too high?

Another conundrum is if the laws governing the universe are derivable from a more fundamental principle i.e. are explainable from purely physical considerations, or should we appeal to anthropic reasoning to rest the case - current laws are just one instance of an "ensemble of possibilities", so we happen to be in a happy accident where the laws are observed to be what they are since they are necessary for conscious observers like us?
1258739 Woody Allen : "Eternity is very long, especially towards the end!"

Will the expansion go on forever?
Calculations suggest a critical density of >5 atoms per cubic meter is needed on average to halt the expansion and reverse it eventually. Current observations produce a value of only 0.2

However, the visible universe is not everything, since dark matter is hypothesized to explain the stability of galaxies and clusters, which in the absence of the missing mass would not have retained their shapes and disintegrated. The gravitational lensing of the light reaching us from other galaxies also needs far greater mass to explain the extent of distortion we observe.

Let S = actual density / critical density. Then taking the proposed dark matter into account, we reach S = 0.3 which should make the geometry of the universe hyperbolic. In such a universe, distant objects look smaller as compared to Euclidean space. However, using the relation between temperature fluctuations in the CMBR and the geometry of the universe, we conclude the universe is flat. Thus there should be around 70% dark matter to account for the observed flatness.

Furthermore, Dark energy is posited to exist as the latent energy of empty space, which has a repulsive effect strong enough to counteract gravity with its high negative pressure and cause a net expansion. This is verified by observing Type 1A supernovae, whose measured brightness acts as a scale of their relative distances, and the gravitational redshift - distance relation confirms the accelerated expansion of the universe.
1258739 Einstein : "The most incomprehensible thing about the universe is that it is comprehensible."
John D. Barrow : "Any universe simple enough to be understood is too simple to produce a mind able to understand it."
Emerson Pugh : "If the human brain were so simple that we could understand it, we would be so simple that we couldn’t."

How is matter distributed in the universe on the largest scales?
The distribution of galaxies in the universe is not endlessly hierarchical (i.e. we see clusters and even superclusters but the sequence doesn't continue ad infinitum). There is a global smoothness that vastly simplifies analysis. Thus the universe is homogenous and isotropic on the large scale.

The overall motion is also correspondingly simple. All galaxies recede away from each other at a speed proportional to their distance. One can visualize this as occupying a node on a lattice connected by rods, with the rods stretching in every direction at a speed that increases with the number of nodes between any pair of rods. There is no preferred center - one would make the same observations if one were occupying any other node in the lattice. The expansion is accelerated and happens everywhere.

Since the speed of light is constant, the galaxies that we see farther away appear to us as they were in the distant past. They are crowded together, which implies in the earlier stages of the universe its density and hence temperature were much higher than current values. Extrapolating back to the very beginning gives us the "Big Bang", where everything emerges from a singularity. Note that this is not an "explosion in space-time" but an "expansion of space-time". There is no exterior to this event - it is in the history of all events that ever occurred everywhere.

How do we know the Big Bang happened?
Before the galaxies had time to form, radiation had permeated the universe in the earliest epochs. Today we observe intergalactic space filled with this cosmic microwave background radiation (CMBR) - an afterglow of creation. It demonstrates a blackbody spectrum i.e. equilibrium at extreme heat and density. This gives credence to the big bang cosmological model. The subsequent expansion led to a cooling down and stretching of the wavelengths to their current values of 3 degrees Kelvin and microwaves. This provides the timeline of the Big Bang at roughly 13.8 billion years.

If the early universe was extremely hot and dense, why didn't all the hydrogen undergo fusion, as it does now in stars, to form bigger elements?
This is because the state of high heat/density didn't last for long enough to exhaust all the fuel. The model predicts 23% Helium leftover along with small traces of Deuterium and Lithium, and this is matched by observations.

What dynamical principles lead to the large scale structures we see now?
Gravity clumps together masses and its strength falls as the inverse square of distance. In an expanding universe, if a region has slightly high density, it undergoes greater gravitational attraction and suffers deceleration, while other region with less matter expand away faster. Thus the density contrast is enhanced due to the amplification of gravity.

Collisions of galaxies produce instabilities in huge gas clouds, producing shockwaves travelling through them. This doesn't affect existing stars due to vast intergalactic distances, but triggers gas condensation in the nebulae and new star formation.
1258739 This is an article published as part of the collected lectures given on Stephen Hawking's 60th birthday. It summarizes our knowledge regarding the origin and fate of the universe (as of 2003), the theory and evidence supporting our beliefs and the open questions yet to be resolved.

Some of us might have encountered similar expositions in e.g. Hawking's "Brief History of Time". Below I am providing a paraphrase that might act as starting point for thoughts over astrophysics/cosmology.
1258739 Turing identifies the mechanical and chemical modes of analysis and focuses upon the latter, with the unit of attention as morphogens. The examples he provides hint at a functionalist approach - genes, hormones, skin pigments all fall under the category as long as they satisfy the relational properties of a morphogen as an evocator.

Straight off he makes one ponder over an epistemological issue - how do different "layers of science" interact i.e. strong vs. weak emergence - Can everything eventually be reduced to the dynamical laws operating over quantum fields (or whatever the lowest level might be), or do different conceptual classes like atoms -> molecules -> cells -> humans have independent power of explanation that contributes something original and not derivable, even in principle, from the other levels?

Morphogens diffuse into a tissue and react together, at times catalyzing the processes, which in turn lead to other morphogens, eventually resulting in a substance with a novel function. This development proceeds through diffusion from regions of higher to lower concentrations at a rate influenced by the properties of the substance (diffusability) and the environment (gradient). Additionally, the presence of the cell walls acts as a filter for some molecules over others.

So our model is that of N cells and M morphogens reacting in them and diffusing across, and the state of the system is described by MN numbers i.e. concentration of each morphogen in each cell, evolving as per the underlying law. Note that if the concentrations move towards an extreme value that is not sustainable biologically, that means we will simply not observe it in nature.

We come to the central problem - how does an (almost) spherically symmetrical blastula evolve into a highly differentiated organism? Turing contends that there can a high number of slight irregularities that get nudged from the initial unstable equilibrium into a few stable regions sans the initial homogeneity. The cause can be random disturbances such as temperature fluctuations or cell growth.

This reminded me of "spontaneous symmetry breaking" in particle physics that is used to explain the present form of forces e.g. electroweak unification at higher energies that we observe separately as electromagnetism and weak nuclear force.

In our particular example, the rate at which different morphogens are produced, destroyed, transformed into one another and diffuse across cells represent this drifting. He sets up a ring model of cells where diffusion is limited to a cell's left and right neighbors alongside intra-cell reactions, and proceeds to solve the differential equations for their concentrations under some simplifying assumptions (e.g. linearity of chemical reaction rate so that the system doesn't move too far from initial homogeneity, no external disturbances once the system has been provoked out of its stability).

This leads to an oscillating wave solution, which can be stationary or travelling, based on parameter values. He analyzes various possibilities and gives biological examples that can be posited to emerge from these, such as dappled patterns, the recursive development of hydra's tentacles, woodruff's whorl of leaves, polygonal symmetry of flowers etc. Computational simulations are also provided as supporting evidence.

Finally he generalizes from the ring to a spherical model which is relevant for the blastula gastrulation. The concentration solutions then follow surface harmonics. The breakdown of homogeneity comes out to be axially symmetrical, and the growth can be amplified along one pole over another, leading to organ formation.
1258739 Ana suggested me to read this seminal paper by Turing, that uses reaction-diffusion theory for describing how chemical processes can lead to the spontaneous formation of biological patterns like stripes, spots, and other morphologies in organisms. As she said, "Turing provides a framework for understanding how patterns emerge in nature, bridging the gap between simple mathematical rules and the complex forms we observe." Its historical importance and the subsequent work that it has inspired can be found in Philip Ball's wonderful book Shapes, first in his trilogy on Nature' patterns.

The paper itself is freely available online. I am posting a paraphrase below that I jot down as I read the paper, trying to capture the essence under the mathematical intricacies.
1258739 An important clarification by Maudlin that the claim that "observation" magically changes the result of the experiment (in particular, destroying the double slit interference pattern) does not appear necessary if one follows the logic of wavefunction evolution and entanglement.

Which is to say - if one has already accepted that an electron is well described by modelling it as a wavefunction (i.e. an evolution of probability amplitudes which yields measurable outcomes using the Born's rule), and further that the wavefunctions of two independent particles will become entangled after they interact with one another and thereafter evolve as a unified state, completely differently to how they did when initially separated, then the fact that the interference pattern of the electron disappears upon trying to ascertain which of the two slits it went through does not require any additional explanations. Decoherence is thus predicted by the conceptual framework we are already working under, there is no extra secret sauce to layer on top.

He says :

"The more a given system interacts with other systems, the more entangled it becomes, and the more it tends to decohere. Experiments done on such a decohered system exhibit no interference. So if one takes interference to be the calling card of quantum theory, entanglement and decoherence make the world appear less quantum mechanical. But since the cause of the decoherence is entanglement, by Schrödinger’s lights, the observable interference disappears because the world is more quantum mechanical!
Entanglement and the consequent decoherence explain why we do not encounter quantum interference effects in everyday life. Avoiding decoherence requires severely limiting the interactions a system has with its environment (and even with parts of itself). Such isolation usually requires carefully prepared laboratory conditions."

Not surprisingly, he has a short temper for the standard idea of "a measurement collapsing the wavefunction into one of the eigenstates of the Hermitian operator", since a measurement is ordinarily supposed to be an interaction with a system that yields information about features that existed prior to the interaction. Thus, I don't "collapse my weight" and produce it as a result of the "measurement" when I climb on a scale - I use the correlation of the behavior of the apparatus with whatever was already present to find it. However, the recipe to use the Born's rule to calculate probabilities of observable outcomes when the "measurement" happens leaves one in a conundrum on how to make sense of it in terms of how one usually understands the word.
1258739 It appears that the entire previous discussion thread was deleted by Goodreads, presumably because I had included links to external websites such as Wikipedia, Youtube and Physics today.
So I am posting the references again just as text. Readers have to look them up themselves.

Other notable books on philosophical explorations of QM : Speakable and Unspeakable in Quantum Mechanics (J. Bell) , Quantum Mechanics and Experience (D. Albert), What is Real (Adam Becker).

Important (real and thought) experiments and discussions : Double Slit, Stern-Gerlach, Mach-Zehnder, Einstein-Podolsky-Rosen (EPR), Bell's inequalities and the tests that demonstrate that QM violates them, the Bohr-Einstein debates and Born-Einstein correspondences. (These are where the now famous "spooky action at a distance" and other quips were originally made)

N. David Mermin has published great articles in Nature, Physics Today etc. In particular, look up "Is the moon there when nobody looks?"

MIT 8.04 Quantum Physics I, Spring 2013 - First lecture from here, "Introduction to Superposition", is a wonderful introduction to the central paradox of QM.

Tim Maudlin and David Albert make multiple appearances, solo and with others, in the podcasts by Robinson Erhardt, Curt Jaimungal and Sean Carrol. Searching their names along with "Quantum Mechanics" will link to the relevant episodes on Youtube / Spotify.

The single most popular and authoritative blog for Quantum Mechanics and Computing is Scott Aaronson's Shtetl-Optimized
« previous 1