SnakeByte[17] The Metropolis Algorithm

Frame Grab From the movie Metropolis 1927

Who told you to attack the machines, you fools? Without them you’ll all die!!

~ Grot, the Guardian of the Heart Machine

First, as always, Oh Dear Reader, i hope you are safe. There are many unsafe places in and around the world in this current time. Second, this blog is a SnakeByte[] based on something that i knew about but had no idea it was called this by this name.

Third, relative to this, i must confess, Oh, Dear Reader, i have a disease of the bibliomaniac kind. i have an obsession with books and reading. “They” say that belief comes first, followed by admission. There is a Japanese word that translates to having so many books you cannot possibly read them all. This word is tsundoku. From the website (if you click on the word):

“Tsundoku dates from the Meiji era, and derives from a combination of tsunde-oku (to let things pile up) and dokusho (to read books). It can also refer to the stacks themselves. Crucially, it doesn’t carry a pejorative connotation, being more akin to bookworm than an irredeemable slob.”

Thus, while perusing a math-related book site, i came across a monograph entitled “The Metropolis Algorithm: Theory and Examples” by C Douglas Howard [1].

i was intrigued, and because it was 5 bucks (Side note: i always try to buy used and loved books), i decided to throw it into the virtual shopping buggy.

Upon receiving said monograph, i sat down to read it, and i was amazed to find it was closely related to something I was very familiar with from decades ago. This finally brings us to the current SnakeByte[].

The Metropolis Algorithm is a method in computational statistics used to sample from complex probability distributions. It is a type of Markov Chain Monte Carlo (MCMC) algorithm (i had no idea), which relies on Markov Chains to generate a sequence of samples that can approximate a desired distribution, even when direct sampling is complex. Yes, let me say that again – i had no idea. Go ahead LazyWebTM laugh!

So let us start with how the Metropolis Algorithm and how it relates to Markov Chains. (Caveat Emptor: You will need to dig out those statistics books and a little linear algebra.)

Markov Chains Basics

A Markov Chain is a mathematical system that transitions from one state to another in a state space. It has the property that the next state depends only on the current state, not the sequence of states preceding it. This is called the Markov property. The algorithm was introduced by Metropolis et al. (1953) in a Statistical Physics context and was generalized by Hastings (1970). It was considered in the context of image analysis (Geman and Geman, 1984) and data augmentation (Tanner (I’m not related that i know of…) and Wong, 1987). However, its routine use in statistics (especially for Bayesian inference) did not take place until Gelfand and Smith (1990) popularised it. For modern discussions of MCMC, see e.g. Tierney (1994), Smith and Roberts (1993), Gilks et al. (1996), and Roberts and Rosenthal (1998b).

Ergo, the name Metropolis-Hastings algorithm. Once again, i had no idea.

Anyhow,

A Markov Chain can be described by a set of states S and a transition matrix P , where each element P_{ij} represents the probability of transitioning from state i to state j .

Provide The Goal: Sampling from a Probability Distribution \pi(x)

In many applications (e.g., statistical mechanics, Bayesian inference, as mentioned), we are interested in sampling from a complex probability distribution \pi(x). This distribution might be difficult to sample from directly, but we can use a Markov Chain to create a sequence of samples that, after a certain period (called the burn-in period), will approximate \pi(x) .

Ok Now: The Metropolis Algorithm

The Metropolis Algorithm is one of the simplest MCMC algorithms to generate samples from \pi(x). It works by constructing a Markov Chain whose stationary distribution is the desired probability distribution \pi(x) . A stationary distribution is a probability distribution that remains the same over time in a Markov chain. Thus it can describe the long-term behavior of a chain, where the probabilities of being in each state do not change as time passes. (Whatever time is, i digress.)

The key steps of the algorithm are:

Initialization

Start with an initial guess x_0 , a point in the state space. This point can be chosen randomly or based on prior knowledge.

Proposal Step

From the current state x_t , propose a new state x^* using a proposal distribution q(x^*|x_t) , which suggests a candidate for the next state. This proposal distribution can be symmetric (e.g., a normal distribution centered at x_t ) or asymmetric.

Acceptance Probability

Calculate the acceptance probability \alpha for moving from the current state x_t to the proposed state x^* :

    \[\alpha = \min \left(1, \frac{\pi(x^) q(x_t | x^)}{\pi(x_t) q(x^* | x_t)} \right)\]

In the case where the proposal distribution is symmetric (i.e., q(x^|x_t) = q(x_t|x^)), the formula simplifies to:

    \[\alpha = \min \left(1, \frac{\pi(x^*)}{\pi(x_t)} \right)\]

Acceptance or Rejection

Generate a random number u from a uniform distribution U(0, 1)
If u \leq \alpha , accept the proposed state x^* , i.e., set x_{t+1} = x^* .
If u > \alpha , reject the proposed state and remain at the current state, i.e., set x_{t+1} = x_t .

Repeat

Repeat the proposal, acceptance, and rejection steps to generate a Markov Chain of samples.

Convergence and Stationary Distribution:

Over time, as more samples are generated, the Markov Chain converges to a stationary distribution. The stationary distribution is the target distribution \pi(x) , meaning the samples generated by the algorithm will approximate \pi(x) more closely as the number of iterations increases.

Applications:

The Metropolis Algorithm is widely used in various fields such as Bayesian statistics, physics (e.g., in the simulation of physical systems), machine learning, and finance. It is especially useful for high-dimensional problems where direct sampling is computationally expensive or impossible.

Key Features of the Metropolis Algorithm:

  • Simplicity: It’s easy to implement and doesn’t require knowledge of the normalization constant of \pi(x) , which can be difficult to compute.
  • Flexibility: It works with a wide range of proposal distributions, allowing the algorithm to be adapted to different problem contexts.
  • Efficiency: While it can be computationally demanding, the algorithm can provide high-quality approximations to complex distributions with well-chosen proposals and sufficient iterations.

The Metropolis-Hastings Algorithm is a more general version that allows for non-symmetric proposal distributions, expanding the range of problems the algorithm can handle.

Now let us code it up:

i am going to assume the underlying distribution is Gaussian with a time-dependent mean \mu_t, which changes slowly over time. We’ll use a simple time-series analytics setup to sample this distribution using the Metropolis Algorithm and plot the results. Note: When the target distribution is Gaussian (or close to Gaussian), the algorithm can converge more quickly to the true distribution because of the symmetric smooth nature of the normal distribution.

import numpy as np
import matplotlib.pyplot as plt

# Time-dependent mean function (example: sinusoidal pattern)
def mu_t(t):
    return 10 * np.sin(0.1 * t)

# Target distribution: Gaussian with time-varying mean mu_t and fixed variance
def target_distribution(x, t):
    mu = mu_t(t)
    sigma = 1.0  # Assume fixed variance for simplicity
    return np.exp(-0.5 * ((x - mu) / sigma) ** 2)

# Metropolis Algorithm for time-series sampling
def metropolis_sampling(num_samples, initial_x, proposal_std, time_steps):
    samples = np.zeros(num_samples)
    samples[0] = initial_x

    # Iterate over the time steps
    for t in range(1, num_samples):
        # Propose a new state based on the current state
        x_current = samples[t - 1]
        x_proposed = np.random.normal(x_current, proposal_std)

        # Acceptance probability (Metropolis-Hastings step)
        acceptance_ratio = target_distribution(x_proposed, time_steps[t]) / target_distribution(x_current, time_steps[t])
        acceptance_probability = min(1, acceptance_ratio)

        # Accept or reject the proposed sample
        if np.random.rand() < acceptance_probability:
            samples[t] = x_proposed
        else:
            samples[t] = x_current

    return samples

# Parameters
num_samples = 10000  # Total number of samples to generate
initial_x = 0.0      # Initial state
proposal_std = 0.5   # Standard deviation for proposal distribution
time_steps = np.linspace(0, 1000, num_samples)  # Time steps for temporal evolution

# Run the Metropolis Algorithm
samples = metropolis_sampling(num_samples, initial_x, proposal_std, time_steps)

# Plot the time series of samples and the underlying mean function
plt.figure(figsize=(12, 6))

# Plot the samples over time
plt.plot(time_steps, samples, label='Metropolis Samples', alpha=0.7)

# Plot the underlying time-varying mean (true function)
plt.plot(time_steps, mu_t(time_steps), label='True Mean \\mu_t', color='red', linewidth=2)

plt.title("Metropolis Algorithm Sampling with Time-Varying Gaussian Distribution")
plt.xlabel("Time")
plt.ylabel("Sample Value")
plt.legend()
plt.grid(True)
plt.show()

Output of Python Script Figure 1.0

Ok, What’s going on here?

For the Target Distribution:

The function mu_t(t) defines a time-varying mean for the distribution. In this example, it follows a sinusoidal pattern.
The function target_distribution(x, t) models a Gaussian distribution with mean \mu_t and a fixed variance (set to 1.0).


Metropolis Algorithm:

The metropolis_sampling function implements the Metropolis algorithm. It iterates over time, generating samples from the time-varying distribution. The acceptance probability is calculated using the target distribution at each time step.


Proposal Distribution:

A normal distribution centered around the current state with standard deviation proposal_std is used to propose new states.


Temporal Evolution:

The time steps are generated using np.linspace to simulate temporal evolution, which can be used in time-series analytics.


Plot The Results:

The results are plotted, showing the samples generated by the Metropolis algorithm as well as the true underlying mean function \mu_t (in red).

The plot shows the Metropolis samples over time, which should cluster around the time-varying mean \mu_t of the distribution. As time progresses, the samples follow the red curve (the true mean) as time moves on like and arrow in this case.

Now you are probably asking “Hey is there a more pythonic library way to to this?”. Oh Dear Reader i am glad you asked! Yes There Is A Python Library! AFAIC PyMC started it all. Most probably know it as PyMc3 (formerly known as…). There is a great writeup here: History of PyMc.

We are golden age of probabilistic programming.

~ Chris Fonnesbeck (creator of PyMC) 

Lets convert it using PyMC. Steps to Conversion:

  1. Define the probabilistic model using PyMC’s modeling syntax.
  2. Specify the Gaussian likelihood with the time-varying mean \mu_t .
  3. Use PyMC’s built-in Metropolis sampler.
  4. Visualize the results similarly to how we did earlier.
import pymc as pm
import numpy as np
import matplotlib.pyplot as plt

# Time-dependent mean function (example: sinusoidal pattern)
def mu_t(t):
    return 10 * np.sin(0.1 * t)

# Set random seed for reproducibility
np.random.seed(42)

# Number of time points and samples
num_samples = 10000
time_steps = np.linspace(0, 1000, num_samples)

# PyMC model definition
with pm.Model() as model:
    # Prior for the time-varying parameter (mean of Gaussian)
    mu_t_values = mu_t(time_steps)

    # Observational model: Normally distributed samples with time-varying mean and fixed variance
    sigma = 1.0  # Fixed variance
    x = pm.Normal('x', mu=mu_t_values, sigma=sigma, shape=num_samples)

    # Use the Metropolis sampler explicitly
    step = pm.Metropolis()

    # Run MCMC sampling with the Metropolis step
    samples_all = pm.sample(num_samples, tune=1000, step=step, chains=5, return_inferencedata=False)

# Extract one chain's worth of samples for plotting
samples = samples_all['x'][0]  # Taking only the first chain

# Plot the time series of samples and the underlying mean function
plt.figure(figsize=(12, 6))

# Plot the samples over time
plt.plot(time_steps, samples, label='PyMC Metropolis Samples', alpha=0.7)

# Plot the underlying time-varying mean (true function)
plt.plot(time_steps, mu_t(time_steps), label='True Mean \\mu_t', color='red', linewidth=2)

plt.title("PyMC Metropolis Sampling with Time-Varying Gaussian Distribution")
plt.xlabel("Time")
plt.ylabel("Sample Value")
plt.legend()
plt.grid(True)
plt.show()

When you execute this code you will see the following status bar:

It will be a while. Go grab your favorite beverage and take a walk…..

Output of Python Script Figure 1.1

Key Differences from the Previous Code:

PyMC Model Usage Definition:
In PyMC, the model is defined using the pm.Model() context. The x variable is defined as a Normal distribution with the time-varying mean \mu_t . Instead of manually implementing the acceptance probability, PyMC handles this automatically with the specified sampler.

Metropolis Sampler:
PyMC allows us to specify the sampling method. Here, we explicitly use the Metropolis algorithm with pm.Metropolis().

Samples Parameter:
We specify shape=num_samples in the pm.Normal() distribution to indicate that we want a series of samples for each time step.

Plotting:
The resulting plot will show the sampled values using the PyMC Metropolis algorithm compared with the true underlying mean, similar to the earlier approach. Now, samples has the same shape as time_steps (in this case, both with 10,000 elements), allowing you to plot the sample values correctly against the time points; otherwise, the x and y axes would not align.

NOTE: We used this library at one of our previous health startups with great success.

Optimizations herewith include several. There is a default setting in PyMC which is called NUTS.
No need to manually set the number of leapfrog steps. NUTS automatically determines the optimal number of steps for each iteration, preventing inefficient or divergent sampling. NUTS automatically stops the trajectory when it detects that the particle is about to turn back on itself (i.e., when the trajectory “U-turns”). A U-turn means that continuing to move in the same direction would result in redundant exploration of the space and inefficient sampling. When NUTS detects this, it terminates the trajectory early, preventing unnecessary steps. Also the acceptance rates on convergence are higher.

There are several references to this set of algorithms. It truly a case of both mathematical and computational elegance.

Of course you have to know what the name means. They say words have meanings. Then again one cannot know everything.

Until Then,

#iwishyouwater <- Of all places Alabama getting the memo From Helene 2024

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr) / X

Music To Blog By: View From The Magicians Window, The Psychic Circle

References:

[1] The Metropolis Algorithm: Theory and Examples by C Douglas Howard

[2] The Metropolis-Hastings Algorithm: A note by Danielle Navarro

[3] Github code for Sample Based Inference by bashhwu

Entire Metropolis Movie For Your Viewing Pleasure. (AFAIC The most amazing Sci-Fi movie besides BladeRunner)

What Would Nash,Shannon,Turing, Wiener and von Neumann Think?

An image of the folks as mentioned above via the GAN de jour

First, as usual, i trust everyone is safe. Second, I’ve been “thoughting” a good deal about how the world is being eaten by software and, recently, machine learning. i personally have a tough time with using the words artificial intelligence.

What Would Nash, Shannon, Turing, Wiener, and von Neumann Think of Today’s World?

The modern world is a product of the mathematical and scientific brilliance of a handful of intellectual pioneers who happen to be whom i call the Horsemen of The Digital Future. i consider these humans to be my heroes and persons that i aspire to be whereas most have not accomplished one-quarter of the work product the humans have created for humanity. Among these giants are Dr. John Nash, Dr. Claude Shannon, Dr. Alan Turing, Dr. Norbert Wiener, and Dr. John von Neumann. Each of them, in their own way, laid the groundwork for concepts that now define our digital and technological age: game theory, information theory, artificial intelligence, cybernetics, and computing. But what would they think if they could see how their ideas, theories and creations have shaped the 21st century?

A little context.

John Nash: The Game Theorist

John Nash revolutionized economics, mathematics, and strategic decision-making through his groundbreaking work in game theory. His Nash Equilibrium describes how parties, whether they be countries, companies, or individuals, can find optimal strategies in competitive situations. Today, his work influences fields as diverse as economics, politics, and evolutionary biology. NOTE: Computational Consensus Not So Hard; Carbon (Human) Consensus Nigh Impossible.

The Nash equilibrium is the set of degradation strategies 

    \[(E_i^*,E_j^*)\]

 

such that, if both players adopt it, neither player can achieve a higher payoff by changing strategies. Therefore, two rational agents should be expected to pick the Nash equilibrium as their strategy.

If Nash were alive today, he would be amazed at how game theory has permeated decision-making in technology, particularly in algorithms used for machine learning, cryptocurrency trading, and even optimizing social networks. His equilibrium models are at the heart of competitive strategies used by businesses and governments alike. With the rise of AI systems, Nash might ponder the implications of intelligent agents learning to “outplay” human actors and question what ethical boundaries should be set when AI is used in geopolitical or financial arenas.

Claude Shannon: The Father of Information Theory

Claude Shannon’s work on information theory is perhaps the most essential building block of the digital age. His concept of representing and transmitting data efficiently set the stage for everything from telecommunications to the Internet as we know it. Shannon predicted the rise of digital communication and laid the foundations for the compression and encryption algorithms protecting our data. He also is the father of my favorite equation mapping the original entropy equation from thermodynamics to channel capacity:

    \[H=-1/N \sum_{i=1}^{N} P_i\,log_2\,P_i\]

The shear elegance and magnitude is unprecedented. If he were here, Shannon would witness the unprecedented explosion of data, quantities, and speeds far beyond what was conceivable in his era. The Internet of Things (IoT), big data analytics, 5G/6G networks, and quantum computing are evolutions directly related to his early ideas. He might also be interested in cybersecurity challenges, where information theory is critical in protecting global communications. Shannon would likely marvel at the sheer volume of information we produce yet be cautious of the potential misuse and the ethical quandaries regarding privacy, surveillance, and data ownership.

Alan Turing: The Architect of Artificial Intelligence

Alan Turing’s vision of machines capable of performing any conceivable task laid the foundation for modern computing and artificial intelligence. His Turing Machine is still a core concept in the theory of computation, and his famous Turing Test continues to be a benchmark in determining machine intelligence.

In today’s world, Turing would see his dream of intelligent machines realized—and then some. From self-driving cars to voice assistants like Siri and Alexa, AI systems are increasingly mimicking human cognition human capabilities in specific tasks like data analysis, pattern recognition, and simple problem-solving. While Turing would likely be excited by this progress, he might also wrestle with the ethical dilemmas arising from AI, such as autonomy, job displacement, and the dangers of creating highly autonomous AI systems as well as calling bluff on the fact that LLM systems do not reason in the same manner as human cognition on basing the results on probabilistic convex optimizations. His work on breaking the Enigma code might inspire him to delve into modern cryptography and cybersecurity challenges as well. His reaction-diffusion model called Turings Metapmorphsis equation, is foundational in explaining biological systems:

Turing’s reaction-diffusion system is typically written as a system of partial differential equations (PDEs):

    \[\frac{\partial u}{\partial t} &= D_u \nabla^2 u + f(u, v),\]


    \[\frac{\partial v}{\partial t} &= D_v \nabla^2 v + g(u, v),\]

where:

    \[\begin{itemize}\item $u$ and $v$ are concentrations of two chemical substances (morphogens),\item $D_u$ and $D_v$ are diffusion coefficients for $u$ and $v$,\item $\nabla^2$ is the Laplacian operator, representing spatial diffusion,\item $f(u, v)$ and $g(u, v)$ are reaction terms representing the interaction between $u$ and $v$.\end{itemize}\]

In addition to this, his contributions to cryptography and game theory alone are infathomable.
In his famous paper, Computing Machinery and Intelligence,” Turing posed the question, “Can machines think?” He proposed the Turing Test as a way to assess whether a machine can exhibit intelligent behavior indistinguishable from a human. This test has been a benchmark in AI for evaluating a machine’s ability to imitate human intelligence.

Given the recent advances made with large language models, I believe he would find it amusing, not that they think or reason.

Norbert Wiener: The Father of Cybernetics

Norbert Wiener’s theory of cybernetics explored the interplay between humans, machines, and systems, particularly how systems could regulate themselves through feedback loops. His ideas greatly influenced robotics, automation, and artificial intelligence. He wrote the books “Cybernetics” and “The Human Use of Humans”. During World War II, his work on the automatic aiming and firing of anti-aircraft guns caused Wiener to investigate information theory independently of Claude Shannon and to invent the Wiener filter. (The now-standard practice of modeling an information source as a random process—in other words, as a variety of noise—is due to Wiener.) Initially, his anti-aircraft work led him to write, with Arturo Rosenblueth and Julian Bigelow, the 1943 article ‘Behavior, Purpose and Teleology. He was also a complete pacifist. What was said about those who can hold two opposing views?

If Wiener were alive today, he would be fascinated by the rise of autonomous systems, from drones to self-regulated automated software, and the increasing role of cybernetic organisms (cyborgs) through advancements in bioengineering and robotic prosthetics. He, I would think, would also be amazed that we could do real-time frequency domain filtering based on his theories. However, Wiener’s warnings about unchecked automation and the need for human control over machines would likely be louder today. He might be deeply concerned about the potential for AI-driven systems to exacerbate inequalities or even spiral out of control without sufficient ethical oversight. The interaction between humans and machines in fields like healthcare, where cybernetics merges with biotechnology, would also be a keen point of interest for him.

John von Neumann: The Architect of Modern Computing

John von Neumann’s contributions span so many disciplines that it’s difficult to pinpoint just one. He’s perhaps most famous for his von Neumann architecture, the foundation of most modern computer systems, and his contributions to quantum mechanics and game theory. His visionary thinking on self-replicating machines even predated discussions of nanotechnology.

Von Neumann would likely be astounded by the ubiquity and power of modern computers. His architectural design is the backbone of nearly every device we use today, from smartphones to supercomputers. He would also find significant developments in quantum computing, aligning with his quantum mechanics work. As someone who worked on the Manhattan Project (also Opphenhiemer), von Neumann might also reflect on the dual-use nature of technology—the incredible potential of AI, nuclear power, and autonomous weapons to both benefit and harm humanity. His early concerns about the potential for mutual destruction could be echoed in today’s discussions on AI governance and existential risks.

What Would They Think Overall?

Together, these visionaries would undoubtedly marvel at how their individual contributions have woven into the very fabric of today’s society. The rapid advancements in AI, data transmission, computing power, and autonomous systems would be thrilling, but they might also feel a collective sense of responsibility to ask:

Where do we go from here?

Once again Oh Dear Reader You pre-empt me….

A colleague sent me this paper, which was the impetus for this blog:

My synopsis of said paper:


The Tensor as an Informational Resource” discusses the mathematical and computational importance of tensors as resources, particularly in quantum mechanics, AI, and computational complexity. The authors propose new preorders for comparing tensors and explore the notion of tensor rank and transformations, which generalize key problems in these fields. This paper is vital for understanding how the foundational work of Nash, Shannon, Turing, Wiener, and von Neumann has evolved into modern AI and quantum computing. Tensors offer a new frontier in scientific discovery, building on their theories and pushing the boundaries of computational efficiency, information processing, and artificial intelligence. It’s an extension of their legacy, providing a mathematical framework that could revolutionize our interaction with quantum information and complex systems. Fundamental to systems that appear to learn where the information-theoretic transforms are the very rosetta stone of how we perceive the world through perceptual filters of reality.

This shows the continuing relevance in ALL their ideas in today’s rapidly advancing AI and fluid computing technological landscape.

They might question whether today’s technology has outpaced ethical considerations and whether the systems they helped build are being used for the betterment of all humanity. Surveillance, privacy, inequality, and autonomous warfare would likely weigh heavily on their minds. Yet, their boundless curiosity and intellectual rigor would inspire them to continue pushing the boundaries of what’s possible, always seeking new answers to the timeless question of how to create the future we want and live better, more enlightened lives through science and technology.

Their legacy lives on, but so does their challenge to us: to use the tools they gave us wisely for the greater good of all.

Or would they be dismayed that we use all of this technology to make a powerpoint to save time so we can watch tik tok all day?

Until Then,

#iwishyouwater <- click and see folks who got the memo

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr) / X

Music To blog by: Bach: Mass in B Minor, BWV 232. By far my favorite composer. The John Eliot Gardiner and Monterverdi Choir version circa 1985 is astounding.

Snake_Byte:[13] The Describe Function.

DALLE-2 Draws Describe

First i trust everyone is safe. Second i hope people are recovering somewhat from the SVB situation. We are at the end of a era, cycle or epoch; take your pick. Third i felt like picking a Python function that was simple in nature but very helpful.

The function is pandas.describe(). i’ve previously written about other introspection libraries like DABL however this is rather simple and in place. Actually i never had utilized it before. i was working on some other code as a hobby in the areas of transfer learning and was playing around with some data and decided to to use the breast cancer data form the sklearn library which is much like the iris data used for canonical modeling and comparison. Most machine learning is data cleansing and feature selection so lets start with something we know.

Breast cancer is the second most common cancer in women worldwide, with an estimated 2.3 million new cases in 2020. Early detection is key to improving survival rates, and machine learning algorithms can aid in diagnosing and treating breast cancer. In this blog, we will explore how to load and analyze the breast cancer dataset using the scikit-learn library in Python.

The breast cancer dataset is included in scikit-learn's datasets module, which contains a variety of well-known datasets for machine learning. The features describe the characteristics of the cell nuclei present in the image. We can load the dataset using the load_breast_cancer function, which returns a dictionary-like object containing the data and metadata about the dataset.

It has been surmised that machine learning is mostly data exploration and data cleaning.

from sklearn.datasets import load_breast_cancer
import pandas as pd

#Load the breast cancer dataset
data = load_breast_cancer()

The data object returned by load_breast_cancer contains the feature data and the target variable. The feature data contains measurements of 30 different features, such as radius, texture, and symmetry, extracted from digitized images of fine needle aspirate (FNA) of breast mass. The target variable is binary, with a value of 0 indicating a benign tumor and a value of 1 indicating a malignant tumor.

We can convert the feature data and target variable into a pandas dataframe using the DataFrame constructor from the pandas library. We also add a column to the dataframe containing the target variable.

#Convert the data to a pandas dataframe
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = pd.Series(data.target)

Finally, we can use the describe method of the pandas dataframe to get a summary of the dataset. The describe method returns a table containing the count, mean, standard deviation, minimum, and maximum values for each feature, as well as the count, mean, standard deviation, minimum, and maximum values for the target variable.

#Use the describe() method to get a summary of the dataset
print(df.describe())

The output of the describe method is as follows:

mean radius  mean texture  ...  worst symmetry      target
count   569.000000    569.000000  ...      569.000000  569.000000
mean     14.127292     19.289649  ...        0.290076    0.627417
std       3.524049      4.301036  ...        0.061867    0.483918
min       6.981000      9.710000  ...        0.156500    0.000000
25%      11.700000     16.170000  ...        0.250400    0.000000
50%      13.370000     18.840000  ...        0.282200    1.000000
75%      15.780000     21.800000  ...        0.317900    1.000000
max      28.110000     39.280000  ...        0.663800    1.000000

[8 rows x 31 columns]

From the summary statistics, we can see that the mean values of the features vary widely, with the mean radius ranging from 6.981 to 28.11 and the mean texture ranging from 9.71 to 39.28. We can also see that the target variable is roughly balanced, with 62.7% of the tumors being malignant.

Pretty nice utility.

Then again in looking at this data one would think we could get to first principles engineering and root causes and make it go away? This directly affects motherhood which i still believe is the hardest job in humanity. Makes you wonder where all the money goes?

Until then,

#iwishyouwater <- Free Diver Steph who is also a mom hunting pelagics on #onebreath

Muzak To Blog By Peter Gabriel’s “Peter Gabriels 3: Melt (remastered). He is coming out with a new album. Games Without Frontiers and Intruder are timeless. i applied long ago to work at Real World Studios and received the nicest rejection letter.

Snake_Byte[11] Linear Algebra, Matrices and Products – Oh My!

Algebra is the metaphysics of arithmetic.

~ John Ray
Looks Hard.

First, as always, i hope everyone is safe, Second, as i mentioned in my last Snake_Byte [] let us do something a little more technical and scientific. For context, the catalyst for this was a surprising discussion that came from how current machine learning interviews are being conducted and how the basics of the distance between two vectors have been overlooked. So this is a basic example and in the following Snake_Byte [] i promise to get into something a little more say carnivore.

With that let us move to some linear algebra. For those that don’t know what linear algebra is, i will refer you to the best book on the subject, Professor Gilbert Strang’s Linear Algebra and its Applications.

i am biased here; however, i do believe the two most important areas of machine learning and data science are linear algebra and probability, with optimization techniques coming in a close third.

So dear reader, please bear with me here. We will review a little math; maybe for some, this will be new, and for those that already know this, you can rest your glass-balls.

We denote x\in\mathbb(R)^N be N-dimensional vectors taking real numbers as their entries. For example:

\begin{bmatrix}\Huge 0 \\ 1 \\ 2 \end{bmatrix}

where \{a_i\} are the indices respectively. In this case [3].

An M-by-N matrix is denoted as X\in\mathbb(R)^N . The transpose of a matrix is denoted as X^T. A matrix X can be viewed according to its columns and its rows:

\begin{bmatrix}  0 & 1 & 2 \\ 3 & 4 & 5\\ 6 & 7 & 8 \\ \end{bmatrix}

where \{a_i_j\} are the row and column indices.

An array is a data structure in python programming that holds fix number of elements and these elements should be of the same data type. The main idea behind using an array of storing multiple elements of the same type. Most of the data structure makes use of an array to implement their algorithm. There is two important parts of the array:

  • Element: Each item stored in the array is called an element.
  • Index: Every element in the array has its own numerical value to identify the element.

Think of programming a loop, tuple, list,array,range or matrix:

from math import exp
v1 = [x, y] # list of variables
v2 = (-1, 2) # tuple of numbers
v3 = (x1, x2, x3) # tuple of variables

v4 = [exp(-i*0.1) for i in range(150)] #ye ole range loop

and check this out for a matrix:

import numpy as np
a = np.matrix('0 1:2 3')
print (a)
output: [[0 1]
 [2 3]]

which folks is why we like the Snake Language. Really that is about it for vectors and matrices. The theory is where you get into proofs and derivations which can save you a ton of time on optimizations.

So now let’s double click on some things that will make you sound cool at the parties or meetups.

A vector can be multiplied by a number. This number a is usually denoted as a scalar:

a\cdot (v_1,v_2) = (av_1,av_2)

Now given this one of the most fundamental aspects in all of machine-learning is the inner product, also called dot product, or scalar product, of two vectors, is a number. Most of all, machine learning algorithms have some form of a dot product somewhere within the depths of all the mathz. Nvidia GPUs are optimized for (you guessed it) dot products.

So how do we set this up? Multiplication of scalar a and a vector (v_1,\dots,v_{n-1}) yields:

(av_0,\dots,av_{n-1})

Ok good so far.

The inner or dot product of two n-vectors is defined as:

(u_0,\dots,u_{n-1})\cdot(v_0,\dots,v_{n-1}) = u_0v_0 +,\dots,+ u_{n-1}v_{n-1}

which, if you are paying attention yields:

(1)   \begin{equation*} = \sum_{j=0}^{N-1}{u_jv_j}\end{equation*}

Geometrically, the dot product of U and V equals the length of U times the length of V times the cosine of the angle between them:

\textbd{U}\cdot\textbf{V}=|\textbf{U}||\textbf{V}|\cos\theta

ok so big deal huh? yea, but check this out in the Snake_Language:

# dot product of two vectors
 
# Importing numpy module
import numpy as np
 
# Taking two scalar values
a = 5
b = 7
 
# Calculating dot product using dot()
print(np.dot(a, b))
output: 35

hey now!

# Importing numpy module
import numpy as np
 
# Taking two 2D array
# For 2-D arrays it is the matrix product
a = [[2, 1], [0, 3]]
b = [[1, 1], [3, 2]]
 
# Calculating dot product using dot()
print(np.dot(a, b))
output:[[5 4]
       [9 6]]

Mathematically speaking the inner product is a generalization of a dot product. As we said constructing a vector is done using the command np.array. Inside this command, one needs to enter the array. For a column vector, we write [[1],[2],[3]], with an outer [], and three inner [] for each entry. If the vector is a row vector, the one can omit the inner []’s by just calling np.array([1, 2, 3]).

Given two column vectors x and y, the inner product is computed via np.dot(x.T,y), where np.dot is the command for inner product, and x.T returns the transpose of x. One can also call np.transpose(x), which is the same as x.T.

 # Python code to perform an inner product with transposition
 import numpy as np
 x = np.array([[1],[0],[-1]])
 y = np.array([[3],[2],[0]]) 
 z = np.dot(np.transpose(x),y)
print (z) 


Yes, now dear read you now can impress your friends with your linear algebra and python prowess.

Note: In this case, the dot product is scale independent for actual purposes of real computation you must do something called a norm of a vector. i won’t go into the mechanics of this unless asked for further explanations on the mechanics of linear algebra. i will gladly go into pythonic examples if so asked and will be happy to write about said subject. Feel free to inquire in the comments below.

Unitl Then,

#iwishyouwater <- Nathan Florence with Kelly Slater at the Box. Watch.

tctjr.

Muzak to Blog By: INXS. i had forgotten how good of a band they were and the catalog. Michael Hutchinson, the lead singer, hung himself in a hotel room. Check out the song “By My Side”, “Dont Change” and “Never Tear Us Apart” and “To Look At You”. They weren’t afraid the take production chances.

Note[2]: i resurrected some very old content from a previous site i owned i imported the older blogs. Some hilarious. Some sad. Some infuriating. i’m shining them up. Feel free to look back in time.

Snake_Byte[6] Algorithm Complexity

The Lighter Side of Complexity - The Complexity Project
Your software design?

Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius and a lot of courage to move in the opposite direction.

E.F. Schumacher

First, i hope everyone is safe.

Second, i had meant this for reading over Thanksgiving but transparently I was having technical difficulties with \LATEX rendering and it appears that both MATHJAX and native LATEX are not working on my site. For those interested i even injected the MATHJAX code into my .php header. Hence i had to rewrite a bunch of stuff alas with no equations. Although for some reason unbenowst to me my table worked.

Third, Hey its time for a Snake_Byte [] !

In this installment, i will be discussing Algorithm Complexity and will be using a Python method that i previously wrote about in Snake_Byte[5]: Range.

So what is algorithm complexity?  Well, you may remember in your mathematics or computer science classes “Big Oh” notation.  For those that don’t know this involves both space and time complexity not to be confused with Space-Time Continuums.  

Let’s hit the LazyWeb and particularly Wikipedia:

“Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. It is a member of a family of notations invented by Paul Bachmann, Edmund Landau, and others collectively called Bachmann–Landau notation or asymptotic notation.”

— Wikipedia’s definition of Big O notation

Hmmm.   Let’s try to parse that a little better shall we?

So you want to figure out how slow or hopefully how fast your code is using fancy algebraic terms and terminology.  So you want to measure the algorithmic behavior as a function of two variables with time complexity and space complexity.  Time is both the throughput as well as how fast from t0-tni1 the algorithm operates.  Then we have space complexity which is literally how much memory (either in memory or persistent memory) the algorithms require as a function of the input.  As an added bonus you can throw around the word asymptotic:

From Dictionary.com

/ (ˌæsɪmˈtɒtɪk) / adjective. of or referring to an asymptote. (of a function, series, formula, etc) approaching a given value or condition, as a variable or an expression containing a variable approaches a limit, usually infinity.

Ergo asymptotic analysis means how the algorithm responds “to” or “with” values that approach ∞.

So “Hey what’s the asymptotic response of the algorithm?”

Hence we need a language that will allow us to say that the computing time, as a function of (n), grows ‘on the order of n3,’ or ‘at most as fast as n3,’ or ‘at least as fast as n *log*n,’ etc.

There are five symbols that are used in the language of comparing the rates of growth of functions they are the following five: ‘o’ (read ‘is little oh of’), O (read ‘is big oh of’), ‘θ’ (read ‘is theta of’), ‘∼’ (read ‘is asymptotically equal to’ or, irreverently, as ‘twiddles’), and Ω (read ‘is omega of’). It is interesting to note there are discrepancies amongst the ranks of computer science and mathematics as to the accuracy and validity of each. We will just keep it simple and say Big-Oh.

So given f(x) and g(x) be two functions of x. Where each of the five symbols above are intended to compare the rapidity of growth of f and g. If we say that f(x) = o(g(x)), then informally we are saying that f grows more slowly than g does when x is very large.

Let’s address the time complexity piece i don’t want to get philosophical on What is Time? So for now and this blog i will make the bounds it just like an arrow t(0) – t(n-1)

That said the analysis of the algorithm is for an order of magnitude not the actual running time. There is a python function called time that we can use to do an exact analysis for the running time.  Remember this is to save you time upfront to gain an understanding of the time complexity before and while you are designing said algorithm.

Most arithmetic operations are constant time; multiplication usually takes longer than addition and subtraction, and division takes even longer, but these run times don’t depend on the magnitude of the operands. Very large integers are an exception; in that case, the run time increases with the number of digits.

So for Indexing operations whether reading or writing elements in a sequence or dictionary are also constant time, regardless of the size of the data structure.

A for loop that traverses a sequence or dictionary is usually linear, as long as all of the operations in the body of the loop are constant time.

The built-in function sum is also linear because it does the same thing, but it tends to be faster because it is a more efficient implementation; in the language of algorithmic analysis, it has a smaller leading coefficient.

If you use the same loop to “add” a list of strings, the run time is quadratic because string concatenation is linear.

The string method join is usually faster because it is linear in the total length of the strings.

So let’s look at an example using the previous aforementioned range built-in function:

So this is much like the linear example above: The lowest complexity is O(1). When we have a loop:


k = 0
for i in range(n):
    for j in range(m):
        print(i)
        k=k+1

In this case for nested loops we multiply the time complexity thus O(n*m). it also works the same for a loop with time complexity (n) we call a function a function with time complexity (m). When calculating complexity we omit the constant regardless if its execution 5 or 100 times.

When you are performing an analysis look for worst-case boundary conditions or examples.

Linear O(n):

for i in range(n):
 if t[i] == 0:
   return 0
return 1

Quadratic O(n**2):

res = 0
for i in range (n):
   for in range (m):
      res += 1
return (res)

There are other types if time complexity like exponential time and factorial time. Exponential Time is O(2**n) and Factorial Time is O(n!).

For space complexity memory has a limit especially if you have ever chased down a heap allocation or trash collection bug. Like we said earlier there is no free lunch you either trade space for time or time for space. Data-driven architectures respond to the input size of the data. Thus the dimensionality of the input space needs to be addressed. If you have a constant number of variables: O(1). If you need to declare an array like using numpy for instance with (n) elements then you have linear space complexity O(n). Remember these are independent of the size of the problem.

For a great book on Algorithm Design and Analysis i highly recommend:

The Algorithm Design Manual by Steven S. Skiena (click it takes you to amazon)

It goes in-depth to growth rates and dominance relations etc `as it relates to graph algorithms, search and sorting as well as cryptographic functions.

There is also a trilogy of sorts called Algorithms Unlocked and Illuminated by Roughgarden and Cormen which are great and less mathematically rigorous if that is not your forte.

Well, i hope this gave you a taste. i had meant this to be a much longer and more in-depth blog however i need to fix this latex issue so i can properly address the matters at hand.

Until then,

#iwishyouwater <- Alexey Molchanov new world freedive record. He is a really awesome human.

Muzak To Blog By: Maddalena (Original Motion Picture Soundtrack) by the Maestro Ennio Morricone – Rest in Power Maestro i have spent many hours listening to your works.

Snake_Byte[4]: Random and PseudoRandom Numbers

Expose yourself to as much randomness as possible.


~ Ben Casnocha
Visualization of the algorithmic random data
A Visualization Of Randomness

First i trust everyone is safe.

Second it is WEDNESDAY and that must mean a Snake_Byte or you are working in a startup because every day is WEDNESDAY in a startup!

i almost didn’t get this one done because well life happens but i want to remain true to the goals herewith to the best of my ability.

So in today’s Snake_Byte we are going to cover Random and PseudoRandom Numbers.  i really liked this one because it was more in line with scientific computing and numerical optimization.

The random module in Python generates what is called pseudorandom numbers.  It is in the vernacular a pseudorandom number generator (PRNG).  This generation includes different types of distributions for said numbers. 

So what is a pseudorandom number:

“A pseudorandom number generator (PRNG), also known as a deterministic random bit generator, is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers.” ~ Wikipedia

The important aspect here is:  the properties approximate sequences of random numbers.  So this means that it is statistically random even though it was generated by a deterministic response.

While i have used the random module and have even generated various random number algorithms i learned something new in this blog.  The pseudorandom number generator in Python uses an algorithm called the Mersenne Twister algorithm.  The period of said algorithm is length 2**19937-1 for the 32 bit version and there is also a 64-bit version.  The underlying implementation in C is both fast and thread-safe. The Mersenne Twister is one of the most extensively tested random number generators in existence. One issue though is that due to the deterministic nature of the algorithm it is not suitable for cryptographic methods.

Let us delve down into some code into the various random module offerings, shall we?

i like using %system in Jupyter Lab to create an interactive session. First we import random. Lets look at random.random() which returns a uniform distribution and when multiplied by a integer bounds it within that distribution range:

%system
import random
for i in range (5):
    x = random.random() * 100
    print (x)
63.281889167063035
0.13679757425121286
47.697874648329
96.66882808709684
76.63300711554905

Next let us check out random.choice(seq) which returns a random element from the non-empty sequence seq. If seq is empty, raises IndexError:

for z in range (5):
mySurfBoardlist = ["longboard", "shortboard", "boogieboard"]
print(random.choice(mySurfBoardlist))
longboard
boogieboard
boogieboard
longboard
shortboard

Next let us look at random.randrange(startstop[, step]) which returns a randomly selected element from range(start, stop, step). This is equivalent to choice(range(start, stop, step)) but doesn’t actually build a range object.

ParameterDescription
startOptional. An integer specifying at which position to start.
Default 0
stopRequired. An integer specifying at which position to end.
stepOptional. An integer specifying the incrementation.
Default 1
random.ranrange parameters
for i in range (5): 
      print(random.randrange(10, 100,1))
84
21
94
91
87

Now let us move on to some calls that you would use in signal processing, statistics or machine learning. The first one is gauss(). gauss() returns a gaussian distribution using the following mathematics:

\Large f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}\right)

Gaussian distribution (also known as normal distribution) is a bell-shaped curve (aka the bell curve), and it is assumed that during any measurement values will follow a normal distribution with an equal number of measurements above and below the mean value.

ParameterDescription
muthe mean
sigmathe standard deviation
returns a random gaussian distribution floating number
gauss() parameters
# import the required libraries 
import random 
import matplotlib.pyplot as plt 
#set the inline magic
%matplotlib inline   
# store the random numbers in a list 
nums = [] 
mu = 100
sigma = 50
    
for i in range(100000): 
    temp = random.gauss(mu, sigma) 
    nums.append(temp) 
        
# plot the distribution 
plt.hist(nums, bins = 500, ec="red") 
plt.show()
Gaussian Distribution in Red

There are several more parameters in the random module, setter functions, seed functions and very complex statistical functions. Hit stack overflow and give it a try! Also it doesn’t hurt if you dust off that probability and statistics textbook!

As a last thought which came first the framework of entropy or the framework of randomness? As well as is everything truly random? i would love to hear your thought in the comments!

Until then,

#iwishyouwater <- click here on this one!

tctjr

References:

Python In A Nutshell by Alex Martelli

M. Matsumoto and T. Nishimura, “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator”, ACM Transactions on Modeling and Computer Simulation Vol. 8, No. 1, January pp.3–30 1998

Muzak To Muzak To Blog By:  Black Sabbath  – The End: Live In Birmingham

Snake_Byte[2]: Comparisons and Equality

Contrariwise, continued Tweedledee, if it was so, it might be, and if it were so, it would be; but as it isn’t, it ain’t. That’s logic!

TweedleDee
Algebra, trigonometry and mathematical logic lessons by Janetvr | Fiverr
It’s all rational isn’t it?

First, i trust everyone is safe.

Second, i am going to be pushing a blog out every Wednesday called Snake_Bytes.  This is the second one hot off the press.  Snake as in Python and Bytes as in well you get it. Yes, it is a bad pun but hey most are bad. 

i will pick one of the myriads of python based books i have in my library and randomly open it to a page.  No matter how basic or advanced i will start from there and i will create a short concise blog on said subject.  For some possibly many the content will be rather pedantic for others i hope you gain a little insight.  As a former professor told me “to know a subject in many ways is to know it well.”  Just like martial arts or music performing the basics hopefully makes everything else effortless at some point.

Ok so in today’s installment we have Comparison and Equality.

I suppose more philosophically what is the Truth?

All Python objects at some level respond to some form of comparisons such as a test for equality or a magnitude comparison or even binary TRUE and FALSE.

For all comparisons in Python, the language traverses all parts of compound objects until a result can be ascertained and this includes nested objects and data structures.  The traversal for data structures is applied recursively from left to right.  

So let us jump into some simple snippets there starting with lists objects.  

List objects compare all of their components automatically.

%system #command line majik in Jupyterlab
# same value with unique objects
A1 = [2, (‘b’, 3)] 
A2 = [2, (‘b’, 3)]

#Are they equivalent?  Same Objects?
A1 == A2, A1 is A2
(True, False)

 So what happened here?  A1 and A2 are assigned lists which in fact are equivalent but distinct objects.  

So for comparisons how does that work?

  •  The ==  tests value equivalence

Python recursively tests nested comparisons until a result is ascertained.

  • The is operator tests object identity

Python tests whether the two are really the same object and live at the same address in memory.

So let’s compare some strings, shall we?

StringThing1 = "water"
StringThing2 = "water"
StringThing1 == StringThing2, StringThing1 is StringThing2
(True, True)

Ok, what just happened?  We need to be very careful here and i have seen this cause some really ugly bugs when performing long-chained regex stuff with health data.  Python internally caches and reuses some strings as an optimization technique.  Here there is really just a single string ‘water’ in memory shared by S1, S2 thus the identity operator evaluates to True.

The workaround is thus:

StringThing1 = "i wish you water"
StringThing2 = "i wish you water"
StringThing1 == StringThing2,StringThing1 is StringThing2
(True, False)

Given the logic of this lets see how we have conditional logic comparisons.

I believe Python 2.5 introduced ternary operators.  Once again interesting word:

Ternary operators ternary means composed of three parts or three as a base.

The operators are the fabled if/else you see in almost all programming languages.

Whentrue if condition else whenfalse

The condition is evaluated first.  If condition is true the result is whentrue; otherwise the result is whenfalse.  Only one of the two subexpressions whentrue and whenfalse evaluates depending on the truth value of condition.

Stylistically you want to palace parentheses around the whole expression.

Example of operator this was taken directly out the Python docs with a slight change as i thought it was funny:

is_nice = True
state = "nice" if is_nice else "ain’t nice"
print(state)

Which also shows how Python treats True and False.

In most programming languages an integer 0 is FALSE and an integer 1 is TRUE.

However, Python looks at an empty data structure as False.  True and False as illustrated above are inherent properties of every object in Python.

So in general Python compares types as follows:

  • Numbers are compared by the relative magnitude
  • Non-numeric mixed types comparisons where ( 3 < ‘water’) doesn’t fly in Python 3.0  However they are allowed in Python 2.6 where they use a fixed arbitrary rule.  Same with sorts non-numeric mixed type collections cannot be sorted in Python 3.0
  • Strings are compared lexicographically (ok cool word what does it mean?). Iin mathematics, the lexicographic or lexicographical order is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a totally ordered set. In other words like a dictionary. Character by character where (“abc” < “ac”)
  • Lists and tuples are compared component by component left to right
  • Dictionaries are compared as equal if their sorted (key, value) lists are equal.  However relative magnitude comparisons are not supported in Python 3.0

With structured objects as one would think the comparison happens as though you had written the objects as literal and compared all the components one at a time left to right.  

Further, you can chain the comparisons such as:

a < b <= c < d

Which functionally is the same thing as:

a < b and b <= c and c < d

The chain form is more compact and more readable and evaluates each subexpression once at the most.

Being that most reading this should be using Python 3.0 a couple of words on dictionaries per the last commentary.  In Python 2.6 dictionaries supported magnitude comparisons as though you were comparing (key,value) lists.

In Python 3.0 magnitude comparisons for dictionaries are removed because they incur too much overhead when performing equality computations.  Python 3.0 from what i can gather uses an in-optimized scheme for equality comparisons.  So you write loops or compare them manually.  Once again no free lunch. The documentation can be found here: Ordering Comparisons in Python 3.0.

One last thing.  There is a special object called None.  It’s a special data type in Python in fact i think the only special data type.  None is equivalent to a Null pointer in C.  

This comes in handy if your list size is not known:

MyList = [None] * 50
Print (MyList)
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

The output makes me think of a Monty Python skit. See what I did there? While the comparison to a NULL pointer is correct the way in which it allocates memory and doesn’t limit the size of the list it allocates presets an initial size to allow for future indexing assignments. In this way, it kind of reminds me of malloc in C.  Purist please don’t shoot the messenger. 

Well, i got a little long in the tooth as they say.  See what i did again?  Teeth, Snakes and Python.

See y’all next week.

Until Then,

#iwishyouwater

@tctjr

Muzak To Blog By: various tunes by : Pink Martini, Pixies, Steve Miller.

Computing The Human Condition – Project Noumena (Part 1)

“I am putting myself to the fullest possible use, which is all I think any conscious entity can ever hope to do.” ~ HAL 9000

“If you want to make the world a better place take a look at yourself and then make a change.” ~ MJ.

First and foremost with this blog i trust everyone is safe.  The world is in an interesting place, space, and time both physically and dare i say collectively – mentally.

A Laundry List

Introduction

This past week we celebrated  Earth Day.  i believe i heard it was the 50th year of Earth Day.  While I applaud the efforts and longevity for a day we should have Earth Day every day.  Further just “thoughting” about or tweeting about Earth Day – while it may wake up your posterior lobe of the pituitary gland and secret some oxytocin – creating the warm fuzzies for you it really doesn’t create an action for furthering Earth Day.  (much like typing /giphy YAY! In Slack).

 As such, i decided to embark on a multipart blog that i have been “thinking” about what i call an Ecological Computing System.  Then the more i thought about it why stop at Ecology?   We are able to model and connect essentially anything, we now have models for the brain that while are coarse-grained can account for gross behaviors, we have tons of data on buying habits and advertisement data and everything is highly mobile and distributed.  Machine learning which can optimize, classify and predict with extremely high dimensionality is no longer an academic exercise.  

Thus, i suppose taking it one step further from ecology and what would differentiate it from other efforts is that <IT>  would actually attempt to provide a compute framework that would compute The Human Condition.  I am going to call this effort Project Noumena.  Kant the eminent thinker of 18th century Germany defined Noumena as a thing as it is in itself, as distinct from a thing as it is knowable by the senses through phenomenal attributes and proposed that the experience was a product of the mind.

My impetus for this are manifold:

  • i love the air, water, trees, and animals,
  • i am an active water person,
  • i want my children’s children’s children to know the wonder of staring at the azure skies, azure oceans and purple mountains,
  • Maybe technology will assist us in saving us from The Human Condition.

Timing

i have waited probably 15+ years to write about this ideation of such a system mainly due to the technological considerations were nowhere near where they needed to be and to be extremely transparent no one seemed to really think it was an issue until recently.  The pandemic seems to have been a global wakeup call that in fact, Humanity is fragile.  There are shortages of resources in the most advanced societies.  Further due to the recent awareness that the pollution levels appear (reported) to be subsiding as a function in the reduction of humans’ daily involvement within the environment. To that point over the past two years, there appears to be an uptake of awareness in how plastics are destroying our oceans.  This has a coupling effect that with the pandemic and other environmental concerns there could potentially be a food shortage due to these highly nonlinear effects.   This uptake in awareness has mainly been due to the usage of technology of mobile computing and social media which in and of itself probably couldn’t have existed without plastics and massive natural resource consumption.  So i trust the irony is not lost there.   

From a technical perspective, Open source and Open Source Systems have become the way that software is developed.  For those that have not read The Cathedral and The Bazaar and In The Beginning Was The Command Line i urge you to do so it will change your perspective.

We are no longer hampered by the concept of scale in computing. We can also create a system that behaves at scale with only but a few human resources.  You can do a lot with few humans now which has been the promise of computing.

Distributed computing methods are now coming to fruition. We no longer think in terms of a monolithic operating system or in place machine learning. Edge computing and fiber networks are accelerating this at an astonishing rate.  Transactions now dictate trust. While we will revisit this during the design chapters of the blog I’ll go out on a limb here and say these three features are cogent to distributed system processing (and possibly the future of computing at scale).

  • Incentive models
  • Consensus models
  • Protocol models

We will definitely be going into the deeper psychological, mathematical, and technical aspects of these items.

Some additional points of interest and on timing.  Microsoft recently released press about a Planetary Computer and announced the position of Chief Ecology Officer.  While i do not consider Project Nuomena to be of the same system type there could be similarities on the ecological aspects which just like in open source creates a more resilient base to work.

The top market cap companies are all information theoretic-based corporations.  Humans that know the science, technology, mathematics and liberal arts are key to their success.  All of these companies are woven and interwoven into the very fabric of our physical and psychological lives.

Thus it is with the confluence of these items i believe the time is now to embark on this design journey.  We must address the Environment, Societal factors and the model of governance.

A mentor once told me one time in a land far away: “Timing is everything as long as you can execute.”  Ergo Timing and Execution Is Everything.

Goals

It is my goal that i can create a design and hopefully, an implementation that is utilizing computational means to truly assist in building models and sampling the world where we can adhere to goals in making small but meaningful changes that can be used within what i am calling the 3R’s:  recycle, redact, reuse.  Further, i hope with the proper incentive models in place that are dynamic it has a mentality positive feedback effect.  Just as in complexity theory a small change – a butterfly wings – can create hurricanes – in this case positive effect. 

Here is my overall plan. i’m not big on the process or gant charts.  I’ll be putting all of this in a README.md as well.  I may ensconce the feature sets etc into a trello or some other tracking mechanism to keep me focused – WebSphere feel free to make recommendations in the comments section:

Action Items:

  • Create Comparative Models
  • Create Coarse-Grained Attributes
  • Identify underlying technical attributes
  • Attempt to coalesce into an architecture
  • Start writing code for the above.

Preamble

Humanity has come to expect growth as a material extension of human behavior.  We equate growth with progress.  In fact, we use the term exponential growth as it is indefinitely positive.  In most cases for a fixed time interval, this means a doubling of the relevant system variable or variables.  We speak of growth as a function of gross national production.  In most cases, exponential growth is treacherous where there are no known or perceived limits.  It appears that humanity has only recently become aware that we do not have infinite resources.  Psychologically there is a clash between the exponential growth and the psychological or physical limit.  The only significance is the relevant (usually local) limit.  How does it affect me, us, and them?  This can be seen throughput most game theory practices – dominant choice.  The pattern of growth is not the surprise it is the collision of the awareness of the limit to the ever-increasing growth function is the surprise.

One must stop and ask: 

Q: Are progress (and capacity) and the ever-increasing function a positive and how does it relate to 2nd law of thermodynamics aka Entropy?  Must it always expand?

We are starting to see that our world can exert dormant forces that within our life can greatly affect our well being. When we approach the actual or perceived limit the forces which are usually negative begin to gain strength.

So given these aspects of why i’ll turn now to start the discussion.  If we do not understand history we cannot predict the future by inventing it or in most cases re-inventing it as it where.

I want to start off the history by referencing several books that i have been reading and re-reading on subjects of modeling the world, complexity, and models for collapse throughout this multipart blog.  We will be addressing issues concerning complex dynamics as are manifested with respect to attributes model types, economics, equality, and mental concerns.  

These core references are located at the end of the blog under references.  They are all hot-linked.  Please go scroll and check them out.  i’ll still be here.  i’ll wait.

Checked them out?  i know a long list. 

As you can see the core is rather extensive due to the nature of the subject matter.  The top three books are the main ones that have been the prime movers and guides of my thinking.  These three books i will refer to as The Core Trilogy:

World Dynamics

The Collapse of Complex Societies 

Six Sources of Collapse 

 As i mentioned i have been deeply thinking about all aspects of this system for quite some time. I will be mentioning several other texts and references along the continuum of creation of this design.

We will start by referencing the first book: World Dynamics by J.W. Forrestor.  World Dynamics came out of several meetings of the Rome Club a 75 person invite-only club founded by the President of Fiat.  The club set forth the following attributes for a dynamic model that would attempt to predict the future of the world:

  • Population Growth
  • Capital Investment
  • Geographical Space
  • Natural Resources
  • Pollution
  • Food Production

The output of this design was codified in a computer program called World3.  It has been running since the 1970s what was then termed a golden age of society in many cases.  All of these variables have been growing at an exponential rate. Here we see the model with the various attributes in action. There have been several criticisms of the models and also analysis which i will go into in further blogs. However, in some cases, the variants have been eerily accurate. The following plot is an output of the World3 model:

2060 does not look good

Issues Raised By World3 and World Dynamics

The issues raised by World3 and within the book World Dynamics are the following:

  • There is a strong undercurrent that technology might not be the savior of humankind
  • Industrialism (including medicine and public health) may be a more disturbing force than the population.  
  • We may face extreme psychological stress and pressures from a four-pronged dilemma via suppression of the modern industrial world.
  • We may be living in a “golden age” despite a widely acknowledged feeling of malaise.  
  • Exhtortions and programs directed at population control may be self-defeating.  Population control, if it works, would yield excesses thereby allowing further procreation.
  • Pollution and Population seem to oscillate whereas the high standard of living increases the production of food and material goods which outrun the population.  Agriculture as it hits a space limit and as natural resources reach a pollution limit then the quality of life falls in equalizing population.
  • There may be no realistic hope of underdeveloped countries reaching the same standard and quality of life as developed countries.  However, with the decline in developed countries, the underdeveloped countries may be equalized by that decline.
  • A society with a high level of industrialization may be unsustainable.  
  • From a long term 100 years hence it may be unwise for underdeveloped countries to seek the same levels of industrialization.  The present underdeveloped nations may be in better conditions for surviving the forthcoming pressures.  These underdeveloped countries would suffer far less in a world collapse.  

Fuzzy Human – Fuzzy Model

The human mind is amazing at identifying structures of complex situations. However, our experiences train us poorly for estimating the dynamic consequences of said complexities.  Our mind is also not very accurate at estimating ad hoc parts of the complexities and the variational outcomes.  

One of the problems with models is well it is just a model  The subject-observer reference could shift and the context shifts thereof.  This dynamic aspect needs to be built into the models.

Also while we would like to think that our mental model is accurate it is really quite fuzzy and even irrational in most cases.  Also attempting to generalize everything into a singular model parameter is exceedingly difficult.  It is very difficult to transfer one industry model onto another.  

In general parameterization of most of these systems is based on some perceptual model we have rationally or irrationally invented.  

When these models were created there was the consideration of modeling social mechanics of good-evil, greed – altruism, fears, goals, habits, prejudice, homeostasis, and other so-called human characteristics.  We are now at a level of science where we can actually model the synaptic impulse and other aspects that come with these perceptions and emotions.

There is a common cross-cutting construct in most complex models within this text that consists of and mainly concerned with the concept of feedback and how the non-linear relationships of these modeled systems feedback into one another.  System-wide thinking permeates the text itself.  On a related note from the 1940’s of which Dr Norbert Weiner and others such as Claude Shannon worked on ballistic tracking systems and coupled feedback both in a cybernetic and information-theoretic fashion of which he attributed the concept of feedback as one of the most fundamental operations in information theory.  This led to the extremely famous Weiner Estimation Filters.  Also, side note: Dr Weiner was a self-styled pacifist proving you can hold two very opposing views in the same instance whilst being successful at executing both ideals.   

Given that basic function of feedback, lets look at the principle structures.  Essentially the model states there will be levels and rates.  Rates are flows that cause levels to change.  Levels can accumulate the net level. Either addition or subtraction to that level.  The various system levels can in aggregate describe the system state at any given time (t).  Levels existing in all subsystems of existence.  These subsystems as you will see include but are not limited to financial, psychological, biological, and economic.   The reason that i say not limited to because i also believe there are some yet to be identified subsystems at the quantum level.  The differential or rate of flow is controlled by one or more systems.  All systems that have some Spatio-temporal manifestation can be represented by using the two variables levels and rates.  Thus with respect to the spatial or temporal variables, we can have a dynamic model.  

The below picture is the model that grew out of interest from the initial meetings of the Club of Rome.  The inaugural meeting which was the impetus for the model was held in Bern, Switzerland on June 29, 1970.  Each of the levels presents a variable in the previously mentioned major structures. System levels appear as right triangles.  Each level is increased or decreased by the respective flow.  As previously mentioned on feedback any closed path through the diagram is a feedback loop.  Some of the closed loops given certain information-theoretic attributes be positive feedback loops that generate growth and others that seek equilibrium will be negative feedback loops.  If you notice something about the diagram it essentially is a birth and death loop. The population loop if you will.  For the benefit of modeling, there are really only two major variables that affect the population.  Birth Rate (BR) and Death Rate (DR).  They represent the total aggregate rate at which the population is being increased or decreased.  The system has coefficients that can initialize them to normal rates.  For example, in 1970 BRN is taken as 0.0885 (88.5 per thousand) which is then multiplied by population to determine BR.  DRN by the same measure is the outflow or reduction.  In 1970 it was 9.5% or 0.095.  The difference is the net and called normal rates.  The normale rates correspond to a physical normal world.  When there are normal levels of food, material standard of living, crowding, and pollution.  The influencers are then multipliers that increase or decrease the normal rates.

Feedback and isomorphisms abound


As a caveat, there have been some detractors of this model. To be sure it is very coarse-grained however while i haven’t seen the latest runs or outputs it is my understanding as i said the current outputs are close. The criticisms come in the shape of “Well its just modeling everything as a y=x*e^{{rt}}. I will be using this concept and map if you will as the basis for Noumena.  The concepts and values as i evolve the system will vary greatly from the World3 model but i believe starting with a minimum viable product is essential here as i said humans are not very good at predicting all of the various outcomes in high dimensional space. We can asses situations very quickly but probably outcomes no so much. Next up we will be delving into the loops deeper and getting loopier.

So this is the first draft if you will as everything nowadays can be considered an evolutionary draft.  

Then again isn’t really all of this just  The_Inifinite_Human_Do_Loop?

until then,

#iwishyouwater

tctjr

References:

(Note: They are all hotlinked)

World Dynamics

The Collapse of Complex Societies 

Six Sources of Collapse 

Beyond The Limits 

The Limits To Growth 

Thinking In Systems Donella Meadows

Designing Distributed Systems Brendan Burns

Introduction to Distributed Algorithms 

A Pragmatic Introduction to Secure Multi-Party Computation 

Reliable Secure Distributed Programming 

Distributed Algorithms 

Dynamic General Equilibrium Modeling 

Advanced Information Systems Engineering 

Introduction to Dynamic Systems Modeling 

Nonlinear Dynamics and Chaos 

Technological Revolutions and Financial Capital 

Marginalism and Discontinuity 

How Nature Works 

Complexity and The Economy 

Complexity a Guided Tour

Future Shock 

Agent_Zero 

Nudge Theory In Action

The Structure of Scientific Revolutions

Agent-Based Modelling In Economics

Cybernetics

Human Use Of Human Beings

The Technological Society

The Origins Of Order

The Lorax

Blog Muzak: Brain and Roger Eno: Mixing Colors