SnakeByte[17] The Metropolis Algorithm

Frame Grab From the movie Metropolis 1927

Who told you to attack the machines, you fools? Without them you’ll all die!!

~ Grot, the Guardian of the Heart Machine

First, as always, Oh Dear Reader, i hope you are safe. There are many unsafe places in and around the world in this current time. Second, this blog is a SnakeByte[] based on something that i knew about but had no idea it was called this by this name.

Third, relative to this, i must confess, Oh, Dear Reader, i have a disease of the bibliomaniac kind. i have an obsession with books and reading. “They” say that belief comes first, followed by admission. There is a Japanese word that translates to having so many books you cannot possibly read them all. This word is tsundoku. From the website (if you click on the word):

“Tsundoku dates from the Meiji era, and derives from a combination of tsunde-oku (to let things pile up) and dokusho (to read books). It can also refer to the stacks themselves. Crucially, it doesn’t carry a pejorative connotation, being more akin to bookworm than an irredeemable slob.”

Thus, while perusing a math-related book site, i came across a monograph entitled “The Metropolis Algorithm: Theory and Examples” by C Douglas Howard [1].

i was intrigued, and because it was 5 bucks (Side note: i always try to buy used and loved books), i decided to throw it into the virtual shopping buggy.

Upon receiving said monograph, i sat down to read it, and i was amazed to find it was closely related to something I was very familiar with from decades ago. This finally brings us to the current SnakeByte[].

The Metropolis Algorithm is a method in computational statistics used to sample from complex probability distributions. It is a type of Markov Chain Monte Carlo (MCMC) algorithm (i had no idea), which relies on Markov Chains to generate a sequence of samples that can approximate a desired distribution, even when direct sampling is complex. Yes, let me say that again – i had no idea. Go ahead LazyWebTM laugh!

So let us start with how the Metropolis Algorithm and how it relates to Markov Chains. (Caveat Emptor: You will need to dig out those statistics books and a little linear algebra.)

Markov Chains Basics

A Markov Chain is a mathematical system that transitions from one state to another in a state space. It has the property that the next state depends only on the current state, not the sequence of states preceding it. This is called the Markov property. The algorithm was introduced by Metropolis et al. (1953) in a Statistical Physics context and was generalized by Hastings (1970). It was considered in the context of image analysis (Geman and Geman, 1984) and data augmentation (Tanner (I’m not related that i know of…) and Wong, 1987). However, its routine use in statistics (especially for Bayesian inference) did not take place until Gelfand and Smith (1990) popularised it. For modern discussions of MCMC, see e.g. Tierney (1994), Smith and Roberts (1993), Gilks et al. (1996), and Roberts and Rosenthal (1998b).

Ergo, the name Metropolis-Hastings algorithm. Once again, i had no idea.

Anyhow,

A Markov Chain can be described by a set of states S and a transition matrix P , where each element P_{ij} represents the probability of transitioning from state i to state j .

Provide The Goal: Sampling from a Probability Distribution \pi(x)

In many applications (e.g., statistical mechanics, Bayesian inference, as mentioned), we are interested in sampling from a complex probability distribution \pi(x). This distribution might be difficult to sample from directly, but we can use a Markov Chain to create a sequence of samples that, after a certain period (called the burn-in period), will approximate \pi(x) .

Ok Now: The Metropolis Algorithm

The Metropolis Algorithm is one of the simplest MCMC algorithms to generate samples from \pi(x). It works by constructing a Markov Chain whose stationary distribution is the desired probability distribution \pi(x) . A stationary distribution is a probability distribution that remains the same over time in a Markov chain. Thus it can describe the long-term behavior of a chain, where the probabilities of being in each state do not change as time passes. (Whatever time is, i digress.)

The key steps of the algorithm are:

Initialization

Start with an initial guess x_0 , a point in the state space. This point can be chosen randomly or based on prior knowledge.

Proposal Step

From the current state x_t , propose a new state x^* using a proposal distribution q(x^*|x_t) , which suggests a candidate for the next state. This proposal distribution can be symmetric (e.g., a normal distribution centered at x_t ) or asymmetric.

Acceptance Probability

Calculate the acceptance probability \alpha for moving from the current state x_t to the proposed state x^* :

    \[\alpha = \min \left(1, \frac{\pi(x^) q(x_t | x^)}{\pi(x_t) q(x^* | x_t)} \right)\]

In the case where the proposal distribution is symmetric (i.e., q(x^|x_t) = q(x_t|x^)), the formula simplifies to:

    \[\alpha = \min \left(1, \frac{\pi(x^*)}{\pi(x_t)} \right)\]

Acceptance or Rejection

Generate a random number u from a uniform distribution U(0, 1)
If u \leq \alpha , accept the proposed state x^* , i.e., set x_{t+1} = x^* .
If u > \alpha , reject the proposed state and remain at the current state, i.e., set x_{t+1} = x_t .

Repeat

Repeat the proposal, acceptance, and rejection steps to generate a Markov Chain of samples.

Convergence and Stationary Distribution:

Over time, as more samples are generated, the Markov Chain converges to a stationary distribution. The stationary distribution is the target distribution \pi(x) , meaning the samples generated by the algorithm will approximate \pi(x) more closely as the number of iterations increases.

Applications:

The Metropolis Algorithm is widely used in various fields such as Bayesian statistics, physics (e.g., in the simulation of physical systems), machine learning, and finance. It is especially useful for high-dimensional problems where direct sampling is computationally expensive or impossible.

Key Features of the Metropolis Algorithm:

  • Simplicity: It’s easy to implement and doesn’t require knowledge of the normalization constant of \pi(x) , which can be difficult to compute.
  • Flexibility: It works with a wide range of proposal distributions, allowing the algorithm to be adapted to different problem contexts.
  • Efficiency: While it can be computationally demanding, the algorithm can provide high-quality approximations to complex distributions with well-chosen proposals and sufficient iterations.

The Metropolis-Hastings Algorithm is a more general version that allows for non-symmetric proposal distributions, expanding the range of problems the algorithm can handle.

Now let us code it up:

i am going to assume the underlying distribution is Gaussian with a time-dependent mean \mu_t, which changes slowly over time. We’ll use a simple time-series analytics setup to sample this distribution using the Metropolis Algorithm and plot the results. Note: When the target distribution is Gaussian (or close to Gaussian), the algorithm can converge more quickly to the true distribution because of the symmetric smooth nature of the normal distribution.

import numpy as np
import matplotlib.pyplot as plt

# Time-dependent mean function (example: sinusoidal pattern)
def mu_t(t):
    return 10 * np.sin(0.1 * t)

# Target distribution: Gaussian with time-varying mean mu_t and fixed variance
def target_distribution(x, t):
    mu = mu_t(t)
    sigma = 1.0  # Assume fixed variance for simplicity
    return np.exp(-0.5 * ((x - mu) / sigma) ** 2)

# Metropolis Algorithm for time-series sampling
def metropolis_sampling(num_samples, initial_x, proposal_std, time_steps):
    samples = np.zeros(num_samples)
    samples[0] = initial_x

    # Iterate over the time steps
    for t in range(1, num_samples):
        # Propose a new state based on the current state
        x_current = samples[t - 1]
        x_proposed = np.random.normal(x_current, proposal_std)

        # Acceptance probability (Metropolis-Hastings step)
        acceptance_ratio = target_distribution(x_proposed, time_steps[t]) / target_distribution(x_current, time_steps[t])
        acceptance_probability = min(1, acceptance_ratio)

        # Accept or reject the proposed sample
        if np.random.rand() < acceptance_probability:
            samples[t] = x_proposed
        else:
            samples[t] = x_current

    return samples

# Parameters
num_samples = 10000  # Total number of samples to generate
initial_x = 0.0      # Initial state
proposal_std = 0.5   # Standard deviation for proposal distribution
time_steps = np.linspace(0, 1000, num_samples)  # Time steps for temporal evolution

# Run the Metropolis Algorithm
samples = metropolis_sampling(num_samples, initial_x, proposal_std, time_steps)

# Plot the time series of samples and the underlying mean function
plt.figure(figsize=(12, 6))

# Plot the samples over time
plt.plot(time_steps, samples, label='Metropolis Samples', alpha=0.7)

# Plot the underlying time-varying mean (true function)
plt.plot(time_steps, mu_t(time_steps), label='True Mean \\mu_t', color='red', linewidth=2)

plt.title("Metropolis Algorithm Sampling with Time-Varying Gaussian Distribution")
plt.xlabel("Time")
plt.ylabel("Sample Value")
plt.legend()
plt.grid(True)
plt.show()

Output of Python Script Figure 1.0

Ok, What’s going on here?

For the Target Distribution:

The function mu_t(t) defines a time-varying mean for the distribution. In this example, it follows a sinusoidal pattern.
The function target_distribution(x, t) models a Gaussian distribution with mean \mu_t and a fixed variance (set to 1.0).


Metropolis Algorithm:

The metropolis_sampling function implements the Metropolis algorithm. It iterates over time, generating samples from the time-varying distribution. The acceptance probability is calculated using the target distribution at each time step.


Proposal Distribution:

A normal distribution centered around the current state with standard deviation proposal_std is used to propose new states.


Temporal Evolution:

The time steps are generated using np.linspace to simulate temporal evolution, which can be used in time-series analytics.


Plot The Results:

The results are plotted, showing the samples generated by the Metropolis algorithm as well as the true underlying mean function \mu_t (in red).

The plot shows the Metropolis samples over time, which should cluster around the time-varying mean \mu_t of the distribution. As time progresses, the samples follow the red curve (the true mean) as time moves on like and arrow in this case.

Now you are probably asking “Hey is there a more pythonic library way to to this?”. Oh Dear Reader i am glad you asked! Yes There Is A Python Library! AFAIC PyMC started it all. Most probably know it as PyMc3 (formerly known as…). There is a great writeup here: History of PyMc.

We are golden age of probabilistic programming.

~ Chris Fonnesbeck (creator of PyMC) 

Lets convert it using PyMC. Steps to Conversion:

  1. Define the probabilistic model using PyMC’s modeling syntax.
  2. Specify the Gaussian likelihood with the time-varying mean \mu_t .
  3. Use PyMC’s built-in Metropolis sampler.
  4. Visualize the results similarly to how we did earlier.
import pymc as pm
import numpy as np
import matplotlib.pyplot as plt

# Time-dependent mean function (example: sinusoidal pattern)
def mu_t(t):
    return 10 * np.sin(0.1 * t)

# Set random seed for reproducibility
np.random.seed(42)

# Number of time points and samples
num_samples = 10000
time_steps = np.linspace(0, 1000, num_samples)

# PyMC model definition
with pm.Model() as model:
    # Prior for the time-varying parameter (mean of Gaussian)
    mu_t_values = mu_t(time_steps)

    # Observational model: Normally distributed samples with time-varying mean and fixed variance
    sigma = 1.0  # Fixed variance
    x = pm.Normal('x', mu=mu_t_values, sigma=sigma, shape=num_samples)

    # Use the Metropolis sampler explicitly
    step = pm.Metropolis()

    # Run MCMC sampling with the Metropolis step
    samples_all = pm.sample(num_samples, tune=1000, step=step, chains=5, return_inferencedata=False)

# Extract one chain's worth of samples for plotting
samples = samples_all['x'][0]  # Taking only the first chain

# Plot the time series of samples and the underlying mean function
plt.figure(figsize=(12, 6))

# Plot the samples over time
plt.plot(time_steps, samples, label='PyMC Metropolis Samples', alpha=0.7)

# Plot the underlying time-varying mean (true function)
plt.plot(time_steps, mu_t(time_steps), label='True Mean \\mu_t', color='red', linewidth=2)

plt.title("PyMC Metropolis Sampling with Time-Varying Gaussian Distribution")
plt.xlabel("Time")
plt.ylabel("Sample Value")
plt.legend()
plt.grid(True)
plt.show()

When you execute this code you will see the following status bar:

It will be a while. Go grab your favorite beverage and take a walk…..

Output of Python Script Figure 1.1

Key Differences from the Previous Code:

PyMC Model Usage Definition:
In PyMC, the model is defined using the pm.Model() context. The x variable is defined as a Normal distribution with the time-varying mean \mu_t . Instead of manually implementing the acceptance probability, PyMC handles this automatically with the specified sampler.

Metropolis Sampler:
PyMC allows us to specify the sampling method. Here, we explicitly use the Metropolis algorithm with pm.Metropolis().

Samples Parameter:
We specify shape=num_samples in the pm.Normal() distribution to indicate that we want a series of samples for each time step.

Plotting:
The resulting plot will show the sampled values using the PyMC Metropolis algorithm compared with the true underlying mean, similar to the earlier approach. Now, samples has the same shape as time_steps (in this case, both with 10,000 elements), allowing you to plot the sample values correctly against the time points; otherwise, the x and y axes would not align.

NOTE: We used this library at one of our previous health startups with great success.

Optimizations herewith include several. There is a default setting in PyMC which is called NUTS.
No need to manually set the number of leapfrog steps. NUTS automatically determines the optimal number of steps for each iteration, preventing inefficient or divergent sampling. NUTS automatically stops the trajectory when it detects that the particle is about to turn back on itself (i.e., when the trajectory “U-turns”). A U-turn means that continuing to move in the same direction would result in redundant exploration of the space and inefficient sampling. When NUTS detects this, it terminates the trajectory early, preventing unnecessary steps. Also the acceptance rates on convergence are higher.

There are several references to this set of algorithms. It truly a case of both mathematical and computational elegance.

Of course you have to know what the name means. They say words have meanings. Then again one cannot know everything.

Until Then,

#iwishyouwater <- Of all places Alabama getting the memo From Helene 2024

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr) / X

Music To Blog By: View From The Magicians Window, The Psychic Circle

References:

[1] The Metropolis Algorithm: Theory and Examples by C Douglas Howard

[2] The Metropolis-Hastings Algorithm: A note by Danielle Navarro

[3] Github code for Sample Based Inference by bashhwu

Entire Metropolis Movie For Your Viewing Pleasure. (AFAIC The most amazing Sci-Fi movie besides BladeRunner)

SnakeByte[16]: Enhancing Your Code Analysis with pyastgrep

Dalle 3’s idea of an Abstract Syntax Tree in R^3 space

If you would know strength and patience, welcome the company of trees.

~ Hal Borland

First, I hope everyone is safe. Second, I am changing my usual SnakeByte [] stance process. I am pulling this from a website I ran across. I saw the library mentioned, so I decided to pull from the LazyWebTM instead of the usual snake-based tomes I have in my library.

As a Python developer, understanding and navigating your codebase efficiently is crucial, especially as it grows in size and complexity. Trust me, it will, as does Entropy. Traditional search tools like grep or IDE-based search functionalities can be helpful, but they cannot often “‘understand” the structure of Python code – sans some of the Co-Pilot developments. (I’m using understand here *very* loosely Oh Dear Reader).

This is where pyastgrep it comes into play, offering a powerful way to search and analyze your Python codebase using Abstract Syntax Trees (ASTs). While going into the theory of ASTs is tl;dr for a SnakeByte[] , and there appears to be some ambiguity on the history and definition of Who actually invented ASTs, i have placed some references at the end of the blog for your reading pleasure, Oh Dear Reader. In parlance, if you have ever worked on compilers or core embedded systems, Abstract Syntax Trees are data structures widely used in compilers and the like to represent the structure of program code. An AST is usually the result of the syntax analysis phase of a compiler. It often serves as an intermediate representation of the program through several stages that the compiler requires and has a strong impact on the final output of the compiler.

So what is the Python Library that you speak of? i’m Glad you asked.

What is pyastgrep?

pyastgrep is a command-line tool designed to search Python codebases with an understanding of Python’s syntax and structure. Unlike traditional text-based search tools, pyastgrep it leverages the AST, allowing you to search for specific syntactic constructs rather than just raw text. This makes it an invaluable tool for code refactoring, auditing, and general code analysis.

Why Use pyastgrep?

Here are a few scenarios where pyastgrep excels:

  1. Refactoring: Identify all instances of a particular pattern, such as function definitions, class instantiations, or specific argument names.
  2. Code Auditing: Find usages of deprecated functions, unsafe code patterns, or adherence to coding standards.
  3. Learning: Explore and understand unfamiliar codebases by searching for specific constructs.

I have a mantra: Reduce, Refactor, and Reuse. Please raise your hand of y’all need to refactor your code? (C’mon now no one is watching… tell the truth…). See if it is possible to reduce the code footprint, refactor the code into more optimized transforms, and then let others reuse it across the enterprise.

Getting Started with pyastgrep

Let’s explore some practical examples of using pyastgrep to enhance your code analysis workflow.

Installing pyastgrep

Before we dive into how to use pyastgrep, let’s get it installed. You can install pyastgrep via pip:

(base)tcjr% pip install pyastgrep #dont actually type the tctjr part that is my virtualenv

Example 1: Finding Function Definitions

Suppose you want to find all function definitions in your codebase. With pyastgrep, this is straightforward:

pyastgrep 'FunctionDef'

This command searches for all function definitions (FunctionDef) in your codebase, providing a list of files and line numbers where these definitions occur. Ok pretty basic string search.

Example 2: Searching for Specific Argument Names

Imagine you need to find all functions that take an argument named config. This is how you can do it:

pyastgrep 'arg(arg=config)'

This query searches for function arguments named config, helping you quickly locate where configuration arguments are being used.

Example 3: Finding Class Instantiations

To find all instances where a particular class, say MyClass, is instantiated, you can use:

pyastgrep 'Call(func=Name(id=MyClass))'

This command searches for instantiations of MyClass, making it easier to track how and where specific classes are utilized in your project.

Advanced Usage of pyastgrep

For more complex queries, you can combine multiple AST nodes. For instance, to find all print statements in your code, you might use:

pyastgrep 'Call(func=Name(id=print))'

This command finds all calls to the print function. You can also use more detailed queries to find nested structures or specific code patterns.

Integrating pyastgrep into Your Workflow

Integrating pyastgrep into your development workflow can greatly enhance your ability to analyze and maintain your code. Here are a few tips:

  1. Pre-commit Hooks: Use pyastgrep in pre-commit hooks to enforce coding standards or check for deprecated patterns.
  2. Code Reviews: Employ pyastgrep during code reviews to quickly identify and discuss specific code constructs.
  3. Documentation: Generate documentation or code summaries by extracting specific patterns or structures from your codebase.

Example Script

To get you started, here’s a simple Python script using pyastgrep to search for all function definitions in a directory:

import os
from subprocess import run

def search_function_definitions(directory):
result = run(['pyastgrep', 'FunctionDef', directory], capture_output=True, text=True)
print(result.stdout)

if __name__ == "__main__":
directory = "path/to/your/codebase" #yes this is not optimal folks just an example.
search_function_definitions(directory)

Replace "path/to/your/codebase" with the actual path to your Python codebase, and run the script to see pyastgrep in action.

Conclusion

pyastgrep is a powerful tool that brings the capabilities of AST-based searching to your fingertips. Understanding and leveraging the syntax and structure of your Python code, pyastgrep allows for more precise and meaningful code searches. Whether you’re refactoring, auditing, or simply exploring code, pyastgrep it can significantly enhance your productivity and code quality. This is a great direct addition to your arsenal. Hope it helps and i hope you found this interesting.

Until Then,

#iwishyouwater <- The best of the best at Day1 Tahiti Pro presented by Outerknown 2024

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr) / X

MUZAK to Blog By: SweetLeaf: A Stoner Rock Salute to Black Sabbath. While i do not really like bands that do covers, this is very well done. For other references to the Best Band In Existence ( Black Sabbath) i also refer you to Nativity in Black Volumes 1&2.

References:

[1] Basics Of AST

[2] The person who made pyastgrep

[3] Wikipedia page on AST

SnakeByte [15]: Debugging With pdb In The Trenches

Dalle3 idea of debugging code from the view of the code itself.

If debugging is the process of removing software bugs, then programming must be the process of putting them in.

~ Edsger W. Dijkstra

Image Explanation: Above is the image showing the perspective of debugging code from the viewpoint of the code itself. The scene features lines of code on a large monitor, with sections highlighted to indicate errors. In the foreground, anthropomorphic code characters are working to fix these highlighted sections, set against a digital landscape of code lines forming a cityscape.

So meta and canonical.

In any event Dear Readers, it is Wednesday! That means as usual everyday is Wednesday in a startup, you actually work at a company where you enjoy the work or it is SnakeByte [ ] time!

i haven’t written a SnakeByte is quite some time. Also, recently, in a previous blog, I mentioned that I had a little oversite on my part, and my blog went down. i didn’t have alerting turned on ye ole AWS and those gosh darn binlogs where not deleting in ye ole WordPress as such we took the proverbial digger into downtime land. i re-provisioned with an additional sizing of the volume and changed the disc from magnetic to SSD, turned on alerts and we are back in business.

So per my usual routine i grab one of the python books from the book shelf randomly open then read about command or commands and attempt to write a blog as fast as possible.

Today’s random command from Python In A Nutshell is pdb, the Python Debugger. i’ll walk you through the basic of using pdb to debug your Python code, which, as it turns out is better than a bunch of print().

Getting Started with pdb

To leverage pdb, import it in your Python script. You can then set breakpoints manually with pdb.set_trace(). When the execution hits this line, the script pauses, allowing you to engage in an interactive debugging session.

Consider this straightforward example:

# example.py
import pdb

def add(a, b):
result = a + b
return result

def main():
x = 10
y = 20
pdb.set_trace() # Breakpoint set here
total = add(x, y)
print(f'Total: {total}')

if __name__ == '__main__':
main()

Here, we have a simple add function and a main function that invokes add. The pdb.set_trace() in the main function sets a breakpoint where the debugger will halt execution.

Running the Code with pdb

Execute the script from the command line to initiate the debugger:

python example.py

When the execution reaches pdb.set_trace(), you will see the debugger prompt ((Pdb)):

> /path/to/example.py(11)main()
-> total = add(x, y)
(Pdb)

Essential pdb Commands

Once at the debugger prompt, several commands can help you navigate and inspect your code. Key commands include:

  • l (list): Displays the surrounding source code.
  • n (next): Moves to the next line within the same function.
  • s (step): Steps into the next function call.
  • c (continue): Resumes execution until the next breakpoint.
  • p (print): Evaluates and prints an expression.
  • q (quit): Exits the debugger and terminates the program.

Let’s walk through using these commands in our example:

List the surrounding code:

Pdb) 1
  6         def main():
  7             x = 10
  8             y = 20
  9             pdb.set_trace()  # Breakpoint here
 10  ->         total = add(x, y)
 11             print(f'Total: {total}')
 12     
 13         if __name__ == '__main__':
 14             main()

Print variable values:

(Pdb) p x
10
(Pdb) p y
20

Step into the add function:

(Pdb) s
> /path/to/example.py(3)add()
-> result = a + b
(Pdb)

Inspect parameters a and b:

(Pdb) p a
10
(Pdb) p b
20

Continue execution:

(Pdb) c
Total: 30

Advanced pdb Features

For more nuanced debugging needs, pdb offers advanced capabilities:

Conditional Breakpoints: Trigger breakpoints based on specific conditions:

if x == 10:
    pdb.set_trace()

Post-Mortem Debugging: Analyze code after an exception occurs:

import pdb

def faulty_function():
    return 1 / 0

try:
    faulty_function()
except ZeroDivisionError:
    pdb.post_mortem() #they need to add a pre-mortem what can go wrong will...

Command Line Invocation: Run a script under pdb control directly from the command line like the first simple example:

python -m pdb example.py

Effective debugging is crucial for building robust and maintainable software. pdb provides a powerful, interactive way to understand and fix your Python code. By mastering pdb, you can streamline your debugging process, saving time and enhancing your productivity.

pdb, the Python Debugger, is an indispensable tool for any developer. It allows us to set breakpoints, step through code, inspect variables, and evaluate expressions on the fly. This level of control is crucial for diagnosing and resolving issues efficiently. While i used cli in the examples it also works with Jupyter Notebooks.

We’ve covered the basics and advanced features of pdb, equipping you with the knowledge to tackle even the most challenging bugs. As you integrate these techniques into your workflow, you’ll find that debugging becomes less of a chore and more of a strategic advantage unless you create a perfect design which is no code at all!

Until then,

#iwishyouwater <- La Vaca Gigante Wipeouts 2024

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr)

Muzak to Blog By: Apostrophe (‘) by Frank Zappa. i was part of something called the Zappa Ensemble in graduate school as one of the engineers. The musicians where amazing. Apostrohe (‘)is an amazing album. The solo on Uncle Remus is just astounding well as is most of his solos.

Snake_Byte[15] Fourier, Discrete and Fast Transformers

The frequency domain of mind (a mind, it must be stressed, is an unextended, massless, immaterial singularity) can produce an extended, spacetime domain of matter via ontological Fourier mathematics, and the two domains interact via inverse and forward Fourier transforms.

~ Dr. Cody Newman, The Ontological Self: The Ontological Mathematics of Consciousness

I am Optimus Transformer Ruler Of The AutoCorrelation Bots

First i trust everyone is safe. i haven’t written technical blog in a while so figured i would write a Snake_Byte on one of my favorite equations The Fourier Transform:

    \[\hat{f} (\xi)=\int_{-\infty}^{\infty}f(x)e^{-2\pi ix\xi}dx\]

More specifically we will be dealing with the Fast Fourier Transform which is an implementation of The Discrete Fourier Transform. The Fourier Transform operates on continuous signals and while i do believe we will have analog computing devices (again) in the future we have to operate on 0’s and 1’s at this juncture thus we have a discrete version thereof. The discrete version:

    \[F(x) &= f\f[k] &= \sum_{j=0}^{N-1} x[j]\left(e^{-2\pi i k/N}\right)^j\0 &\leq k < N\]

where:

    \[f[k] &= f_e[k]+e^{-2\pi i k/N}f_o[k]\f[k+N/2] &= f_e[k]-e^{-2\pi i k/N}f_o[k]\]

The Discrete Fourier Transform (DFT) is a mathematical operation. The Fast Fourier Transform (FFT) is an efficient algorithm for the evaluation of that operation (actually, a family of such algorithms). However, it is easy to get these two confused. Often, one may see a phrase like “take the FFT of this sequence”, which really means to take the DFT of that sequence using the FFT algorithm to do it efficiently.

The Fourier sequence is a kernel operation for any number of transforms where the kernel is matched to the signal if possible. The Fourier Transform is a series of sin(\theta) and cos(\theta) which makes it really useful for audio and radar analysis.

For the FFT it only takes O(n\log{}n) for the sequence computation and as one would imagine this is a substantial gain. The most commonly used FFT algorithm is the Cooley-Tukey algorithm, which was named after J. W. Cooley and John Tukey. It’s a divide and conquer algorithm for the machine calculation of complex Fourier series. It breaks the DFT into smaller DFTs. Other FFT algorithms include the Rader’s algorithm, Winograd Fourier transform algorithm, Chirp Z-transform algorithm, etc. The only rub comes as a function of the delay throughput.

There have been amazing text books written on this subject and i will list them at the end of the blarg[1,2,3]

So lets get on with some code. First we do the usual houskeeping on import libraries as well as doing some majik for inline display if you are using JupyterNotebooks. Of note ffpack which is a package of Fortran subroutines for the fast Fourier transform. It includes complex, real, sine, cosine, and quarter-wave transforms. It was developed by Paul Swarztrauber of the National Center for Atmospheric Research, and is included in the general-purpose mathematical library SLATEC.

# House keeping libraries imports and inline plots:
import numpy as np
from scipy import fftpack
%matplotlib inline
import matplotlib.pyplot as pl

We now set up some signals where we create a sinusoid with a sample rate. We use linspace to set up the amplitude and signal length.

#frequency in cycles per second or Hertz
#this is equivalent to concert A

Frequency = 20 
# Sampling rate or the number of measurements per second
# This is the rate of digital audio

Sampling_Frequency = 100 

# set up the signal space:
time = np.linspace(0,2,2 * Sampling_Frequency, endpoint = False)
signal = np.sin(Frequency * 2 * np.pi * time)

Next we plot the sinusoid under consideration:

# plot the signal:
fif, ax = plt.subplots()
ax.plot(time, signal)
ax.set_xlabel('Time [seconds]')
ax.set_ylabel('Signal Amplitude')

Next we apply the Fast Fourier Transform and transform into the frequency domain:

X_Hat = fftpack.fft(signal)
Frequency_Component = fftpack.fftfreq(len(signal)) * Sampling_Frequency

We now plot the transformed sinusoid depicting the frequencies we generated:

# plot frequency components of the signal:
fig, ax = plt.subplots()
ax.stem(Frequency_Component, np.abs(X_Hat)) # absolute value of spectrum
ax.set_xlabel ('Frequency in Hertz [HZ] Of Transformed Signal')
ax.set_ylabel ('Frequency Domain (Spectrum) Magnitude')
ax.set_xlim(-Sampling_Frequency / 2, Sampling_Frequency / 2)
ax.set_ylim(-5,110)

To note you will see two frequency components, this is because there are positive and negative (real and imaginary) components to the transform which is what we see using the stem plots as expected. This is because the kernel as mentioned before is both sin(\theta) and cos(\theta).

So something really cool happens when using the FFT. It is called the convolution theorem as well as Dual Domain Theory. Convolution in the time domain yields multiplication in the frequency domain. Mathematically, the convolution theorem states that under suitable conditions the Fourier transform of a convolution of two functions (or signals) is the poin-twise (Hadamard multiplication) product of their Fourier transforms. More generally, convolution in one domain (e.g., time domain) equals point-wise multiplication in the other domain (e.g., frequency domain).

Where:

    \[x(t)*h(t) &= y(t)\]

    \[X(f) H(f) &= Y(f)\]

So there you have it. A little taster on the powerful Fourier Transform.

Until Then,

#iwishyouwater <- Cloudbreak this past year

Muzak To Blarg by: Voyager Essential Max Ricther. Phenomenal. November is truly staggering.

References:

[1] The Fourier Transform and Its Applications by Dr Ronald N Bracewell. i had the honor of taking the actual class at Stanford University from Professor Bracewell.

[2] The Fourier Transform and Its Applications by E. Roan Brigham. Graet book on butterfly and overlap-add derivations thereof.

[3] Adaptive Digital Signal Processing by Dr. Claude Lindquist. A phenomenal book on frequency domain signal processing and kernel analysis. A book ahead of its time. Professor Lindquist was a mentor and had a direct effect and affect on my career and the way i approach information theory.

Snake_Byte:[14] Coding In Philosophical Frameworks

Dalle-E Generated Philospher

Your vision will only become clear when you can look into your heart. Who looks outside, dreams; who looks inside, awakes. Knowing your own darkness is the best method for dealing with the darknesses of other people. We cannot change anything until we accept it.

~ C. Jung

(Caveat Emptor: This blog is rather long in the snakes tooth and actually more like a CHOMP instead of a BYTE. tl;dr)

First, Oh Dear Reader i trust everyone is safe, Second sure feels like we are living in an age of Deus Ex Machina, doesn’t it? Third with this in mind i wanted to write a Snake_Byte that have been “thoughting” about for quite some but never really knew how to approach it if truth be told. I cant take full credit for this ideation nor do i actually want to claim any ideation. Jay Sales and i were talking a long time after i believe i gave a presentation on creating Belief Systems using BeliefNetworks or some such nonsense.

The net of the discussion was we both believed that in the future we will code in philosophical frameworks.

Maybe we are here?

So how would one go about coding an agent-based distributed system that allowed one to create an agent or a piece of evolutionary code to exhibit said behaviors of a philosophical framework?

Well we must first attempt to define a philosophy and ensconce it into a quantized explanation.

Stoicism seemed to me at least the best first mover here as it appeared to be the tersest by definition.

So first those not familiar with said philosophy, Marcus Aurelius was probably the most famous practitioner of Stoicism. i have put some references that i have read at the end of this blog1.

Stoicism is a philosophical school that emphasizes rationality, self-control, and inner peace in the face of adversity. In thinking about this i figure To build an agent-based software system that embodies Stoicism, we would need to consider several key aspects of this philosophy.

  • Stoics believe in living in accordance with nature and the natural order of things. This could be represented in an agent-based system through a set of rules or constraints that guide the behavior of the agents, encouraging them to act in a way that is in harmony with their environment and circumstances.
  • Stoics believe in the importance of self-control and emotional regulation. This could be represented in an agent-based system through the use of decision-making algorithms that take into account the agent’s emotional state and prioritize rational, level-headed responses to stimuli.
  • Stoics believe in the concept of the “inner citadel,” or the idea that the mind is the only thing we truly have control over. This could be represented in an agent-based system through a focus on internal states and self-reflection, encouraging agents to take responsibility for their own thoughts and feelings and strive to cultivate a sense of inner calm and balance.
  • Stoics believe in the importance of living a virtuous life and acting with moral purpose. This could be represented in an agent-based system through the use of reward structures and incentives that encourage agents to act in accordance with Stoic values such as courage, wisdom, and justice.

So given a definition of Stoicism we then need to create a quantized model or discrete model of those behaviors that encompass a “Stoic Individual”. i figured we could use the evolutionary library called DEAP (Distributed Evolutionary Algorithms in Python ). DEAP contains both genetic algorithms and genetic programs utilities as well as evolutionary strategy methods for this type of programming.

Genetic algorithms and genetic programming are both techniques used in artificial intelligence and optimization, but they have some key differences.

This is important as people confuse the two.

Genetic algorithms are a type of optimization algorithm that use principles of natural selection to find the best solution to a problem. In a genetic algorithm, a population of potential solutions is generated and then evaluated based on their fitness. The fittest solutions are then selected for reproduction, and their genetic information is combined to create new offspring solutions. This process of selection and reproduction continues until a satisfactory solution is found.

On the other hand, genetic programming is a form of machine learning that involves the use of genetic algorithms to automatically create computer programs. Instead of searching for a single solution to a problem, genetic programming evolves a population of computer programs, which are represented as strings of code. The programs are evaluated based on their ability to solve a specific task, and the most successful programs are selected for reproduction, combining their genetic material to create new programs. This process continues until a program is evolved that solves the problem to a satisfactory level.

So the key difference between genetic algorithms and genetic programming is that genetic algorithms search for a solution to a specific problem, while genetic programming searches for a computer program that can solve the problem. Genetic programming is therefore a more general approach, as it can be used to solve a wide range of problems, but it can also be more computationally intensive due to the complexity of evolving computer programs2.

So returning back to the main() function as it were, we need create a genetic program that models Stoic behavior using the DEAP library,

First need to define the problem and the relevant fitness function. This is where the quantized part comes into play. Since Stoic behavior involves a combination of rationality, self-control, and moral purpose, we could define a fitness function that measures an individual’s ability to balance these traits and act in accordance with Stoic values.

So lets get to the code.

To create a genetic program that models Stoic behavior using the DEAP library in a Jupyter Notebook, we first need to install the DEAP library. We can do this by running the following command in a code cell:

pip install deap

Next, we can import the necessary modules and functions:

import random
import operator
import numpy as np
from deap import algorithms, base, creator, tools

We can then define the problem and the relevant fitness function. Since Stoic behavior involves a combination of rationality, self-control, and moral purpose, we could define a fitness function that measures an individual’s ability to balance these traits and act in accordance with Stoic values.

Here’s an example of how we might define a “fitness function” for this problem:

# Define the fitness function.  NOTE: # i am open to other ways of defining this and other models
# the definition of what is a behavior needs to be quantized or discretized and 
# trying to do that yields a lossy functions most times.  Its also self referential

def fitness_function(individual):
    # Calculate the fitness based on how closely the individual's behavior matches stoic principles
    fitness = 0
    # Add points for self-control, rationality, focus, resilience, and adaptability can haz Stoic?
    fitness += individual[0]  # self-control
    fitness += individual[1]  # rationality
    fitness += individual[2]  # focus
    fitness += individual[3]  # resilience
    fitness += individual[4]  # adaptability
    return fitness,

# Define the genetic programming problem
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

# Initialize the genetic algorithm toolbox
toolbox = base.Toolbox()

# Define the genetic operators
toolbox.register("attribute", random.uniform, 0, 1)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attribute, n=5)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", fitness_function)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=0.1, indpb=0.1)
toolbox.register("select", tools.selTournament, tournsize=3)

# Run the genetic algorithm
population = toolbox.population(n=10)
for generation in range(20):
    offspring = algorithms.varAnd(population, toolbox, cxpb=0.5, mutpb=0.1)
    fits = toolbox.map(toolbox.evaluate, offspring)
    for fit, ind in zip(fits, offspring):
        ind.fitness.values = fit
    population = toolbox.select(offspring, k=len(population))
    
# Print the best individual found
best_individual = tools.selBest(population, k=1)[0]

print ("Best Individual:", best_individual)
 

Here, we define the genetic programming parameters (i.e., the traits that we’re optimizing for) using the toolbox.register function. We also define the evaluation function (stoic_fitness), genetic operators (mate and mutate), and selection operator (select) using DEAP’s built-in functions.

We then define the fitness function that the genetic algorithm will optimize. This function takes an “individual” (represented as a list of five attributes) as input, and calculates the fitness based on how closely the individual’s behavior matches stoic principles.

We then define the genetic programming problem via the quantized attributes, and initialize the genetic algorithm toolbox with the necessary genetic operators.

Finally, we run the genetic algorithm for 20 generations, and print the best individual found. The selBest function is used to select the top individual fitness agent or a “behavior” if you will for that generation based on the iterations or epochs. This individual represents an agent that mimics the philosophy of stoicism in software, with behavior that is self-controlled, rational, focused, resilient, and adaptable.

Best Individual: [0.8150247518866958, 0.9678037028949047, 0.8844195735244268, 0.3970642186025506, 1.2091810770505023]

This denotes the best individual with those best balanced attributes or in this case the Most Stoic,

As i noted this is a first attempt at this problem i think there is a better way with a full GP solution as well as a tunable fitness function. In a larger distributed system you would then use this agent as a framework amongst other agents you would define.

i at least got this out of my head.

until then,

#iwishyouwater <- Alexey Molchanov and Dan Bilzerian at Deep Dive Dubai

Muzak To Blog By: Phil Lynott “The Philip Lynott Album”, if you dont know who this is there is a statue in Ireland of him that i walked a long way with my co-founder, Lisa Maki a long time ago to pay homage to the great Irish singer of the amazing band Thin Lizzy. Alas they took Phil to be cleaned that day. At least we got to walk and talk and i’ll never forget that day. This is one of his solo efforts and i believe he is one of the best artists of all time. The first track is deeply emotional.

References:

[1] A list of books on Stoicism -> click HERE.

[2] Genetic Programming (On the Programming of Computers by Means of Natural Selection), By Professor John R. Koza. There are multiple volumes i think four and i have all of this but this is a great place to start and the DEAP documentation. Just optimizing a transcendental functions is mind blowing what GP comes out with using arithmetic

Snake_Byte:[13] The Describe Function.

DALLE-2 Draws Describe

First i trust everyone is safe. Second i hope people are recovering somewhat from the SVB situation. We are at the end of a era, cycle or epoch; take your pick. Third i felt like picking a Python function that was simple in nature but very helpful.

The function is pandas.describe(). i’ve previously written about other introspection libraries like DABL however this is rather simple and in place. Actually i never had utilized it before. i was working on some other code as a hobby in the areas of transfer learning and was playing around with some data and decided to to use the breast cancer data form the sklearn library which is much like the iris data used for canonical modeling and comparison. Most machine learning is data cleansing and feature selection so lets start with something we know.

Breast cancer is the second most common cancer in women worldwide, with an estimated 2.3 million new cases in 2020. Early detection is key to improving survival rates, and machine learning algorithms can aid in diagnosing and treating breast cancer. In this blog, we will explore how to load and analyze the breast cancer dataset using the scikit-learn library in Python.

The breast cancer dataset is included in scikit-learn's datasets module, which contains a variety of well-known datasets for machine learning. The features describe the characteristics of the cell nuclei present in the image. We can load the dataset using the load_breast_cancer function, which returns a dictionary-like object containing the data and metadata about the dataset.

It has been surmised that machine learning is mostly data exploration and data cleaning.

from sklearn.datasets import load_breast_cancer
import pandas as pd

#Load the breast cancer dataset
data = load_breast_cancer()

The data object returned by load_breast_cancer contains the feature data and the target variable. The feature data contains measurements of 30 different features, such as radius, texture, and symmetry, extracted from digitized images of fine needle aspirate (FNA) of breast mass. The target variable is binary, with a value of 0 indicating a benign tumor and a value of 1 indicating a malignant tumor.

We can convert the feature data and target variable into a pandas dataframe using the DataFrame constructor from the pandas library. We also add a column to the dataframe containing the target variable.

#Convert the data to a pandas dataframe
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = pd.Series(data.target)

Finally, we can use the describe method of the pandas dataframe to get a summary of the dataset. The describe method returns a table containing the count, mean, standard deviation, minimum, and maximum values for each feature, as well as the count, mean, standard deviation, minimum, and maximum values for the target variable.

#Use the describe() method to get a summary of the dataset
print(df.describe())

The output of the describe method is as follows:

mean radius  mean texture  ...  worst symmetry      target
count   569.000000    569.000000  ...      569.000000  569.000000
mean     14.127292     19.289649  ...        0.290076    0.627417
std       3.524049      4.301036  ...        0.061867    0.483918
min       6.981000      9.710000  ...        0.156500    0.000000
25%      11.700000     16.170000  ...        0.250400    0.000000
50%      13.370000     18.840000  ...        0.282200    1.000000
75%      15.780000     21.800000  ...        0.317900    1.000000
max      28.110000     39.280000  ...        0.663800    1.000000

[8 rows x 31 columns]

From the summary statistics, we can see that the mean values of the features vary widely, with the mean radius ranging from 6.981 to 28.11 and the mean texture ranging from 9.71 to 39.28. We can also see that the target variable is roughly balanced, with 62.7% of the tumors being malignant.

Pretty nice utility.

Then again in looking at this data one would think we could get to first principles engineering and root causes and make it go away? This directly affects motherhood which i still believe is the hardest job in humanity. Makes you wonder where all the money goes?

Until then,

#iwishyouwater <- Free Diver Steph who is also a mom hunting pelagics on #onebreath

Muzak To Blog By Peter Gabriel’s “Peter Gabriels 3: Melt (remastered). He is coming out with a new album. Games Without Frontiers and Intruder are timeless. i applied long ago to work at Real World Studios and received the nicest rejection letter.

Snake_Byte[12]: Dabl A High-Level Data Analysis Library in Python

Not To Be Confused With The Game

It enables us to dabble in vicarious vice and to sit in smug judgment on the result.

Online Quote Generator

First, i hope everyone is safe. Second i haven’t written a Snake_Byte [ ] in quite some time so here goes. This is a library i ran across late last night and well for what it achieves even for data exploration it is well worth the pip install dabl cost of it all.

Data analysis is an essential task in the field of machine learning and artificial intelligence. However, it can be a challenging and time-consuming task, especially for those who are not familiar with programming. That’s where the dabl library comes into play.

dabl, short for Data Analysis Baseline Library, is a high-level data analysis library in python, designed to make data analysis as easy and effortless as possible. It is an open-source library, developed and maintained by the scikit-learn community.

The library provides a collection of simple and intuitive functions for exploring, cleaning, transforming, and visualizing data. With dabl, users can perform various data analysis tasks such as regression, classification, clustering, anomaly detection, and more, with just a few lines of code.

One of the main benefits of dabl is that it helps users get started quickly by providing a set of default actions for each task. For example, to perform a regression analysis, users can simply call the “regression” function and pass in their data, and dabl will take care of the rest.

Another advantage of dabl is that it provides easy-to-understand visualizations of the results, allowing users to quickly understand the results of their analysis and make informed decisions based on the data. This is particularly useful for non-technical users who may not be familiar with complex mathematical models or graphs.

dabl also integrates well with other popular data analysis libraries such as pandas, numpy, and matplotlib, making it a convenient tool for those already familiar with these libraries.

So let us jump into the code shall we?

This code uses the dabl library to perform regression analysis on the Titanic dataset. The dataset is loaded using the pandas library and passed to the dabl.SimpleRegressor function for analysis. The fit method is used to fit the regression model to the data, and the score method is used to evaluate the performance of the model. Finally, the dabl.plot function is used to visualize the results of the regression analysis.

import dabl
import pandas as pd
import matplotlib.pyplot as plt

# Load the Titanic dataset from the disk
titanic = pd.read_csv(dabl.datasets.data_path("titanic.csv"))
#check shape columns etc
titanic.shape
titanic.head
#all that is good tons of stuff going on here but now let us ask dabl whats up:
titanic_clean = dabl.clean(titanic, verbose=1)

#a cool call to detect types
types = dabl.detect_types(titanic_clean)
print (types)
#lets do some eye candy
dabl.plot(titanic, 'survived')
#lets check the distribution
plt.show()
#let us try simple regression if it works it works
# Perform regression analysis
fc = dabl.SimpleClassifier(random_state=0)
X = titanic_clean.drop("survived", axis=1)
y = titanic_clean.survived
fc.fit(X, y)                     

Ok so lets break this down a little.

We load the data set: (make sure the target directory is the same)

# Load the Titanic dataset from the disk
titanic = pd.read_csv(dabl.datasets.data_path("titanic.csv"))

Of note we loaded this in to a pandas dataframe. Assuming we can use python and load a comma-separated values file lets now do some exploration:

#check shape columns etc
titanic.shape
titanic.head

You should see the following:

(1309, 14) 

Which is [1309 rows x 14 columns]

and then:

pclass  survived                                             name  \
0          1         1                    Allen, Miss. Elisabeth Walton   
1          1         1                   Allison, Master. Hudson Trevor   
2          1         0                     Allison, Miss. Helen Loraine   
3          1         0             Allison, Mr. Hudson Joshua Creighton   
4          1         0  Allison, Mrs. Hudson J C (Bessie Waldo Daniels)   
...      ...       ...                                              ...   
1304       3         0                             Zabour, Miss. Hileni   
1305       3         0                            Zabour, Miss. Thamine   
1306       3         0                        Zakarian, Mr. Mapriededer   
1307       3         0                              Zakarian, Mr. Ortin   
1308       3         0                               Zimmerman, Mr. Leo   

         sex     age  sibsp  parch  ticket      fare    cabin embarked boat  \
0     female      29      0      0   24160  211.3375       B5        S    2   
1       male  0.9167      1      2  113781    151.55  C22 C26        S   11   
2     female       2      1      2  113781    151.55  C22 C26        S    ?   
3       male      30      1      2  113781    151.55  C22 C26        S    ?   
4     female      25      1      2  113781    151.55  C22 C26        S    ?   
...      ...     ...    ...    ...     ...       ...      ...      ...  ...   
1304  female    14.5      1      0    2665   14.4542        ?        C    ?   
1305  female       ?      1      0    2665   14.4542        ?        C    ?   
1306    male    26.5      0      0    2656     7.225        ?        C    ?   
1307    male      27      0      0    2670     7.225        ?        C    ?   
1308    male      29      0      0  315082     7.875        ?        S    ?   

     body                        home.dest  
0       ?                     St Louis, MO  
1       ?  Montreal, PQ / Chesterville, ON  
2       ?  Montreal, PQ / Chesterville, ON  
3     135  Montreal, PQ / Chesterville, ON  
4       ?  Montreal, PQ / Chesterville, ON  
...   ...                              ...  
1304  328                                ?  
1305    ?                                ?  
1306  304                                ?  
1307    ?                                ?  
1308    ?                                ?  

Wow tons of stuff going on here and really this is cool data from an awful disaster. Ok let dabl exercise some muscle here and ask it to clean it up a bit:

titanic_clean = dabl.clean(titanic, verbose=1)
types = dabl.detect_types(titanic_clean)
print (types)

i set verbose = 1 in this case and dabl.detect_types() shows the types detected which i found helpful:

Detected feature types:
continuous      0
dirty_float     3
low_card_int    2
categorical     5
date            0
free_string     4
useless         0
dtype: int64

However look what dabl did for us;

                      continuous  dirty_float  low_card_int  categorical  \
pclass                     False        False         False         True   
survived                   False        False         False         True   
name                       False        False         False        False   
sex                        False        False         False         True   
sibsp                      False        False          True        False   
parch                      False        False          True        False   
ticket                     False        False         False        False   
cabin                      False        False         False        False   
embarked                   False        False         False         True   
boat                       False        False         False         True   
home.dest                  False        False         False        False   
age_?                      False        False         False         True   
age_dabl_continuous         True        False         False        False   
fare_?                     False        False         False        False   
fare_dabl_continuous        True        False         False        False   
body_?                     False        False         False         True   
body_dabl_continuous        True        False         False        False   

                       date  free_string  useless  
pclass                False        False    False  
survived              False        False    False  
name                  False         True    False  
sex                   False        False    False  
sibsp                 False        False    False  
parch                 False        False    False  
ticket                False         True    False  
cabin                 False         True    False  
embarked              False        False    False  
boat                  False        False    False  
home.dest             False         True    False  
age_?                 False        False    False  
age_dabl_continuous   False        False    False  
fare_?                False        False     True  
fare_dabl_continuous  False        False    False  
body_?                False        False    False  
body_dabl_continuous  False        False    False 
Target looks like classification
Linear Discriminant Analysis training set score: 0.578
 

Ah sweet! So data science, machine learning or data mining is 80% cleaning up the data. Take what you can get and go with it folks. dabl even informs us it appears the target method looks like a classification problem. As the name suggests, Classification means classifying the data on some grounds. It is a type of Supervised learning. In classification, the target column should be a Categorical column. If the target has only two categories like the one in the dataset above (Fit/Unfit), it’s called a Binary Classification Problem. When there are more than 2 categories, it’s a Multi-class Classification Problem. The “target” column is also called a “Class” in the Classification problem.

Now lets do some analysis. Yep we are just getting to some statistics. There are univariate and bivariate in this case.

Bivariate analysis is the simultaneous analysis of two variables. It explores the concept of the relationship between two variable whether there exists an association and the strength of this association or whether there are differences between two variables and the significance of these differences.

The main three types we will see here are:

  1. Categorical v/s Numerical 
  2. Numerical V/s Numerical
  3. Categorical V/s Categorical data

Also of note Linear Discriminant Analysis or LDA is a dimensionality reduction technique. It is used as a pre-processing step in machine learning. The goal of LDA is to project the features in higher dimensional space onto a lower-dimensional space in order to avoid the curse of dimensionality and also reduce resources and dimensional costs. The original technique was developed in the year 1936 by Ronald A. Fisher and was named Linear Discriminant or Fisher’s Discriminant Analysis. 

(NOTE there is another LDA (Latent Dirichlet Allocation which is used in Semantic Engineering that is quite different).

dabl.plot(titanic, 'survived')

In the following plots that auto-magically happens is continuous feature plots for discriminant analysis.

Continuous Feature PairPlots

In the plots you will also see PCA (Principle Component Analysis). PCA was invented in 1901 by Karl Pearson, as an analog of the principal axis theorem in mechanics; it was later independently developed and named by Harold Hotelling in the 1930s.  Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT) in signal processing, the Hotelling transform in multivariate quality control, proper orthogonal decomposition (POD) in mechanical engineering. PCA is used extensively in many and my first usage of it was in 1993 for three-dimensional rendering of sound.

Discriminating PCA Directions

What is old is new again.

The main difference is that the Linear discriminant analysis is a supervised dimensionality reduction technique that also achieves classification of the data simultaneously. LDA focuses on finding a feature subspace that maximizes the separability between the groups. While Principal component analysis is an unsupervised Dimensionality reduction technique, it ignores the class label. PCA focuses on capturing the direction of maximum variation in the data set.

LDA

Both reduce the dimensionality of the dataset and make it more computationally resourceful. LDA and PCA both form a new set of components.

The last plot is categorical versus target.

So now lets try as dabl said a SimpleClassifier then fit the data to the line. (hey some machine learning!)

fc = dabl.SimpleClassifier(random_state=0)
X = titanic_clean.drop("survived", axis=1)
y = titanic_clean.survived
fc.fit(X, y) 

This should produce the following outputs with accuracy metrics:

Running DummyClassifier(random_state=0)
accuracy: 0.618 average_precision: 0.382 roc_auc: 0.500 recall_macro: 0.500 f1_macro: 0.382
=== new best DummyClassifier(random_state=0) (using recall_macro):
accuracy: 0.618 average_precision: 0.382 roc_auc: 0.500 recall_macro: 0.500 f1_macro: 0.382

Running GaussianNB()
accuracy: 0.970 average_precision: 0.975 roc_auc: 0.984 recall_macro: 0.964 f1_macro: 0.968
=== new best GaussianNB() (using recall_macro):
accuracy: 0.970 average_precision: 0.975 roc_auc: 0.984 recall_macro: 0.964 f1_macro: 0.968

Running MultinomialNB()
accuracy: 0.964 average_precision: 0.988 roc_auc: 0.990 recall_macro: 0.956 f1_macro: 0.961
Running DecisionTreeClassifier(class_weight='balanced', max_depth=1, random_state=0)
accuracy: 0.976 average_precision: 0.954 roc_auc: 0.971 recall_macro: 0.971 f1_macro: 0.974
=== new best DecisionTreeClassifier(class_weight='balanced', max_depth=1, random_state=0) (using recall_macro):
accuracy: 0.976 average_precision: 0.954 roc_auc: 0.971 recall_macro: 0.971 f1_macro: 0.974

Running DecisionTreeClassifier(class_weight='balanced', max_depth=5, random_state=0)
accuracy: 0.969 average_precision: 0.965 roc_auc: 0.983 recall_macro: 0.965 f1_macro: 0.967
Running DecisionTreeClassifier(class_weight='balanced', min_impurity_decrease=0.01,
                       random_state=0)
accuracy: 0.976 average_precision: 0.954 roc_auc: 0.971 recall_macro: 0.971 f1_macro: 0.974
Running LogisticRegression(C=0.1, class_weight='balanced', max_iter=1000,
                   random_state=0)
accuracy: 0.974 average_precision: 0.991 roc_auc: 0.993 recall_macro: 0.970 f1_macro: 0.972
Running LogisticRegression(C=1, class_weight='balanced', max_iter=1000, random_state=0)
accuracy: 0.975 average_precision: 0.991 roc_auc: 0.994 recall_macro: 0.971 f1_macro: 0.973

Best model:
DecisionTreeClassifier(class_weight='balanced', max_depth=1, random_state=0)
Best Scores:
accuracy: 0.976 average_precision: 0.954 roc_auc: 0.971 recall_macro: 0.971 f1_macro: 0.974

This actually calls the sklearn routines in aggregate. Looks like our old friend logistic regression works. keep it simple sam it ain’t gotta be complicated.

In conclusion, dabl is a highly recommended library for those looking to simplify their data analysis tasks. With its intuitive functions and visualizations, it provides a quick and easy way to perform data analysis, making it an ideal tool for both technical and non-technical user. Again, the real strength of dabl is in providing simple interfaces for data exploration. For more information:

dabl github. <- click here

Until Then,

#iwishyouwater <- hold your breath on a dive with my comrade at arms @corepaddleboards. great video and the clarity was astounding.

Muzak To Blog By: “Ballads For Two”, Chet Baker and Wolfgang Lackerschmid, trumpet meet vibraphone sparsity. The space between the note is where all of the action lives.

Snake_Byte[11] Linear Algebra, Matrices and Products – Oh My!

Algebra is the metaphysics of arithmetic.

~ John Ray
Looks Hard.

First, as always, i hope everyone is safe, Second, as i mentioned in my last Snake_Byte [] let us do something a little more technical and scientific. For context, the catalyst for this was a surprising discussion that came from how current machine learning interviews are being conducted and how the basics of the distance between two vectors have been overlooked. So this is a basic example and in the following Snake_Byte [] i promise to get into something a little more say carnivore.

With that let us move to some linear algebra. For those that don’t know what linear algebra is, i will refer you to the best book on the subject, Professor Gilbert Strang’s Linear Algebra and its Applications.

i am biased here; however, i do believe the two most important areas of machine learning and data science are linear algebra and probability, with optimization techniques coming in a close third.

So dear reader, please bear with me here. We will review a little math; maybe for some, this will be new, and for those that already know this, you can rest your glass-balls.

We denote x\in\mathbb(R)^N be N-dimensional vectors taking real numbers as their entries. For example:

\begin{bmatrix}\Huge 0 \\ 1 \\ 2 \end{bmatrix}

where \{a_i\} are the indices respectively. In this case [3].

An M-by-N matrix is denoted as X\in\mathbb(R)^N . The transpose of a matrix is denoted as X^T. A matrix X can be viewed according to its columns and its rows:

\begin{bmatrix}  0 & 1 & 2 \\ 3 & 4 & 5\\ 6 & 7 & 8 \\ \end{bmatrix}

where \{a_i_j\} are the row and column indices.

An array is a data structure in python programming that holds fix number of elements and these elements should be of the same data type. The main idea behind using an array of storing multiple elements of the same type. Most of the data structure makes use of an array to implement their algorithm. There is two important parts of the array:

  • Element: Each item stored in the array is called an element.
  • Index: Every element in the array has its own numerical value to identify the element.

Think of programming a loop, tuple, list,array,range or matrix:

from math import exp
v1 = [x, y] # list of variables
v2 = (-1, 2) # tuple of numbers
v3 = (x1, x2, x3) # tuple of variables

v4 = [exp(-i*0.1) for i in range(150)] #ye ole range loop

and check this out for a matrix:

import numpy as np
a = np.matrix('0 1:2 3')
print (a)
output: [[0 1]
 [2 3]]

which folks is why we like the Snake Language. Really that is about it for vectors and matrices. The theory is where you get into proofs and derivations which can save you a ton of time on optimizations.

So now let’s double click on some things that will make you sound cool at the parties or meetups.

A vector can be multiplied by a number. This number a is usually denoted as a scalar:

a\cdot (v_1,v_2) = (av_1,av_2)

Now given this one of the most fundamental aspects in all of machine-learning is the inner product, also called dot product, or scalar product, of two vectors, is a number. Most of all, machine learning algorithms have some form of a dot product somewhere within the depths of all the mathz. Nvidia GPUs are optimized for (you guessed it) dot products.

So how do we set this up? Multiplication of scalar a and a vector (v_1,\dots,v_{n-1}) yields:

(av_0,\dots,av_{n-1})

Ok good so far.

The inner or dot product of two n-vectors is defined as:

(u_0,\dots,u_{n-1})\cdot(v_0,\dots,v_{n-1}) = u_0v_0 +,\dots,+ u_{n-1}v_{n-1}

which, if you are paying attention yields:

(1)   \begin{equation*} = \sum_{j=0}^{N-1}{u_jv_j}\end{equation*}

Geometrically, the dot product of U and V equals the length of U times the length of V times the cosine of the angle between them:

\textbd{U}\cdot\textbf{V}=|\textbf{U}||\textbf{V}|\cos\theta

ok so big deal huh? yea, but check this out in the Snake_Language:

# dot product of two vectors
 
# Importing numpy module
import numpy as np
 
# Taking two scalar values
a = 5
b = 7
 
# Calculating dot product using dot()
print(np.dot(a, b))
output: 35

hey now!

# Importing numpy module
import numpy as np
 
# Taking two 2D array
# For 2-D arrays it is the matrix product
a = [[2, 1], [0, 3]]
b = [[1, 1], [3, 2]]
 
# Calculating dot product using dot()
print(np.dot(a, b))
output:[[5 4]
       [9 6]]

Mathematically speaking the inner product is a generalization of a dot product. As we said constructing a vector is done using the command np.array. Inside this command, one needs to enter the array. For a column vector, we write [[1],[2],[3]], with an outer [], and three inner [] for each entry. If the vector is a row vector, the one can omit the inner []’s by just calling np.array([1, 2, 3]).

Given two column vectors x and y, the inner product is computed via np.dot(x.T,y), where np.dot is the command for inner product, and x.T returns the transpose of x. One can also call np.transpose(x), which is the same as x.T.

 # Python code to perform an inner product with transposition
 import numpy as np
 x = np.array([[1],[0],[-1]])
 y = np.array([[3],[2],[0]]) 
 z = np.dot(np.transpose(x),y)
print (z) 


Yes, now dear read you now can impress your friends with your linear algebra and python prowess.

Note: In this case, the dot product is scale independent for actual purposes of real computation you must do something called a norm of a vector. i won’t go into the mechanics of this unless asked for further explanations on the mechanics of linear algebra. i will gladly go into pythonic examples if so asked and will be happy to write about said subject. Feel free to inquire in the comments below.

Unitl Then,

#iwishyouwater <- Nathan Florence with Kelly Slater at the Box. Watch.

tctjr.

Muzak to Blog By: INXS. i had forgotten how good of a band they were and the catalog. Michael Hutchinson, the lead singer, hung himself in a hotel room. Check out the song “By My Side”, “Dont Change” and “Never Tear Us Apart” and “To Look At You”. They weren’t afraid the take production chances.

Note[2]: i resurrected some very old content from a previous site i owned i imported the older blogs. Some hilarious. Some sad. Some infuriating. i’m shining them up. Feel free to look back in time.

Snake_Byte[10] – Module Packages

Complexity control is the central problem of writing software in the real world.

Eric S. Raymond
AI-Generated Software Architecture Diagram

Hello dear readers! first i hope everyone is safe. Secondly, it is the mondy-iest WEDNESDAY ever! Ergo its time for a Snake_Byte!

Grabbing a tome off the bookshelf we randomly open and it and the subject matter today is Module Packages. So there will not be much if any code but more discussion as it were on the explanations thereof.

Module imports are the mainstay of the snake language.

A Python module is a file that has a .py extension, and a Python package is any folder that has modules inside it (or if your still in Python 2, a folder that contains an __init__.py file).

What happens when you have code in one module that needs to access code in another module or package? You import it!

In python a directory is said to be a package thus imports are known as package imports. What happens in import is that the code is turned into a directory from a local (your come-pooter) or that cloud thing everyone talks about these days.

It turns out that hierarchy simplifies the search path complexities with organizing files and trends toward simplifying search path settings.

Absolute imports are preferred because they are direct. It is easy to tell exactly where the imported resource is located and what it is just by looking at the statement. Additionally, absolute imports remain valid even if the current location of the import statement changes. In addition, PEP 8 explicitly recommends absolute imports. However, sometimes they get so complicated you want to use relative imports.

So how do imports work?

import dir1.dir2.mod
from dir1.dir2.mod import x

Note the “dotted path” in these statements is assumed to correspond to the path through the directory on the machine you are developing on. In this case it leads to mod.py So in this case directory dir1 which is subdirectory dir2 and contains the module mod.py. Historically the dot path syntax was created for platform neutrality and from a technical standpoint paths in import statements become object paths.

In general the leftmost module in the search path unless it is a home directory top level file is exactly where the file presides.

In Python 3.x packages changed slightly and only applies to imports within files located in package directories. The changes include:

  • Modifies the module import search path semantic to skip the package’s own directory by default. These checks are essentially absolute imports
  • Extension of the syntax f from statements to allow them to explicitly request that imports search the packages directories only, This is the relative import mentioned above.

so for instance:

from.import spam #relative to this package

Instructs Python to import a module named spam located in the same package directory as the file in which this statement appears.

Similarly:

from.spam import name

states from a module named spam located in the same package as the file that contains this statement import the variable name.

Something to remember is that an import without a leading dot always causes Python to skip the relative components of the module import search path and looks instead in absolute directories that sys.path contains. You can only force the dot nomenclature with relative imports with the from statement.

Packages are standard now in Python 3.x. It is now very common to see very large third-party extensions deployed as part of a set of package directories rather than flat list modules. Also, caveat emptor using the relative import function can save memory. Read the documentation. Many times importing AllTheThings results in major memory usage an issue when you are going to production with highly optimized python.

There is much more to this import stuff. Transitive Module Reloads, Managing other programs with Modules (meta-programming), Data Hiding etc. i urge you to go into the LazyWebTM and poke around.

in addition a very timely post:

PyPl is running a survey on packages:

Take the survey here -> PyPl Survey on Packages

Here some great comments and suggestions via Y-Combinator News:

Y-Combinator News Commentary on PyPl Packages,

That is all for now. i think next time we are going to delve into some more scientific or mathematical snake language bytes.

Until Then,

#iwishyouwater <- Wedge top 50 wipeouts. Smoookifications!

@tctjr

MUZAK TO BLOG BY: NIN – “The Downward Spiral (Deluxe Edition)”. A truly phenomenal piece of work. NIN second album, trent reznor told jimmy iovine upon delivering the concept album “Im’ Sorry I had to…”. In 1992, Reznor moved to 10050 Cielo Drive in Benedict Canyon, Los Angeles, where actress Sharon Tate formally lived and where he made the record. i believe it changed the entire concept of music and created a new genre. From an engineering point of view,  Digidesign‘s TurboSynth and  Pro Tools were used extensively.

Snake_Byte[9] XKCD PLOTS

An algorithm must be seen to be believed.

~ D. Knuth

First i trust everyone is safe. Second its WEDNESDAY so we got us a Snake_Byte! Today i wanted to keep this simple, fun and return to a set of fun methods that are included in the defacto standard for plotting in python which is Matplotlib. The method(s) are called XKCD Style plotting via plt.xkcd().

If you don’t know what is referencing it is xkcd, sometimes styled XKCD, whcih is a webcomic created in 2005 by American author Randall Munroe. The comic’s tagline describes it as “a webcomic of romance, sarcasm, math, and language”. Munroe states on the comic’s website that the name of the comic is not an initialism but “just a word with no phonetic pronunciation”. i personally have read it since its inception in 2005. The creativity is astounding.

Which brings us to the current Snake_Byte. If you want to have some fun and creativity in your marketechure[1] and spend fewer hours on those power points bust out some plt.xkcd() style plots!

First thing is you need to install matplotlib:

pip install matplotlib

in this simple example we need numpy:

pip install numpy
import numpy as np
plt.xkcd() 
plt.plot(np.sin(np.linspace(0, 10)))
plt.plot(np.sin(np.linspace(10, 20)))
plt.title('Sorta Lissajous')
Sorta Lissajous

So really that is all there with all the bells and whistles that matplotlib has to offer.

The following script was based on Randall Munroe’s Stove Ownership.

(Some will get the inside industry joke.)

with plt.xkcd():
    # Based on "Stove Ownership" from XKCD by Randall Munroe
    # https://xkcd.com/418/

    fig = plt.figure()
    ax = fig.add_axes((0.1, 0.2, 0.8, 0.7))
    ax.spines.right.set_color('none')
    ax.spines.top.set_color('none')
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_ylim([-30, 10])

    data = np.ones(100)
    data[70:] -= np.arange(30)

    ax.annotate(
        'THE DAY I TRIED TO CREATE \nAN INTEROPERABLE SOLTUION\nIN HEALTH IT',
        xy=(70, 1), arrowprops=dict(arrowstyle='->'), xytext=(15, -10))

    ax.plot(data)

    ax.set_xlabel('time')
    ax.set_ylabel('MY OVERALL MENTAL SANITY')
    fig.text(
        0.5, 0.05,
        '"Stove Ownership" from xkcd by Randall Munroe',
        ha='center')
Interoperability In Health IT

So dear readers there it is an oldie but goodie and it is so flexible! Add it to your slideware or maretechure or just add it because its cool.

Until Then,

#iwishyouwater <- Mentawis surfing paradise. At least someone is living.

Muzak To Blog By: NULL

[1] Marchitecture is a portmanteau of the words marketing and architecture. The term is applied to any form of electronic architecture perceived to have been produced purely for marketing reasons and has in many companies replaced actual software creation.