This is a fairly literal port of Phil's C code. Of course, management of the display is quite different since it's different hardware. The major optimizations are using an array of Boolean to track where grains are, and replacing as much multiplication and division as possible with bit shifting. Other than that it is a pretty faithful port of Phil's code. So much so that I've reused his comments verbatim.

The code is complex and not immediately obvious, but the comments do a good job of walking you through it.

It doesn't always act exactly as you might expect, but keep in mind that this isn't a physics simulation, it's more of an approximation. The rules are quite simple and, as Phil said on Show & Tell, it's an example of fairly complex emergent behavior.

I'm quite pleased and impressed with how much I could get CircuitPython to do and how well it performs given the speed and memory constraints it's running under.

## Division & multiplication by powers of 2

The optimizations around division and multiplication are based on the mapping between base 10 numbers and binary numbers. Especially where 2 and powers of 2 are concerned.

If we take the number 10 and represent it in binary, we have 00001010 (we'll stick to 8 bits for these examples). If we divide 10 by 2 we get 5. In binary that's 00000101. So if we keep dividing by 2 (integer division)

`10        00001010`
` 5        00000101`
` 2        00000010`
` 1        00000001`

In decimal it's a free for all without any noticeable pattern. But look at the binary representations. Each time we divide by 2, the digits move one place to the right (with the rightmost one being discarded and a 0 begin shifted into the left.

Another thing to notice is, that since each shift right is a dividing by 2, shifting multiple times divides by a power of 2. Shifting twice is dividing by 4 (2 squared), 3 times is dividing by 8 (2 cubed), etc.  In general, shifting right n digits is dividing by 2 to the power of n.

Multiplying by powers of 2 works the same way, but by shifting left rather than right.

So why is this relevant? Shifting is a lot faster than division. A good compiler (like GCC used by the Arduino IDE) or a hardware integer math unit will make this optimization (i.e. shifting for factors that are powers of 2)  without the programmer having to worry about it. However, CircuitPython and MicroPython don't.

This comes up quite a bit in this code. Grain positions are kept in a coordinate system that has 256 times the resolution of the pixel coordinate system. To get the pixel location of a grain, both x and y need to be divided by 256. Fortunately that's just shifting 8 times.

## Remove unnecessary code

One optimization that had a significant impact was getting rid of my Dotstar Featherwing library in favor of using the underlying Dotstar library directly. When it struck me that the display was just being updated all at once, in one place, getting rid of the library was an obvious win.  Not only did it free memory, but it also removed a non-trivial amount of work that was being done each time through the loop.

Consider the Dotstar update code. In early versions I started by clearing the display out of habit. That's completely unnecessary when it's updating each pixel. Look for easy wins like this. The code should only do what it absolutely needs to. Anything more is wasting memory and/or time.

I expect I could get another jump in performance if I stripped down the Dotstar library and made a version customized for this application. One that did just what is required.

## Optimize carefully

This was a good exercise in optimization. The first version of the python code could barely move 5 grains around in anything approaching a smooth speed. The current version does a reasonable job with 10 grains.

I do want to warn you about trying to optimize in an all or nothing way. Be very deliberate and cautious. Pick one thing to optimize and do it.  Measure the results. If it didn't help significantly, remove it and try something else. The thing about optimizing is that it generally makes the code harder to understand so only do optimizations that make a difference.

## The code

Here is the code in its entirety.

```# SPDX-FileCopyrightText: 2018 Dave Astels for Adafruit Industries
#

#
# Copyright (c) 2018 Dave Astels
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

"""
Ported from the C code writen by Phillip Burgess
Explainatory comments are used verbatim from that code.
"""

import math
import random

import board
import busio

N_GRAINS = 10  # Number of grains of sand
WIDTH = 12  # Display width in pixels
HEIGHT = 6  # Display height in pixels
NUMBER_PIXELS = WIDTH * HEIGHT
MAX_FPS = 45  # Maximum redraw rate, frames/second
GRAIN_COLOR = (0, 0, 16)
MAX_X = WIDTH * 256 - 1
MAX_Y = HEIGHT * 256 - 1

class Grain:
"""A simple struct to hold position and velocity information
for a single grain."""

def __init__(self):
"""Initialize grain position and velocity."""
self.x = 0
self.y = 0
self.vx = 0
self.vy = 0

grains = [Grain() for _ in range(N_GRAINS)]
i2c = busio.I2C(board.SCL, board.SDA)
board.D13, board.D11, WIDTH * HEIGHT, 0.25, False)

oldidx = 0
newidx = 0
delta = 0
newx = 0
newy = 0

occupied_bits = [False for _ in range(WIDTH * HEIGHT)]

def index_of_xy(x, y):
"""Convert an x/column and y/row into an index into
a linear pixel array.

:param int x: column value
:param int y: row value
"""
return (y >> 8) * WIDTH + (x >> 8)

"""Check if a pixel is already used.

:param int limit: the index into the grain array of
the grain being assigned a pixel Only grains already
allocated need to be checks against.
:param int x: proposed clumn value for the new grain
:param int y: proposed row valuse for the new grain
"""
for j in range(limit):
if x == grains[j].x or y == grains[j].y:
return True
return False

for g in grains:
placed = False
while not placed:
g.x = random.randint(0, WIDTH * 256 - 1)
g.y = random.randint(0, HEIGHT * 256 - 1)
placed = not occupied_bits[index_of_xy(g.x, g.y)]
occupied_bits[index_of_xy(g.x, g.y)] = True
g.vx = 0
g.vy = 0

while True:
# Display frame rendered on prior pass.  It's done immediately after the
# FPS sync (rather than after rendering) for consistent animation timing.

for i in range(NUMBER_PIXELS):
wing[i] = GRAIN_COLOR if occupied_bits[i] else (0, 0, 0)
wing.show()

f_x, f_y, f_z = sensor.raw_acceleration
ax = f_x >> 8  # Transform accelerometer axes
ay = f_y >> 8  # to grain coordinate space
az = abs(f_z) >> 11  # Random motion factor
az = 1 if (az >= 3) else (4 - az)  # Clip & invert
ax -= az  # Subtract motion factor from X, Y
ay -= az
az2 = (az << 1) + 1  # Range of random motion to add back in

# ...and apply 2D accel vector to grain velocities...
v2 = 0  # Velocity squared
v = 0.0  # Absolute velociy
for g in grains:
g.vx += ax + random.randint(0, az2)  # A little randomness makes
g.vy += ay + random.randint(0, az2)  # tall stacks topple better!

# Terminal velocity (in any direction) is 256 units -- equal to
# 1 pixel -- which keeps moving grains from passing through each other
# and other such mayhem.  Though it takes some extra math, velocity is
# clipped as a 2D vector (not separately-limited X & Y) so that
# diagonal movement isn't faster

v2 = g.vx * g.vx + g.vy * g.vy
if v2 > 65536:  # If v^2 > 65536, then v > 256
v = math.floor(math.sqrt(v2))  # Velocity vector magnitude
g.vx = (g.vx // v) << 8  # Maintain heading
g.vy = (g.vy // v) << 8  # Limit magnitude

# ...then update position of each grain, one at a time, checking for
# collisions and having them react.  This really seems like it shouldn't
# work, as only one grain is considered at a time while the rest are
# regarded as stationary.  Yet this naive algorithm, taking many not-
# technically-quite-correct steps, and repeated quickly enough,
# visually integrates into something that somewhat resembles physics.
# (I'd initially tried implementing this as a bunch of concurrent and
# "realistic" elastic collisions among circular grains, but the
# calculations and volument of code quickly got out of hand for both
# the tiny 8-bit AVR microcontroller and my tiny dinosaur brain.)

for g in grains:
newx = g.x + g.vx  # New position in grain space
newy = g.y + g.vy
if newx > MAX_X:  # If grain would go out of bounds
newx = MAX_X  # keep it inside, and
g.vx //= -2  # give a slight bounce off the wall
elif newx < 0:
newx = 0
g.vx //= -2
if newy > MAX_Y:
newy = MAX_Y
g.vy //= -2
elif newy < 0:
newy = 0
g.vy //= -2

oldidx = index_of_xy(g.x, g.y)  # prior pixel
newidx = index_of_xy(newx, newy)  # new pixel
# If grain is moving to a new pixel...
if oldidx != newidx and occupied_bits[newidx]:
# but if that pixel is already occupied...
# What direction when blocked?
delta = abs(newidx - oldidx)
if delta == 1:  # 1 pixel left or right
newx = g.x  # cancel x motion
# and bounce X velocity (Y is ok)
g.vx //= -2
newidx = oldidx  # no pixel change
elif delta == WIDTH:  # 1 pixel up or down
newy = g.y  # cancel Y motion
# and bounce Y velocity (X is ok)
g.vy //= -2
newidx = oldidx  # no pixel change
else:  # Diagonal intersection is more tricky...
# Try skidding along just one axis of motion if
# possible (start w/ faster axis). Because we've
# already established that diagonal (both-axis)
# motion is occurring, moving on either axis alone
# WILL change the pixel index, no need to check
# that again.
if abs(g.vx) > abs(g.vy):  # x axis is faster
newidx = index_of_xy(newx, g.y)
# that pixel is free, take it! But...
if not occupied_bits[newidx]:
newy = g.y  # cancel Y motion
g.vy //= -2  # and bounce Y velocity
else:  # X pixel is taken, so try Y...
newidx = index_of_xy(g.x, newy)
# Pixel is free, take it, but first...
if not occupied_bits[newidx]:
newx = g.x  # Cancel X motion
g.vx //= -2  # Bounce X velocity
else:  # both spots are occupied
newx = g.x  # Cancel X & Y motion
newy = g.y
g.vx //= -2  # Bounce X & Y velocity
g.vy //= -2
newidx = oldidx  # Not moving
else:  # y axis is faster. start there
newidx = index_of_xy(g.x, newy)
# Pixel's free! Take it! But...
if not occupied_bits[newidx]:
newx = g.x  # Cancel X motion
g.vx //= -2  # Bounce X velocity
else:  # Y pixel is taken, so try X...
newidx = index_of_xy(newx, g.y)
# Pixel is free, take it, but first...
if not occupied_bits[newidx]:
newy = g.y  # cancel Y motion
g.vy //= -2  # and bounce Y velocity
else:  # both spots are occupied
newx = g.x  # Cancel X & Y motion
newy = g.y
g.vx //= -2  # Bounce X & Y velocity
g.vy //= -2
newidx = oldidx  # Not moving
occupied_bits[oldidx] = False
occupied_bits[newidx] = True
g.x = newx
g.y = newy
```

This guide was first published on Jan 23, 2018. It was last updated on Jan 23, 2018.