The Sieve of Eratosthenes is a wonderfully simple way to calculate primes below a specific number, with nothing more than integer addition (and optionally a multiplication, which can speed things up).
The idea behind it is to create an array of flags, initially true, indicating whether each number in the range is prime. The first flag in the array represents the number 2, the smallest prime. Then, starting with the first flag, find the next prime number (wherever a flag is still true), and reset the flags for all multiples of that number (but not the number itself) to false.
After one pass, 2 will remain prime, while all multiples of two will be marked non-prime. The next pass will find that 3 is still marked, and mark all multiples as non-prime. Then, 4 will already be non-prime, so the process continues from 5, and so on.
After the end of the array is reached, only prime numbers will still have their flags set, and they can be collected into a list (this can also happen during the sieve, as each prime number is found).
Languages
Zig
Zig is a fairly new programming language, and is perhaps described as “a better C” – not in the way that C++ claims to be, by adding more complex features, but by staying simple while bringing in all sorts of modern improvements.
I’ve found myself really enjoying Zig as a language recently, though it’s a little too unstable still for me to want to build anything big with it, since the standard library hasn’t settled yet.
One really useful tool that Zig provides for this task is a bitset type. Unlike the one I used last week in Nim, this is a standard library feature, not a language feature.
const std = @import("std");
const StaticBitSet = std.bit_set.StaticBitSet;
pub fn primes(buffer: []u32, comptime limit: u32) []u32 {
// ...
}
I’ve imported the standard library, and created a shorter alias for the StaticBitSet
type. This is the type to use for bitsets that have a size known at compile time… but wait, surely we don’t know how many primes to generate until runtime?
Zig has a really powerful comptime
feature that allows some parts of the code to be processed during the compilation process, and it’s been used here in the primes
function signature. limit
, the upper bound for the prime sieve is passed as a comptime
value, so it can’t be something provided dynamically at runtime. Instead, it needs to be given either as a constant, or as a value computed with only comptime
operations.
Because of this, the code uses the StaticBitSet
type to allocate the necessary flag space ahead of time.
var sieve = StaticBitSet(limit - 1).initFull();
var n: u32 = 2;
var i: usize = 0;
Inside primes
, the first thing to do is set up some variables. The StaticBitSet
I’ve been explaining is here, along with the current factor being checked, and an index into the output buffer
.
while (n <= limit) : (n += 1) {
if (!sieve.isSet(n - 2)) continue;
buffer[i] = n;
i += 1;
var m = n * n;
while (m <= limit) : (m += n) {
sieve.unset(m - 2);
}
}
This is the main body of the sieve. The loops here use an interesting piece of Zig syntax that acts a little like a C for loop. The : (n += 1)
on the end of the outer loop means that after each iteration, n
will be incremented.
If the current value was found to be non-prime, the loop just continues to the next iteration, otherwise the prime is written to the output buffer, and all multiples of n
are marked. This is also where the optimisation I mentioned comes in – everything below n * n
will already have been marked by a previous iteration, so I skip right to that point.
return buffer[0..i];
Finally, I just slice off the part of the buffer that wasn’t written to, and return it. It’s quite common in Zig to be very explicit about memory usage, and this function does so by expecting memory to be allocated and passed in as a slice. It’s up to the caller how this memory is obtained (in this case it’s a fixed array on the stack), and primes
just needs to return a sub-slice of the buffer as the result.
Bash
I was kind of dreading this one, because Bash is a shell scripting language, not a full on programming language. It’s designed for automating small shell tasks, not building mathematical algorithms. Weirdly though, it turned out to be pretty easy!
#!/usr/bin/env bash
for ((i=2; i<=$1; i++)); do
[[ ${flag[i]} ]] && continue
for ((j=i*i; j<=$1; j+=i)); do
flag[j]=1
done
primes+=("$i")
done
echo "${primes[@]}"
If you haven’t done much with Bash before, this might be completely nonsense to you, and even if you have, I’ve used some cheeky tricks to make things smaller, at the cost of overall code quality.
Starting from the top, primes
and flag
are both arrays, but I don’t declare or initialise them explicitly here. Good Bash style would suggest that I use declare -a
to mark them as arrays.
flag
is being used in the opposite way to my earlier description (1 = non-prime, 0 = prime). This allows me to treat all numbers as prime by default without initialisation.
for ((i=2; i<=$1; i++)); do
Bash uses C-style for
loops, with initialiser, condition, update. The syntax is slightly different, but the operation is the same. Again, i
and j
weren’t declared.
[[ ${flag[i]} ]] && continue
[[ ]]
is conditional syntax, which is used here to check if flag[i]
is set. If it isn’t the short-circuiting &&
won’t run continue
, so this is like if (flag[i]) continue
.
for ((j=i*i; j<=$1; j+=i)); do
flag[j]=1
done
This is the multiples loop, just like the last one. The only thing to remember here is that flag
is inverted.
primes+=("$i")
i
was the latest prime, so add it to the list.
done
echo "${primes[@]}"
Finally, print out all the primes separated by spaces. [@]
basically means “all the elements”.
While it’s far from code golf levels of compact, I had fun trying to squeeze this one down to a tiny script, while still keeping the same flow as the other solutions.
Fortran
Fortran, originally FORTRAN, was the first really successful high-level programming language. It was created in 1957, originally targeted the IBM 704, and until Fortran 90 (released in 1991?), didn’t support lowercase letters. It’s still popular today for high-performance computing, and is even used for many efficient linear algebra libraries, such as NumPy!
This implementation does one main thing differently from the previous ones - in the first loop, rather than adding primes to a list, I just increment a counter. This is because I need to allocate the right amount of memory upfront, but I can only know how many primes are in the range once I’ve found them. Technically the Zig version suffers from this problem as well, but just hides it by allocating a fixed amount of memory that is known to be enough.
function primes(limit)
integer, intent(in) :: limit
integer, allocatable :: primes(:)
logical, allocatable :: is_prime(:)
integer :: num_primes, i, j
allocate(is_prime(limit - 1))
num_primes = 0
I start with type declarations, including those of the input and output values, because Fortran doesn’t place those in the function signature. The return array is called primes
, because functions return a variable with the same name as the function by default. Initialisation of num_primes
is not done in the declaration, because in Fortran that acts like a C static
variable – it would only be initialised once across all calls.
do i = 1, limit - 1
is_prime(i) = .true.
end do
Initialising is_prime
to all true is easy enough, but notice that Fortran starts arrays at 1 :(
do i = 1, limit - 1
if (.not. is_prime(i)) then
cycle
end if
num_primes = num_primes + 1
do j = i * 2 + 1, limit - 1, (i + 1)
is_prime(j) = .false.
end do
end do
We’ve seen this loop before, just needs a few tweaks for the Fortran version. There’s a bit of extra calculation for the loop bounds to account for the 1-indexed arrays and 2-based flag array. cycle
is Fortran’s continue
, several Fortran keywords are surrounded by .
, but the logic is no different to previous solutions.
allocate(primes(num_primes))
j = 1
do i = 1, limit - 1
if (.not. is_prime(i)) then
cycle
end if
primes(j) = i + 1
j = j + 1
end do
end function primes
Now that I’ve found how many primes there are, I can stick them all into the primes
array, after allocating the right amount of memory, and it will be returned automatically.
Fortran is verbose, and does some things in ways that seem strange in a modern language, but the solution is still fundamentally the same, behind a new syntax – it’s still the Sieve of Eratosthenes, after all.
Summary
Because the goal of this week was to implement an existing algorithm, there’s not that much difference in the resulting solutions. Instead of fundamental changes in how languages let you write code, we saw a range of verbosity – the idea is always the same, the amount of code it takes is not. Bash can be tiny but unreadable, Fortran can be well-specified at the cost of code space, and personally I think that Zig hits a sweet spot between the two.
My full solutions are public: Zig, Bash, and Fortran, so feel free to check them out and see how the code looks when it’s all put together. Next week is the beginning of Analytical April, so the languages will move to slightly higher level ones, which should be a good change of pace.