======================================================================
Title:       Plaintext Weblog Posts
Date:        2025-01-25
Tags:        smolnet
Link:        https://spool-five.com/posts/2025-01-25-plaintext_weblog_posts/
Word Count:  620
======================================================================


I use sourcehut pages for both my gemini and website. It is a simple,
great service where you don't need to worry about managing a server.

Sorcehut pages[1]
=>[1] https://srht.site/

One downside is that you are forced into a 'static site' mode of thinking
about your content. This isn't necessarily a bad thing, but it means
you can't do much with the content on the server side. For example, I
needed to use a separate server instance for the interactive/games
content that is available at dev.spool-five.com.

Gemini games[2]
=>[2] gemini://dev.spool-five.com

Recently Kris Occhipinti posted a video where he demonstrates the
cli-friendly nature of his website. His site serves the content differently
if your client is curl/wget, and allows you to do things like searching
the site without leaving the terminal. You can see for yourself:

wget -qO- "https://filmsbykris.com"
Kris Occhipinti Video[3]
=>[3] https://www.youtube.com/watch?v=IsKN6nuTauY

I loved this idea, but in the context of a static site hosted on sourcehut
pages this isn't really an option. So I wanted to try an alternative,
which would at least help to make my site more 'grep-able'. I didn't
necessarily succeed in that sense, but I ended up writing some scripts
to generate plaintext version of my web content that could sit alongside
the (more 'bloated') html.

It was a fun exercise. Like a lot of people on here I'm sure, I've always
been curious about trying to build a static site generator myself, but
have been dissuaded by all the great options that are already available.
So this scratched that itch a bit by giving me a chance to write something
that could at least generate a more stripped-down, portable version of
the content that I've posted here.

The same 'source' content persists in the form of markdown, and the
plaintext versions of the files are generated alongside them within the
sourcehut actions workflow.

You can see an alternative 'index' of all the plaintext content at the
link below.

Index of plaintext version of site[4]
=>[4] https://spool-five.com/pt/

If you want to read this post in the terminal you can try:

wget -qO- "https://spool-five.com/pt/2025-01-25-plaintext_weblog_posts.txt"

While not as elegant as Kris Occhipinti's cli interface, you can also
use the 'plaintext' index of posts to filter for information using wget.
Each line of the plaintext index is split into four sections with spaces:
date, title, tags, link. The formatting of these lines uses something
similar to the Denote emacs package, where title words are separated by
dashes and tags are separated by underscores.

20250101 Title-of-post _foo_bar https://example.com

In this format, if you wanted to get the link for the oldest post that
was tagged with 'philosophy', you would filter for '_philosophy' and
take the last entry:

wget -qO- "https://spool-five.com/pt/index.txt" | grep "_philosophy" | tail -n1 | awk -v x=4 '{print $4}'

Or, to print out a random page to the terminal:

wget -qO- "https://spool-five.com/pt/index.txt" | shuf | head -n1 | awk -v x=4 '{print $4}' | xargs -I {} wget -O- {} | less

The scripts for converting/building these features were written using
babashka. I was pleasantly surprised at how simple it was to integrate
into the sourcehut build process. I'm not a professional developer so
I don't engage with CI/CD much, but the sourcehut build instructions
were quite clear and easy to follow.

Sourcehut builds[5]
=>[5] https://man.sr.ht/builds.sr.ht/

Babashka[6]
=>[6] https://babashka.org/

Source code for the scripts[7]
=>[7] https://git.sr.ht/~loopdreams/spv-plaintext