GitHub Pages

I recently decided that I should probably write more. GitHub Pages is a good place where to store what I write.

What Is GitHub Pages

GitHub Pages is a service provided by GitHub to host… pages. It is a great way to add a website to a project hosted on GitHub, and also quite simple to do by using git. The idea is that you create a dedicated branch in your project repository, called gh-pages, and put the website there. GitHub is then smart enough to take the contents from that branch and expose them on the Internet at the right address.

If your nickname on GitHub is mynick, and the project is called myproject, then:

the project will be at https://github.com/mynick/myproject
its pages will be at http://mynick.github.io/myproject

For example, my repository for the Potrace Perl bindings has:

project repository at address https://github.com/polettix/Graphics-Potrace
its associated page(s) at address http://polettix.github.io/Graphics-Potrace

Project or User/Association?

What written above is fine for projects hosted on GitHub. As a matter of fact, there is also a standardized way to have similar pages for a user or an organization.

There is a slight inconsistency in how the thing is handled though:

it still relies on a GitHub project - good
the GitHub project MUST have a specific name, e.g. mynick.github.io - still good
the pages are hosted in the master branch instead of gh-pages - this is a bummer!

Blog?

With the tools above - especially considering the pages for a user or an association - it is easy to think about hosting a blog on GitHub. We will assume that it is hosted as a project by itself, not the one for the user/association above (although you can easily tweak the instructions below to make it happen).

The basic idea is that keeping a blog’s pages is too cumbersome to be done manually. You will probably want to provide a consistent look, with all headers, navigation, sides, footers… all the bells and whistles.

One of the best approaches to take is to use some blog generation system - we’ll use Jekyll here - so that we can concentrate on writing the stuff, and let the system do the heavy lifting to generate the final pages. Hence, it makes sense to consider the blog from two points of view:

the generating system where you put your articles in
the final generated site

This fits perfectly with GitHub: you can keep the generating system as the project, and its associated GitHub Pages as the real blog that is served on the Internet.

Let’s Start!

I set up my blog infrastructure using Jekyll. After installing it, create your new blog like this:

jekyll new myblog
cd myblog
git init
git add .
git commit -m 'initial import'

Now you have your local repository for the blog. At this point, you are ready for creating a new repository in GitHub (let’s call it myblog in user mynick) and tie them up:

git remote add origin git@github.com:mynick/myblog.git
git push -u origin master

It’s time to start generating pages at this point. Depending on how you installed Jekyll or whatever different, you might have to use bundle, which is what we will assume here:

bundle exec jekyll build

Now the generated stuff will live inside the _site subdirectory. This should be already included inside the .gitignore file generated by Jekyll automatically, but in case it’s a good moment for doing this.

The suggestion is that the master and gh-pages branches are really separated from one another. Again, there might be many ways to do this, I’m just providing you one here:

git checkout master
git checkout --orphan gh-pages
git rm -rf .

At this point you should still have the _site directory lying around, and this is where the real contents of your site actually are. A basic strategy can be to just copy the contents of that directory inside the root directory of the repository:

tar cf - -C _sites . | tar xvf -
git add .
git commit -m 'gh-pages initial import'
git push origin gh-pages:gh-pages

There you go, your blog is online!

Routine Workflow

What’s the workflow from now on? You will normally work in the master branch - we set all this up for this reason, actually - and will switch on the gh-pages branch only when needed.

Adding posts or pages in Jekyll is quite easy and there is plenty of documentation. When you’re done, make sure you are in the master branch and that changes are committed, otherwise you will not be able to switch to the gh-pages branch later on. It’s OK to have files that are not yet onboarded in GitHub though, git will not complain about them.

At this point, you have to follow these steps:

bundle exec jekyll build
git checkout gh-pages
tar cf - -C _sites . | tar xvf -
git add .
git commit -m $(date '+blog status at %Y%m%d-%H%M%S')
git push origin gh-pages:gh-pages
git checkout master

The copy using tar is effective although not completely correct. In particular, it will not take into consideration things that you delete, because all items will be added to what is already saved and committed. In general this should not be a problem though, because you will mostly be adding things, will you not?

A better strategy is to use git ls-files to list all files and remove most of them before doing the copy with tar. We should not get rid of all of them though, because some might be important for the generic management of the pages (e.g. the .gitignore file). We will assume that there are no files with spaces, so this will work:

bundle exec jekyll build
git checkout gh-pages
rm $(git ls-files | grep -v '^\.gitignore$')
tar cf - -C _sites . | tar xvf -
git add .
git commit -m $(date '+blog status at %Y%m%d-%H%M%S')
git push origin gh-pages:gh-pages
git checkout master

I eventually put the commands above in publish.sh file:

#!/bin/bash
MYDIR=$(dirname "$0")
FULLME=$(readlink -f "$0")
BAREME=$(basename "$0")

die() {
   echo "$*" >&2
   exit 1
}

main() {
   cd "$MYDIR" || die "unable to go in $MYDIR"
   cd .. || die "unable to go in parent directory of $MYDIR"
   echo "in $PWD now"

   git checkout master || die 'unable to switch to master'
   bundle exec jekyll build || die "unable to update contents"
   git checkout gh-pages || die 'unable to switch to gh-pages'
   tar cf - -C _site . | tar xvf - \
   && git add . \
   && git commit -m "$(date '+update at %Y%m%d-%H%M%S')" \
   && git push origin gh-pages
   git checkout master || die 'unable to switch to master'
}

main

I’m not an expert on this, but it’s very probable that without resorting to the trick of defining a function main and calling it, things might go very wrong in the execution of the script, because the script will live in the master branch but it might be unavailable in branch gh-pages.

Flavio Poletti