Building for the Parallella with FuseSOC

I am currently working in the background on porting the FuseSOC build tool to the Parallella.

I will be reporting updates in the thread here, but I wanted somewhere to keep an up-to-date set of instructions, which will be updated each time I refine this.

Current status:

Currently, you can use FuseSOC to build the parallella_7020_headless example project from the parallella-hw repository, with a few hacks, and a new repository with a config for the Parallella, and a two extra scripts (knocking these off, one at a time).

I have now tested the output on target (read on below), and verified I can still talk to the Epiphany, but for now you use the output of this tool at your own risk. Before too long, I hope to be building all my projects using FuseSOC, and will be able to remove the above warning 🙂

Desired result:

The goal is to be able to fetch straight from the parallella-hw repo (update: achievement unlocked), and build for other configurations (headless/HDMI, z7020/z7010) with minimal changes. And then to provide an easy way build (all) your own projects on top of this. I envisage this would be done by patching the base project with your changes to those files – plus you would pull in any new files added either by adding to this repo, or provisioning from your own repo (say if you want to pull in a custom core). However managing the patches is going to be an interesting challenge (ideally you don’t want to commit for every build), but I’ll flesh those ideas out as the port improves.

Installating FuseSOC:

Fetch the repositories from github:

git clone https://github.com/yanidubin/fusesoc
git clone https://github.com/yanidubin/parallella-cores

Go to the fusesoc folder and build, install and configure:

cd fusesoc
autoreconf -i && ./configure && make && sudo make install
fusesoc init

When you are prompted on the location of orpsoc-cores, you can go ahead and install this to a convenient location – this installs the OpenRISC cores (which I am not using as of yet).

Starting a build:

NOTE – at present you will see seven warnings about system files not being found (in bold yellow at the very start of the build). You can ignore these, it is a temporary issue with one of the workarounds I have implemented, and the build will still succeed. As for the various other ERROR type messages seen during the build, these are not actually fatal errors – just some rough edges around how I am driving the Xilinx cli tools.

Create a test folder (anywhere), import Xilinx environment settings, and start the build.

mkdir test
cd test
. /opt/Xilinx/14.7/ISE_DS/settings64.sh
/usr/bin/time -v fusesoc --cores-root=../parallella-cores build parallella

The build shouldn’t take too much longer than planAhead reports (if you add synth+impl). It takes a little longer since it has to run synthesis twice due to some current limitations).

Process "Generate Programming File" completed successfully
INFO:TclTasksC:1850 - process run : Generate Programming File is done.

	Command being timed: "fusesoc build parallella"
	User time (seconds): 428.61
	System time (seconds): 4.04
	Percent of CPU this job got: 94%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 7:37.23

Note that if you do not wish to specify the path to parallella-cores on the commandline as above, you may wish to edit ~/.config/fusesoc/fusesoc.conf manually and add the path to parallella-cores after orpsoc-cores, for example:


$ cat ~/.config/fusesoc/fusesoc.conf
[main]
cores_root = /stash/files/dev/embedded/fpga/fusesoc-cores /stash/files/dev/embedded/fpga/parallella-cores

Testing the build:

The resulting .bit file is at build/parallella/bld-ise/parallella_z7_top.bit. I will be adding automatic conversion to a .bit.bin very soon, but for now I just use.

$ promgen -b -w -p bin -data_width 32 -u 0 build/parallella/bld-ise/parallella_z7_top.bit -o FuseSOC-test-001.bit.bin
$ scp FuseSOC-test-001.bit.bin linaro@Parallella:

So far I have loaded this onto my board and verified that it programs and that I can still talk to the Epiphany chip

linaro-nano:~> sudo su
root@linaro-nano:/home/linaro# cat FuseSOC-test-001.bit.bin > /dev/xdevcfg
root@linaro-nano:/home/linaro# cat /sys/devices/amba.1/f8007000.devcfg/prog_done
1
root@linaro-nano:/home/linaro# exit
linaro-nano:~> cd ~/epiphany-examples/apps/matmul-16
linaro-nano:~/epiphany-examples/apps/matmul-16> ./run.sh

and the test passes – so I at least got something right.

I am not aware of any issues at this time, however my testing is far from extensive.

Build tool validation – work required

The images are completely different – I did a hexdump / diff of the FuseSOC output versus the parallella_e16_headless_gpiose_7020.bit.bin which I expect comprises the same source inputs, built from planAhead.

Ideally, if I had produced something very similar, since I am theoretically using the same tools, I could prove that the build system is a perfect drop in replacement for planAhead. We just might see a different timestamp, or some such, but otherwise binary identical.

I will do some further tests by comparing some of the intermediary files at some point, so others can be confident it is safe to use – but this is not all that important for me at this stage – so long as it runs.

Why do the bitstreams differ?

Olof (author of FuseSOC) has confirmed in his comment below that this is to be expected, and not at all surprising.

So the following observations would seem to be correct – read on only if interested.

My work has recently involved binary patching of mobile radios in the field (including the FPGA hosting our soft processor, the soft processor firmware, and a DSP), and so I have looked into how much impact a tiny change in the VHDL has on a binary bitstream. We must limit the amount of patch data to send over a very low speed radio link.

I found that bzdiff (modified bsdiff4 to use zlib for stream compression rather than bzip2) made the patch file larger than simply using naive zlib compression – not at all the case with our firmware and DSP which benefited from the bsdiff algorithm.

Granted that this was using a much older FPGA from a different vendor (Cyclone, from Altera) – but given that observation, the above does not surprise/worry me. I had conjectured that if a minor maintenance change to our Radio FPGAs can entirely change the bitstream, then something like using different optimisation settings, might cause a vastly different result, as the map/routing might make quite different choices on where to put things – even if a small number of initial changes are introduced.

4 thoughts on “Building for the Parallella with FuseSOC

  1. Olof Kindgren

    Hi Yani,

    Thanks for doing this. It’s been on my todo list for quite some time, but I never got around to start working on it. I’ll try to take a closer look at the patches and merge them as soon as possible. Please let me know if you have any questions or ideas as well. I will definitely talk about this on the OpenRISC conference that we are having in Munich this weekend.

    There are also a few things you might be interested in knowing.
    1. You can have multiple core libraries in FuseSoC. The cores_root parameter in fusesoc.conf is a space-separated list, so you can add more like cores_root = /path/to/orpsoc-cores /path/to/parallella-cores /path/to/other/cores. I have several cores_roots in my own fusesoc.conf. If two cores with the same name is found in several libraries, the last one is used, so this can also be used to override cores from orpsoc-cores. The other option is to specify additional core libraries with fusesoc –cores-root=/path/to/parallella-cores –cores-root=/path/to/other/cores build , and the new libraries will be searched when you run FuseSoC

    2. It is practically impossible to diff two FPGA image files. It’s enough just to change a single option in any of the stages, or change the order of two source files, and the synthesis result will be different, which will yield a different place-and-route database and so on. Even two identical sets of inputs aren’t always guaranteed to give you the same result. We noticed this at my work about a year ago where the Linux kernel on the build machine was upgraded and suddenly we got different results. FPGA toolchains are just full of black magic

    Looking forward to hear more of this, and I’ll make sure to try it out myself!

    //Olof Kindgren

    Like

    Reply
    1. Yani Dubin Post author

      Hi Olof,

      No problem – that’s very cool, cheers for letting me know 🙂

      I have a bit of tidy up to do still. But I would like to discuss with you some of the bits I have come up against. So far I’m content with hacking things to make it work, as I build up an idea of how I can use the tool – but will be keen to discuss whether all of these are things which should necessarily be added (and how they are architected in), or if there is simply a better way to use the tool and some are not necessary. I don’t have so much experience with FPGA projects – just an idea of what I would like to do with the Parallella. So some perspective would be great.

      What would you prefer – shall I email you with questions, or would you like to make it an open discussion? If you do you don’t have a mailing list or forum, we could either continue here, or use this thread over on the Parallella forums (where I’ve mentioned the things I’m struggling with).

      No, I hadn’t realised you could have multiple core libraries – that is very good to know, since it means keeping parallella-cores separate much more viable. I don’t want to clutter up orpsoc-cores, but I see FuseSOC being the build tool to build potentially each Parallella tutorial I do (currently IDE based, but am thinking just diffs with the nitty gritty would be more signal and less noise), and even providing some sort of sandbox for newcomers not wanting to see the IDE at all initially. So each project might be a new system definition which is essentially clone of a base system, plus a few patches – and that doesn’t fit in with orpsoc-cores. We could add maybe one reference system, and that would be it. So having these co-exist seems best. At present, I am not dealing with cores – since the base projects don’t use any yet.

      Thanks for the info on the difference between builds – interesting that a kernel change (rather than a toolchain change) was able to affect it!

      I’ll update the article, since I’ve progressed things again yesterday.

      Regards,
      Yani.

      Like

      Reply
      1. Olof Kindgren

        Hi again Yani,

        Sorry for the delay. The OpenRISC conference last weekend stole all my time, but it was a great event!

        Anyway, I have a few ideas for how to extend FuseSoC with different kinds of autogenerated code (like coregen, platgen and custom code generators). The quickest way to talk about these things would probably be in the #openrisc channel on irc.freenode.net if you are an IRC user, otherwise mails or the thread you posted would work fine as well.

        I also want to say that I think you’re on the right track with the idea of a base system for Parallella that can be built upon. It’s exactly how I envisioned that FuseSoC should be used. I’ve had some ideas for the parallella-hw repo on github too. The elink code should be one core, the common code is useful in other places and could also be moved to a separate core. That would make it easier to just clone the top-level glue logic for all different systems

        Anyway. Looking forward to hear more of your progress and eventually add this to mainline FuseSoC – and don’t hesitate to ask any questions

        Best Regards,
        Olof

        Like

  2. Yani Dubin Post author

    Hi Olof,

    Sorry about the delay. Glad it went well. I will have to have a look into what you guys are up to when I have a chance (the Risc-V project caught my eye in particular).

    I’m not currently an IRC user, but could be persuaded to give it another go. However I think an issue we will have is that we are 11 hours apart (I am GMT+13 here in NZ). Having a slow back and forth over an active channel will not work so well I think – I foresee it ending up as private messages. So, I am not in the habit of checking IRC (and would need to set one up secured in a chroot on my webhost in order to monitor it 24/7), and I gather you are not in the habit of checking the Parallella forums. Would email be the best compromise here?

    To be honest I haven’t looked at this much since my last update. I did embark on the next step which was removing all the custom scripts in the parallella-cores project – I didn’t quite get it finished before I put it down, but certainly passed the point where I felt convinced I was putting too much specifics into the tool to make the project simple. Mostly it is a workaround for an issue I am having with xst when driven from xtclsh – so not a valid thing to burden the tool with. I put it down – and then set up hosting, threw together a new website framework, and am just finishing up another tutorial. I do have more content to add and clean up, and then will get back onto this (and will complete and check my latest into a branch – just for reference). But we can certainly start discussing already – always a good source of motivation 🙂 I will certainly have more questions once I get back into it in earnest.

    What you say about the partitioning with the cores is interesting – I haven’t looked at the verilog code in any detail, and considered it all as just glue. It certainly never occurred to me these could be nicely packaged up like that – but then the idea of a core is new to me (I am not an FPGA guy). I agree anything which is not just glue is a good candidate to treat as a core – this sounds like a great idea. I will have to give the verilog files a more thorough examination and learn what exactly constitutes a core.

    Then there are modules like the one for GPIO which I think could be improved so that the file itself doesn’t need to be modified as other peripherals are added (to release some of the GPIOs). I was thinking maybe an interface clean up (using generics) to better encapsulate it might be preferable to the current approach of global includes. Maybe not the best example of a common module – but the only one I have looked at in any detail. If these could be better encapsulated, they are better candidates to put into a core, and parameterise in the top level glue logic.

    Regards,
    Yani.

    Like

    Reply

Leave a comment