In order to help get people get started with FPGA programming to make the most of their Parallella, I have created this tutorial as a quick introduction. It draws on other tutorial material out there – the only new thing is presenting it in a format relevant to the Parallella and parallella-hw repository. It is written for ISE WebPack (14.7), and not Vivado.
ATTENTION – this post is now deprecated, and remains only for archival purposes. It will not be maintained, and will eventually be deleted. Please refer to the up-to-date version at my new website (parallellagram.org). Please direct any questions/comments there. New tutorials are available, which will not be posted here at all, so please update your bookmarks.
- Generating an FPGA peripheral with an AXI4 interface using the XPS wizard (in particular, for the Parallella)
- Modifying the result to do something meaningful
- Run-time reconfiguration of the FPGA
- Accessing the peripheral via /dev/mem
- Python test scripts to verify correct operation
It will not include:
- Simulation (because I am a hacker, not an FPGA designer)
- Well writtenness / nice layout (because I am also not a blogger)
- How to install ISE Webpack, or anything relating to Windross
- Full AXI4 interface – this will be covered in the next tutorial
NOTE – I strongly recommend reading the prerequisite tutorial before embarking on this one. There is some required setup covered in there, and this tutorial assumes you already have the project all set up and ready to go. If you wan’t to skip that, but use a ready-made project, you can checkout https://github.com/yanidubin/parallella-fpga-tutorials, place it in the same parent folder as parallella-hw. Tutorial000_BaseProject contains a base you should work from (planAhead: File >> Save Project As <project name>) Tutorial001_AXI4Lite contains the completed project sources.
Step 1: Open the planAhead tool, and get to the system builder using the provided system file for the Parallella
Open up your project in the planAhead tool.
From the folder in which you cloned the git repo at https://github.com/parallella/parallella-hw
[img] Under the Project Manager tab, find the Sources pane and expand the tree entries for parallella_z7_top and then system_stub to expose system_i. Double click this entry to launch the xps tool, which will allow altering the system design.
NOTE – you will probably get a Xilinx License Error. This is an annoying bug which you can easily work around. Just click “ok”, and when it brings up the dialog to manage licenses, simply click “Close”. You can still use the tool just fine.
NOTE also – this means of invoking the XPS is a pain since it causes planAhead to be unresponsive until it is closed. I recommend in future that you run xps from the commandline and point it at your system.xmp. The main thing is not to start with a blank XPS project and have it try to generate a blank system config, oblivious to the Parallella hardware.
Step 2: Create and configure an AXI4 IP core
NOTE – from this point on, you can follow any tutorial out there for the zync in general (Zedboard, etc – how I learnt this stuff myself, after all), provided you retain any customisations to the Parallella system.mhs file. But not everything will be helpful, as often a different workflow is prescribed (such as using Xilinx SDK to build a bootloader and driver framework – all of which is completely unnecessary).
Once the xps is running, you will be faced with a beautifully complex system diagram of the SoC. You can ignore this for this tutorial, since no customisation of this is necessary just yet.
[img] From the menu bar, go to Hardware >> Create or Import Peripheral.
[img] Click Next, then select Create templates for a new peripheral and click Next again.
[img] You then are asked where to create the core. I prefer to add all my cores to an EDK repository rather than the project. This is so I can re-use them in multiple projects.
[img] I recommend creating your own edk_user_repository – under parallella-hw/fpga/ if you are using a checkout/fork of parallella-hw, or better still in your own project repository (but at the top level as IP cores are designed to be shared between projects).
[img] Click Next, and give the module a Name, and click Next again.
For the first example, we will use the AXI4-Lite interface. This provides access to a simple set of registers. This humble register file is all you need to implement your own ALU and instruction set. I intend to provide such an example as a separate example/tutorial – no need to do anything so fancy for now.
[img] Select AXI4-Lite, and click Next.
[img] You will be asked about IP Interface services. All you need is the User logic software register – we will not be using a data phase timer. Click Next.
[img] Click Next, and you will be asked how many S/W registers you want. I went for 8 for my ALU peripheral – this will depend what you wish to do with yours, but you’ll need at least 4 to complete this example.
[img] Click Next, and you will be shown a list of ports. We do not need to make any changes here. Click Next again.
[img] I know nothing of simulation, and a license is required for this besides – so I clicked Next again.
[img] You will then be asked about Peripheral Implementation Support. Since I prefer to use VHDL rather than Verilog, and do not require any driver support, I simply clicked Next. While you may prefer Verilog, I chose to do the rest of this tutorial in VHDL – so if you plan to keep working through the final steps, stick with VHDL for now.
[img] Then click Finish and you are done.
You now have a new IP core which you can include in your projects. The IP core itself doesn’t do much – it simply supports writing to the registers, and read back the values. But before we can use it, there are a few steps to go through. Only then will we customise it to make it do something a little more interesting.
Step 3: Reconfiguring the Processing System
If you are using the HDMI build, or for some reason already have M_AXI_GP0 enabled, then you can skip this section (although it is instructional).
We cannot use M_AXI_GP1, as it is routed externally and used by the Epiphany link. M_AXI_GP0 therefore is the master interface we will be using. If you simply go to step 4 without enabling M_AXI_GP0 first, I have found the XPS UI refuses to allow you to connect the clock to the peripheral – there is evidently some bug. In the past, I hacked the system.mhs by hand – but if we do it this way round, it is a bit friendlier 🙂
I have not investigated whether we can get peripherals working on the M_AXI_GP1 bus. But even if we could, I am not sure slaves can communicate directly, so it is probably not a means to establish direct FPGA <-> Epiphany communications as you might have started wondering (I know I did!).
[img] If you go to the Bus Interfaces tab, you should see that M_AXI_GP1 is present (but M_AXI_GP0 is not).
[img] On the Zynq tab, click on the 32b GP AXI Master Ports (green box, bottom left of diagram).
[img] In the General Purpose Master AXI Interfaces tab, select Enable M_AXI_GP0 interface. Then click OK.
[img] Now go to back to the Bus Interfaces tab. You will now see both M_AXI_GP0 and M_AXI_GP1 shown.
Step 4: Instantiate and configure an AXI4 IP core
[img] You should now be able go to the IP Catalog tab in the left window of the XPS and expand the entry at the bottom of your list to expose your PCore. Location will depend on whether you put it in a repository or made it project local.
[img] Right click on your core and click Add to add an instance to your project.
[img] Select Yes when XPS asks stupid questions.
[img] You can click Ok to skip past the configuration dialogue.
[img] Leave the default setting for “processing_system7_0“.
[img] Under the Bus Interfaces tab, you should now see that your peripheral is connected to the same axi_interconnect_1 as M_AXI_GP0. The numbering of the axi_interconnect is not important, just that they are connected to the same one.
[img] Under the ports tab, you will see various references to your peripheral, such as the clock being used for axi_interconnect_1. If axi4lite_example_0.S_AXI.S_AXI_ACLK is not connected to port processing_system7_0::FCLK0, you are not going to get very far – the previous step was all about making sure this would connect cleanly.
[img] Now navigate to the Addresses tab. You should now see your peripheral near the top of the list, and allocated a region of the memory map. This is the address space for the peripheral which is memory mapped (can be accessed on the Arm via /dev/mem).
[img] In order to make things easier, I suggest you change the address to something easier to work with. I chose 0x60000000. I also limited the reserved memory to 4k (the smallest allocation XPS will allow for this interface type) since you only have 8 registers! We are not accessing BRAM yet. Finally you should lock the address – because if it moves (not sure what causes this), it’s not going to work so well.
You should now be able to run Generate NetList and have this complete without any errors. If not, you may wish to use git to revert system.mhs and start again (leave system.xmp, as this contains the search path to your PCore – this way, you only need to do the system config and PCore instantiation/configuration again).
Step 5: Build and test the resultant core
I don’t have time to cover customising the PCore – but this probably works best as a separate tutorial anyway.
Go pack to planAhead. Generate the BitStream (after resetting runs – I usually then right click system_i and run Generate Output Products, since planAhead does not reliably do this always – I try and do this every time I modify my core. Can someone suggest something better?).
Since I haven’t written tutorial 0 part B, here is a quick and dirty guide to doing bitgen and runtime reconfiguring your FPGA. Rather than using bootgen as described in the readme, which requires bif/dummy ELF file, I prefer to use promgen.
Tutorial001_AXI4Lite$ promgen -b -w -p bin -data_width 32 -u 0 Tutorial001_AXI4Lite.runs/impl_1/parallella_z7_top.bit -o Tutorial001_AXI4Lite-001.bit.bin
Copy to the target board:
scp Tutorial001_AXI4Lite-001.bit.bin linaro@Parallella:
If you have not already, generate xdevcfg so you can runtime reconfigure the FPGA. Since we are just using /dev/mem, there is no dependence on devicetrees or overlays:
root@linaro-nano:/home/linaro# mknod /dev/xdevcfg c 250 0
As root on the Parallella, load in the new bitstream and verify it programmed.
root@linaro-nano:/home/linaro# cat Tutorial001_AXI4Lite-001.bit.bin > /dev/xdevcfg
root@linaro-nano:/home/linaro# cat /sys/devices/amba.1/f8007000.devcfg/prog_done
The second command returns 1 on success, 0 on failure. If it failed, I believe this means you have a corrupt bitstream.
Now you are ready to probe the device. I use the gpio-dev-mem-test tool from Sven Andersson of FPGA Design From Scratch. As a normal user:
linaro-nano:~> wget http://svenand.blogdrive.com/files/gpio-dev-mem-test.c
linaro-nano:~> gcc gpio-dev-mem-test.c -o gpio-dev-mem-test
Then you can do a quick read/write test using:
linaro-nano:~> sudo ./gpio-dev-mem-test -g 0x60000000 -i
gpio dev-mem test: input: 00000000
linaro-nano:~> ./gpio-dev-mem-test -g 0x60000000 -o 42
linaro-nano:~> ./gpio-dev-mem-test -g 0x60000000 -i
gpio dev-mem test: input: 0000002a
So we read a 0, wrote a value, and read it back only we got the result in hexadecimal (42 == 0x2a). This is one of the things I dislike about Sven’s tool – there is no way to input values in Hex, and almost everyone works in hex, right? So what you see is rarely what you get. I never bothered to fix it since I drive it from Python, and do all the magic conversion there anyway.
Step 6: Improvements
I ran out of time, but I want to share the following improvements:
Python script to exercise /dev/mem, excercising the core and verifying the results are as expected.
For example, when I run my image smoothing algorithm, I do the calculations in Python and compare the pixel array I get back to ensure the algorithm is operating correctly.
A very basic ALU. For my first project, I assigned reg0 as the instruction register, reg1 and reg2 as operand, and reg3 and reg4 for the result. Then I implemented a very simple add/subtract/multiply/AND/OR/etc ALU (with implicit addressing – that is, given the instruction, the operands come from a specific register – always reg1/reg2). Rather than fetching instructions from a location as a real processor would, I simply execute whatever instruction is in reg0 on every cycle. What you have is effectively a register file – so there are many possibilities here.
Step 7: Where to next?
You may like to refer to this guide at FPGA CPU News. It is fairly tailored towards the ZedBoard early on (which is why I wrote this tutorial) – but he goes on to explore another means of talking to the peripheral which I have not investigated as of yet. I was more interested in getting high performance AXI-stream going, than getting into writing drivers.
I also intend to provide a guide to using AXI4 full as a means to access BRAM, as this is the interface I use for my image processing engine (still in development as a bit of a side project) until I get AXI-stream working. Much of it is in common with what we have done here (you just drive a wizard), but the VHDL side is quite different. Actually, the auto-generated code from Xilinx has some annoying shortcomings which I want to address before recommending it.