William R Sowerbutts
2014-04-26 13:50:12 UTC
Hello everyone
This is not strictly N8VEM related, but I've seen your thread about FPGA
systems on chip ... I built a similar system last year and I thought some of
you may be interested. I should start by saying this was only my second FPGA
project and was also my first attempt at writing code for a Z80, so the
quality of my code is probably not very good! The machine works well though
and I've had a great deal of fun with it.
I built my system using a Papilio Pro FPGA board:
http://papilio.cc/index.php?n=Papilio.PapilioPro
The PPro is a great board and I thoroughly recommend it. It has a Xilinx
Spartan 6 LX9 FPGA, 8MB of SDRAM, 8MB of SPI flash memory, and an FTDI USB
interface that is used to connect JTAG and UART to a host PC. My main
criticism of the board is that the serial link via the UART has no flow
control lines hooked up to the FPGA -- the FTDI has a deep FIFO (several KB)
and you can build a receive FIFO inside the FPGA, but at high data rates you
will inevitably overflow these eventually.
I also have a Pipistrello FPGA board which is based on the same Papilio form
factor. It has the UART flow-control hooked up, has a larger and faster DDR
SDRAM chip as well as a much larger LX45 FPGA. You can use the Xilinx on-chip
memory controller block to drive the DDR SDRAM chip. I've not got had time to
get this board working yet.
The Papilio form factor is very hardware-hacker friendly; all the IO pins are
broken out on 0.1" headers so you can easily pop a bit of veroboard on top
and solder up a MAX3232 or SD card or LEDs or whatever.
I started my Z80 system with the open-source T80 CPU core, a UART that I'd
written for an earlier project, and some of the on-chip block SRAM for
memory. I then wrote a simple monitor program for it.
Xilinx have a "data2mem" tool that you can use to quickly replace the data
loaded into a block RAM without resynthesising the FPGA design (which is
rather slow), so you can assemble your monitor program, use data2mem to have
the code loaded into block RAM, then reprogram the FPGA which will run the
code when it comes out of reset. This affords a very quick edit/compile/test
cycle (about 3 seconds from hitting enter to running code).
Once I had a monitor program running I imported Mike Field's brilliant SDRAM
controller to drive the 8MB SDRAM chip on the board:
http://hamsterworks.co.nz/mediawiki/index.php/Simple_SDRAM_Controller
This gave me access to far more memory than the Z80 could address, so I added
a 4K paged MMU to translate the 16-bit (64K) logical address space into a
26-bit (64MB) physical address space. Each 4KB logical page can be mapped
independently to any 4KB physical page. There's also what I call the "17th
page" which allows you to access physical memory addresses without mapping
them into the CPU address space -- it has a 26-bit pointer in the MMU and an
I/O port that translates I/O cycles into memory cycles, automatically
incrementing the pointer after each cycle so you can use the INIR instruction
with it to do block copies of unmapped physical memory to/from mapped memory.
The SDRAM takes on the order of 10 cycles to supply data after a read request
so I implemented a 16KB direct mapped cache using the on-chip block SRAM in
order to conceal this latency. This works very well. The FPGA block SRAM is
36-bits wide which allows for a 4-byte wide cache line plus 4 bits to
indicate the validity of each byte. You can also use it in a 9-bit wide mode
which turned out to be perfect for storing the cache address tags.
Debugging the cache was a pain. I ended up writing several programs to
exercise and test the memory in various ways; when I found a fault it often
took some head-scratching to determine if it was a bug in the hardware or the
software! This is doubly hard when the software is itself executing from
unreliable memory, so I added a 4K block of SRAM to the system use the MMU to
map that wherever I want and store the memory test program in there.
The Xilinx synthesis tools tell me my design is good for about 64MHz. I've
always run it at 128MHz without problems -- I think the critical timing paths
are probably on the address bus and so have an extra cycle to propagate.
I've not figured out how to tell the Xilinx tools about this nor how to
interpret their output to understand which are the critical paths. The Z80 is
rather fast at 128MHz and the cache all but eliminates the need for
wait states.
Once I had the hardware working I had a lot of fun writing software for it,
extending the hardware capabilities as the software grew more sophisticated.
I wrote a CP/M-2.2 BIOS and got CP/M running. There's so much RAM in the
system that I just used the top 6MB as three 2MB RAM disks, which hugely
simplified writing storage drivers (you just map the relevant page of RAM
disk into the address space, copy the data, and then map back the original
page -- this was before I implemented the "17th page" trick). I wrote SPI
master hardware and some routines in the monitor ROM to copy the RAM disk
contents to and from the on-board 8MB SPI flash for persistent storage.
Once I had CP/M working I wrote an MP/M-II XIOS and got MP/M-II running. I
added a second UART and a simple interval timer and got interrupt driven
serial and pre-emptive multitasking working. I was really very impressed with
MP/M-II, I had not realised that these Z80 systems could multitask and
support multiple concurrent users (and all before I was even born!)
Once I had that working I got a bit ambitious and decided to port UZI, Doug
Braun's 8-bit UNIX like operating system. There's little or no documentation
so this was harder than writing the BIOS/XIOS where there is a clear
specification of what you need to do. I started with the P112 UZI-180 port
which uses the Hi-Tech C/PM C compiler. I ported the kernel to ANSI C and
made it build with the modern SDCC compiler, added drivers for my MMU, UART,
RAM disk, an SD card interface, and removed the Z180 instructions. I modified
the context switching mechanism to make it much more efficient by eliminating
all the memory copying. I also increased the amount of memory available to
processes -- a native UZI process can use up to 0xF900 (62.25KB) and a CP/M
process running under emulation has a 60KB TPA (larger than under real CP/M!)
I've not yet come up with a good solution for building UZI userspace
applications. I'd really like to write a Z-machine interpreter that runs
under UZI and can play the newer Z-machine formats (eg versions 5 and 8).
These require much more memory than fits in the 64K logical address space and
I can't come up with a clean way to expose the MMU's abilities through
standard UNIX system calls (neither mmap nor shared memory segments are a
good match).
Anyway, if anyone else has a Papilio Pro and is interested, please let me
know and I'll send you an FPGA bit stream so you can have a play yourself. If
there's interest I'd consider tidying up the source code enough to share it.
I've recently ordered one of John's Mark IV SBC boards and I'm hoping to
assemble that and then port my UZI kernel over -- that software was the most
work and I'd love to see it used. The Z180 MMU is less capable than my
synthetic hardware but I believe I've come up with a workable scheme.
Has anyone considered building a ECB board to carry an FPGA? You'd need some
tri-state buffers that handle level shifting down to 3.3V on the FPGA side,
but the right design would allow you to implement either peripherals or a CPU
on the FPGA (or both). FPGAs are generally in TQFP or BGA packages but a few
rows of 0.1" header pins would allow you to plug on a Papilio or Pipistrello
daughterboard. I do realise this is somewhat contrary to the N8VEM philosophy
of using original parts, but it might be fun to have one board that can be
reconfigured quickly to work as a Z80, 68K, MIPS, ARM or even a bespoke CPU
board that talks to your N8VEM peripherals.
Hope someone found something interesting buried in there.
Will
_________________________________________________________________________
William R Sowerbutts will-***@public.gmane.org
"Carpe post meridiem" http://sowerbutts.com
main(){char*s=">#=0> ^#X@#@^7=",c=0,m;for(;c<15;c++)for
(m=-1;m<7;putchar(m++/6&c%3/2?10:s[c]-31&1<<m?42:32));}
This is not strictly N8VEM related, but I've seen your thread about FPGA
systems on chip ... I built a similar system last year and I thought some of
you may be interested. I should start by saying this was only my second FPGA
project and was also my first attempt at writing code for a Z80, so the
quality of my code is probably not very good! The machine works well though
and I've had a great deal of fun with it.
I built my system using a Papilio Pro FPGA board:
http://papilio.cc/index.php?n=Papilio.PapilioPro
The PPro is a great board and I thoroughly recommend it. It has a Xilinx
Spartan 6 LX9 FPGA, 8MB of SDRAM, 8MB of SPI flash memory, and an FTDI USB
interface that is used to connect JTAG and UART to a host PC. My main
criticism of the board is that the serial link via the UART has no flow
control lines hooked up to the FPGA -- the FTDI has a deep FIFO (several KB)
and you can build a receive FIFO inside the FPGA, but at high data rates you
will inevitably overflow these eventually.
I also have a Pipistrello FPGA board which is based on the same Papilio form
factor. It has the UART flow-control hooked up, has a larger and faster DDR
SDRAM chip as well as a much larger LX45 FPGA. You can use the Xilinx on-chip
memory controller block to drive the DDR SDRAM chip. I've not got had time to
get this board working yet.
The Papilio form factor is very hardware-hacker friendly; all the IO pins are
broken out on 0.1" headers so you can easily pop a bit of veroboard on top
and solder up a MAX3232 or SD card or LEDs or whatever.
I started my Z80 system with the open-source T80 CPU core, a UART that I'd
written for an earlier project, and some of the on-chip block SRAM for
memory. I then wrote a simple monitor program for it.
Xilinx have a "data2mem" tool that you can use to quickly replace the data
loaded into a block RAM without resynthesising the FPGA design (which is
rather slow), so you can assemble your monitor program, use data2mem to have
the code loaded into block RAM, then reprogram the FPGA which will run the
code when it comes out of reset. This affords a very quick edit/compile/test
cycle (about 3 seconds from hitting enter to running code).
Once I had a monitor program running I imported Mike Field's brilliant SDRAM
controller to drive the 8MB SDRAM chip on the board:
http://hamsterworks.co.nz/mediawiki/index.php/Simple_SDRAM_Controller
This gave me access to far more memory than the Z80 could address, so I added
a 4K paged MMU to translate the 16-bit (64K) logical address space into a
26-bit (64MB) physical address space. Each 4KB logical page can be mapped
independently to any 4KB physical page. There's also what I call the "17th
page" which allows you to access physical memory addresses without mapping
them into the CPU address space -- it has a 26-bit pointer in the MMU and an
I/O port that translates I/O cycles into memory cycles, automatically
incrementing the pointer after each cycle so you can use the INIR instruction
with it to do block copies of unmapped physical memory to/from mapped memory.
The SDRAM takes on the order of 10 cycles to supply data after a read request
so I implemented a 16KB direct mapped cache using the on-chip block SRAM in
order to conceal this latency. This works very well. The FPGA block SRAM is
36-bits wide which allows for a 4-byte wide cache line plus 4 bits to
indicate the validity of each byte. You can also use it in a 9-bit wide mode
which turned out to be perfect for storing the cache address tags.
Debugging the cache was a pain. I ended up writing several programs to
exercise and test the memory in various ways; when I found a fault it often
took some head-scratching to determine if it was a bug in the hardware or the
software! This is doubly hard when the software is itself executing from
unreliable memory, so I added a 4K block of SRAM to the system use the MMU to
map that wherever I want and store the memory test program in there.
The Xilinx synthesis tools tell me my design is good for about 64MHz. I've
always run it at 128MHz without problems -- I think the critical timing paths
are probably on the address bus and so have an extra cycle to propagate.
I've not figured out how to tell the Xilinx tools about this nor how to
interpret their output to understand which are the critical paths. The Z80 is
rather fast at 128MHz and the cache all but eliminates the need for
wait states.
Once I had the hardware working I had a lot of fun writing software for it,
extending the hardware capabilities as the software grew more sophisticated.
I wrote a CP/M-2.2 BIOS and got CP/M running. There's so much RAM in the
system that I just used the top 6MB as three 2MB RAM disks, which hugely
simplified writing storage drivers (you just map the relevant page of RAM
disk into the address space, copy the data, and then map back the original
page -- this was before I implemented the "17th page" trick). I wrote SPI
master hardware and some routines in the monitor ROM to copy the RAM disk
contents to and from the on-board 8MB SPI flash for persistent storage.
Once I had CP/M working I wrote an MP/M-II XIOS and got MP/M-II running. I
added a second UART and a simple interval timer and got interrupt driven
serial and pre-emptive multitasking working. I was really very impressed with
MP/M-II, I had not realised that these Z80 systems could multitask and
support multiple concurrent users (and all before I was even born!)
Once I had that working I got a bit ambitious and decided to port UZI, Doug
Braun's 8-bit UNIX like operating system. There's little or no documentation
so this was harder than writing the BIOS/XIOS where there is a clear
specification of what you need to do. I started with the P112 UZI-180 port
which uses the Hi-Tech C/PM C compiler. I ported the kernel to ANSI C and
made it build with the modern SDCC compiler, added drivers for my MMU, UART,
RAM disk, an SD card interface, and removed the Z180 instructions. I modified
the context switching mechanism to make it much more efficient by eliminating
all the memory copying. I also increased the amount of memory available to
processes -- a native UZI process can use up to 0xF900 (62.25KB) and a CP/M
process running under emulation has a 60KB TPA (larger than under real CP/M!)
I've not yet come up with a good solution for building UZI userspace
applications. I'd really like to write a Z-machine interpreter that runs
under UZI and can play the newer Z-machine formats (eg versions 5 and 8).
These require much more memory than fits in the 64K logical address space and
I can't come up with a clean way to expose the MMU's abilities through
standard UNIX system calls (neither mmap nor shared memory segments are a
good match).
Anyway, if anyone else has a Papilio Pro and is interested, please let me
know and I'll send you an FPGA bit stream so you can have a play yourself. If
there's interest I'd consider tidying up the source code enough to share it.
I've recently ordered one of John's Mark IV SBC boards and I'm hoping to
assemble that and then port my UZI kernel over -- that software was the most
work and I'd love to see it used. The Z180 MMU is less capable than my
synthetic hardware but I believe I've come up with a workable scheme.
Has anyone considered building a ECB board to carry an FPGA? You'd need some
tri-state buffers that handle level shifting down to 3.3V on the FPGA side,
but the right design would allow you to implement either peripherals or a CPU
on the FPGA (or both). FPGAs are generally in TQFP or BGA packages but a few
rows of 0.1" header pins would allow you to plug on a Papilio or Pipistrello
daughterboard. I do realise this is somewhat contrary to the N8VEM philosophy
of using original parts, but it might be fun to have one board that can be
reconfigured quickly to work as a Z80, 68K, MIPS, ARM or even a bespoke CPU
board that talks to your N8VEM peripherals.
Hope someone found something interesting buried in there.
Will
_________________________________________________________________________
William R Sowerbutts will-***@public.gmane.org
"Carpe post meridiem" http://sowerbutts.com
main(){char*s=">#=0> ^#X@#@^7=",c=0,m;for(;c<15;c++)for
(m=-1;m<7;putchar(m++/6&c%3/2?10:s[c]-31&1<<m?42:32));}