On Cartridge widths:
--------------------

It's in MEMCON1.  There is a jumper Tom reads on startup, which is hard-wired to 8-bits.  That's how the 68K reads the 8-bit boot ROM. The 68K then reads address $800400 to determine the cart width.  There's a tricky part there:  The boot ROM first has to initialize DRAM and then copy itself there.  Once in DRAM, it can safely set MEMCON1 to the new ROM width -- because boot ROM becomes inaccessible in 32-bit mode.  The Memory Track uses $800400 to say 'I'm 8-bits wide', then once the Memory Track BIOS is copied into DRAM (at $2800 -- in reserved DRAM), it can set MEMCON1 to 32-bit mode.  At that point, it can read every 4th byte to access the Flash, which is mapped in the high word (only accessible in 32-bit mode).

Looks like you already knew how the Memory Track worked before I worked out the wiring. ;) Hmm.. so *all* carts start in 8-bit mode? I don't quite understand yet.

Well, yes -- but only for the very first read.  The very first cart read is a single byte at $800400 to determine ROM width and ROM speed.  Because of the way the address bus is wired, that read always reads from $800400, no matter what the actual ROM width is.  The boot code does a ton more stuff, then, onec safely in DRAM, it sets MEMCON1 to the ROM size and speed.  At that point the cart is accessed as an 8, 16, or 32-bit cart.


On hacking the boot for more speed:
-----------------------------------

Okay.  There's two tricks.  The first is that the Jaguar 
checks location $400 in ROM to see if it should skip the intro.  So, if 
you use TYPE AB (the universal signature that homebrew ROMs use), then 
you can flip bit 0 at $400 and it will still boot -- only with no 
startup logo. Only problem is, it takes 5 seconds to decrypt TYPE AB. 
So, for five seconds you see a black screen.  So I wanted to fix that, 
which lead to the second part.

The second trick involves hacking the encryption.  On 
startup, the 68K loads the GPU with an RSA decryption routine.  While 
the GPU works, the 68K, OP, DSP, and Blitter are all working together on 
the startup sequence.  Once the GPU writes '0x03d0dead' to GPU SRAM, 
(to the first address??) the 68K jumps to the cartridge startup 
vector. In other words, the GPU encryption check must be satisfied 
before a single line of 68K code runs.

So, I disassembled the GPU cart check to see how it works 
and see if it can be bypassed somehow.  The GPU reads these 65-byte 
chunks from ROM starting at $0.  Usually, there are 10 of these chunks 
(650 bytes).  Each chunk takes about a half second to decrypt.  Each 
65-byte chunk decrypts into 64-bytes.  So, 650 bytes get read from ROM 
and 640 bytes are stored in GPU SRAM.  Those 640 bytes contain GPU code, 
which is executed.

The decrypted GPU code contains an MD5 check that runs 
across the whole ROM.  That only takes a half second or so.  So, the 
only way to speed things up is to tell the GPU to decrypt fewer 65-byte 
chunks.  In fact, byte $0 contains the number of chunks to decrypt (as a 
negative number - the default $F6 means -10).  If you change that to 
'-1', the Jaguar boots in half a second -- but then, when it goes to run 
the 64-bytes of MD5 checking code, the GPU crashes.  This is because the 
other 576 bytes are missing.  But, it proves there's a way to boot fast.

I got to this point and then realized that the whole thing 
is absurdly weak.  If I can make the GPU crash, I can probably make it 
crash in a way that benefits me.  Obviously, the 65-bytes are weakly 
protected -- at _most_ there can only be a 8-bit checksum there (since 
the other 64-bytes are required to contain the encrypted data).  In 
fact, it turns out there is only 5-bits of checksum in those 65-bytes. 
One in 32 of those chunks will succeed.

To get that statistic I wrote a little 68K harness around 
the GPU code, and just added 1 to the encrypted block and ran it 
through. Because they use a sound encryption algorithm, changing 1 bit 
in the input changes roughly half the bits in the output -- it's very 
random.  So, that means it is very easy to make the GPU execute random 
garbage instead of the MD5 check -- 1 in 32 bit patterns will do it. 
All I needed was a garbage pattern that also did something useful...

It turns out that one of the registers has a pointer to the 
next encrypted chunk in ROM.  All you need is a branch or call opcode to 
that register, and the GPU will transfer control to your ROM routine. 
It turns out you only need to control 7 bits to do this -- the other 
bits are don't cares -- because they encode condition information -- the 
majority of which will work regardless of the particular flags.
NOTE! The above paragraph is incorrect - it was Kevv's first stab at
it. The cartridge address register is overwritten before the encrypted
code starts. Kevv ended up with a MOVEI/JUMP combination, though he was
very flexible on the address jumped to - anywhere in cart space would
work because of the way his board starts up.

So, at this point I realized I only had to try 4096 
combinations -- 2^(7+5) -- and the GPU would resume execution from ROM. 
At that point I can do anything I please, include activating the 68K, or 
copying my own code into the GPU and resuming inside the GPU.  Heck, the 
68K entry point doesn't even matter.

So I outfitted my 68K harness with the ability to check for 
the needed bit pattern in GPU RAM, and let it run overnight.  In the 
morning I had a 66-byte chunk that contained the right jump instruction 
followed by a whole bunch of random garbage.  And now I can boot my 
Jaguar in half a second.  :)

It's sort of silly since the encryption is ridiculously 
strong -- I mean, it's 512-bit RSA!  There is no way you could crack it, 
except for the fact that they _execute_ whatever you pass in, and with a 
dense 16-bit opcode format it is not hard to get some useful opcodes. 
;)

If they had chosen instead to encrypt only the MD5 sum, 
instead of the MD5 routine, the Jaguar would boot a lot quicker and it 
would have been uncrackable.  TYPE AB works by NOPing out a critical 
test in the MD5 test, so maybe they intended it to be crackable all 
along.

Okay, from my notes:  f6 at $0 is decimal -10, which means 
decode 10 65-byte blocks.  

Most of the rest of my notes are just observations on cute 
tricks they use, like if anything goes wrong they zero out the GPU and 
try to crash the 68K.  But I was able to disable interrupts in my 
harness ot prevent that.

Oh, it's 100% pure computational code.  However, it is 
_very_ tricky with the multiply-accumulates.  Whoever wrote it was some 
kind of genius.

Sorry, I got that backwards.  The 68K copies from 
e00256-e00526 to f032ec (Tom SRAM).
Also, it copies e00222-e00261 to f03000 -- that's the public 
key.  Finally, it starts execution at f032ec, with all registers zeroed.


It's not that hard to figure out!  All you need is a 
disassembler and some patience.  I have to admit I never figured out the 
RSA code itself.  It is really genius.  It's tight, it's dense, it's 
_fast_, it even has proper instruction pairing to work around JagRISC 
stalls. It essentially does 512b^512b mod 512b.  Just think about the 
size of 2^512 to the 2^512 power -- it's a wonder it can do those 
operations in .5 seconds each.  Atari probably wasted their best 
programmer on that routine. Imagine doing that kind of math when all you 
have is a 16bx16b multiplier.  ;)

I love the fact that you broke it, 
though - so you have 65 bytes that goes at the front of your ROM which 
basically, when the Jaguar decrypts it, makes the GPU jump to the next 
address on the cart?

Yep, exactly.  That actually turned out handy for other 
reasons.  I only have 4MB of Flash ROM.  If I used TYPE AB, my cart 
would start at address $802000 -- but, that location usually contains 
the last-played game.  This way, I can start at address $800100 --  
memory that is never used by any Jaguar cart since that's where the MD5 
hash check rests.  There's enough space down there for me to read the 
boot block out of the USB microcontroller.

Yeah, the low 8KB is reserved by Atari for the boot block. 
But, in fact, only $800400-$80040B and $800000-(however many 65-byte 
chunks you need) are actually used.  So, the rest of that area is 
playground for your bootloader.  The same thing applies to DRAM 
actually.  The low 16KB were reserved for the Alpine and RDB debugger 
stub.  Then, when the CD-ROM came out, part of the low 16KB was used for 
the CD BIOS and the Memory Track BIOS.  So, you can safely put stuff in 
that area of DRAM too.
