Important
Work In Progress
Resurrection attempt of the UNIX PDP-11 C Compiler from December 1972 (a.k.a. prestruct-c).
- PDP-11/45 (native floating point and multiplication/division support)
- UNIX V3 beta
- Origin at
0
- Virt11 (my own PDP-11/20 simulator; currently private) for debugging and testing the compiler
- UNIX V1 syscall simulation
- Origin at
040000
- Origin at
- Built-in interactive debugger
- Disassembly
- Breakpoints (exec, read, write)
- Register dump
- Instruction trace
- Register manipulation
- Flag manipulation
- Memory dump
- Focus on accuracy and determinism rather than speed
- Runs on Windows and Linux
- UNIX V1 syscall simulation
- Apout for compiling/assembling the compiler
- PDP-11/45 instructions
- Very fast, sufficiently accurate
- Supports virtually every PDP-11 binary
- Origin at
0
or040000
depending on the simulated UNIX version
- Origin at
- Highly flexible
- Written in the June 1972 (last1120c) syntax (i.e.,
char []
forchar
pointer) - Accepts a newer syntax (i.e.,
char *
forchar
pointer)- Therefore, it cannot compile itself
- Not entirely compatible with the June 1972 compiler either
- It uses negative integer constants in switch cases
- Missing all
.s
files- Some tables can be taken from other versions of the compiler
- Does not work properly if assembled with the origin set to
040000
- The binary is too big, and some buffers are located at addresses >
0100000
- Pointers being effectively negative causes the following logic from
liba.a/put.o/fl
to misbehave:mov r0,0f neg r0 add 4(r1),r0 ble 1f mov r0,0f+2
- Can be fixed by replacing the above code with:
mov r0,0f cmp 4(r1),r0 blos 1f neg r0 add 4(r1),r0 mov r0,0f+2
- The binary is too big, and some buffers are located at addresses >
- Should be assembled with origin
0
to simulate its natural habitat, but it doesn't work:as
built from source code leftover on the 's1' tape can assemble source files with the relocation constant defaulted to0
- However, the objects do not want to link, and I'm forced to switch back to
as
from the 's2' tape - The relocation constant can be manually changed to
0
- However, the
liba.a
andlibc.a
libraries hardcode it as040000
- The source code of
libc
is available, andlibc
can be rebuilt to use0
as the relocation constant - But the source code of
liba
is not available- The most important objects are
get.o
andput.o
- I have disassembled them and ensured the re-assembled binaries match the original
- They can be rebuilt with the relocation constant set to
0
- The most important objects are
- Running the executable after linking crashes:
- I believe some pointers still use the relocation constant of
040000
- Perhaps the link editor
ld
plays a role?
- I believe some pointers still use the relocation constant of
- Forced to use
040000
as the origin with patchedget.o
to fix the negative pointer bug - The assembler from the
s2
tape does not support the PDP-11/54MUL
instruction used byc0t.s
- Can be worked around by replacing:
with:mov _cval,r1 mul base,r1 add r0,r1
mov _cval,r1 mov base,(r4)+ mov r1,(r4) mov -(r4),r1 add r0,r1
- The June 1972 compiler does not support negative integer constants in switch cases. Therefore:
needs to be replaced with:
/* sort unmentioned */ case -2: cs[0] = 5; /* auto */
/* sort unmentioned */ case 0177776: /* -2 */ cs[0] = 5; /* auto */
- Since the origin is not
040000
and the code assumes it to be0
,c00.c
,c03.c
, andc10.c
need to be modified accordingly.
- Implement PDP-11/45 floating point and arithmetic instructions
- Implement UNIX V2 (and possibly V3) syscall simulation
- Switch environment and target to PDP-11/45 (currently the compiler generates PDP-11/20 code)
- Give origin
0
another try
A lot of changes have occurred, the biggest one probably being the way function calls are handled. For the following code:
func1()
{
func2();
}
func2() {}
The June 1972 compiler compiles it to something like this (pseudo-assembly):
.text
L0:
CALL [_func2]
RET
L1:
RET
.data
_func1:
.word L0
_func2:
.word L1
This compiler removes the indirection and performs calls/jumps (JSR
and JMP
) directly:
.text
func1:
CALL _func2
RET
func2:
RET
This effectively renders libc.a
from the 's2' tape incompatible. Luckily, the source code of libc is available on the 'last1120c' tape, and it can be modified to work with code generated by this compiler.
Another change is the pointer declaration syntax. This compiler uses the modern pointer declaration syntax (char *name;
instead of char name[];
).
Almost nothing works (and by "nothing", I mean it, nothing), but at least it can compile this program with the very short-lived struct declaration syntax:
struct foo (
char x;
int y;
char *z;
);
main(argc, argv)
char **argv;
{
struct foo bruh;
bruh.x = 'C';
bruh.y = 123;
bruh.z = "test";
printf("x = '%c', y = %d, z = \"%s\"\n", bruh.x, bruh.y, bruh.z);
}
If you rename the variable bruh
to something else, like bar
, it will throw an error saying Unimplemented pointer conversion
. I have no idea why. I've also never gotten struct pointers to work - it always complains about Illegal structure ref
when I try to use ->
. It also seems to accept .
on structure pointers (and does not actually dereference the pointer when accessing the members), so something is probably very wrong with referencing and dereferencing.
bin
- binarieslib
- C runtime and library modified to work with code generated by this compilerwith_float
- compiler binaries with floating point support enabledwithout_float
- compiler binaries with floating point support disabled (recommended)
samples
- sample.c
source files and the.s
files produced by the compilersrc_build
- source files for compiling using the toolchain from the 's2' tapesrc
- source files for compiling using a hypothetical UNIX V3 beta toolchain; no changes to original C codetools
- compiler and libraries used for compiling the compiler
c0
and c1
currently need to be built with different compilers and libraries. They're in the tools
directory. It is suggested that you build this compiler without floating point support because floating point support increases the chance of the compiler outputting invalid assembly code. Make sure you use the toolchain from the 's2' tape and source files from src_build
.
# c0 - COPY FILES FROM "tools/for_building_c0" TO "/usr/lib" cc -c c0[0123].c ______________________________________________________________________ # IF FLOAT SUPPORT EDIT c0t.s TO SET "fpp" TO "1" # ELSE EDIT c0t.s TO SET "fpp" TO "0" ______________________________________________________________________ as c0t.s; mv a.out c0t.o cc c0?.o; mv a.out c0 # c1 - COPY FILES FROM "tools/for_building_c1" TO "/usr/lib" cc -c c1[01].c as c1t.s; mv a.out c1t.o cvopt < cctab.s > cctab.i; as cctab.i; mv a.out cctab.o cvopt < efftab.s > efftab.i; as efftab.i; mv a.out efftab.o cvopt < fptab.s > fptab.i; as fptab.i; mv a.out fptab.o cvopt < regtab.s > regtab.i; as regtab.i; mv a.out regtab.o cvopt < sptab.s > sptab.i; as sptab.i; mv a.out sptab.o ______________________________________________________________________ # IF FLOAT SUPPORT cc c1?.o fptab.o efftab.o cctab.o sptab.o; mv a.out c1 # ELSE cc c1?.o regtab.o efftab.o cctab.o sptab.o; mv a.out c1 ______________________________________________________________________ # cleanup rm *.o *.i