Unsigned arithmetic in strongly typed languages

How does subtraction for unsigned integer work in languages that dont allow invalid lvalues?

Would it not be possible to support subtraction?

1 Like

From what direction are you asking this question?

There are certainly languages that do not have unsigned integers at all; OCaml is one such. (I believe OCaml does have extension libraries with unsigned types.)

Assuming that your concern is underflow, the answer is frequenly that subtractive underflow is handled in the same manner as integer overflow; exactly how that is varies considerably from language to language. Even strongly typed languages will sometimes declare numeric overflows to have undefined behaviors; for example, they may guarantee that the result is of the same type as the operands and that usage of the result (for example) as an array index will never violate memory safety, but be perfectly happy to take 0 - 1 and yield 2^32-1.

I would say that the likely cases are (with examples where I know of them):

  • Exactly as described above; unsigned minus unsigned will be unsigned, but might have an algebraically non-sensical result (C++, Rust)
  • Underflow is permitted and may yield a non-sensical result, but this condition is reliably detectable with explicit checks by the programmer (Underflow is algebraically detectable)
  • Underflow yields a well-defined result (Java)
  • Underflow is detected at commission and triggers an error mechanism such as a trap or exception (Ada)
  • Unsigned integers are unsupported (OCaml)
  • (For languages with strong dynamic typing) Underflow causes integer promotion to a signed type (Lisps promote along the numeric tower, but I’m not aware of one with an unsigned type)

There are quite possibly other options that are not occurring to me right now.

2 Likes

Sure, I can make that gaurantee by compromising performance of all array accesses by apply a guard to all array accesses. It would nicer to gaurantee the range the index can ever have.

I guess I presumed strong typed languages promise to always have valid values for a particular type. If the ā€˜get-out’ is always well, its invalid so who knows, it makes strong typing a bit Meh..

I’ve always thought there has to be some mileage in compilers doing ā€˜type computation’ for expressions and generate meta-types with limited domains. eg if I have an integer and do:

a & 64

Compiler could create a meta-type for this expression for which the only values are 0 and 64 - and leverage that when emitting code.

That is very common in modern compilers, but it’s typically done at the optimization stage, not in the type system. There are certainly many languages that would allow you to express the type ā€œexactly one or exactly 64ā€, but in those languages it is typically a distinct type from any kind of integer that would be, for example, permitted to index an array.

1 Like

Sure, but synthesising a type as the RTL is generated rather than waiting until you do a bunch of transformations on the tree would be better perhaps. I just like the idea that a conditional block of code where the conditional is ā€œif (i >=0 && i <8)ā€ should make type of i an integer with a specific range. All the compilers I’ve ever used don’t do that.

Fortran is strongly typed but has no unsigned integers in its standard.

But the feature is regularly proposed, like in this proposal to the J3 comittee (2024-February-28):

https://j3-fortran.org/doc/year/24/24-116.txt

And it has been included in GFortran 15 as an experimental feature:

And there are long discussions about how it should function if included in the standard:

1 Like

Fortran IV was good, After that it became like every other standard a mess after the PDP 11 came out, because now machines had char, short,long signed and unsigned data types. Then the X86 with near and far. Nothing seems backward comparable after that.

. You had REAL and INTEGER what more do you need. Character data was unsigned. A standard needs to unchanging and respect all earlier versions and punched card encodings is my view on the matter. People today seem to forget FORTRAN was designed for 36 bit signed magnitude machine. Unsigned would make not able to run on a IBM 7090.
The last time I studied programing was in 1980 on a IBM 1130.
ONE WORD INTEGERS was a non standard option.

One way to make a type that is either 0 or 64 is to use an enum:

You can manually assign the constants values.

Pascal? Strongly typed but not inherently unsigned unless something like:

So…

var foo 0 … 100 ;

then

foo := 1 ;
foo := foo - 1 ; (* OK )
foo := foo -1 ; (
Run-time error? *)

The 2nd subtraction ought to throw a run-time error. (Although it’s been decades since I last used Pascal).

But to do that it needs to run on a system that can handle subtraction of whatever the native type is, then do a compare…

-Gordon

Try checking the other thread on Edison, they seem to be bit more modern than Pascal.

A piece missing here is: what does ā€œunsigned arithmeticā€ mean?
In C it has a specific meaning: modulo 2^n arithmetic with integers in the range 0..(2^n)-1. If so, emulating that in a language like Pascal or Algol 60 where there are only (signed) integers is a bit tricky but certainly doable. Presumably the emulation would use signed integers whose representation matches the unsigned value in question.
If you have a language where overflow is defined to wrap modulo 2^n, the job is nearly trivial (many operations are identical for signed and unsigned arithmetic). If the language says that overflow either is an exception or undefined, it gets harder.

C is a particularly ugly language in this respect, because it defines unsigned to wrap silently while signed overflow is undefined. Many program bugs result from this design error…

(You may know this, but for those who don’t…) This is specifically because C does not take a position on sign representation, and choosing a specific arithmetic signed over/underflow behavior would privilege whatever representation’s behavior was chosen.

Good point, but treating unsigned the way C does depends on a computer that does arithmetic that way, which either means a 2’s complement machine, or a machine that has separate unsigned arithmetic instructions.
For example, you can’t (easily) do C unsigned arithmetic on a CDC 6600, or an EL-X8, because one’s complement machines can’t easily treat UINT_MAX and 0 differently.
I suppose you could hack it by having unsigned ints limited to the same range as positive signed ints, i.e., don’t use the top bit in the word.

I have very little experience with one’s complement machines, but those instruction sets I am familiar with have separate signed and unsigned arithmetic operations. I agree that if a machine did not have this, C’s ordained behavior would be problematic. I’m guessing that by the time C was standardized, one’s complement machines without unsigned arithmetic were simply unusual enough that no one barked? Or perhaps there was too much extant C code that depended on overflow behaviors for pointers. The fact that the PDP-11, Interdata 7/32 and 8/32, and VAX were all two’s complement may have sealed this fate.

That’s interesting. The two machines I mentioned both have only signed arithmetic instructions (60 and 27 bits, respectively). There are some others I don’t know well (PDP-1, ARRA II, ARMAC); I don’t think those have two sets of integer instructions either.
C is later than a bunch of other high level languages (such as ALGOL), and it probably arrived around the time that one’s complement machines were no longer seen much. Perhaps enough to explain why C wanted to be agnostic, but uncommon enough that the C rule of unsigned arithmetic (or even the notion that there is such a thing) was untenable.
Another point may be that the PDP-11 explicitly accommodates signed and unsigned operations, not so much for the arithmetic (that, where it matters like div and mul, is signed) but in the conditional branches. And the address space of the PDP-11 is small enough that learning to use unsigned branches when operating on pointers is a lesson programmers learned early and often.

There are a bunch of other representational issues beyond the one or twos complement game that made signed maths hairy on the older machines. Several systems for example had a 16bit word with a sign bit at the top. However the signed maths double word instructions were effectively 31 bit and ignored the ā€œsignā€ bit of the lower word. Some of the mainframes were even stranger and had different instructions for loading signed and unsigned values let alone manipulating them.
Then of course there is hardware that faults on overflows which was often (IMHO rightfully) seen as the sane right and proper thing to do to avoid programmer errors producing rubbish.

A lot of older C is built around the reality of hardware at the time. Things like the rules on valid pointers emerge from segmented or object based architectures were merely creating a pointer to an invalid object is an offence in the eyes of the CPU.

There’s also a lot of stuff in C that people misunderstand entirely (like pointers) because the assume that a pointer is a memory address whereas it’s actually an object, offset pair (or for C functions just a handle - which really matters if you have overlays).

1 Like

I never thought about overlays, with C. Unix with the PDP 11 MMU let one use the full memory of the 11 for user programs, a important feature for the time. That may have made the 11 stand out compared to the other 16 bit computers that had mmu late in product development.
The fact the UNIX had modern file system, rather than relying on cards and magnetic tape let C become a
more versatile language as compared to Pascal that often compiled just to p-code.