ffi: allow bitfields based on endian types.

This is very nice: you can now declaratively match a structure that uses bitfields, and comes from a different endian system. * ffi.c (ffi_kind_t): Replace FFI_KIND_NUM type with FFI_KIND_INT, FFI_KIND_UINT and FFI_KIND_FLO. Now using the tft->kind value, we can distinguish integer and floating types, and determine signedness. (struct txr_ffi_type): New flag, bigendian, which is set to 1 for all big types that are big endian. This is not just the endian types like be-int32, but natural types like int, if the underlying platform is big endian. (swap_get32, swap_put32, swap_get64, swap_put64): New static functions. (ffi_generic_swap_fat_ubit_put, ffi_generic_swap_fat_sbit_put, ffi_generic_swap_fat_ubit_get, ffi_generic_swap_fat_sbit_get, ffi_generic_swap_ubit_put, ffi_generic_swap_sbit_put, ffi_generic_swap_ubit_get, ffi_generic_swap_sbit_get): New static functions. (ffi_make_type_builtin): On big endian, set the bigendian flag on every type. For the endian types, this will get adjusted as an additional construction step. (make_ffi_type_endian): New static function. Calls ffi_make_type_builtin, and then initializes the bigendian flag with the given value. (make_ffi_type_struct, make_ffi_type_union): Because bitfields can have endiannness now, we can no longer refer to the machine's own endianness in laying them out; we have to look at the mtft->bigendian value. Furthermore, we need an additional rule: when a bitfield member follows a member that has different endian, a new allocation unit has to begin. (ffi_type_compile): the sbit and ubit types must set the type to FFI_KIND_INT and FFI_KIND_UINT now. For the big operator, we can simplify the error checking: instead of exhaustively comparing the type to all the supported integer types, we just now check whether it is FFI_KIND_INT or FFI_KIND_UINT. Here, if we detect that an endian bitfield has opposite byte order from the machine, then we instantiate the bitfield with the ffi_generic_swap_* virtual functions. These perform the required byte swapping accesses to the bitfield storage cells, so then the bit field manipulation code just works using the local integer representation (shifting and masking). Of course, the shift amount depends on the endian; and that is calculated at type creation time in make_ffi_type_struct. (ffi_init_types): Replace FFI_KIND_NUM with the appropriate constant for each affected type. In some cases, we have to calculate whether to use the INT or UINT one, for the types whose signedness is not specified. We switch all the endian types to new constructor make_ffi_type_endian, passing the required value of the bigendian flag. * txr.1: Documented.
author: Kaz Kylheku <kaz@kylheku.com> 2022-05-22 09:05:15 -0700
committer: Kaz Kylheku <kaz@kylheku.com> 2022-05-22 09:05:15 -0700
commit: 16eb3d22a29911981371b98e83008f8741903cc8 (patch)
tree: 4f80b90d1026ec44fb06d1359b368589040a69b4 /txr.1
parent: 663b75b357e77bfc47e31830b4e49cf1b026b141 (diff)
download: txr-16eb3d22a29911981371b98e83008f8741903cc8.tar.gz
txr-16eb3d22a29911981371b98e83008f8741903cc8.tar.bz2
txr-16eb3d22a29911981371b98e83008f8741903cc8.zip
1 files changed, 78 insertions, 20 deletions
diff --git a/txr.1 b/txr.1
index 971a96fc..70904936 100644
--- a/txr.1
+++ b/txr.1
@@ -80867,25 +80867,7 @@ operator is more general than
 .code ubit
 and
 .codn sbit .
-It allows for bitfields based on integer units smaller than or equal to 
-.codn uint .
-
-The
-.meta type
-argument may be any of the types
-.codn char ,
-.codn short ,
-.codn int ,
-.codn uchar ,
-.codn ushort ,
-.codn uint ,
-.codn int8 ,
-.codn int16 ,
-.codn int32 ,
-.codn uint8 ,
-.code uint16
-and 
-.codn uint32 .
+It allows for bitfields based on on any integer type up to 64 bits wide.
 
 When the character types
 .code char
@@ -80895,7 +80877,7 @@ are used as the basis of bitfields, they convert integer values, not
 characters.
 In the case of
 .codn char ,
-the bitfield is signed. 
+the bitfield is signed.
 
 All remarks about
 .code ubit
@@ -80909,6 +80891,82 @@ Details about the algorithm by which bitfields are allocated within a structure
 are given in the paragraph below entitled
 .BR "Bitfield Allocation Rules" .
 
+Under the
+.code bit
+operator, the endian types such as
+.code be-int32
+or
+.code le-int16
+may also be used as the basis for bitfields.
+If
+.meta type
+is an endian type, the bitfield is then allocated in the same way that a
+bitfield of the corresponding ordinary type would be allocated on a target
+machine which has the byte order of that endian type.
+
+When a bitfield member follows a member which has a different byte order,
+the bitfield is placed into a new allocation cell. This is true even if
+the previous member has the same alignment.
+
+Note: the allocation of bits within a bitfield based on a byte storage
+cells also differs between different endian systems. However, the FFI
+type system does not offer one byte endian types such as
+.codn be-uint8 .
+The workaround is to switch to a wider type.
+
+Note: endian bitfields may be used to match the image of a C structure which
+contains bitfields, without having to conditionally define the FFI struct type
+differently based on whether the current machine is big or little endian.
+Conditionally defining a structure for two different byte orders adds
+verbiage to the program and is highly error-prone, since the bitfields
+change order within an allocation unit.
+
+For instance, on a big endian system, the definition of a  structure
+representing an IPv4 packet might begin like this:
+
+.verb
+  (struct ipv4-header
+    (ver (bit 4 uint16))
+    (ihl (bit 4 uint16))
+    (dscp (bit 6 uint16))
+    (ecn (bit 2 uint16))
+    (len uint16)
+    ...)
+.brev
+
+to port this to a little endian system, the programmer has to recognize
+that the first pair of fields is packed into one byte, and the next pair
+of fields into a second byte. The bytes stay in the same order, but
+the pairs are reversed:
+
+.verb
+  (struct ipv4-header
+    (ihl (bit 4 uint16)) ;; reversed pair
+    (ver (bit 4 uint16))
+    (ecn (bit 2 uint16)) ;; reversed pair
+    (dscp (bit 6 uint16))
+    (len be-uint16)
+    ...)
+.brev
+
+Endian bitfields allow this to be defined naturally. The IPv4 header
+is based on network byte order, which is big-endian, so big endian types
+are used. The little endian version above already uses
+.code be-uint16
+for the
+.meta len
+field. This just has to be done for the bitfields also:
+
+.verb
+  (struct ipv4-header
+    (ver (bit 4 be-uint16))
+    (ihl (bit 4 be-uint16))
+    (dscp (bit 6 be-uint16))
+    (ecn (bit 2 be-uint16))
+    (len be-uint16)
+    ...)
+.brev
+
 .coNP FFI types @ buf and @ buf-d
 .synb
 .mets ({buf | buf-d} << size )
author	Kaz Kylheku <kaz@kylheku.com>	2022-05-22 09:05:15 -0700
committer	Kaz Kylheku <kaz@kylheku.com>	2022-05-22 09:05:15 -0700
commit	16eb3d22a29911981371b98e83008f8741903cc8 (patch)
tree	4f80b90d1026ec44fb06d1359b368589040a69b4 /txr.1
parent	663b75b357e77bfc47e31830b4e49cf1b026b141 (diff)
download	txr-16eb3d22a29911981371b98e83008f8741903cc8.tar.gz txr-16eb3d22a29911981371b98e83008f8741903cc8.tar.bz2 txr-16eb3d22a29911981371b98e83008f8741903cc8.zip