Technical Writer II

Python’s struct module converts between Python values and packed binary bytes using format strings that describe a fixed layout similar to C structs. It solves the problem of matching exact byte layouts for network protocols, binary file formats, and interoperability with C code. Binary packing matters whenever wire formats or on-disk records must agree with a specification byte for byte. By the end of this tutorial, you pack and unpack binary data, control byte order with format prefixes, reuse compiled formats with the Struct class, write into and read from buffers with pack_into and unpack_from, and spot common errors before they corrupt data.
struct.pack() returns a bytes object that contains the binary representation described by the format string.struct.unpack() reads from a bytes-like buffer and returns a tuple of Python values.struct.pack_into() writes into an existing writable buffer, and it uses an explicit offset to choose where to write.@, =, <, >, !) change how integers and floats are encoded, and mismatches can silently corrupt data.struct.Struct compiles the format once, which saves work when you pack or unpack many records with the same format.struct.calcsize() tells you how many bytes a given format requires.The struct module is part of the Python standard library. It converts Python values into packed binary bytes and back, using format strings that describe a fixed memory layout similar to C structs. Use it when a protocol, file format, or C interface requires exact byte widths and positions.
In production code, struct is a good fit when your protocol or file format is described in terms of fixed-width integers, floats, and fixed-length byte fields, and when you need to convert them deterministically. It is not designed for variable-length fields, optional fields, or deeply nested layouts. For those cases, see the construct library or protobuf in the comparison table later in this tutorial.
The authoritative reference is the Python documentation for struct.
A format string tells struct how many bytes to allocate for each field and what Python type to map it to. It combines an optional byte order prefix with one or more type codes and optional repeat counts. The order of codes defines the on-wire layout.
Common format characters grouped by category:
| Category | Code | C Type | Python Type | Standard Size |
|---|---|---|---|---|
| Boolean | ? |
_Bool |
bool | 1 byte |
| Integer | b |
signed char |
int | 1 byte |
| Integer | B |
unsigned char |
int | 1 byte |
| Integer | h |
short |
int | 2 bytes |
| Integer | H |
unsigned short |
int | 2 bytes |
| Integer | i |
int |
int | 4 bytes |
| Integer | I |
unsigned int |
int | 4 bytes |
| Integer | q |
long long |
int | 8 bytes |
| Integer | Q |
unsigned long long |
int | 8 bytes |
| Float | f |
float |
float | 4 bytes |
| Float | d |
double |
float | 8 bytes |
| Bytes | s |
char[] |
bytes | 1 byte per char |
| Padding | x |
pad byte | no value | 1 byte |
Standard sizes apply when you use a byte order prefix (<, >, !, =). Without a prefix, sizes are platform-native and may differ.
For the complete list, see Python struct format characters.
A few format-string rules that readers use often:
3H means three unsigned shorts.s formats bytes with an explicit length, for example 8s stores exactly eight bytes.x adds a padding byte that is skipped when packing and ignored when unpacking.calcsize(fmt) returns the number of bytes the format string will occupy.The s format requires extra attention. Unlike integer codes, 8s
packs exactly eight bytes as a single bytes value, not eight separate
values. You must encode a Python str to bytes before packing, and
pad or truncate to the declared length.
import struct
label = b'hello'
padded = label.ljust(8, b'\x00') # pad to exactly 8 bytes
packed = struct.pack('>8sI', padded, 42)
print('packed_hex', packed.hex())
name_raw, number = struct.unpack('>8sI', packed)
print('name', name_raw.rstrip(b'\x00'))
print('number', number)
packed_hex 68656c6c6f0000000000002a
name b'hello'
number 42
Notice that struct.unpack returns the full eight bytes including the
null padding. Call .rstrip(b'\x00') to recover the original value.
struct.pack(format, v1, v2, ...) takes Python values and returns a bytes object whose length always equals struct.calcsize(format). The number of values must match the number of fields in the format string.
The signature is:
struct.pack(format, v1, v2, ...) -> bytes
Example: pack three integers using signed short and signed long:
import struct
packed = struct.pack('>hhi', 5, 10, 15)
print(packed)
print(packed.hex())
print('size_bytes', struct.calcsize('>hhi'))
b'\x00\x05\x00\n\x00\x00\x00\x0f'
0005000a0000000f
size_bytes 8
In this example, the format string '>hhi' uses big-endian byte order, two signed shorts, and a signed int. The > prefix makes the output identical on any platform, which is important when you compare results against protocol specs.
If you are building a network protocol header, you typically combine struct.pack with socket send and receive code. For a full client/server example, see Python socket programming server-client.
When a format string mixes integers and a bytes field, the value count
must still match the number of format codes, with one value per code
except for s. The s code always consumes exactly one bytes argument
regardless of the declared length.
import struct
# Format: big-endian, uint16 version, 4-byte name, uint32 timestamp
fmt = '>H4sI'
print('expected_size', struct.calcsize(fmt))
packed = struct.pack(fmt, 1, b'node', 1700000000)
print('packed_hex', packed.hex())
expected_size 10
packed_hex 00016e6f6465655359c0
If calcsize returns a number you did not expect, check whether your
format uses native types without a prefix. Native types like l and i
follow platform alignment rules, which can add padding bytes between
fields. Switching to a prefixed format such as > or < gives you
fixed, predictable sizes on any machine.
struct.unpack(format, buffer) reads exactly struct.calcsize(format) bytes from the buffer and returns a tuple of Python values. The buffer must be exactly the right length, not shorter and not longer.
The signature is:
struct.unpack(format, buffer) -> tuple
Example: pack values, then unpack them back to Python objects:
import struct
fmt = '>hhi'
data = (5, 10, 15)
wire = struct.pack(fmt, *data)
values = struct.unpack(fmt, wire)
print('wire_hex', wire.hex())
print('values', values)
wire_hex 0005000a0000000f
values (5, 10, 15)
struct.unpack always returns a tuple, even when the format contains a single element. Code that expects a single scalar can still unpack the tuple with tuple unpacking, for example x, = struct.unpack('i', buf).
In a real stream or file, you rarely have a buffer that contains exactly one record. Use struct.calcsize to slice the right number of bytes before calling struct.unpack:
import struct
fmt = '>HH'
record_size = struct.calcsize(fmt) # 4 bytes per record
# Simulate a stream containing three back-to-back records.
stream = struct.pack(fmt, 1, 100) + struct.pack(fmt, 2, 200) + struct.pack(fmt, 3, 300)
offset = 0
while offset + record_size <= len(stream):
record = struct.unpack(fmt, stream[offset:offset + record_size])
print('record', record)
offset += record_size
record (1, 100)
record (2, 200)
record (3, 300)
This pattern works for binary files too. Open the file in 'rb' mode, read record_size bytes at a time, and stop when read() returns fewer bytes than record_size.
A byte order prefix at the start of a format string controls how multi-byte integers and floats are encoded. Without a prefix, struct uses native byte order, which varies by platform and makes output non-portable.
Binary protocols often define a single byte order for multi-byte fields. For example, PNG file headers use big-endian integers, and Windows BMP headers use little-endian integers.
Python struct supports five prefix characters:
| Prefix | Name | Byte Order | Size/Alignment |
|---|---|---|---|
@ |
Native with alignment | Native | Native alignment may add padding |
= |
Standard sizes, native order | Native | Standard sizes, no alignment padding |
< |
Little-endian | Little-endian | Standard sizes, no alignment padding |
> |
Big-endian | Big-endian | Standard sizes, no alignment padding |
! |
Network order | Big-endian | Standard sizes, no alignment padding |
To see how each prefix changes the bytes, pack a concrete integer:
import struct
value = 0x12345678
for prefix in ['@', '=', '<', '>', '!']:
packed = struct.pack(prefix + 'I', value)
print(prefix, packed.hex())
@ 78563412
= 78563412
< 78563412
> 12345678
! 12345678
On this example host, native byte order is little-endian. On a big-endian host, @ and = switch to big-endian, while <, >, and ! stay fixed.
Byte order mismatch between sender and receiver does not raise an exception, it can silently corrupt values.
struct.Struct(fmt) compiles a format string once and stores the result. Calling st.pack(...) or st.unpack(...) on the instance skips format parsing on every call, which matters in loops and high-throughput packet processing.
Side-by-side timing example:
import struct
import timeit
fmt = '>Ih'
data = (1, 2)
N = 200000
module_time = timeit.timeit(
'struct.pack(fmt, *data)',
number=N,
globals={'struct': struct, 'fmt': fmt, 'data': data},
)
st = struct.Struct(fmt)
struct_time = timeit.timeit(
'st.pack(*data)',
number=N,
globals={'st': st, 'data': data},
)
speedup = module_time / struct_time
print('module_level_s', round(module_time, 6))
print('Struct_instance_s', round(struct_time, 6))
print('speedup_x', round(speedup, 2))
module_level_s 0.020752
Struct_instance_s 0.012069
speedup_x 1.72
Exact timings vary by CPU and Python build, but the pattern holds: Struct avoids repeated format parsing and repeated size computation.
struct.pack_into(fmt, buffer, offset, v1, ...) writes packed bytes into an existing writable buffer at a given byte offset. struct.unpack_from reads from the same kind of buffer at a given offset. Both work with bytearray and memoryview.
struct.pack_into writes into an existing buffer and returns None, so you typically inspect the buffer after packing.
Example using bytearray:
import struct
fmt = '>Ih' # unsigned int, signed short
st = struct.Struct(fmt)
buf = bytearray(st.size)
print('buf_len', len(buf))
st.pack_into(buf, 0, 0x12345678, -2)
print('buf_hex', buf.hex())
values = st.unpack_from(buf, 0)
print('unpacked', values)
buf_len 6
buf_hex 12345678fffe
unpacked (305419896, -2)
Python’s bytes, bytearray, and memoryview types represent immutable data, writable buffers, and zero-copy views. For background, see the Python data types tutorial.
Using offset to pack at a specific location in a larger buffer:
import struct
fmt = '>Ih'
st = struct.Struct(fmt)
big = bytearray(st.size + 4)
offset = 4
st.pack_into(big, offset, 0x01020304, 7)
print('big_hex', big.hex())
print('unpacked_at_offset', st.unpack_from(big, offset))
big_hex 00000000010203040007
unpacked_at_offset (16909060, 7)
If you already have a slice view, memoryview can avoid extra copies:
import struct
fmt = '<HH'
st = struct.Struct(fmt)
backing = bytearray(st.size + 6)
view = memoryview(backing)[3:] # view starts at offset three bytes
st.pack_into(view, 0, 0x1122, 0x3344)
out = st.unpack_from(view, 0)
print('backing_hex', backing.hex())
print('unpacked', out)
backing_hex 00000022114433000000
unpacked (4386, 13124)
Legacy code sometimes uses ctypes.create_string_buffer to allocate writable memory for pack_into. For pure-Python buffer handling, prefer bytearray and memoryview unless you are already integrating with a C ABI memory region.
The examples below cover three common production patterns: building a binary network header, writing and reading a binary file, and matching a C struct layout for interop.
The format '>BBHI' packs a four-field packet header in big-endian byte order. Pack it before sending and unpack it on receipt to recover the original field values.
import struct
version = 1
packet_type = 2
payload = b'hello'
length = len(payload) # payload length in bytes
checksum = sum(payload) & 0xFFFFFFFF
fmt = '>BBHI' # version (u8), type (u8), length (u16), checksum (u32)
header = struct.pack(fmt, version, packet_type, length, checksum)
print('packed_header_hex', header.hex())
# send simulation
sent = header
# receive simulation
received = sent
v, t, l, c = struct.unpack(fmt, received)
print('unpacked', (v, t, l, c))
packed_header_hex 0102000500000214
unpacked (1, 2, 5, 532)
When you build real client/server code around this, pair the packed header with socket send and receive calls. For patterns and buffering strategies, see Python socket programming server-client.
If you compute checksums using masking and shifts, Python’s bitwise operators are the same primitives you use for protocol-level arithmetic.
Open the file in binary mode ('wb' and 'rb'), write packed bytes directly, and read them back for unpacking. Context managers handle file closing even if an error occurs.
import struct
import tempfile
from pathlib import Path
values = (10, 20, 30, 40)
fmt = '>4I' # four unsigned ints in big-endian order
with tempfile.TemporaryDirectory() as d:
path = Path(d) / 'data.bin'
packed = struct.pack(fmt, *values)
print('packed_hex', packed.hex())
print('size_bytes', struct.calcsize(fmt))
with path.open('wb') as f:
f.write(packed)
with path.open('rb') as f:
raw = f.read()
unpacked = struct.unpack(fmt, raw)
print('unpacked', unpacked)
packed_hex 0000000a000000140000001e00000028
size_bytes 16
unpacked (10, 20, 30, 40)
The format '<IHBB' matches a fixed-width C record layout using little-endian byte order. Call struct.calcsize first to confirm your Python format string produces the same byte width as the C struct.
Assume a C struct like this:
// C (conceptual)
// uint32_t magic; // 4 bytes
// uint16_t code; // 2 bytes
// uint8_t flags; // 1 byte
// uint8_t reserved; // 1 byte
//
// Total size is 8 bytes on typical ABIs with this field order.
Python format string:
<IHBB means little-endian, uint32, uint16, uint8, uint8, with standard sizes and no extra alignment padding.
import struct
fmt = '<IHBB'
print('calcsize', struct.calcsize(fmt))
# Bytes received from the wire or a binary file.
wire = bytes.fromhex('ddccbbaa34120100')
print('wire_hex', wire.hex())
decoded = struct.unpack(fmt, wire)
print('decoded', decoded)
calcsize 8
wire_hex ddccbbaa34120100
decoded (2864434397, 4660, 1, 0)
If you need to match a C struct that includes compiler-inserted padding, check your C compiler ABI rules and consider using @ for native alignment in Python, or define explicit packing in C. The safest path is to confirm sizes with both sides and add tests that parse real fixtures.
Most struct errors fall into three categories: buffer length mismatches, wrong item counts, and silent byte order corruption. The third category raises no exception, which makes it the most dangerous in production.
struct.error: Unpack Requires a Buffer of X BytesError message: struct.error: unpack requires a buffer of 4 bytes
import struct
try:
struct.unpack('>I', b'\x00\x01') # 2 bytes, but '>I' needs 4 bytes
except struct.error as e:
print(e)
unpack requires a buffer of 4 bytes
Root cause: the input buffer is shorter than struct.calcsize(format). The unpack call must read exactly the number of bytes required by the format string.
Fix: slice or read exactly struct.calcsize(format) bytes, then pass that buffer to struct.unpack. When parsing stream data, buffer until you have enough bytes.
struct.error: Pack Expected X Items for Packing (Y Given)Error message: struct.error: pack expected 2 items for packing (got 1)
import struct
try:
struct.pack('>Ih', 123) # format requires two values
except struct.error as e:
print(e)
pack expected 2 items for packing (got 1)
Root cause: the number of Python values passed to struct.pack does not match the number of fields described by the format string.
Fix: provide one value per format element, or expand a tuple/list into positional arguments, for example struct.pack(fmt, *values).
Symptom: no exception is raised. The decoded values are silently wrong, which makes byte order mismatch the hardest of the three errors to catch in production.
import struct
value = 0x1234
wire = struct.pack('>H', value) # sender uses big-endian
# receiver mistakenly uses little-endian
decoded = struct.unpack('<H', wire)[0]
print('wire_hex', wire.hex())
print('decoded', decoded)
wire_hex 1234
decoded 13330
Root cause: the receiver interprets multi-byte fields with a different byte order prefix from the sender.
Fix: use the same prefix on both sides, for most network protocols choose > or !, and for host-to-host binary files decide whether you need native alignment or standard sizes.
struct is the right tool for fixed-width, C-compatible binary layouts. For more complex needs, Python offers several alternatives with different trade-offs.
| Approach | Use Case | Pros | Cons |
|---|---|---|---|
struct |
Fixed layouts, C-like binary packing and unpacking | Standard library, explicit format strings, predictable sizes with <>! and = |
Manual format strings, limited for highly nested or variable-length formats |
ctypes |
Interop with C APIs, mapping foreign memory | Can mirror C structs and call C functions | ABI and alignment differences by platform, more error-prone memory handling |
array module |
Homogeneous numeric arrays for I/O | Simple for one numeric type, easy to convert to bytes in some workflows | Not designed for mixed field layouts or padding rules |
construct library |
Declarative parsing and building of complex binary formats | Rich schema language for conditional and nested parsing | Extra dependency, parsing overhead compared to struct in tight loops |
protobuf |
Message serialization with schemas | Cross-language schema compatibility, versioning support | Not a byte-for-byte match to C struct layouts, uses variable-length encodings |
The questions below cover the most common points of confusion when working with struct.pack, struct.unpack, byte order prefixes, and buffer sizing.
Q: What does struct.pack return in Python?
A: struct.pack(format, ...) returns a bytes object containing the binary representation described by format. The returned length always matches struct.calcsize(format).
Q: What is the difference between struct.pack and struct.pack_into?
A: struct.pack creates and returns a new bytes object. struct.pack_into writes the packed values into an existing writable buffer, using an offset to choose where to write.
Q: Why does struct.unpack return a tuple?
A: struct.unpack returns a tuple because a format string can describe multiple fields. Even if the format has one field, returning a tuple keeps the API consistent.
Q: How do I handle byte order when using Python struct with network data?
A: Network protocols commonly use a fixed network byte order, usually big-endian. Use > or ! in your format strings for multi-byte fields, and ensure both sender and receiver use the same prefix.
Q: What is the difference between native byte order @ and standard byte order = in Python struct?
A: @ uses native byte order and native alignment, which can add padding between fields. = uses native byte order but standard sizes with no alignment padding, so layouts match the documented sizes rather than the platform ABI.
Q: How do I pack a string or bytes object using Python struct?
A: Use the s format with an explicit length, for example 8s. For str, encode to bytes first, then pass the resulting bytes to struct.pack, and make sure it is exactly the required length.
Q: When should I use the Struct class instead of module-level struct.pack and struct.unpack?
A: Use struct.Struct(fmt) when your code repeatedly packs or unpacks with the same format string. It compiles the format once, so repeated operations avoid re-parsing the format each call.
Q: What causes struct.error: unpack requires a buffer of X bytes and how do I fix it?
A: This error happens when the buffer you pass to struct.unpack is shorter than struct.calcsize(format). Fix it by buffering until you have enough bytes or by slicing the correct number of bytes before calling struct.unpack.
Most binary format problems reduce to three decisions: which fields to pack, what byte order the other side expects, and whether you are calling the same format string often enough to justify a Struct instance. Get those three right and struct rarely surprises you.
The format character table and the byte order prefix table in this tutorial are the two references you will return to most often. The errors section covers the cases that trip up even experienced users, particularly the silent byte order mismatch that produces wrong values with no exception.
From here, a natural next step is to put a packed header onto a real socket. For a working client and server you can extend with your own header format, see Python socket programming server-client.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Building future-ready infrastructure with Linux, Cloud, and DevOps. Full Stack Developer & System Administrator. Technical Writer @ DigitalOcean | GitHub Contributor | Passionate about Docker, PostgreSQL, and Open Source | Exploring NLP & AI-TensorFlow | Nailed over 50+ deployments across production environments.
I agree, the way the struct var prints is ridiculous. How did it get past the checking process?
- Pete
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.