Building a Basic TCP Protocol Parser in Go from Scratch
Min-jun Kim
Dev Intern · Leapcell

Introduction to TCP Protocol Parsing in Go
Understanding and interacting with network protocols is a fundamental skill for any software engineer. Among these, TCP (Transmission Control Protocol) stands as a cornerstone of the internet, enabling reliable, ordered, and error-checked delivery of data streams between applications. While Go's net
package provides high-level APIs for network communication, there are scenarios where dissecting TCP packets at a lower level becomes crucial. This could be for security analysis, debugging network issues, implementing custom network proxies, or simply gaining a deeper understanding of how data flows across the network. This article will guide you through the process of building a basic TCP protocol parser from scratch in Go, empowering you to peek into the raw bytes of TCP segments.
Essential Concepts for TCP Parsing
Before we dive into the code, let's briefly define some core concepts related to TCP and network packet parsing that will be relevant to our discussion:
- Ethernet Frame: The lowest layer of our journey. Data packets on a local network are encapsulated within Ethernet frames. These frames contain source and destination MAC addresses, and a type field indicating the next protocol (e.g., IPv4).
- IP Packet (Internet Protocol): Encapsulated within an Ethernet frame, an IP packet handles routing across networks. It contains source and destination IP addresses, and a protocol field indicating the next layer (e.g., TCP).
- TCP Segment (Transmission Control Protocol): Our primary focus. A TCP segment is encapsulated within an IP packet. It provides reliable, connection-oriented data transfer. Key fields include:
- Source Port / Destination Port: Identify the sending and receiving application.
- Sequence Number: Tracks the order of bytes in the data stream.
- Acknowledgement Number: Confirms receipt of data from the other end.
- Data Offset (Header Length): Specifies the length of the TCP header in 32-bit words.
- Flags: Control bits like SYN (synchronize), ACK (acknowledge), FIN (finish), PSH (push data), RST (reset), URG (urgent pointer significant).
- Window Size: Indicates the amount of data the receiver is willing to accept.
- Checksum: For error detection.
- Urgent Pointer: Indicates urgent data.
- Options: Optional fields that can extend the header.
- Payload: The actual application data.
- Endianness: The order in which bytes are stored in multi-byte data types (e.g., big-endian vs. little-endian). Network protocols typically use big-endian (network byte order).
- Byte Buffer: A common way to read and write bytes, often used for parsing binary data. Go's
bytes.Buffer
andencoding/binary
package are invaluable here.
Building a Simple TCP Parser
Our goal is to parse an incoming stream of bytes representing a raw TCP segment and extract its key header fields. We'll simulate receiving a raw TCP segment for simplicity, though in a real-world scenario, you'd typically capture these using libraries like gopacket
.
Let's start by defining a struct to hold the parsed TCP header information:
package main import ( "bytes" "encoding/binary" "fmt" "io" "net" ) // TCPHeader represents the structure of a TCP header type TCPHeader struct { SourcePort uint16 DestinationPort uint16 SequenceNumber uint32 Acknowledgement uint32 DataOffset uint8 // Upper 4 bits of the 8-bit field, multiplied by 4 gives header length Flags uint8 // Lower 6 bits of the 8-bit field, combined with 2 bits from DataOffset field WindowSize uint16 Checksum uint16 UrgentPointer uint16 // Options and Payload follow } // ParseTCPHeader takes a byte slice representing a TCP segment and attempts to parse its header. func ParseTCPHeader(data []byte) (*TCPHeader, []byte, error) { if len(data) < 20 { // Minimum TCP header length is 20 bytes return nil, nil, fmt.Errorf("tcp segment too short, expected at least 20 bytes, got %d", len(data)) } reader := bytes.NewReader(data) header := &TCPHeader{} // Source Port (2 bytes) if err := binary.Read(reader, binary.BigEndian, &header.SourcePort); err != nil { return nil, nil, fmt.Errorf("failed to read source port: %w", err) } // Destination Port (2 bytes) if err := binary.Read(reader, binary.BigEndian, &header.DestinationPort); err != nil { return nil, nil, fmt.Errorf("failed to read destination port: %w", err) } // Sequence Number (4 bytes) if err := binary.Read(reader, binary.BigEndian, &header.SequenceNumber); err != nil { return nil, nil, fmt.Errorf("failed to read sequence number: %w", err) } // Acknowledgment Number (4 bytes) if err := binary.Read(reader, binary.BigEndian, &header.Acknowledgement); err != nil { return nil, nil, fmt.Errorf("failed to read acknowledgment number: %w", err) } // Data Offset (4 bits) and Flags (6 bits) // These are packed into a single byte followed by another byte for flags. var offsetFlags uint16 if err := binary.Read(reader, binary.BigEndian, &offsetFlags); err != nil { return nil, nil, fmt.Errorf("failed to read data offset and flags: %w", err) } header.DataOffset = uint8((offsetFlags >> 12) * 4) // Get upper 4 bits and multiply by 4 for header length in bytes header.Flags = uint8(offsetFlags & 0x1FF) // Get lower 9 bits (including reserved bits) // Window Size (2 bytes) if err := binary.Read(reader, binary.BigEndian, &header.WindowSize); err != nil { return nil, nil, fmt.Errorf("failed to read window size: %w", err) } // Checksum (2 bytes) if err := binary.Read(reader, binary.BigEndian, &header.Checksum); err != nil { return nil, nil, fmt.Errorf("failed to read checksum: %w", err) } // Urgent Pointer (2 bytes) if err := binary.Read(reader, binary.BigEndian, &header.UrgentPointer); err != nil { return nil, nil, fmt.Errorf("failed to read urgent pointer: %w", err) } // Calculate header length and extract payload tcpHeaderLength := int(header.DataOffset) if tcpHeaderLength > len(data) { return nil, nil, fmt.Errorf("data offset (%d) indicates a header longer than segment length (%d)", tcpHeaderLength, len(data)) } payload := data[tcpHeaderLength:] return header, payload, nil }
Explanation of Header Parsing Logic
TCPHeader
Struct: We defineTCPHeader
to mirror the structure of a TCP segment header, usinguint16
anduint32
for appropriately sized fields.DataOffset
is given in 32-bit words, so we'll need to multiply it by 4 to get the length in bytes.ParseTCPHeader
Function:- It takes a
[]byte
as input, representing the raw TCP segment. - Minimum Length Check: A TCP header is at least 20 bytes long. We check this to prevent out-of-bounds errors.
bytes.NewReader
: This creates anio.Reader
from our byte slice, making it easy to read fixed-size data usingbinary.Read
.binary.Read
: This crucial function reads binary data from theio.Reader
and populates our struct fields.binary.BigEndian
ensures we're interpreting the bytes in network byte order.- Data Offset and Flags: This is a tricky part. The Data Offset (4 bits) and 6 TCP flags are packed together with 6 reserved bits. The first 4 bits constitute the
DataOffset
. The last 6 bits are flags. TheoffsetFlags
variable reads 2 bytes (16 bits) where theDataOffset
is the upper 4 bits and the flags are within the lower 9 bits. We mask and shift to extract them correctly. - Payload Extraction: Once the header is parsed, we use
header.DataOffset
(which we've already converted to bytes) to slice the original byte array and get the remainingpayload
.
- It takes a
Simulating a TCP Segment and Usage
Let's create a main
function to demonstrate how to use our parser. We'll handcraft a simple TCP segment for illustration purposes.
func main() { // A sample TCP segment (20 bytes header + 7 bytes payload "HELLO\r\n") // This is a SYN-ACK packet often seen after a SYN from client // Source Port: 12345 // Dest Port: 80 // Seq Num: 0x12345678 // Ack Num: 0x98765432 // Data Offset: 5 (20 bytes) // Flags: SYN (0x02), ACK (0x10) -> (0x02 | 0x10) = 0x12 // Window Size: 0xFFFF (65535) // Checksum: 0xAAAA (placeholder for this example) // Urgent Pointer: 0x0000 // Payload: "HELLO\r\n" rawTCPSegment := []byte{ 0x30, 0x39, // Source Port: 12345 (0x3039) 0x00, 0x50, // Dest Port: 80 (0x0050) 0x12, 0x34, 0x56, 0x78, // Sequence Number 0x98, 0x76, 0x54, 0x32, // Acknowledgment Number 0x50, 0x12, // Data Offset (5*4 = 20 bytes), Flags (SYN, ACK) -> 0x5012 where 0x5 is data offset and 0x012 are flags 0xFF, 0xFF, // Window Size 0xAA, 0xAA, // Checksum 0x00, 0x00, // Urgent Pointer // Payload starts here 'H', 'E', 'L', 'L', 'O', '\r', '\n', } header, payload, err := ParseTCPHeader(rawTCPSegment) if err != nil { fmt.Printf("Error parsing TCP header: %v\n", err) return } fmt.Println("--- TCP Header ---") fmt.Printf("Source Port: %d\n", header.SourcePort) fmt.Printf("Destination Port: %d\n", header.DestinationPort) fmt.Printf("Sequence Number: 0x%X\n", header.SequenceNumber) fmt.Printf("Acknowledgement Number: 0x%X\n", header.Acknowledgement) fmt.Printf("Header Length (bytes): %d\n", header.DataOffset) fmt.Printf("Flags: 0x%X\n", header.Flags) // Decode specific flags fmt.Printf(" SYN Flag: %t\n", (header.Flags&0x02) != 0) fmt.Printf(" ACK Flag: %t\n", (header.Flags&0x10) != 0) fmt.Printf(" PSH Flag: %t\n", (header.Flags&0x08) != 0) fmt.Printf(" RST Flag: %t\n", (header.Flags&0x04) != 0) fmt.Printf(" FIN Flag: %t\n", (header.Flags&0x01) != 0) fmt.Printf(" URG Flag: %t\n", (header.Flags&0x20) != 0) fmt.Printf("Window Size: %d\n", header.WindowSize) fmt.Printf("Checksum: 0x%X\n", header.Checksum) fmt.Printf("Urgent Pointer: %d\n", header.UrgentPointer) fmt.Printf("Payload (%d bytes): %s\n", len(payload), string(payload)) // Example with a different data offset (with options) // Let's assume options add 4 bytes, so Data Offset becomes 6 (24 bytes) rawTCPSegmentWithOptions := []byte{ 0xC0, 0x01, // Source Port: 49153 0x00, 0x50, // Dest Port: 80 0x00, 0x00, 0x00, 0x01, // Sequence Number 0x00, 0x00, 0x00, 0x01, // Acknowledgment Number 0x60, 0x12, // Data Offset (6*4 = 24 bytes), Flags (SYN, ACK) 0x04, 0x00, // Window Size 0x00, 0x00, // Checksum 0x00, 0x00, // Urgent Pointer 0x01, 0x01, 0x08, 0x0A, // Example TCP Option (NOP, NOP, Timestamps) 'A', 'B', 'C', } fmt.Println("\n--- TCP Header with Options ---") headerWithOptions, payloadWithOptions, err := ParseTCPHeader(rawTCPSegmentWithOptions) if err != nil { fmt.Printf("Error parsing TCP header with options: %v\n", err) return } fmt.Printf("Header Length (bytes): %d\n", headerWithOptions.DataOffset) fmt.Printf("Flags: 0x%X\n", headerWithOptions.Flags) fmt.Printf("Payload (%d bytes): %s\n", len(payloadWithOptions), string(payloadWithOptions)) }
When you run this main
function, you will see the extracted TCP header fields and the payload, demonstrating that our parser can correctly interpret the raw byte stream.
Applications and Further Enhancements
This basic parser is a starting point. Here are some ways it can be extended and its real-world applications:
- Full Packet Dissection: Integrate this TCP parser with an IP parser, and an Ethernet parser, to form a complete network packet dissector. Libraries like
gopacket
already do this efficiently. - Packet Capture and Analysis: Use this with
pcap
bindings (e.g.,github.com/google/gopacket/pcap
) to capture live network traffic and analyze TCP segments for debugging, security monitoring, or performance insights. - Custom Proxies/Firewalls: Implement rules based on TCP port numbers, flags, or even payload content for custom network filtering or routing.
- Stateful Protocol Analysis: Track TCP connection states (SYN, SYN-ACK, ACK, FIN) using the flags to understand connection lifecycle.
- Error Checking: Implement TCP checksum verification to ensure data integrity, although this is more complex as it involves a pseudo-header.
Conclusion
Building a TCP protocol parser from scratch in Go, even a basic one, is an excellent exercise for understanding how network protocols operate at a byte level. It demystifies the structure of TCP segments and provides a foundational skill for various networking tasks. While higher-level libraries often abstract these details, knowing how to interpret raw bytes empowers you to diagnose complex network issues and build highly specialized network applications. This simple parser demonstrates the power of Go's standard library for handling binary data and provides a stepping stone for more advanced network programming endeavors.