Syslog Format: A Comprehensive Guide to Modern Event Logging

What is the Syslog Format?
The syslog format is a standard method for sending and storing log messages from devices, operating systems, applications and network appliances. It provides a lightweight, extensible framework that enables centralised collection and analysis of events across heterogeneous environments. In practice, the syslog format defines how a log message is structured so that receivers can parse, interpret and correlate information reliably. The term “syslog format” is used widely in documentation, vendor manuals and engineering discussions, and you will frequently see variations and refinements as technologies evolve. Understanding the core of the syslog format is essential for anyone responsible for monitoring, security operations or IT compliance.
A Brief History of Syslog and Its Formats
The roots of the syslog format stretch back to early network devices and Unix-inspired systems. The original model emerged as a simple, text-based protocol that could transport messages over UDP. This simplicity made syslog extremely popular, especially in large-scale networks where reliability could be traded for speed and broad compatibility. Over time, the need for richer metadata and structured data led to refinements, with the formalisation of modern specifications such as the syslog protocol defined in RFC 5424. Earlier drafts and implementations also highlighted the legacy format described in RFC 3164, which remains widely referenced for historical context and compatibility considerations. The evolution toward RFC 5424 and related standards has given administrators a more precise, machine-parseable representation while maintaining the flexibility the syslog format is known for.
The Core Elements of the Syslog Format
PRI: Priority, Facility and Severity
The PRI value is a composite integer that encodes both the facility and the severity of the event. It is enclosed in angle brackets at the start of the message, for example, <34>. The facility designates the general category of the source (kernel messages, mail system, system daemons, etc.), while the severity indicates the level of importance or impact (emergency, alert, critical, error, warning, notice, information, debug). Decoding PRI allows a log management system to apply filters, routing rules and escalation workflows automatically.
Timestamps: TIMESTAMP
In the syslog format, the TIMESTAMP field captures when the event occurred. Modern practice prefers a precise, ISO-like timestamp, typically in UTC with a trailing ‘Z’ to indicate Zulu time, or in an offset format that preserves temporal ordering across time zones. Consistency here is vital for correlation across disparate systems. In RFC 5424, the TIMESTAMP is required to be present and should follow a standard, machine-parseable representation. In older deployments, you may still encounter variable formats; plan to normalise timestamps during ingestion for reliable analytics.
Hostname: HOSTNAME
The HOSTNAME field identifies the originating device or host that generated the message. This could be a server, switch, router or a container instance. In large deployments, the accuracy and consistency of hostnames become a prerequisite for successful aggregation, deduplication and attribution.
App name: APP-NAME
APP-NAME records the name of the application or daemon that emitted the log. This is immensely helpful when a single host runs multiple services. A well-structured APP-NAME helps operators distinguish, for example, a security tool from a general system process without inspecting the message body.
Process ID: PROCID
PROCID usually contains a numeric identifier.
hage for cross-reference across reboots
Message ID: MSGID
MSGID provides a persistent identifier for the type of message. It is useful when correlating logs across systems and when implementing canonical event flows. While some implementations omit MSGID, including it can improve traceability and searchability in SIEM tools and log stores.
Structured Data: STRUCTURED-DATA
STRUCTURED-DATA is an optional field that allows a message to carry additional structured information in a machine-readable format. It can be used to attach contextual attributes such as application version, environment, correlation IDs or custom metrics. When present, it is enclosed in square brackets and can contain multiple data elements within a defined schema. The addition of structured data marks a significant improvement over pure free-text messages because it supports richer querying and automation.
Message Payload: MSG
The MSG field contains the human-readable portion of the log, describing the event or action that occurred. In the syslog format, MSG is the last field and may be of arbitrary length. To ensure compatibility with parsers and storage systems, many organisations limit the MSG to sensible lengths or implement truncation policies. Although MSG is readable by humans, well-designed log pipelines will also surface key attributes from the STRICT structure for automated analysis.
RFC 3164 vs RFC 5424: A Practical Comparison
Understanding the differences between the older RFC 3164 format and the newer RFC 5424 format is essential for effective log management. RFC 3164 used a more relaxed, free-form layout, relying on the old-school PRI portion and a text-first approach. RFC 5424 introduces a fixed structure, explicit versioning, and the structured data field. It also separates the header from the message content more clearly, enabling more precise parsing, filtering and enrichment. In practice, many organisations support both formats in mixed environments, but for new deployments, adopting RFC 5424 provides long-term benefits for consistency and tooling support.
Practical Examples of Syslog Format Messages
RFC 5424 style example
<165>1 2024-07-14T12:34:56.789Z myhost.example.co.uk sshd 1234 ID47 - [exampleSDID@32473 iut="4" eventSource="SSH" eventID="1012"] Accepted publickey for user "jbloggs" from 203.0.113.27
RFC 3164 style example
<34>Aug 12 17:01:23 myhost kernel: INFO: Device ready, interface eth0 up
Framing and Transport: RFC 6587
Network transmission of syslog messages is governed by RFC 6587, which defines two primary framing methods: octet counting and non-transparent framing. Octet counting prefixes each message with its length, making stream boundaries unambiguous over TCP. Non-transparent framing relies on delimiter-based separation, typically using a newline character. These framing rules are important when you deploy syslog over TCP or when you route logs through middleboxes that might coalesce or split messages. To maximise reliability, many modern deployments use TCP with octet counting, which makes the syslog format robust for high-volume, low-latency scenarios and simplifies downstream parsing.
Secure and Reliable Syslog Transport
Syslog messages can travel over UDP, TCP, TLS-encrypted TCP (often over port 6514) or even via secure transports like SSH or VPNs. UDP is lightweight but unreliable; TCP provides reliability and ordering at a modest performance cost; TLS ensures confidentiality and integrity of messages in transit. When security and compliance are priorities, encrypting the channel and authenticating the sender are standard best practices. The syslog format itself is orthogonal to transport; you can transport the same message in different ways, but you should consider the trade-offs of each option in your environment.
Extensions: JSON and Unicode in the Syslog Format
Tools and Libraries for Handling the Syslog Format
Syslog Servers and Forwarders
Integration with SIEM and Observability Tools
Best Practices for Deploying the Syslog Format in Organisations
Consistency Across Devices
Aim for consistent message formats across all devices and services. Standardise the timestamp format, the host naming conventions, and the use of STRUCTURED-DATA where possible. Consistency reduces the complexity of parsing rules and makes dashboards more intuitive for on-call engineers and security professionals.
Time Synchronisation and Time Zones
Prioritise accurate time synchronisation using NTP (Network Time Protocol) to ensure that timestamps are comparable across systems. Time drift can degrade the usefulness of correlations and the reliability of analytics. In cloud environments, consider converting to UTC in the exposure layer and maintaining only one canonical time reference in storage and dashboards.
Security and Compliance
Protect log data both in transit and at rest. Use TLS for transport, implement access controls for log stores, and enforce rotation and retention policies aligned with regulatory requirements. Carefully manage who can query or export structured data, as log data can contain sensitive information. Where feasible, sanitize or redact sensitive fields before long-term storage, and audit access to log data to maintain an auditable trail.
Retention, Archiving and Compliance
Define retention schedules that balance operational needs against storage costs and compliance obligations. For many organisations, raw logs are kept for a defined period for security investigations, followed by longer-term summaries or anonymised data. Automation helps enforce these policies and prevents administrative drift that could lead to non-compliance.
Monitoring and Alerting
Translate syslog format messages into actionable alerts. Leverage severity levels and structured data to drive rules that can automatically escalate incidents. A well-tuned alerting strategy reduces alarm fatigue and ensures that genuine issues are surfaced promptly to the right teams.
Common Pitfalls in Using the Syslog Format
Inconsistent Timestamp Handling
When devices generate timestamps in varying formats or without time zone information, correlation becomes error-prone. Normalise inputs at ingestion time and store canonical time representations to prevent misalignment between systems.
Overlooking Structured Data
Relying solely on MSG for context can hinder automation. Structured Data is designed for machine readability; neglecting it reduces the usefulness of logs for dashboards, alerts and analytics. Where possible, standardise common attributes as part of the structured-data payload.
Under- or Over-Scaling Log Infrastructure
Too few resources can lead to dropped messages and gaps in visibility, while excessive capacity without proper retention policies can inflate costs. Design the log pipeline to scale with workload, using partitioning, indexing strategies and tiered storage to maintain performance and cost-effectiveness.
Future Directions: The Syslog Format in Cloud-Native Environments
A Glossary of Key Terms
- Syslog format: The standard structure for log messages used across devices, applications and network equipment.
- RFC 5424: The modern specification for the syslog protocol, introducing versioning, structured data and a richer header.
- RFC 3164: An older reference model for syslog messages, still encountered for compatibility in legacy deployments.
- PRI: Priority value encoding facility and severity in a single integer.
- STRUCTURED-DATA: A field in RFC 5424 that carries additional contextual information in a machine-readable form.
- RFC 6587: Framing rules for transporting syslog messages over TCP, including octet counting.
- OID: An optional object identifier used within structured data to identify data elements.
- NTP: Network Time Protocol, used for clock synchronisation across devices.
- SIEM: Security Information and Event Management, a platform for aggregating, indexing and analysing logs for security and compliance.