



RASPRG                                                        C. Perkins
Internet-Draft                                     University of Glasgow
Intended status: Informational                                 I. Castro
Expires: 7 November 2026                 Queen Mary University of London
                                                             R. Yanagida
                                                            S. McQuistin
                                                University of St Andrews
                                                              6 May 2026


       Analysing Internet Standards Development Organisation Data
                  draft-perkins-analysing-sdo-data-00

Abstract

   This document outlines some issues to consider when studying data
   relating to the Internet standards development ecosystem.  It
   identifies observable components of standards development processes,
   proposes a taxonomy of possible measurements, and highlights
   methodological, interpretive, and ethical considerations.  It is
   intended to support a range of uses, including monitoring standards
   development organisations (SDOs), evaluating the evolution of
   technical work, understanding technology deployment, and informing
   community, leadership, and governance discussions.

   This document is submitted for consideration by the Research and
   Analysis of Standard-Setting Processes Research Group (RASPRG) in the
   IRTF.  It is not an IETF product and is not a standard.

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at
   https://csperkins.github.io/draft-analysing-sdo-data/draft-perkins-
   analysing-sdo-data.html.  Status information for this document may be
   found at https://datatracker.ietf.org/doc/draft-perkins-analysing-
   sdo-data/.

   Discussion of this document takes place on the RASPRG Research Group
   mailing list (mailto:rasprg@irtf.org), which is archived at
   https://mailarchive.ietf.org/arch/browse/rasprg/.  Subscribe at
   https://www.ietf.org/mailman/listinfo/rasprg/.

   Source for this draft and an issue tracker can be found at
   https://github.com/csperkins/draft-analysing-sdo-data.





Perkins, et al.          Expires 7 November 2026                [Page 1]

Internet-Draft        Analysing Internet Standards              May 2026


Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 7 November 2026.

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Standards Development as a Socio-Technical System . . . . . .   4
   3.  Analysing the IETF  . . . . . . . . . . . . . . . . . . . . .   7
     3.1.  Datatracker . . . . . . . . . . . . . . . . . . . . . . .   8
     3.2.  RFC Editor  . . . . . . . . . . . . . . . . . . . . . . .   9
     3.3.  Mailing List Archives . . . . . . . . . . . . . . . . . .   9
     3.4.  Session Recordings  . . . . . . . . . . . . . . . . . . .  10
     3.5.  Chat Archives . . . . . . . . . . . . . . . . . . . . . .  10
     3.6.  GitHub  . . . . . . . . . . . . . . . . . . . . . . . . .  10
   4.  Analysing Other SDOs  . . . . . . . . . . . . . . . . . . . .  10
     4.1.  Data Availability Across SDOs . . . . . . . . . . . . . .  11
     4.2.  Integrating Data Across SDOs  . . . . . . . . . . . . . .  11
   5.  Data Processing . . . . . . . . . . . . . . . . . . . . . . .  12
   6.  Ethics and Data Protection  . . . . . . . . . . . . . . . . .  13
   7.  Recommendations . . . . . . . . . . . . . . . . . . . . . . .  14
     7.1.  Recommendations for the IETF  . . . . . . . . . . . . . .  16
     7.2.  Recommendations for Researchers . . . . . . . . . . . . .  17
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  18



Perkins, et al.          Expires 7 November 2026                [Page 2]

Internet-Draft        Analysing Internet Standards              May 2026


   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  18
   10. Informative References  . . . . . . . . . . . . . . . . . . .  18
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  19
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  19

1.  Introduction

   Internet technologies are developed and standardised by a range of
   standards development organisations (SDOs), including the IETF, W3C,
   IEEE, 3GPP, ITU-T, and others.  The standards that these
   organisations produce underpin the interoperability and architectural
   evolution of the Internet and the Web.

   Understanding how standards are developed, including who participates
   in the standards process, what collaborations occur during the
   development of standards, how the process is organised and governed,
   and the technical outputs, can support analysis of standards
   ecosystems.  Such analysis can assist with monitoring standards
   development organisations, evaluating the evolution of technical
   work, understanding technology deployment, and ultimately be used to
   inform community leadership and governance discussions.

   This document outlines considerations for studying data from the
   Internet standards development ecosystem.  It aims to:

   *  identify observable components of the Internet standards
      development ecosystem;

   *  describe considerations for measuring and analysing the standards
      development process;

   *  provide a taxonomy of possible measurements and analytical
      approaches;

   *  highlight methodological, interpretive, and ethical
      considerations;

   *  illustrate the application of these methods to the IETF, given the
      availability of rich data about the IETF participants, documents,
      processes, and communication channels;

   *  discuss the relevance and limits of applying these methods to
      other SDOs and the extent to which differences in governance,
      transparency, and data availability affect such analysis; and

   *  encourage reproducible research practises and transparent
      analysis.




Perkins, et al.          Expires 7 November 2026                [Page 3]

Internet-Draft        Analysing Internet Standards              May 2026


   This document does not prescribe specific metrics, define evaluation
   criteria, or recommend approaches to comparative rankings of
   standards bodies, groups, or participants.

   *TODO*: Cite prior work.  This document currently does a very poor
   job of citing prior work.  This omission will be remedied in future
   versions.  While it is not intended that this document forms the
   basis for a comprehensive literature survey, if you know of relevant
   prior work that should be cited please contact the authors.

2.  Standards Development as a Socio-Technical System

   Internet standards development can be understood as a socio-technical
   system in which technical artefacts, human participants,
   organisational interests, and governance processes interact over
   time.  Standards do not emerge solely from technical design choices,
   nor solely from institutional processes; rather, they arise through
   structured collaboration among individuals and organisations
   operating within formal and informal rules.

   Technical outputs emerge from a socio-technical process in which
   engineering choices interact with expertise, incentives,
   organisational structures, review processes, historical precedent,
   deployment constraints, and the cultural norms and practices of the
   standards community.  At the same time, the organisational and
   cultural context is not fixed: governance structures, working
   practices, and community norms evolve together over time and these
   changes in turn shape future participation and technical decision-
   making.

   For analytical purposes, standards development ecosystems can be
   viewed as comprising several interacting components:

   *  *Participants:* Participants are the individuals who contribute to
      standards development.  They may include engineers, researchers,
      operators, implementers, academics, independent contributors,
      civil society representatives, policy specialists, and others with
      relevant expertise or interests.  Participation criteria differ
      across SDOs.  Some use open participation, while others structure
      participation through organisational- or state-based membership,
      sometimes with additional exceptions or parallel open mechanisms.

      Participation models affect standards development by shaping both
      who is able to contribute, and how they are permitted to
      contribute.  Open participation can broaden the pool of
      contributors and make it easier for individuals to join without
      prior institutional affiliation, which may increase diversity of
      experience and viewpoints.  At the same time, openness does not



Perkins, et al.          Expires 7 November 2026                [Page 4]

Internet-Draft        Analysing Internet Standards              May 2026


      eliminate all the barriers to participation.  Effective
      participation may still depend on having sufficient time, funding,
      employer support, travel resources, and familiarity with the
      processes, tools, and norms of the community.  Membership-based
      models may provide clearer institutional commitment and
      resourcing, but they can also limit participation to those acting
      through recognised organisations or membership categories.

   *  *Organisations:* Participants are often affiliated with
      organisations such as companies, academic institutions,
      governments, consultancies, or civil society groups.  These
      organisations may provide forms of support including funding,
      staff time, technical expertise, or implementation experience.

      The relationship between participants and organisations is not
      equally visible across SDOs.  In some models, participation is
      individual, and so any recorded affiliation may be incomplete, and
      reflect a specific contribution rather than the sustained view of
      the participant.  In other models, where individuals participate
      on behalf of a clearly indicated affiliation, the institutional
      link is clearer.

      Even where affiliations are recorded, they may not fully describe
      the organisational context.  A company may be a subsidiary of
      another company (or in the process of becoming so), and
      consultants or contractors may work for clients whose interests
      are not directly visible in participation records.

   *  *Technical Groups:* SDOs typically organise work through technical
      groups such as working groups, research groups, study group,
      committees, or similar bodies.  These groups define scope,
      coordinate discussion, and develop technical outputs.  They are
      not always organised as a single flat layer, with hierarchical and
      other structures in use.

      The number, names, and functions of these structures differ across
      organisations.  In some cases, they reflect administrative
      oversight or broad technical areas; in others, they distinguish
      between different forms of technical development.

   *  *Artefacts:* Standards processes generate artefacts such as
      drafts, specifications, recommendations, reports, agendas,
      minutes, presentations, issue trackers, and final published
      standards.  These artefacts provide an observable record of
      technical development.  Revision histories, references, and
      relationships between documents may help reveal aspects such as
      participation dynamics, design iteration, and the evolution of the
      underlying technologies subject to standardisation.



Perkins, et al.          Expires 7 November 2026                [Page 5]

Internet-Draft        Analysing Internet Standards              May 2026


      Different SDOs vary in how openly they make such information
      available and in how easily it can be accessed and reused.
      artefact availability can support the work of participants,
      researchers, and other observers, but collecting, maintaining,
      publishing, and organising this information also imposes costs on
      SDOs.

   *  *Collaboration Infrastructure:* Standards development requires
      communication among participants to propose work, discuss
      technical issues, review contributions, coordinate activity,
      resolve disagreements, and build support for possible outcomes.
      SDOs therefore rely on systems such as mailing lists, code
      repositories, and meetings to facilitate this debate.

      The mix of communication, collaboration, and coordination
      mechanisms differs across SDOs, often to support the other
      attributes described.

   *  *Governance Structures:* Standards bodies have formal governance
      structures, with charters specifying the scope of different
      activities, defined leadership roles, review and approval stages,
      appeals processes, voting rules, consensus procedures, and so on.
      These structures define how work is initiated, scoped, reviewed,
      approved, and contested.

      At the same time, influence is also exercised through reputation,
      recognised expertise, community norms, procedural familiarity, and
      control over agendas, drafting, or review capacity.  Governance
      structures therefore shape how decisions are made, how priorities
      are established, how disagreements are managed, and, ultimately,
      how influence is distributed within standards development.

   *  *Standards Implementation and Deployment:* Implementation usually
      occurs outside the formal standards process, and may be voluntary
      by interested parties or mandated by policy in certain
      jurisdictions.

      In many cases, publication of a standard does not by itself
      require implementation.  Adoption may therefore vary widely: some
      standards are widely deployed, while others see limited or no
      implementation.  Adoption may also be shaped by factors outside
      the standards process, including regulation, procurement, cost,
      and compatibility with existing systems.

      Data on implementation and operational use is often limited.






Perkins, et al.          Expires 7 November 2026                [Page 6]

Internet-Draft        Analysing Internet Standards              May 2026


   Measuring SDO activity is challenging.  Observable metrics such as
   publication counts, message volume, attendance figures, authorship,
   or leadership roles can provide useful evidence, but each captures
   only part of the standards process.  Analysis of artefacts and logs
   from the collaboration infrastructure (e.g., analysis of mailing list
   messages) can provide more detail and nuance, at the expense of
   additional complexity, but even these do not provide a complete view.

   There are several reasons for this.  One is that critical aspects of
   standards development are hard to observe directly.  Influence,
   agenda setting, informal coordination, negotiation, and the practical
   exercise of power and authority may not be well represented by any
   single metric, or group of metrics, and are extremely challenging to
   infer from collaboration infrastructure logs.

   Another reason is that the available data is often limited.  Data
   availability and quality vary across SDOs.  Different parts of the
   process are not equally observable, and even within a single SDO some
   information may be incomplete, difficult to access, inconsistently
   structured, or unavailable.

   Combining multiple data sources introduces additional challenges.
   Observations from different parts of the process may not share stable
   identifiers, identifiers may change over time, and the same entity
   may appear in different forms across records.  Voluntary
   declarations, non-standard terminology, and organisational changes
   such as mergers or acquisitions may further complicate linkage.

   Metrics, artefacts, and other data sources may also differ in
   accuracy, representativeness, and relevance.  Not all artefacts have
   the same significance, not all forms of participation have the same
   effect, and visible activity does not necessarily correspond to
   implementation, adoption, or wider impact.  Measures should therefore
   be interpreted cautiously and, where possible, considered alongside
   complementary indicators.

3.  Analysing the IETF

   IETF participation is open with no formal membership.  Individuals
   can participate by joining mailing lists, contributing to
   discussions, submitting Internet-Drafts, and attending meetings.
   Contributions ordinarily reflect the opinion of individual
   participants, and not necessarily their affiliation; exceptions to
   this norm exist for specific aspects such as draft authorship and
   intellectual property rights disclosures.






Perkins, et al.          Expires 7 November 2026                [Page 7]

Internet-Draft        Analysing Internet Standards              May 2026


   The IETF has a hierarchical group structure, with technical working
   groups (that have working group chairs) organised into distinct areas
   (that have area directors).

   Reflecting its open participation model, much of the IETF's processes
   are publicly observable through open records and dedicated APIs.
   Mailing lists are a central forum for working group discussion,
   alongside meetings; some groups also use externally hosted
   repositories, for example on GitHub, to support drafting and issue
   discussion.

3.1.  Datatracker

   The IETF Datatracker (https://datatracker.ietf.org/) is the main
   source of day-to-day and historical data about the operation of the
   IETF.  It can be accessed via the website or programmatically using a
   REST API and provides information about:

   *  Participants including names, email addresses, pronouns,
      biography, and photo, and external resources such as personal
      websites, GitHub usernames, and Orcid identifiers.  The
      Datatracker maintains a record of the different names and email
      addresses used by individuals.

   *  Artefacts such as RFCs, Internet-drafts, agendas, blue sheets,
      working group charters, conflict reviews, shepherd write-ups,
      liaison statements, minutes, and presentation slides, including:

      -  Metadata such as the title, name ("draft-ietf-..."), revision,
         date, state, and where appropriate abstract, working group, RFC
         number and publication stream, status on the standards track,
         area director, and document shepherd.

      -  Submissions (e.g., different revisions of internet-drafts) with
         document name, revision, date, title, abstract, authors, group,
         and metadata about documents the submission replaces.

      -  Authors with email address, affiliation, and country.

      -  Events such as state changes state, expiration, details of IESG
         processing, IETF last call, directorate reviews, IANA reviews,
         etc., with the document name, revision, date, and responsible
         person.

      -  Relationships including normative and informative references,
         and document replaced, updated, or obsoleted.





Perkins, et al.          Expires 7 November 2026                [Page 8]

Internet-Draft        Analysing Internet Standards              May 2026


   *  Working groups, research groups, area, directorates, and
      leadership bodies such as the IESG, IRSG, and IAB, including the
      group name and acronym, group state, relationships between groups
      (e.g., working groups are organised in areas), the mailing list,
      charter text, milestones, and who occupies key roles in the group.

   *  IESG processing, including ballot positions, the text of comments
      and discusses, and scheduling of the IESG review.

   *  Directorate membership and directorate reviews, including the
      document, reviewer, outcome, data, and the review text.

   *  Meetings, including both plenary and interim meetings, with
      venues, dates, and times, details of what groups met in what time
      slots, and registration and attendance data.

   *  IPR disclosures including the document that the IPR relates to,
      the person making disclosure, details of the patent, and licensing
      terms.

   The Datatracker has been developed over time, and this is reflected
   in the data that is available, with more recent data being
   significantly more complete than earlier data.  Datatracker profiles
   are only required for a subset of IETF activities (e.g., draft
   submission, meeting registration), and so a number of active
   participants do not have a profile.

3.2.  RFC Editor

   The RFC Editor makes the RFC index available in machine readable form
   at https://www.rfc-editor.org/rfc-index.xml.  The RFC index includes
   title, authors, publication date, status, abstract, publication
   stream, name of the precursor Internet-Draft, and the IETF area and
   working group that developed the RFC, if appropriate.  This
   information is also available in the IETF Datatracker.

   Information about RFC errata is available on the RFC Editor website
   at https://www.rfc-editor.org/errata.php.  This data is also
   available in machine readable form.

3.3.  Mailing List Archives

   The IETF maintains public mail archives at
   https://mailarchive.ietf.org/ that are also available in machine
   readable form via IMAP from imap.ietf.org.  The recent mail archives
   are essentially complete, but some historical lists that were not
   originally hosted on ietf.org are missing.  Spam emails have largely,
   but not entirely, been removed from the archive.  As of March 2026,



Perkins, et al.          Expires 7 November 2026                [Page 9]

Internet-Draft        Analysing Internet Standards              May 2026


   the IETF mail archive contains approximately 3 million messages from
   almost 1400 mailing lists, around 40GB of data, with some messages
   dating back to the late 1980s.

   The are significant data quality problems with older messages in the
   IETF mail archive, due to problems with the original messages rather
   than the archive, that make them difficult to process.

3.4.  Session Recordings

   The IETF makes video recordings of its plenary meetings available on
   YouTube (https://www.youtube.com/user/ietf).  Audio recordings of
   IETF sessions from IETF 49 through to IETF 106 are available at
   https://get.ietf.org/archive/audio.

3.5.  Chat Archives

   The IETF makes chat logs available.  Jabber was used prior to 2021,
   with archives at https://get.ietf.org/archive/jabber/. More recently,
   Zulip has been used accessible at zulip.ietf.org.

3.6.  GitHub

   Some IETF working groups, and some individuals, make extensive use of
   GitHub for document development and issue tracking.  The IETF does
   not maintain a complete list of GitHub repositories associated with
   its work.  The IETF Datatracker contains links to some repositories
   and user profiles.

   Using the GitHub API, the following information is available:

   *  Information about GitHub users that contribute (e.g., username,
      email address, and other biography information).

   *  Contributions and changes, by way of Git commits, made by those
      users to documents.

   *  Discussion that takes place through comments and issues.

4.  Analysing Other SDOs

   Standards relevant to the Internet and the Web are also developed
   within the W3C, 3GPP, ITU-T, and others.  Each organisation has its
   own governance model, participation structure, institutional culture,
   and data availability.  These differences affect both what can be
   observed, and how observations should be interpreted.





Perkins, et al.          Expires 7 November 2026               [Page 10]

Internet-Draft        Analysing Internet Standards              May 2026


4.1.  Data Availability Across SDOs

   SDOs vary considerably in terms of the data that they make publicly
   available about their activities, and in how easily that data can be
   accessed and processed.

   The W3C provides a REST API at https://api.w3.org, covering metadata
   about documents, participants, affiliations, and groups, and
   maintains a public mailing list archive.  W3C groups make extensive
   use of GitHub for document development and issue tracking.  The W3C
   operates under a membership model, in which participation is
   primarily through affiliated organisations.  This affects how data
   about participants and their contributions should be interpreted,
   particularly when being compared to data from the IETF and other SDOs
   with individual participation models.

   The ITU-T and 3GPP both operate under membership-based models where
   access to documents, meeting records, and contribution data is
   typically restricted to member organisations.  Some ITU-T
   Recommendations are made publicly available after publication, while
   the 3GPP makes its specifications available at https://www.3gpp.org/
   specifications.  The working documents, contributions, and meeting
   records are generally not accessible to non-members.

   Differences in data availability mean that the methods applicable to
   the IETF, where rich longitudinal data is publicly available, may not
   be replicable across all SDOs.  Any analyses should account for these
   availability differences.

4.2.  Integrating Data Across SDOs

   Efforts to understand the wider standardisation landscape requires
   combining data across multiple SDOs.

   SDOs do not share common identifiers for participants, organisations,
   documents, or other metadata.  An individual that participates across
   multiple SDOs may appear under different names, e-mail addresses, or
   usernames in the records of each SDO.  Resolving these identifies
   requires suitable entity resolution mechanisms, and the risk of both
   incorrect matches (where two unrelated entities are linked together)
   and missed matches (where one entity has multiple, separate records
   in each SDO).  The same risks apply to affiliations: companies may be
   recorded under different names, abbreviations, or subsidiary
   identities across SDOs.

   Standards developed within one organisation may reference, build
   upon, or be coordinated with work at another SDO, but these
   relationships are not captured in any shared record.  Reconstructing



Perkins, et al.          Expires 7 November 2026               [Page 11]

Internet-Draft        Analysing Internet Standards              May 2026


   these relationships requires either manual effort, or natural
   language processing of document content, introducing the risk of
   errors.

   SDOs operate on different timescales and with different process
   structures.  Comparing activity across organisations at a given point
   in time may not reflect equivalent stages of development.

   Finally, differences in governance and participation models affect
   which comparisons are meaningful.  Data analyses, and the
   interpretation of them, must consider that apparent differences
   between SDOs may reflect structural factors (e.g., open vs.
   membership-based participation) rather than substantive differences
   in behaviour or outcomes.

5.  Data Processing

   Significant processing effort is required to clean, normalise, and
   link data records before they can be processed.

   The same individual participant may appear across each of the data
   sources with different identifiers, including names, e-mail
   addresses, usernames.  These identifiers may change over time.
   Entity resolution (using exact and heuristic matching) is feasible in
   many instances, but requires careful validation to prevent the
   introduction of errors into later analyses.  Entity resolution across
   organisations is similarly challenging, where companies may be
   subsidiaries of another, might merge or be acquired, or, given the
   unstructured nature of the dataset, appear under different names (to
   illustrate the scope of the entity resolution problem note that, as
   of May 2026, there are 282 variants of the name "Huawei" in the IETF
   Datatracker).  Information external to the Datatracker, and other
   data sources, is often needed to process organisational data.

   Participants may have more than one affiliation, including across the
   lifetime of a particular contribution (e.g., an Internet-Draft).
   Affiliation data is only recorded for a subset of activities, and may
   need to be inferred (e.g., from corporate domain names) in other
   cases.  As a result, affiliation data, where recorded, indicates the
   participant's affiliation at moment in time for a particular
   contribution, making it difficult to form a continuous history.

   Document life cycles are non-linear, and documents might pass through
   multiple working groups, by replaced or updated by later drafts, and
   change authorship over time.






Perkins, et al.          Expires 7 November 2026               [Page 12]

Internet-Draft        Analysing Internet Standards              May 2026


   Working group leadership is difficult to reconstruct: knowing who
   chaired a working group during a particular period, or which area a
   given group belonged to at a given time, requires the reconstruction
   of a timeline from historical event records held in the Datatracker.
   These records can be incomplete or inconsistently formatted.

   E-mail metadata and message content presents a number of challenges.
   A significant number of messages contain malformed or archaic header
   fields that break widely used email processing libraries and need
   correction.  Mail clients perform the threading of messages in
   different ways, with the separation between new and quoted text
   becoming unclear.  Natural language processing of message content
   requires contextualisation, with informal conventions, technical
   vocabulary, and the use of acronyms (all of which may evolve over
   time) presenting challenges that are unique to the dataset.

   As noted, the quality of the dataset degrades significantly for
   historical records.  Data that was not gathered by the Datatracker at
   the time, or that has been subject to partial backfilling later, must
   be treated with caution, both in terms of data processing and later
   analyses.

6.  Ethics and Data Protection

   Data is made available by the IETF, and other Internet SDOs, subject
   to their particular privacy and data protection policies and terms of
   use.  For the IETF, these are described at https://www.ietf.org/
   privacy-statement/; other SDOs will have their own policies.

   The available data includes considerable amounts of personal data
   that is potentially sensitive and subject to legal restrictions on
   processing and use in many jurisdictions (e.g., the GDPR in Europe).
   Researchers must ensure that their use of such data conforms to any
   applicable regulations.  It is important to note that the regulations
   that apply to research use of such data may differ from those that
   apply to the IETF, or other SDOs, with regards to their use of the
   data as part of the standards process.














Perkins, et al.          Expires 7 November 2026               [Page 13]

Internet-Draft        Analysing Internet Standards              May 2026


   Researchers must ensure that their research, in particular research
   that involves personal data from the IETF or other SDOs, is conducted
   ethically and with respect for persons, in careful consideration of
   the risks and benefits of the work, taking care to ensure that those
   who bear the risk also gain some benefit, and with respect for the
   law and public interest.  Researchers should consult with their
   organisation's Institutional Review Board, Research Ethics Committee,
   or similar, prior to conducting research that might raise ethical
   concerns, and are referred to the guidance in the Menlo Report
   [MENLO], the Belmont Report [BELMONT], and the ACM Policy on Research
   Involving Human Participants and Subjects [ACM] for further
   discussion of issues around ethical conduct of research.

   Researchers are reminded that while data may be public, the
   implications of that data are not always well-known.  For example,
   data that can be collected from the IETF Datatracker makes it
   possible to derive measures of the effectiveness of individuals in
   certain roles that, if presented out of context, might be considered
   sensitive.  It is inappropriate to publish data about specific
   individuals without their explicit consent.

   Finally, we note that researchers must take care to avoid disruption
   to the Internet standards process.  In part, this requires that they
   consult with the operations staff in the IETF LLC, or other SDOs, to
   ensure their data access does not cause operational difficulties
   (e.g., overload of servers that might disrupt an ongoing meeting).
   More broadly, researchers should ensure that any results that might
   be considered sensitive or disruptive are responsibly disclosed to
   the affected parties prior to publication.  The effective operation
   of the Internet standards process directly affects critical global
   infrastructure, and researchers should be mindful of this when
   presenting results.

7.  Recommendations

   Analysis of standards development data is useful to support
   transparency and provide insight into the health, structure, and
   evolution of standards ecosystems, including patterns of
   participation, collaboration, concentration, and the development of
   technologies.  It can inform discussions within SDOs and provide
   indicators of how technical work progresses over time.  It can also
   inform broader Internet governance questions, such as how decision-
   making is structured, how participation is distributed, and the
   extent of centralisation in these processes [RFC9518], and can be
   useful to external stakeholders, including regulators, policy makers,
   and civil society, seeking to understand how standards are developed
   and governed.




Perkins, et al.          Expires 7 November 2026               [Page 14]

Internet-Draft        Analysing Internet Standards              May 2026


   Analysis of standards development is constrained by what can be
   observed.  Important aspects of the process such as informal
   discussion, trust, institutional memory, cultural norms, and the
   exercise of influence may be only partially visible.  In addition,
   the available data is often incomplete, inconsistently structured,
   and shaped by changes in tools and processes over time, with
   historical records in particular being sparse or unreliable.

   As a result, analyses based on these data provide only a partial view
   of the process.  Quantitative metrics such as message volume,
   authorship, participation counts, or leadership roles can be useful
   indicators, but do not directly capture influence, authority, or
   impact.  They should therefore be interpreted with care and in
   context, rather than in isolation.

   Where data is derived or reconstructed (e.g., via entity resolution,
   affiliation inference, or automated extraction) it is important to
   retain a clear link to the original sources.  The provenance of such
   transformations should be documented, and derived data should be
   distinguishable from primary records.  This allows results to be
   checked and, where necessary, corrected.

   SDOs can support analysis of their processes by ensuring that the
   data they produce remains consistent, well-structured, and accessible
   over time.  This includes maintaining clear, timestamped
   documentation of artefacts and processes, recording changes and their
   implications, and using consistent data formats and identifiers.
   Providing structured access to data, for example through stable and
   well-documented APIs can be especially helpful.  When introducing
   changes to tools, processes, or working practises, it is important to
   consider how these affect what is recorded and how it can be
   analysed.  Where changes introduce discontinuities these should be
   clearly documented, including their scope and implications, so that
   their impact on the data can be understood and accounted for in
   subsequent analysis.

   Comparisons across standards development organisations require
   particular care.  Differences in governance, participation models,
   and transparency affect both what is observable and how it should be
   interpreted.  Apparent differences between organisations may reflect
   these structural factors rather than substantive differences in
   behaviour or outcomes.









Perkins, et al.          Expires 7 November 2026               [Page 15]

Internet-Draft        Analysing Internet Standards              May 2026


   Finally, although much of the data used in this type of analysis is
   publicly available, its use still raises ethical questions.  Analyses
   can have implications for individuals and organisations, especially
   if results are presented without sufficient context.  Researchers
   should take care in how findings are reported, particularly where
   they relate to identifiable participants.

7.1.  Recommendations for the IETF

   *  *Preserving a centralised and stable data access:* The Datatracker
      provides a central interface for structured data about IETF
      activity.  Maintaining this role, including stable identifiers,
      consistent schemas, and well-documented APIs, supports
      reproducible and longitudinal analysis.  Where data is maintained
      across multiple systems, stable references to authoritative
      sources help ensure consistency and integration.

   *  *Data quality and consistency:* The data reflects changes in tools
      and practices over time, which can make it harder to interpret,
      especially for older records.  Common data such as events, roles,
      group metadata, and document states may be inconsistent across
      time.  Where possible, these differences should be made consistent
      or clearly documented.

   *  *Historical data and backfilling:* Historical data may be
      incomplete.  Where records can be reconstructed with confidence,
      backfilling can improve coverage.  Backfilled data should be
      clearly identified, and its provenance documented.

   *  *Provenance of derived data:* Where data is derived from primary
      sources (e.g., extraction from archival material), the
      relationship between source and derived data should be explicit.
      Original artefacts should be retained where possible, and derived
      records clearly distinguished to allow validation and correction.

   *  *Error reporting and correction:* Datasets will contain errors,
      particularly in historical or reconstructed records.  Providing a
      transparent mechanism for reporting and correcting errors, along
      with maintaining a record of changes, improves reliability.

   *  *Separation of primary and inferred data:* Some data useful for
      analysis (e.g., identity resolution, affiliation inference)
      involves interpretation.  Such data should be distinguishable from
      primary records, with clear documentation of how it was produced.

      -  *TODO:* is this done by the IETF, by the researchers, or both?





Perkins, et al.          Expires 7 November 2026               [Page 16]

Internet-Draft        Analysing Internet Standards              May 2026


   *  *Impact of process and tooling changes:* Changes to tools and
      working practises affect what is recorded and how it can be
      analysed.  Where such changes introduce differences in data
      structure or coverage (e.g., adoption of different collaboration
      platforms), these should be documented clearly, including their
      scope and implications, to preserve comparability across groups
      and over time.

7.2.  Recommendations for Researchers

   Analysis of standards development data requires careful handling of
   both the data and its interpretation.  The following practises can
   improve the robustness and reproducibility of such work:

   *  *Care in Datatracker use:* When using the Datatracker, it is
      preferable to download a local snapshot of the data, while
      respecting any access limits, and perform analysis on that copy.
      This avoids repeated queries to the live API.

   *  *Use versioned data snapshots:* The underlying datasets evolve
      over time.  Analyses should be based on well-defined snapshots
      rather than live data, so that results can be reproduced and
      compared.

   *  *Document data processing steps:* Significant processing is often
      required before analysis, including cleaning, normalisation, and
      entity resolution.  These steps can materially affect results and
      should be clearly documented, including any assumptions or
      heuristics used.

   *  *Handle identity and affiliation data with care:* Participants may
      appear under multiple identifiers, and affiliations may be
      incomplete, ambiguous, or change over time.  Methods used to
      resolve identities or infer affiliations should be validated where
      possible and treated as approximations.

   *  *Account for incomplete and inconsistent data:* Not all aspects of
      the standards process are equally observable, and available data
      may be incomplete or inconsistent, particularly for historical
      records.  Analyses should account for these limitations and avoid
      over-interpreting gaps or trends.

   *  *Be cautious in interpreting metrics:* Common metrics such as
      message volume, authorship, or participation counts do not
      directly capture influence, authority, or impact.  Results should
      be interpreted in context and, where possible, supported by
      complementary evidence.




Perkins, et al.          Expires 7 November 2026               [Page 17]

Internet-Draft        Analysing Internet Standards              May 2026


   *  *Consider the impact of tooling and process changes:* Changes in
      tools or working practises (e.g., use of different collaboration
      platforms) can affect what is recorded and how it is structured.
      These changes should be considered when interpreting longitudinal
      trends or comparing across groups.

   *  *Engage with the community:* Data alone provides an incomplete
      view of the standards process.  Engagement with participants or
      domain experts can help interpret results and identify factors
      that are not visible in the data.

   *  *Support reproducibility and reuse:* Where possible, researchers
      should share datasets, code, and methods, subject to applicable
      policies and privacy considerations.  This reduces duplication of
      effort and improves the reliability of results.

   *  *Contribute improvements where appropriate:* Effort spent cleaning
      or structuring data may be of broader value.  Where feasible,
      contributing corrections or improvements back to shared data
      sources can benefit the wider community.

   *  *Consider ethical implications:* As discussed in the Ethics and
      Data Protection section, analysis may have implications for
      individuals or organisations.  Care should be taken in how results
      are presented, particularly where they may be sensitive or open to
      misinterpretation.

8.  Security Considerations

   Research into the operation of the Internet standards development
   ecosystem does not directly affect the security of the Internet.
   Effective operation of the Internet standards process is, however,
   critical to the security of the network, and researchers studying the
   development of Internet standards must consider potential security
   implications of their results and ensure that any such implications
   are responsibly disclosed to the relevant SDO.  Examples might
   include, but are not limited to, research that discovers attempts to
   subvert or disrupt the operation of the standards process.

9.  IANA Considerations

   This document has no IANA actions.

10.  Informative References







Perkins, et al.          Expires 7 November 2026               [Page 18]

Internet-Draft        Analysing Internet Standards              May 2026


   [ACM]      ACM Publications Board, "ACM Publications Policy on
              Research Involving Human Participants and Subjects", n.d.,
              <https://www.acm.org/publications/policies/research-
              involving-human-participants-and-subjects>.

   [BELMONT]  National Commission for the Protection of Human Subjects
              of Biomedical and Behavioral Research, "The Belmont Report
              - Ethical Principles and Guidelines for the Protection of
              Human Subjects of Research", n.d.,
              <https://www.hhs.gov/ohrp/regulations-and-policy/belmont-
              report/>.

   [MENLO]    US Department of Homeland Security Science and Technology
              Directorate, "The Menlo Report - Ethical Principles
              Guiding Information and Communication Technology
              Research", August 2012,
              <https://www.dhs.gov/sites/default/files/publications/CSD-
              MenloPrinciplesCORE-20120803_1.pdf>.

   [RFC9518]  M. Nottingham, "Centralization, Decentralization, and
              Internet Standards", December 2023,
              <https://datatracker.ietf.org/doc/html/rfc9518>.

Acknowledgments

   This document builds on work funded, in part, by the UK Engineering
   and Physical Sciences Research Council under grants EP/S033564/1 and
   EP/S036075/1.

Authors' Addresses

   Colin Perkins
   University of Glasgow
   Email: csp@csperkins.org


   Ignacio Castro
   Queen Mary University of London
   Email: i.castro@qmul.ac.uk


   Ryo Yanagida
   University of St Andrews
   Email: ryo@htonl.net


   Stephen McQuistin
   University of St Andrews



Perkins, et al.          Expires 7 November 2026               [Page 19]

Internet-Draft        Analysing Internet Standards              May 2026


   Email: sjm55@st-andrews.ac.uk


















































Perkins, et al.          Expires 7 November 2026               [Page 20]
