About This Paper
A first draft of this paper was released for review
by members of the CNI Access Management list on March 28, 1998
and generated a great deal of electronic discussion within the
closed CNI-AUTHENTICATE mailing list. This was followed by a meeting
in Washington DC on April 5, 1998 to review and discuss the draft
paper and comments generated on the list up to that date. The
revision has also benefited from discussions at a Digital Library
Federation/National Science Foundation Workshop held in Washington
on April 6, 1998 on closely related issues. My thanks to all
who contributed.
This version, which incorporates many of the ideas
from this process, is being prepared for distribution at the Spring
CNI Task Force meeting in Washington DC, April 14-15; it is also
being placed on the CNI web site (www.cni.org) for wider dissemination.
Note that in some places time did not permit me to fully incorporate
earlier comments or to research questions that were identified,
and I have tried to indicate where changes will be made prior
to the preparation of the final version. The paper also still
needs some considerable editorial work, and I ask readers to be
forgiving of editorial problems. Comments are invited and should
be sent to
<cliff@cni.org>.
About 10 May, 1998, I will prepare
a final version of the white paper which will be placed on the
CNI web site.
Return to Contents
1.0 Introduction
As institutions implement networked information strategies
which call for sharing and licensing access to information resources
in the networked environment, authentication and access management
have emerged as major issues which threaten to impede progress.
While considerable work has been done over the last two decades
on authentication within institutions and, more recently, in support
of consumer-oriented electronic commerce on the Internet, a series
of new technical and policy issues emerge in the cross-organizational
authentication and access management context. This white paper,
which is being prepared by the Coalition for Networked Information
in conjunction with a large group of volunteer reviewers and contributors,
is intended to serve several purposes:
- To identify and scope the new issues that emerge
in the cross-organizational setting and to provide a framework
for analyzing them.
- To map out the various best-practice approaches
to solving these problems using existing and emerging technology
so that institutions and information providers can make informed
choices among the alternatives and consider how these choices
relate to institutional authentication and access management strategies.
- To provide a common vocabulary and framework
to assist in the development of licensing and resource-sharing
agreements, and to highlight technical and policy considerations
that need to be addressed as part of these business negotiations.
- To lay the foundation for possible follow-on
formal or de facto community standards development in access management.
If large scale use of networked information resources is to flourish,
we need to move away from the specialized case-by-case access
management systems in use today and towards a small number of
general approaches which will let institutionally-based access
management infrastructures interoperate with arbitrary resources.
Return to Contents
2.0 Defining The Cross-Organizational Access Management Problem
The basic cross-organizational access management
problem is exemplified by most licensing agreements for networked
information resources today; it also arises in situations where
institutions agree to share limited-access resources with other
institutions as part of consortia or other resource sharing collaborations.
In such an agreement, an institution -- a university, a school,
a public library, a corporation -- defines a user community which
has access to some network resource. This community is typically
large, numbering perhaps in the tens of thousands of individuals,
and membership may be volatile over time, reflecting for example
the characteristics of a student body. The operator of the network
resource, which may a web site, or a resource reached by other
protocols such as Telnet terminal emulation or the Z39.50 information
retrieval protocol needs to decide whether users seeking access
to the resource are actually members of the user community that
the licensee institution defined as part of the license agreement.
Note that the issue here is not how the licensee
defines the user community -- for example how a university might
define students, staff members and faculty (all of the problems
about alumni, part time and extension students, adjunct faculty,
affiliated medical staff and the like); it is assumed that the
institution and the resource operator have reached some satisfactory
resolution on this question. Rather, the issue is one of testing
or verifying that individuals are really a member of this community
according to pre-agreed criteria, of having the institution vouch
for or credential the individuals in some way that the resource
operator can understand. Such arrangements are often called "site"
licenses, but this term is really inaccurate; while physical presence
at a specific site may be one criteria for having access, a better
term is "group" license or "community"
license, emphasizing that the key consideration is membership
in some community, and that physical location is often not the
key membership criteria.
Progress in inter-organizational access management
will benefit everyone. To the extent that resource operators and
licensing institutions can agree on common methods for performing
this authentication and access management function, it greatly
facilitates both licensing and resource sharing by making it quick,
easy and inexpensive to implement business arrangements. It benefits
users by making their navigation through a network of resources
provided by different operators more seamless and less cumbersome.
The central challenge of cross-institutional access management
is not to set up barriers to access; it is to facilitate access
in a responsible fashion, recognizing the needs of all parties
involved in the access arrangements.
While this white paper will give some particular
emphasis to issues that arise in the higher education and library
communities (particularly at the policy level) the problem under
consideration here is very general, and in fact occurs in general
corporate licensing of networked information services, or cooperation
among business partners.
As we will see in the next section, not only are
there questions about how best to accomplish this technically,
there are also a series of intertwined policy and management considerations
which need to be considered.
The focus here is on group licenses that may be subject
to some additional constraints (for example concurrent user limits)
rather than on transactional models where individual users may
take actions to incur specific incremental costs back to the licensing
institution over and above base community licensing costs. Any
incremental cost transactional model will need to incorporate
at least two additional features: a set of user constraints that
become part of the attributes for each authenticated user and
which are made available to the resource operator, and a means
by which the resource operator can obtain permission for transactions
by passing a query back to the licensing institution. This involves
a much more complex trust, liability and business relationship
between resource operator and licensing institution, as well as
consideration of financial controls and a careful assessment of
security threats. It will not be considered further here.
Note that there are several other cross-organizational
authentication, authorization and access management issues which
are beyond the scope of this paper, including the authentication
of service providers and verifying the integrity and provenance
of information retrieved from networked resources.
2.1 Terminology and Definitions
Throughout the rest of this paper we'll use
the general terms "resource operator" to cover publishers,
web site operators, and other content providers (including libraries
and universities in their roles as providers of content), and
"licensee institution" to cover organizations such
as universities or public libraries that arrange for access to
resources on behalf of their user communities.
Authentication and authorization actually have very
specific meanings, though the two processes are often confounded,
and in practice are often not clearly distinguished. We will use
the term "access management" to describe broader
systems that may make use of both authentication and authorization
services in order to control use of a networked resource.
Authentication is the process where a network user
establishes a right to an identity -- in essence, the right to
use a name. There are a large number of techniques that may be
used to authenticate a user -- passwords, biometric techniques,
smart cards, certificates. Note that names need not correspond
to the usual names of individuals in the physical world. A user
may have the rights to use more than one name: we view this as
a central philosophical assumption in the cross-organizational
environment. There is a scope or authority problem associated
with names; in essence, when a user is authorized to use an identity
this is a statement that some organization has accepted the user's
right to that name. For authorization within an institution
this issue often isn't important, and in some schemes a
user may only have a single identity; for cross-organizational
applications such as those of interest here, this relativistic
character of identity is of critical importance. A user may have
rights to use identities established by multiple organizations
(such as universities and scholarly societies) and more than one
identity may figure in an access management decision. Users may
have to decide what identity to present to a resource: they may
have access because they are a member of a specific university's
community, or a member of a specific scholarly society, for example.
Making these choices will be a considerable burden on users,
much like trying to shop for the best discount rate on a service
that offers varying discounts to different membership and affinity
groups (corporate rate, senior citizen rate, weekly rate, government
rate, etc.).
A single, network-wide (not merely institution wide)
access management authority would simplify many processes by allowing
rights assigned to an individual by different organizations to
become attributes of a master name rather than having them embodied
in different names authorized by different organizations; yet
such a centralized identity system probably represents an unacceptable
concentration of power, as well as being technically impractical
at the scale we will ultimately need. It should be noted that
within the UK Athens project we can see a model of a rather centralized
authorization system which has been scaled successfully to quite
a large number of users, and which by virtue of its centralized
nature has allowed rapid progress in wide access to networked
information. The Athens experience and the factors -- technical,
social, cultural, and legal -- that have enabled it to work in
the UK call for very careful study as we consider approaches for
other nations such as the US.
A name or identity has attributes associated with
it. These may be demographic in nature -- for example, this identity
signifying a faculty member in engineering, or signifying a student
enrolled in a specific course -- or they may capture permissions
to use resources. Attributes may be bound closely to a name (for
example, in a certificate payload) or they may be stored in a
directory or other demographic database under a key corresponding
to the name. Attributes may change over time; for example, from
semester to semester the set of courses that a given identity
is associated with may well change. Just because some system
on a network has knowledge of a name does not necessarily imply
that it has access to attributes associated with that name. There
is a fine line between rights to names (authentication) and attributes;
for some purposes, simply knowing that a user has a right to a
name from a given authorizing authority may itself represent sufficient
information (an implicit attribute, if one wishes) that can support
access management decisions.
Authorization is the process of determining whether
an identity (plus a set of attributes associated with that identity)
is permitted to perform some action, such as accessing a resource.
Note that permission to perform an action does not guarantee
that the action can be performed; for example, a common practice
in cross-organizational licensing is to further limit access to
a maximum number of concurrent users from among an authorized
user community.
Note that authentication and authorization decisions
can be made at different points, by different organizations.
Some libraries are establishing consortia which involve
reciprocal borrowing and user-initiated interlibrary loan services;
in a real sense these consortia are developing what amounts to
a union or distributed shared patron file. One can view this as
moving beyond just common authentication and access management
to a system of shared access to a common directory structure for
user attributes, and a common definition of user attributes among
the consortium members. This is an example of a situation where
very rich attributes are available to each participant in the
consortium as they make authorization decisions; interlibrary
loan and reciprocal borrowing represent a much richer and more
nuanced set of actions than would be typical of a networked information
resource.
A subsection on models for access management,
discussing the locus of authorization decisions and trust relationships
between there resource operator and licensing institution, will
probably be added here in the next revision.
Return to Contents
3.0 Evaluation and Analysis Criteria
We will be examining a number of different proposed
solutions to the access management problem. Before describing
and analyzing these proposed solutions, this section considers
the various requirements that a viable solution needs to address.
Obviously, there are trade-offs which will need to be made among
the conflicting goals in the context of each specific resource
access arrangement, and institutions will have to make policy
choices about the relative importance of the various requirements.
3.1 Feasibility and Deployability
First and foremost, the authentication and access
management solution needs to work at a practical level. From the
user's perspective, it should facilitate access, minimizing
redundant authentication interactions and providing a single-signon,
user-friendly view of the array of available networked information
resources. It needs to scale; it must be feasible for institutions
to deploy and manage for large and dynamic populations of community
members. It needs to be sufficiently robust and simple so that
user support issues are tractable; for example, a forgotten password
should not be an intractable problem. It needs to be affordable.
From the resource operator viewpoint, a viable access
management system should not require a vast amount of ongoing
production and maintenance. Configuration to add a new licensing
institution should be simple, and ongoing maintenance of that
configuration should not call for large amounts of information
to be interchanged between resource operator and licensing institution
on an ongoing basis (such as file updates). Software parameter
changes -- not new software -- should be necessary to add additional
institutions. There should be a clean, simple, and well-defined
(standard) interface between resource operator and licensing institution.
A systems or network failure at one institution should not degrade
a resource operator's service to other licensing institutions.
Practical solutions are inextricably linked to the
installed base of software. Ideally, all of the software needed
to implement an authentication and access management solution
should be available either commercially or as free software. Good
solutions will leverage off of the installed technology base,
and also current investments in upgrading that technology base:
they should not be specific to libraries or even to higher education
if possible, at a mechanism level (though libraries or higher
educational institutions may use these mechanisms in conjunction
with policies that vary from those common in the corporate or
consumer markets). Most importantly, the software support that
end users require should be available in common packages -- such
as web browsers -- that are already part of the installed base.
Any solution that requires custom specialized software to be installed
on every potential user's desktop machine starts with a
severe handicap. Similarly, any solution requiring specialized
hardware, such as biometric systems or smart card readers, is
certainly not going to be feasible on a cross-institutional basis,
and while it might imaginably be workable within an institution's
internal authentication system, some other technique would be
needed to convey cross-organizational access management data.
Few resource providers will be willing to limit access to users
equipped with such specialized facilities.
Software isn't enough; there is also the question
of whether the user knows how to configure and employ it. For
example, current web browsers contain considerable support for
client-side certificates and proxies, but few users know how to
use these features. Education about an existing software base
is easier than first replacing or upgrading an installed software
base and then teaching users how to employ the new software,
but it's still a substantial issue.
Kerberos is an interesting case study of the feasibility
constraints. An institution could certainly make a successful
decision to deploy Kerberos as a local authentication system
by placing Kerberos support software on each user's workstation
(perhaps via a site license to a vendor); however, inter-realm
Kerberos is probably too intimate a connection between resource
operators and licensee institutions to be viable, and most resource
operators would also reject Kerberos as an inter-organizational
approach because of the requirements it places on end user systems
at institutions that were not using Kerberos for local authentication.
In the cases where Kerberos is being used for inter-organizational
resource sharing, I believe that one could argue that the participating
institutions (typically consortium members) have made commitments
to link their administrative and other support systems at a much
more sophisticated level than one would find in the typical resource
operator - licensing institution relationship and are coming more
to resemble a single "consortium institution" with
an internal (local) authentication system.
Any solution also needs to reflect current realities;
in particular, it must be able to recognize the need for a user
community member to access a resource both independent of his
or her physical location (for example, a user must be able to
connect to the internet via a commercial ISP, a mobile IP link,
or a cable television internet connection from home), and also
the need for people to access resources by virtue of their location
(for example, access may be granted to anyone who is physically
present in a library, whether or not they are actually members
of the licensee institutional community).
3.2 Authentication Strength
The solution needs to be reasonably secure. The resource
operator needs confidence that an attacker can't forge
a credential easily. All parties need confidence that credentials
cannot easily be stolen by eavesdropers on the net (for example,
through sniffer attacks), and that they cannot be stolen easily
from a user that exercises reasonable precautions. Also, systemic
compromise is a concern: this is a very real difference between
having an individual user's credentials compromised (in
which case they can be canceled and new ones issued) and having
the system as a whole compromised, which might call for reissuing
credentials to everybody in the user community.
Authentication strength is a somewhat subjective
question. For many of the approaches that we will discuss, strength
comes from the details of cryptographic algorithms and key lengths
used; but part lies also in overall system design and implementation
and in the realities of user behavior, and this can often be the
source of the largest number of vulnerabilities. Some level of
reason is called for here; most of the resources being access
controlled, while certainly valuable assets, do not represent
immanent dangers to public safety or national security if access
control is breached. An access management system needs to be complemented
by monitoring and other controls on the part of the resource operator
to limit the impact of a breach. Further, there are after-the-fact
legal remedies which can be applied to limit the damage caused
by such a breach.
The cryptographic technology underlying many access
management systems is legally sensitive on an international export
and import basis, and may also be constrained by various national
laws (though within the US, cryptographic technology can be employed
freely, at least today).This is important for several reasons:
resource access may cross national boundaries, and also because
members of an institution's user community may need to
access networked information resources when traveling outside
of their home nation. We will see international resource sharing
consortia, and also see institutions in one nation licensing access
to resources in other nations.
It should be noted that virtually any strong access
management system that incorporates general purpose cryptographic
services will be illegal for export since all strong cryptographic
implementations for general encryption/decryption are export controlled
in the United States under current laws governing trafficking
in arms. Note however that it may be possible for members of a
user community traveling abroad to export cryptographic software
for temporary personal use under some specific limitations; depending
on where they are traveling it may or may not be legal for them
to use it under the laws of the country they are in at the time.
Matters are more complex than they may seem, however, because
US export control laws are mostly concerned with cryptography
that can support encryption (for confidentiality or concealment);
export licensing of systems specifically for authentication or
digital signatures which do not serve dual use as encryption systems
has been much less of a problem. Consideration of the legality
of developing, importing, exporting, and operating of access management
systems outside the US needs to be analyzed on a country-by-country
basis; laws vary considerably.
3.3 Granularity and Extensibility
There is a need for fine-grained access control where
institutions want to limit resource access to only individuals
registered for a specific class; this arises in electronic reserves
and distance education contexts, especially when a class may be
offered to students at multiple institutions. Other variations
are also possible: limiting access to law students, to faculty,
to graduate students and faculty in physics. This sort of fine-grained
access management is likely to be very complex, since there will
be great variation from institution to institution in how groups
of users are identified, named and specified. There is also
some overlap between fine-grained authentication and demographic
information that may be needed to generate management information
(discussed below).
Granularity of access has been one of the most controversial
issues in the discussions of the first draft of this paper and
related issues. Without arguing against the need for fine-grained
access control for some applications, I will summarize a few observations:
- At present, most access to network information
resources is not controlled on a fine-grained basis. There
is a very real danger that by accommodating all of the needs for
fine-grained access management into the basic access management
mechanisms we will produce a system that is too complex and costly
to see wide-spread implementation anytime soon.
- The information needed to support fine-grained
access management probably needs to be kept within institutions
for privacy reasons, and should be treated as attributes to an
identity rather than expressed as additional identities (in other
words, one should record that a user with a given identity happens
to be enrolled as a member of course X, rather than issuing the
user an identity as member-21-of-course-X). This also has implications
for the locus of authorization decisions for fine-grained access
management.
- In many -- but certainly not all -- cases, the
resources (such as electronic course reserves) that are subject
to fine-grained access management will be within an institution,
or within one of the institutions in a consortium of institutions
that are collaborating closely through shared courses or similar
projects. The case where an external commercial networked resource
will be access controlled to members of a small group like a class
will be rare.
- In some cases, the presence of fine grained access
management mechanisms may encourage irrational license economics.
For example: suppose there is an electronic journal that prices
based on the number of people that have access, rather than on
the number of people that actually use it. This would encourage
an institution to define a fine-grained group of authorized users
to this journal in order to save money. Such an arrangement is
complex and sets up barriers to access for the rest of the university
community. It would probably make more sense to initially price
access for the entire university community based on the approximate
number of people who will actually use the journal, and then if
it turns out a few more people are using it that were originally
expected, negotiate a slightly higher fee at license renewal rather
than defining a special access group. Revenues to the publisher
will be roughly the same in either case, but additional use would
be encouraged rather than discouraged. Note that of course this
reasoning doesn't apply in cases where there is wide demand
for a resource, and the licensing institution is making a policy
decision to deliberately and systematically limit access to the
resource to a specific closed user community; but this is, reviewers
believed, the exception rather than the common case.
3.4 Cross-Protocol Flexibility
Some approaches work for a wide range of applications
protocols that might be used for accessing information. Others
are designed to work only with specific protocols, or would require
the development of special software extensions or modifications
in order to support a full range of protocols. For our purposes,
HTTP-based Web access is the critical application protocol; we
will also consider Telnet terminal emulation and the Z39.50 information
retrieval protocol, although these are far less critical. The
main locus of concern here is the user's desktop machine,
which normally uses HTTP or Telnet to connect to machines that
are part of the system of networked information resources; Z39.50
is seldom used at the desktop today and finds its main application
in linking major networked information resources together.
Reviews of the earlier draft of the paper felt that
the X Window protocol was not an issue, as this was primarily
a local access application. The ability to sign electronic mail
messages is certainly an issue for email-enabled networked information
applications, though probably not a major one. Secure email access
-- authenticated SMTP, POP, or IMAP, for example -- are viewed
as primarily issues within an institution rather than cross-organizational
questions; while it is certainly useful to have an authentication
infrastructure which will support these applications, as well
as local administrative applications, this is again not central
to the cross-organizational problem. Directory access protocols
such as LDAP are also potentially serious issues.
CORBA and DCOM are potential questions, though it
is not clear to what extent these will be used from desktop machines
in the future. There are also a set of issues involving authentication
in conjunction with JAVA applets and systems like Authenticode
or PICS which are not well understood at this point. Many of the
authentication and authorization problems in this area deal with
a user's machine making decisions about what applets it
is willing to accept and to execute, and what authorizations it
is willing to assign them; these are similar to questions about
document authenticity and integrity and are out of scope for this
paper. The other set of problems center around an applet making
decisions about a user's rights; while technology
and standards in this area are still in flux, most of the current
approaches seem to assume some kind of certificate infrastructure.
This is an area where more work is clearly needed.
3.5 Privacy Considerations
The application scenarios here involve access to
information resources. In many cases libraries will pay for these
licenses to electronic resources as a replacement for physically
acquiring information in paper form.
The licensee institution, in the print world, has
a set of internal policies about record-keeping and use reporting
(both who used it and how often it was used); generally these
are very restrictive and stress user privacy. The institution
then has a separate set of policies (which may in fact never have
been explicitly codified) about sharing this usage information
with the content supplier: in general this policy has been very
simple -- the supplier got no information about usage other than
that which the institution chose to make public for other reasons.
In the electronic environment, the situation changes.
Because information is often accessed at the publisher site, the
publisher may know a great deal about who is accessing what material
and how often. Aggregators and service bureaus may also complicate
both the collection and flow of information. To some extent the
collection, use, retention, and even potential resale of this
information can be covered by license contract; and should be.
Institutions will have to develop realistic policies about privacy
of readers in the networked information environment which are
acceptable to their user communities and well understood by readers.
However, some authentication and access management approaches
offer licensee institutions much greater flexibility than others
to limit the amount of information that can technically be collected
by the resource operator. In general, it is desirable that the
amount of privacy at risk which needs to be controlled by contractual
provision be minimized.
Clearly, one strategy for ensuring user privacy is
to ensure that users remain anonymous in their use of information
resources. We can distinguish several common situations:
- Repeat users cannot be identified; each session
is completely anonymous. We will call this anonymous access.
- Repeat users can be identified, but the identity
of a user cannot be determined. The resource operator knows only
that some specific individual is accessing the resource repeatedly,
not who that individual is. The user may be identified by some
arbitrary identifier, such as USER123. We will call this pseudononymous
access.
- Demographic characteristics of users can be determined,
but not actual identities. We will call this pseudonymous access
with demographic identification.
- Actual identities can be associated with sessions.
We will call this identified access. It may be supplemented with
demographics; just because the resource operator knows who someone
is does not mean that they automatically know the user's
demographic characteristics as well as his or her name.
Note that many users choose to identify themselves
in order to obtain added value services, such as electronic mail
notification of changes to a resource, or to preserve context
from one session to the next, or to maintain a user profile at
a resource. It's important to distinguish voluntary user
self-identification from automatic identification that is generated
as a byproduct of an authentication and access management system.
It is also worth considering, at least briefly, how an institution
might provide services for its community that permits community
members to enjoy these added value services without identifying
themselves to resource operators, and whether it's worth
going to the trouble to make this possible.
Understanding the coupling between pseudononymous
or identified access as provided by an access management system
and the desire to implement such capabilities as part of an information
access system is a crucial issue. A given information resource
may rely on an authentication and access management system to
provide identified or pseudononymous access automatically, or
it may offer some weak or strong higher level functions (using
a userid/password or cookie scheme, for example) that give the
already authenticated and authorized user the option of identifying
him or her self (literally or pseudononymously) in order to obtain
personalized services from the information resource. In the latter
case, assuming that it's a real choice and the level of
service offered to the anonymous user is meaningful, this isn't
an authentication and access management system issue at all: it's
a choice that users of the information resource are free to make
on an individual basis.
Privacy is not a purely political or moral issue.
To the extent that researchers are pursuing patents, developing
grant applications in a competitive environment, or seeking precedence
for discoveries, confidential access to information resources
is a critical issue with potentially significant economic consequences.
Many higher education institutions are bound by laws about privacy
of student records; some public libraries may face legislative
constraints on patron privacy; and medical institutions (including
university hospitals) may have to consider issues involving privacy
of medical records. And, of course, beyond the United States --
for example in Europe -- the overall legal framework grants stronger
privacy rights for all citizens.
Finally, in discussing privacy, we should recognize
the overall need for a secure environment; this goes beyond authentication
and access management. If user interactions with networked information
resources are conducted in the clear, they are subject to eavesdropping
by other machines on a local area network near the user (for example,
by sniffer-based attacks within the campus network) or by attackers
anywhere along the network path to the resource. Very few information
resources today support searching and information retrieval (as
opposed to ordering) via encrypted SSL-secured HTTP. If privacy
is to be honored in the licensing of networked information resources,
then contractual arrangements, resource sharing designs, and procurements
must recognize the importance of providing such support.
In some situations privacy and confidentially issues
go beyond access management and session encryption. Some users
may be concerned that even knowledge that they are using a resource
(not necessarily what they are doing with it) becomes known through
traffic analysis. Link level encryption helps with this to an
extent, but is not widely deployed and is unlikely to be widely
deployed anytime soon. Very large scale aggregating proxies and
experimental systems such as Crowds, which build on work done
with anonymizing emailer systems such as Mixmaster also help to
address these needs. Robust protection against traffic analysis
in the public internet requires very large overheads. We will
not consider this problem further here, other than to observe
that credential-based approaches seem likely to be most flexible
in these environments, and that if they are used it will be necessary
to consider traffic analysis vulnerabilities created by the credentials
verification process as well as the submission process. Similarly,
there are situations where some users are unwilling to permit
a resource operator to know what sort of information they are
searching for (even beyond contractual restrictions on the collection
and use of this information); in these cases it may be necessary
for such users to locally replicate an entire resource or large
subsets of it.
3.6 Accountability
In negotiating a license agreement, all parties recognize
that the resource being licensed is of value and that the rights
of the licenser must be respected. Typically, a licensee institution
will agree to educate members of the user community about the
license terms and restrictions relevant to the information resource
in question, and to work with the resource operator to identify,
investigate and put a stop to improper use of the resource. Thus,
both the resource operator and the licensee institution share
a common interest in having some individual user accountability
as part of an authentication and access management system, so
that if inappropriate use is detected (for example, if a single
user seems to be accessing the resource thousands of times a day
from computers on three continents) the organizations know where
to begin investigating.
Of course, there's a tension between accountability
and privacy; to the extent that privacy is achieved through anonymity,
there is no accountability. Note that this balance may be managed
by compartmentalizing information, for example: if a specific
user is identified to the resource operator simply as USER2345,
and the licensee institution knows who USER2345 actually is (but
the resource operator does not) then the resource operator could
call for an investigation of what USER2345 is doing, and the licensee
institution might then follow its own due process in that investigation,
which might result in internal disciplinary action but might never
result in revealing the individual's actual identity to
the resource operator. In a real sense, the obligation of the
members of the user community are to the licensee institution,
and the licensee institution in turn has obligations to the resource
operator to ensure that members of its user community behave responsibly;
it is not at all clear that it's appropriate for the resource
operator to be dealing with individual members of the user community
directly.
Accountability will also have some interactions with
institutional policies about inappropriate use of network resources,
particularly to the extent that interaction with these resources
may go beyond simply retrieving information to participation in
interactive communications. For example, policies that typically
govern the use of electronic mail may come into play. But even
if resources are used purely for information retrieval purposes
some accountability (coupled with management data) may be desired
in support of policies prohibiting use of university resources
for personal commercial gain, for example; a useful analogy may
be drawn to practices and policies in areas such as telephone
logs.
3.7 Ability to Collect Management Data
The licensing institution has a legitimate need to
gather management data in order to guide future decisions; if
it is spending a great deal of money to license access to a resource,
or to participate in a consortium resource sharing arrangement
it is only reasonable that it will want to know how much various
resources are being used and what sectors of the user community
is making use of them. For public institutions, in particular,
collection of management data is an essential part of institutional
accountability, and some collection of management data may even
be considered part of public records responsibilities for these
institutions.
There are many reasons to collect management data
besides guiding licensing or resource sharing decisions. These
include the allocation of costs within a licensing organization
or even the development of enhanced services such as collaborative
filtering systems.
It's useful to define some terms. Management
data can be faceted in two ways. The first is by user: this might
include faceting by source IP address, by identity (name), or
by user attributes that figure into a contractually based authorization
decision (i.e. a resource is limited to faculty and graduate students;
this user had the faculty attribute), or by demographic information
that the licensee institution knows and wants to correlate with
usage patterns (i.e. this is a first year graduate student in
civil engineering, or even, in theory though likely not in practice,
this is a male student). They second way to facet management data
is by the objects being accessed or the services being used: which
pages of which articles are being read, which one of several different
databases on a server is being searched, how often searching is
by author rather than by date, etc.
Collecting highly aggregated data is not particularly
problematic; there's no way to prevent the resource operator
from having aggregated data (although its use can obviously be
managed by contract). The only question is whether the licensee
institution can collect its own aggregate data or whether it must
take it as a return feed from the resource operator; in the latter
case, there are a whole series of scaling issues related to standards,
since it will be a significant burden for the licensee institution
to receive use statistics feeds from potentially hundreds of resource
operators in different formats, reflecting different conceptual
models about what is being counted, and with different delivery
schedules.
The larger problems arise when one wants demographically
faceted use data, or even individual use data. In the case of
demographically faceted data, either the licensee institution
must use the authentication and access management system to pass
demographic faceting to the resource operator so that it can become
part of the usage data that the resource operator returns, or
the licensee institution must be able to capture its own demographically
faceted use data. Privacy considerations begin to emerge when
demographic data must be passed to the resource operator.
In the case of individual use data the problems become
even more sensitive. Clearly, if users are individually tracked
by the resource operator (whether or not their identities are
known -- i.e. whether they are pseudononymous or identified) then
the resource operator can collect individual level data and return
it to the licensee institution. The resource operator may even
get supplemental demographic data about the individuals from the
licensee organization. There are also a series of institutional
policy problems having to do with individual level data at the
licensee institution: who can see this data -- for example, can
a faculty member look at the statistics for his or her students
use of specific information resources? Under what procedures
are usage records subject to audit to detect misuse? Again, we
need to consider when these issues should be defined by policy
and trust in implementation of policy as opposed to being managed
by technical means.
While many scenarios are possible, I suspect that
the most common practical situations today will be these:
- usage is tracked on an aggregated basis either
by the institution or the resource operator; I suspect tracking
by the resource operator will be more common since the resource
operator will be able to count events that are more meaningful
in measuring resource utilization (for example, by journal rather
than just page accesses).
- usage is tracked on an individual (pseudononymous
or identified) basis by the resource operator, who then passes
use logs back to the institution ,which processes them to factor
in demographic data and obtain a demographically faceted usage
report.
- institution and resource operator agree on some
very simple demographic faceting and demographic data is passed
to the resource operator by the access management system; these
demographics are then factored into the usage reports developed
by the resource operator.
Management data is a major problem in the current
access framework. Part of the problem is the conflict between
privacy and a desire for demographic or individual data. Most
of this is going to have to be sorted out at the institutional
policy level, and may involve making sacrifices in order to ensure
privacy. Some institutions may be legally limited in their ability
to collect certain management data. It would be very useful to
have some real-world examples of how this trade-off has been settled.
A very insightful comment was made at the meeting
to review the first draft of this paper. From the perspective
of the licensing institution, particularly when facing difficult
collection and resource allocation decisions, the observation
was "there's never enough management information
-- this issue here is to define what you absolutely have to have,
not would you would ideally like".
Return to Contents
4.0 Approaches to Access Management
Having summarized the many and sometimes conflicting
requirements that an access management system must address, we
now consider a number of actual schemes currently in use or under
consideration and analyze how well they meet these requirements.
It's important to recognize that in solving
real-world problems more than one approach may be relevant at
a single institution; one might use one scheme for one class of
users and a different scheme for another class. For example,
an institution might choose to manage access for kiosks and public
workstations by IP source address, and to use a credential scheme
for other users. Indeed, virtually all of the major institutional
systems that are currently being deployed combine multiple approaches.
Also, note that approaches can be cascaded in a hierarchy; for
example, a resource might be set up to first check whether a user
could be validated by an IP source filtering approach but if the
IP source address isn't valid for access, the resource
might then apply a credential-based access management test.
At the most general level, there are three approaches
-- proxies, IP source filtering, and credential-based access management.
Basically, with IP filtering, the licensee institution
guarantees to the resource operator that all traffic coming from
a given set of IP addresses (perhaps all IP addresses on one or
more networks) represent legitimate traffic on behalf of the licensee
institution's user community. The resource operator then
simply checks the source IP address of each incoming request.
In the case of a proxy, the licensee institution
has deployed some sort of local authentication system, and users
employ specific proxy machines to send traffic to the resource
and receive responses back from that resource; the local authentication
system (which is invisible to the resource operator, except that
the resource operator knows that it is in place in order to guarantee
that traffic coming from the proxy machines is legitimate) is
used to control who can have access to the proxy machine. As a
business matter, the resource operator may want to know something
about how the local authentication system works in order to have
confidence in the proxy, but this does not enter into the actual
authentication which is performed operationally by the resource
operator. The resource operator will most commonly identify the
proxy machines by their IP addresses (or some variation such as
reverse DNS lookup), and for this reason from the resource operator's
point of view proxies are often just considered to be a special
case of IP source address filtering -- a resource operator who
is set up to do IP source address filtering can accommodate a
licensing institution employing proxies with essentially no additional
work. However, proxies can actually be identified using either
IP addresses or any credential-based cross-organizational authentication
scheme (such as certificates). Because of this, and also because
many of the policy and technical issues surrounding proxies at
a higher level are quite distinct from those involved in IP source
address filtering, we will treat proxies as a separate approach.
The third approach is credential-based. Here the
user presents some form of credential -- a user id and password,
or a cryptographic certificate, for example -- to the resource
operator as evidence that he or she is a legitimate member of
the user community. The resource operator then validates this
credential with some trusted institutional server (or third party
server operating under contract to the institution) before deciding
whether to allow access. Note that there needs to be advance agreement
(most likely as part of the license contract or resource sharing
agreement) as to how the mutually trusted institutional servers
or third parties (such as certificate authorities) are identified
and authenticated themselves.
For completeness, it is worth noting that there is
one other possibility: the resource operator assigns credentials
to individual members of the licensee community (perhaps in cooperation
with the licensee institution). This is what was done historically
when small numbers of users needed access to a few specialized
information resources. The trouble is that it does not scale manageably
to large numbers of users or large numbers of resources, and particularly
not to both. While it's reasonable for an institution to
distribute one set of credentials to each member of its user community
(for example, in conjunction with an internal authentication system)
it's not reasonable to distribute hundreds of different
credentials for different resources to each user, or to expect
the users to manage them or to keep straight which credentials
are for use with which resource. Thus, we will not consider this
model further, other than to recognize that it may have its place
for specialized resources that serve only a handful of users.
4.1 IP source address filtering
Currently, IP source address filtering is the major
mechanism used to implement authentication and access management
for cross-institutional resource access. The way this works is
that the licensee institution provides the resource operator with
a list of IP addresses that are authorized access; this can include
some wildcarding to permit entire subnets or networks to have
access, and also occasionally incorporates exclusion lists (all
hosts on a given net or subnet EXCEPT for the following specific
hosts). There is general agreement that it is unsatisfactory
for a number of reasons, and it is instructive to evaluate it
against our seven functional requirements both to see where it
works and where it actually falls short.
Feasibility and Deployment: This is relatively easy
to deploy and manage from the perspective of both the institution
and the resource operator. No special software is needed at the
user side, and at the resource operator side the support is not
difficult. There is some maintenance involved in keeping the tables
at the resource providers up to date, but this is not unmanageable.
It is necessary for the licensee institution to perform some analysis
on access and use policies for the machines within the institution
to make sure that machines that aren't access-limited to
the institutional community are excluded where necessary, and
to educate members of the community that giving outsiders an account
on a machine also gives them access to institutional resources
that they may not be entitled to; there are some real dangers
of access control breaches by the creation of proxies either through
ignorance of the implications or deliberately.
The major problem, from a feasibility point of view,
is that many legitimate users are not coming through the institutional
network at all times; they may want access through commercial
ISPs, at their workplaces outside of the institution, or from
home. Some other solution is needed to handle these users.
One should not underestimate the management complexities
of IP source address based access management, particularly from
the point of view of a resource operator. Configuration changes
are frequent, and configurations for a large licensee institution
can be quite complex. Also, the move from the older class-based
network addresses owned by institutions to classless IP network
addressing with the address space managed by the ISP has introduced
new problems; not only must the licensee institution get the network
masks right, but there's no easy way for the resource operator
to independently verify this (for example, that an institution's
network is a /18 rather than a /19).
Authentication Strength: Source IP filtering is actually
relatively strong. While it's not difficult to introduce
packets to the network with spoofed source addresses unless appropriate
packet filters are in place (and this has become a major problem
in the context of network denial of service attacks), getting
responses back to a spoofed network address is much harder, and
basically involves hijacking entire network addresses within the
routing infrastructure. This is relatively unlikely; it's
a sophisticated and complex attack, and is very likely to be noticed
quickly. Resolving the threat of IP spoofing needs to be addressed
at the network routing infrastructure level, and considerable
work is going on in this area (packet filters and authenticated
BGP peering, for example).
A specific machine with an excluded source IP address
that sits on a generally authorized network can circumvent that
restriction more easily, if the machine isn't under institutional
administration (for example, its owner can just give it a new
IP address on the same network.)
Source IP filtering isn't subject to systemic
compromise, and doesn't come with export control restrictions.
Granularity and Extensibility: To the extent that
membership in specific groups can be linked unambiguously to specific
network addresses (for example, in an office, a dorm room, or
a computer lab) fine grained access is feasible. Such direct linkage
is often not the case, however; students in a class may share
use of a computer lab, or need to use public workstations in a
library.
Cross-Protocol Flexibility. Since all protocols of
interest run on top of IP, source IP address based access control
is quite universal.
Privacy Considerations: To the extent that source
IP addresses can be linked to individuals (for example, personal
workstations in offices) there are some privacy issues. And certainly
source IP addresses are correlated to demographics, if the resource
provider is willing to invest in understanding the campus network
architecture. Access in a source IP filtering authentication
environment is probably somewhere between anonymous and pseudononymous,
with some ability to move from pseudononymous to identified access
in individual cases if the resource provider is willing to go
to the trouble to do so (this is the case of personal workstations
used primarily by a single individual).
Accountability: There is limited accountability --
at the level of machines rather than people -- which mirrors the
privacy situation. One has relatively good accountability for
individually-owned personal workstations and relatively poor accountability
for everything else; for a large, shared machine one gets accountability
to the machine level, and then has to work with the administrator
of that machine to identify a specific user or users. If dynamic
IP address assignment is used (as is often the case for laptops
in public areas, for example), then accountability is particularly
weak.
Management Data: An institution can collect some
usage data at a highly aggregated level that is not well correlated
to application-level constructs through a border router, or get
aggregated usage data from the resource operator. Demographic
data can be obtained to the extent there is correlation between
IP address blocks and demographics (for example, there might be
a campus subnet for a medical school); this demographic data will
be sketchy and imperfect at best, and some differentiations (such
as students as opposed to faculty) will be very hard to extract.
Individual level usage data will be possible only in the case
where there are personal workstations, and all work by an individual
is done on that workstation.
Summary: IP source address based access management
tracks the activities of machines rather than people. To the extent
that there's a very close correlation between the two,
it works reasonably well. Unfortunately, the correlation has never
been that good and many trends (such as the move from institutional
modem banks to purchase of commercial dial up access to the internet)
continue to weaken this correlation. IP source address access
management may work particularly well for fixed-location, institutionally
managed public terminals, such a public workstations in libraries
or computer labs.
There are several additional issues and variations
on source IP filtering which deserve some additional comment.
Many organizations are moving to dynamic assignment
of IP addresses, either for limited situations such as laptops
that may be docked in classrooms, computer labs, or public areas
such as library reading rooms, or in some cases, campus wide in
order to simplify address management. This dynamic assignment
weakens accountability, strengthens privacy, and complicates the
collection of meaningful management data. However, since dynamic
IP addresses are assigned within an organizational network number,
use of dynamic IP addresses does not invalidate the use of IP
source address based access management.
To mitigate the problems with access via dialup ISP
connections, a few universities have negotiated special arrangements
with specific ISPs so that members of their community are assigned
addresses on a specific (private) net or subnet when connecting
via the ISP (since the ISP does authentication on the users as
part of the establishment of the dialup connection, this is feasible
if the ISP can maintain this information as part of its user attribute
database). While this makes it possible to extend IP source authentication
to dialup users obtaining service through the ISP, it should be
clear that this approach will not scale reasonably to offer users
a wide range of choice in the ISP marketplace (including wireless
and cable TV based ISPs); it is most practical in situations with
large educational institutions who have the marketplace power
to negotiate such arrangements and where members of the institution's
user community are willing to select from at most a small number
of competing ISPs.
Approaches using IP tunneling and/or Mobile IP
type support can be used to mitigate some of the limitations of
traditional source IP based access management schemes, though
they may have considerable performance and complexity drawbacks.
The next revision of the paper will include a discussion of these
approaches.
Some organizations have used reverse Domain Name
System (DNS) lookups on source IP addresses and then checked the
DNS name in order to perform access management. This changes matters
very little except that it means that access management must also
rely on the security of the DNS system itself (which can be a
problem; secure DNS is not yet deployed widely) and requires that
all hosts have DNS names tabled, which is often not the case.
This approach also does not work well with DHCP (dynamic assignment
of IP addresses) which is often used to support laptop machines.
4.2 Proxies
In some sense, proxy based approaches simply shift
the problem, since an institution will still have to deploy an
internal authentication and access management system in order
to control use of the proxy servers. However, it may be easier
to implement an internal system than to implement a system that
must be used by a wide range of resource providers; proxies modularize
and compartmentalize the authentication problem.
Let us assume for the time being that an institution
has implemented a viable internal authentication system and analyze
various proxy schemes under that assumption. Our comments, then,
will only cover the proxy scheme itself, not the institutional
authentication system necessary to support the proxy.
We need to distinguish between two different kinds
of services that are sometimes referred to as proxies. The first,
which we will call mechanical proxies, are services which take
make use of facilities designed directly into implementations
of protocols such as HTTP. To use a web proxy server, one configures
a browser to pass all HTTP requests not directly to the destination
host, but instead to a proxy server, which intercepts these requests
and when necessary retransmits them to the true destination host.
In this case, the operation of the proxy should be invisible to
the end user.
The second type of proxy is what we will call an
application-level proxy (historically, these have often been called
"protocol translation systems" or "gateways").
An application level proxy functionally forwards requests where
appropriate, but does not rely on protocol mechanisms. An example
might be a Telnet proxy, where in order to reach an access-controlled
Telnet based resource, one telnets to an institutional system;
this might engage the user in an authentication and authorization
dialog, and then mange a Telnet session to the remote resource,
with some editing. In the web environment, a service such as the
anonymizer (www.anonymizer.com) is a good example; here, one accesses
the web page of the service and provides the URL of the remote
resource one really wishes to access. The anonymizer service not
only forwards requests on, but also dynamically re-writes each
page coming back from the remote resource prior to presenting
it to the end user, for example, replacing each URL in the retrieved
page with a URL that accesses the anonymizer with a parameter
of the actual remote page that is being requested. As the environment
becomes more sophisticated, applications proxies become increasingly
problematic: for example, an applications-level proxy generally
will not handle pages that contain Java applets properly.
Feasibility and Deployment: This is not entirely
straightforward. Proxies introduce a considerable amount of overhead,
and the institution will need to invest in the installation and
operation of proxy servers. Some overhead may be mitigated by
having the proxy server perform caching operations as well as
access management, although this introduces a range of other responsibilities
and problems. Also, proxy servers become mission critical systems;
they need to be available and reliable, and to be sized so that
they do not represent a performance bottleneck.
Proxies -- and in particular application level proxies
-- have scaling problems not only in terms of computational resources
to support a large user community, but also in terms of configuration
management and support as the number of resources available to
the user community multiply. Each resource needs to be configured,
and as resources change, configuration changes will be needed
in the proxy.
In the case of mechanical proxies, user browsers
have to be properly configured to make use of the proxy rather
than communicating directly with resources on the network. This
will be a particular problem when pre-configured browsers are
supplied by sources other than the licensee institution; for example,
cable-TV based internet service providers like @home make extensive
use of proxies and caching within their own networks, and supply
browsers that are configured to use the ISP's network.
In the case of applications level proxies, users will have to
be taught to go through the application in order to reach remote
information resources.
Integrating a local authentication system with a
commercial (usually mechanical) proxy server may be non-trivial.
Programming for an application level proxy can become quite complex.
One useful distinction is the locus and complexity of decision
making that the proxy must perform. At the simplest level, a proxy
can just screen all potential users without regard to the resource
that they want to access; essentially there's a single
authorization to use the proxy, and through it all of the resources
that it permits access to. At a more complex level, the proxy
might consider both the user and the resource in order to make
an authorization decision; at the most complex level, it may track
in detail the user's interaction with various resources
and make very specialized decisions about what requests it will
and will not pass through to the resources.
Telnet application proxies are tricky to build (consider
problems like the handling of break signals as they are propagated
across the proxy), and as far as I know, standard commercial
software to support construction of such proxies doesn't
exit. For Z39.50 applications, it's certainly possible
to construct custom proxies, although I am not aware of general
purpose software to do this. The proxy strategy is a very general
one architecturally.
From the point of view of the resource operator,
proxies are easy to work with; they usually just look like a particularly
simple form of IP source address authentication. However, they
may raise some user support problems; if an institutionally-provided
proxy is out of service or overloaded, the resource operator can
expect complaints about bad service for reasons that are outside
of its control.
Authentication strength: obviously, this depends
on the local authentication system. There is the danger of systemic
compromise if the proxy server is successfully attacked (that
is, the local authentication built into the proxy server is broken)
or the proxy is misconfigured. A breach of the local authentication
system is likely to be a very high visibility event which will
receive rapid response from the licensing institution; a breach
of the proxy may be more insidious and more difficult to detect.
The communication between the proxy server and the resource can
be very strongly secured and authenticated using certificates
and session level encryption.
Granularity and extensibility: in theory, anything
is possible if enough work is done on the proxy server. For fine-grained
access control, however, it's necessary for the proxy to
consider who is trying to access what, rather than just having
the proxy server authenticate members of the user community prior
to any use of the proxy. It's not clear how hospitable
commercial proxy software is to this kind of application, or how
complex the institution-specific programming will have to be;
the more complex it gets, the more likely there are going to be
security vulnerabilities.
Cross-Protocol Flexibility: Because the authentication
mechanism used between proxy and user and between proxy and resource
need not be the same, there's a particularly high level
of cross-protocol flexibility. In the worst case, the proxy can
use a very general authentication approach like source IP filtering
to support protocols between the proxy and the resource, and can
use specialized methods (even embedded within application proxy
code) to authenticate users to the proxy server.
Privacy: proxies can provide real anonymity of use
if they are set up properly; the resource operator need not even
get a source IP address for the end user. On the other hand,
they provide a choke point for potential systematic institutional
monitoring of what the user community is doing, which may be some
cause for concern.
Accountability: in general, proxies provide poor
accountability, since they offer anonymous access. At best, some
level of accountability can be provided by correlating local logs
at the proxy (which is tied into the local authentication system)
and monitoring at the resource. In theory it would be possible
for the proxy to pass some pseudonym or identity to the resource,
but it's not clear how this would be accomplished in a
standard and interoperable fashion.
Management data: just as a proxy is a choke point
for monitoring, it is also a choke point for collecting management
data, including demographically faceted data or individual data
since it authenticates users and then sees all of their requests
to resources. Of course, correlating this to applications-level
events and terminology is hard. It is not clear how a proxy could
pass demographic data along with requests to a resource to permit
faceted statistics collection at the resource side.
Summary: it's hard to fully evaluate the proxy
approach for two reasons. To some extent it just moves the authentication
problem because it presupposes the existence of an institutional
authentication system, and the problems of deploying such a system
really need to be considered. Second because a proxy -- particularly
an applications level proxy -- is a point at which custom programming
can be inserted almost anything is possible, at least in theory,
but it's hard to evaluate the implementation and maintenance
cost of such a system, and the extent to which it demands custom
interfaces to the resources themselves, as opposed to using completely
standard interfaces.
4.3 Credential based approaches
In a credential based approach, the user interacts
directly with resources on the net rather than working through
an institutionally-provided proxy intermediary. The key problems
here are:
- What are the credentials that the user presents
to the resource?
- how are these credentials presented securely?
- how are the credentials validated with the issuing
institution?
For a credential based approach to scale, all of
these activities need to take place in a standardized fashion.
The most commonly discussed credentials are X.509 certificates,
which are attractive because browsers and servers already have
some support for them (designed to enable electronic commerce)
and because other software components needed for an X.509 public
key infrastructure are already becoming available on the marketplace.
However, many other forms of credentials are possible, including
userids and passwords, one time passwords, and the like. Indeed,
it's useful to differentiate between application-level
credentials -- where the collection of the credential and its
validation is packaged into the application itself, such a obtaining
and checking a userid and password -- and credentials which are
built into protocol mechanisms, such as the use of certificates
with HTTP and SSL. The protocol based mechanisms are more general
and often require less work to implement on the part of the resource
operator, but are less familiar to end users, calling for a larger
investment in infrastructure and user education.
Credentials can be confusing to analyze because they
can potentially carry both authentication and attribute information
together, or they can be used purely (or almost purely) for authentication.
We will analyze two credential-based approaches:
a userid/password scheme at the application level, and a certificate
based approach.
4.3.1 Password based credentials
Assume that institutions simply maintained databases
of (pseudonymous or identified) user ids and passwords. Note carefully
that the idea here is that a member of the institutional user
community has a single userid and password for access to all licensed
resources, and not a separate userid and password for each
licensed resource.
Using SSL-encrypted forms (which eliminates the problems
of transmitting passwords in the clear), it would be fairly easily
for a resource to ask for this userid and password securely; one
could then have a special purpose protocol so that a resource
could securely check whether the userid and password were valid
by querying an institutional userid/password database server.
Note that SSL can set up an encrypted connection with a server
certificate but no client-side certificate.
The special purpose userid/password checking protocol
doesn't exist today, but is not hard to design or implement,
and since it only needs to be implemented by the resource operator
and by an institutional server or two at each licensee institution,
it might be much less problematic than making all licensee community
users go through the complications of obtaining and installing
certificates on their machines. Further, similar protocols for
userid/password checking are already in use for validating users
to terminal servers (i.e. TACACS, RADIUS); these might be used,
or at least adapted.
Users are already familiar with user ids and passwords,
including the need to keep passwords secure, to change them, and
to pick them well (or at least they are more familiar with these
issues than, for example, certificate use). Userids and passwords
can be carried in the minds of people rather than being installed
on specific machines the way that certificates are; this helps
with kiosks, computer labs, libraries and other shared machine
settings -- assuming that one can teach the user to log off when
he or she is finished, rather than just leaving the machine signed
on. Probably the biggest problem with this approach -- which is
not shared with certificates -- is that the resource operator
obtains a set of globally valid credentials for the user, and
has to be trusted to keep them secure. There are also some secondary
problems -- Trojan horse resources that capture user ids and passwords
under false pretenses, for example, are a much more serious threat
than they are in a certificate exchange environment.
Let's consider passwords and user ids carried
over SSL encryption from the perspective of our requirements definition.
It's clear that they are feasible and deployable. Assuming
that a protocol for verifying user ids and passwords with an institutional
server is standardized and deployed, the amount of work faced
either by a licensee institution or a resource operator is quite
manageable. Special desktop software is not required for web access;
for other protocols, such as Telnet, an SSL- capable Telnet is
needed (my understanding is that some of these are under development).
Z39.50 credentials are a particular problem because no Z39.50
interface to a service like SSL is currently defined. User ids
and passwords are clearly linked to people rather than network
addresses of machines. One problem with userids and passwords
is that they don't encourage seamless navigation among
resources; each resource is going to explicitly annoy the user
by asking for his or her userid and password on each visit.
While passwords represent relatively weak security,
a system can be put in place to require them to be difficult to
guess (by forcing the use of pass phrases rather than passwords,
or avoiding use of words in a dictionary), and also insisting
that they be changed frequently. The use of an SSL based transport
removes the security problems of transmitting them in the clear.
The protection provided by SSL will depend on whether US-only
(long key) or international (short key) versions of SSL are supported
by the user's browser. Userids and passwords are subject
to systemic compromise from two perspectives; if the institutional
password verification server is compromised, new passwords would
have to be issued to all members of the user community. Also,
each resource operator now shares in the responsibility for keeping
userids and passwords secure; if any resource operator's
site is retaining user ids and passwords, and is compromised,
this will compromise all other resource operators as well as the
home institution (if the institution is using the same userid
and password for internal and external authentication and authorization
purposes).
Granularity and extensibility. An institutional password
server will just verify that a particular userid/password combination
is valid (it would also know what resource operator was asking).
In situations where an access management decision needs to be
made that goes beyond validity of the userid/password pair, the
key question is the locus of that decision. The resource operator
will either have to maintain a list of valid Ids (identities)
or the password server will have to keep information about what
resources a userid has access to. Or the institution would have
to offer resource operators access to a user attribute database
keyed on userid.
Cross-protocol flexibility: because passwords operate
at a higher level of abstraction than protocols they are general.
Telnet and Z39.50 support should be straightforward, assuming
that there is encryption on the link over which the passwords
are transmitted, as discussed above.
Privacy and accountability. The use of user ids and
passwords transfers personal information directly to the resource
operator. This information may be pseudononymous or identified;
it will not be anonymous. To this extent, it undermines privacy
but offers accountability. Management data faceted by demographic
categories will be available from the resource operator only to
the extent that the licensee institution provides demographic
data as a byproduct of userid/password validation. there is no
opportunity for the licensee institution to collect statistical
information directly, other than a count of how often userid/password
pairs are validated by the various resource operators.
Summary: to the extent that an institutional password
verification server controls the export of individual and demographic
information, passwords could work surprisingly well in an SSL-protected
context. A primary benefit is that users are familiar with the
model. There are important missing pieces here, particularly
the protocol to permit resource operators to verify userid/password
pairs with institutions that issued them. Probably the greatest
weakness of this approach is the dependency on each resource operator
to protect userid/password pairs, and the danger of systemic compromise
due to a security failure on the part of a single resource operator.
Further comments. Clearly, by issuing different passwords
and userids for different resources, it is possible to reduce
the interdependence among resource operators and the dependence
on each resource operator in maintaining security. However, large
numbers of passwords and userids are extremely unfriendly and
confusing for users, and probably impractical. For users who
only use a single machine (or who are willing to store a cookie
file in a network file system), and for resources that don't
require high security, it's certainly possible to store
userids and passwords as cookies on the user's machine
(though many users have become "cookie-phobic" due
to the overly dire publicity surrounding cookies); once stored,
the user doesn't have to enter them at all, improving seamless
cross-resource navigation. This is the approach that is taken
by many low-security commercial services in the consumer marketplace
today.
4.3.2 Certificate based Credentials
X.509 certificate based credentials are substantially
more complex than passwords, but offer a number of advantages.
In essence, an X.509 certificate (plus the private key that goes
with the certificate) gives a machine credentials that support
its right to make use of a name, and allows this assertion to
be verified by checking with a certificate authority (which might
be operated by the licensee institution, or operated by a third
party under contract to the licensee institution). X.509 certificates
include expiration dates, and certificate authorities can also
provide revocation lists to invalidate certificates prior to their
expiration date (though checking such lists can involve substantial
overhead, and not all systems supporting certificates currently
check revocation lists.)
Rather than making a complete analysis of certificate
based credentials, we will simply highlight how they differ from
the password based credential approach already discussed.
X.509 certificates and corresponding private keys
are messy to distribute (much more so than, for example, a starter
single use password for a local authentication system), and complicated
for users to install, particularly in cases where the certificate
needs to be installed in multiple machines owned by a single user.
Backup and recovery needs to be considered carefully lest a user
lose his or her certificates permanently as the result They are
highly intractable in cases where users share machines, such as
public workstations. X.509 certificates can contain demographic
data (though there are standardization problems here about how
to encode them in the certificate payload) which could be used
for resource-operator based statistics gathering or fine-grained
authorization decisions.
In contrast to passwords, there is already a well
defined protocol/process which can be used to validate an X.509
certificate-based credential that has been presented to a resource
operator.
Note that an X.509 certificate based credential does
not consist of simply the certificate itself, but rather a complex
object that includes the certificate and is signed with the (secret)
private key corresponding to the certificate; since this is computed
anew each time a credential is needed, X.509 based certificates
do not share the password-approach problem that security depends
on each resource operator carefully protecting the user's
credentials.
Userids and passwords are application level constructs;
they can be designed into an application using any protocol, assuming
only that the connection can be encrypted. The exchange of X.509
certificates is a lower level, protocol-integrated operation and
does not rely on encryption. Thus, there is work involved in extending
the use of X.509 certificates to work with protocols other than
HTTP, such as Telnet. (Z39.50 already contains facilities for
certificate exchange). There is also still a need for an SSL-type
service to encrypt the connection where confidentiality is desired;
SSL can also handle many aspects of certificate exchange without
the need for upper level protocol engineering, if it is available
(though the application -- if not the applications-level protocol
-- still needs to know something about certificates). One advantage
of certificates is that they are more flexible than most other
mechanisms; they can be used for signing electronic mail messages,
for example (though generally a separate key is used for signing).
And much of the current work on new protocols and services --
for example in the Java environment -- seems to be based on certificate
models.
The issues involving privacy, accountability and
management data change little from the password scenario already
discussed. One point worth noting that if the user has several
certificates -- for example, an identified one for use with an
internal institutional authentication and authorization system
and a pseudononymous one for use with external services -- he
or she must select the correct certificate for presentation in
order to maintain privacy.
4.4 Proxy/Credential Hybrid Schemes
There are several interesting and confusing schemes
that after much discussion the initial reviews of the paper recognized
are really hybrids of the proxy and credential approaches. In
these schemes, the user contacts an applications proxy in order
to gain access to the resource. The proxy authenticates the user,
checks his or her authorization, and then prepares and submits
a set of credentials to the resource. After the user's
connection to the resource is established through these credentials,
the proxy steps out of the way (via an HTTP redirect) and the
user interacts directly with the resource. This has several useful
results. It greatly reduces the overhead generated by use of a
proxy, and minimizes the resource requirements for the proxy machines.
It reduces some of the privacy concerns related to the proxy.
And it means that short lived rather than long-lived credentials
(something perhaps more akin to a Kerberos ticket, philosophically,
though it may be embodied in a certificate based credential) can
be sent to the resource operator; further, it may avoid the need
to store these short-term credentials locally on the end user's
machine.
Return to Contents
5.0 Conclusions
Both proxies and credential-based authentication
schemes seem to be viable approaches. Proxies have the advantage
of compartmentalizing and modularizing authentication issues within
an institution. But they also place heavy responsibilities upon
the licensee institution to operate proxy servers professionally
and responsibly. Proxy servers will become a focal point for policy
debates about privacy, accountability and the collection of management
information; successful operation of a proxy server implies that
the user community is prepared to trust the licensee institution
to behave responsibly and to respect privacy. Similarly, resource
operators have to trust the licensee institution to competently
implement and operate a local authentication system; anomaly monitoring
of aggregated traffic from a proxy server by a resource operator
is very difficult, and the resource operator will have to largely
rely on the institution to carry out a program of anomalous access
monitoring.
A cross-organizational authentication system based
on a credential approach has the advantage of greater transparency.
Resource operators can have a higher level of confidence in the
access management mechanisms, and a much greater ability to monitor
anomalous access patterns. The downside is much greater complexity;
issues of privacy, accountability and the collection of management
statistics become a matter for discussion among a larger group
of parties. Further, it seems that a credential system means that
there has to be cross organizational interdependency in order
to avoid systemic compromise of the authentication system, as
opposed to a simple relationship of trust -- recognized in a contract
-- for the proxy approach.
One point that seems clear is that an institutional
public key infrastructure may not extend directly to a cross-institutional
one; it may be desirable to issue community members a set of pseudononymous
certificates for presentation outside the institution as well
as individually identified ones that are used within the institution
in order to provide a privacy firewall while still maintaining
some level of accountability.
IP source filtering does not seem to be a viable
general solution, although it may be very useful for some niche
applications, such as supporting public workstations or kiosks.
It can be used more widely -- indeed today it usually is the basic
access management tool -- but it definitely cannot support remote
users flexibly in its basic form. Most real-world access management
systems are going to have to employ multiple approaches, and IP
source address filtering is likely to be one of them.
Reviewers of the first draft of the paper were very
concerned with the costs of deploying access management systems
and the supporting authentication infrastructure. There is relatively
little good data on this, though some early adopter institutions
are seeing rather high costs, particularly for public key (certificate)
based approaches. There is an urgent need to develop a bettter
basis for estimating the initial deployment and operating costs
of the various approaches, and this need should addressed in any
follow-on work to the white paper.
A final issue: this white paper has focused on inter-institutional
issues in authentication and access management. It should be clear
that the role of the licensee institution as a mediator adds some
very significant value for the members of the user community.
There are many users of networked information resources who do
not have a natural affiliation with a licensee organization, and
who thus do not have a way to obtain these benefits. We can expect
these users to seek affiliations -- such as that of alumni --
which allow them to obtain these benefits. The idea of being able
to have a single ID that allows access to a vast array of networked
information resources is a very powerful one, and it is one that
today is available only in an institutional context.
Return to Contents
Appendix: Notes on the State of the Art in Available Software
This appendix provides a snapshot of the currently
available state of the art for key software components in terms
of their support for authentication, authorization and access
management. One of the key issues that the white paper has identified
is the need for off the shelf software to provide the needed facilities,
particularly at the user's desktop.
Web Browsers
Two web browsers -- Netscape Navigator and Microsoft's
Internet Explorer -- currently dominate the broswer marketplace.
Both support a wide range of platforms, including Microsoft windows,
the Mac OS, and various varieties of UNIX.
Both browsers support SSL for encrypting forms that
include passwords. It is worth noting that while both browsers
support 128-bit encryption in their US-only products, users must
take special action to obtain these versions and the vast majority
of users probably are still running the much less secure 40-bit
export qualified versions that are available as the default distributions.
Both browsers support proxy servers as a configuration option.
Both browsers support the incorporation of X.509 certificates.
The browsers do not yet support certificate revocation lists (verify
this).
There are many problems with certificates. They are
not simple for the average user to import. Certificate backup
and recovery (for example, in the case of a disk crash) is a problem.
Certificates may not be moved smoothly as part of an upgrade;
they definitely won't move if a user switches between Netscape
and Internet Explorer (Netscape will import IE certificates via
explicit action, but neither browser will simply make use of certificates
installed in its competitor).
Both browsers include a built-in Telnet. This Telnet
does not support SSL for protecting the transmission of user ids
and passwords. Both browsers can be configured to use independent
Telnet helper applications rather than the build it Telnet. I
am aware of work going on in the Mac world to provide a stand-along
Telnet application which incorporates SSL encryption. Reconfiguration
of any browser to substitute an external Telnet is non-trivial
for the average user.
One issue that was identified during early reviews
of this paper was the Lynx character-based web browser. Lynx is
important for two reasons: because there is still a large installed
base of trailing-edge character-based terminal technology, and,
perhaps more compellingly, because Lynx, in conjunction with other
specialized assistive software, is a key part of many institutional
strategies for meeting the needs of disabled users and the requirements
of the Americans with Disabilities (ADA) law. Lynx capabilities
remain to be researched.
Web servers
Commercial web servers from Netscape and Microsoft
support SSL, as does Stronghold (commercial Apache); Apache proper
supports SSL only on a limited basis with the addition of the
shareware SSLeay module.
Need to review X509 support, including what Certificate
Authorities are supported/will issue, support of Certificate Revocation
Lists, etc.
Return to Contents