Responsible Operations: Data Science, Machine Learning, and AI in Libraries

Responsible Operations is intended to help chart library community engagement with data science, machine learning, and artificial intelligence (AI) and was developed in partnership with an advisory group and a landscape group comprised of more than 70 librarians and professionals from universities, libraries, museums, archives, and other organizations.

This research agenda presents an interdependent set of technical, organizational, and social challenges to be addressed en route to library operationalization of data science, machine learning, and AI.

Challenges are organized across seven areas of investigation:

  1. Committing to Responsible Operations
  2. Description and Discovery
  3. Shared Methods and Data
  4. Machine-Actionable Collections
  5. Workforce Development
  6. Data Science Services
  7. Sustaining Interprofessional and Interdisciplinary Collaboration

Organizations can use Responsible Operations to make a case for addressing challenges, and the recommendations provide an excellent starting place for discussion and action.

https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html
by Thomas Padilla

Padilla, Thomas. 2019. Responsible Operations: Data Science, Machine Learning, and AI in Libraries. Dublin, OH: OCLC Research. https://doi.org/10.25333/xk7z-9g97.

Data-centric approach to enterprise architecture

Data is the key to taking a measured approach to change, rather than a simple, imprudent reaction to an internal or external stimulus. But it’s not that simple to uncover the right insights in real time, and how your technology is built can have a very real impact on data discovery. Data architecture and enterprise architecture are linked in responding to change, while limiting unintended consequences. DBTA recently held a webcast featuring Donald Soulsby, vice president of Architecture Strategies at Sandhill Consultants, and Jeffrey Giles, principal architect at Sandhill Consultants, who discussed a data-centric approach to enterprise architecture. Sandhill Consultants is a group of people, products and processes that help clients build comprehensive data architectures resulting from a persistent data management process founded on a robust Data governance practice, producing trusted, reliable, data, according to Soulsby and Giles. A good architecture for data solutions includes: RISK MANAGEMENT Strategic Regulatory Media Consumer COMPLIANCE Statutory Supervising Body Watchdog Commercial Value Chain Professional Enterprise architecture frameworks start with risk management as its building blocks, Soulsby and Giles said. A typical model asks what, how, where, when, and who. A unified architectural approach asks what, how, where, when, who and why. This type of solution is offered by Erwin and is called Enterprise Architecture Prime 6. According to Soulsby and Giles, the platform can achieve compliance, either regulatory or value chain; can limit unintended consequences; and has risk management for classification, valuation, detection and mitigation. erwin and Sandhill Consultants offerings will provide a holistic view to governing architectures from an enterprise perspective. This set of solutions provides a strong Data Foundation across the Enterprise to understand the Impact of Change and to reduce Risk and achieve Compliance, Solusby and Giles said. An archived on-demand replay of this webinar is available here.

via The Building Blocks of Great Enterprise Architecture for Uncovering Data — Architectural CAD Drawings

UK introduces new Data Protection Bill, because, you know, GDPR

Digital Minister Matt Hancock has confirmed the UK government will introduce a new data protection law. “It will provide everyone with the confidence that their data will be managed securely and safely.

Source: UK introduces new Data Protection Bill, because, you know, GDPR

GDPR: PII Data vs. Personal data

b8218ceb-2e27-4405-88eb-541da0d8237c

The European Union’s new General Data Protection Regulation (GDPR), which goes into full effect in May 2018, significantly strengthens the data privacy rights of consumers and the requirements on companies that solicit and retain customer identities. Positive part about GDPR is that companies cannot hide, and It applies to all companies anywhere in the world those do business in Europe and/or retain EU citizen’s data.

The US-based Personally identifiable information (PII) and the European concept of Personal Data make up a critical demarcation line related to data types and privacy consequences. To get compliant with GDPR, one has to understand the difference between the way two-legal systems approach the concept of personal information and its meaning in the context they are used. PII is any data that could potentially identify a specific individual. Any information that can distinguish one person from another and can be used for de-anonymizing anonymous data can be considered PII.

PII, or SPI (sensitive personal information), as used in information security and privacy laws, is information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context. PII term is used in US context that is created on the basis of commonly used US law. Examples of PII data–full name, maiden name, social security number, phone number, email address, asset information, owned properties etc. Little variation may be observed from states to states.

Personal Data is defined in the EU Directive 95/46/EC and it covers much wider range of information that may include transaction history, social media posts, photographs and other data that relates to an individual or identifiable person, directly or indirectly. Personal data term applies to all 28 EU states of European Economic Area (EEA). The concept reflects European law maker’s intention to bring the concept of privacy as a fundamental human right and draw the accountability of handling this sensitive data by business.

We can say all PII data is personal data but not all personal data is PII data. It is important that data and IT architects along with Data Protection Officer (DPO) consider personal data beyond the narrow scope of PII, especially US based companies, to build a successful GDPR compliance program.

GDPR Data Portability and Master Data Sharing

The title of this blog caught my attention for it talks about data portability between competitors. Yes….This scenario is not very far when competitors will share the customer profiles…may be via DaaS (Data as a service). Henrik calls it another “Sunny side of GDPR”. #AbhiSrivastava, #GDPR, #DataArchitecture, #DataPortability

Liliendahl on Data Quality

PortabilityOne of the controversial principles in the upcoming EU GDPR enforcement is the concept of data portability.

In legal lingo data portability means: “Where the data subject has provided the personal data and the processing is based on consent or on a contract, the data subject shall have the right to transmit those personal data and any other information provided by the data subject and retained by an automated processing system, into another one, in an electronic format which is commonly used, without hindrance from the controller from whom the personal data are withdrawn.”

In other words, if you are processing personal data provided by a (prospective) customer or other kind of end user of your products and services, you must be able hand these data over to your competitor.

I am sure, this is a new way of handling party master data to almost every business. However, sharing master…

View original post 40 more words

Data Architecture: How to build the castle?

HNL-gallery-900x400-03_0“Architecture is frozen music.” This famous quote is from 18th-century writer Johann Wolfgang von Goethe. The statement reveals a quality of architecture as a creative discipline. Both architecture and music are wide open to interpretations; however, they intrinsically bind things in harmony. Data architecture is a symphony of data that is collected, stored, arranged, integrated, and put to use in corporations. We dealt with the data architecture definitions and its need with house analogy in my last blog “Data Architecture – What is it and why should we care?”. In the current article, we will recount how to put together the fortress of data architecture.

Architecture frameworks such as TOGAF, Zachman, DoDAF offer us a method to think about systems and architecture. Although plenty of consortia developed, proprietary, defense industry, government and open source frameworks are available, one should use them judiciously because one might overdo things than necessary. There are many research papers available that show that EA frameworks are theoretical and impossible to carry out. With this in mind, experts agree that foundational artifacts are needed to document data architecture. Organizations decide these foundational set of artifacts based on the potential value they provide and the investment they have to make in creating them. These artifacts are integrated set of specifications to define data requirements, to guide integration and control of data assets, and to align with business’ information needs and strategy. An Architect must make sure the coherency and integrity between the artifacts created whether a diagram, data models, and other documents.

DAMA DMBOK divides Enterprise data architecture artifacts into three broad categories of specifications –

  1. The enterprise data model: The heart and soul of enterprise data architecture,
  2. The information value chain analysis: Aligns data with business processes and other enterprise architecture components, and
  3. Related data delivery architecture: Including database architecture, data integration architecture, data warehousing/business intelligence architecture, document content architecture, and meta-data architecture.

Picture2
Enterprise architecture includes data, process, application, technology and business architecture in practice. The business architecture may include goals, strategies, principles, projects, roles and organizational structures. Process architecture has processes (functions, activities, tasks, steps, flow, products) and events (triggers, cycles). Application architecture has macro-level and micro-level application component architecture across the entire application portfolio governing the design of components and interfaces, such as a service-oriented architecture (SOA).  Enterprise architecture includes these aspects.

Picture1

To create the data architecture, one has to define business information needs. The core of any enterprise data architecture is an enterprise data model (EDM). The EDM is an integrated subject oriented data model defining the essential data created and consumed across the enterprise. Building enterprise data model is the first mark in establishing that need and data requirement. Organizations cannot build EDM overnight. Each strategic, and enhancement project should contribute to building it piece by piece. Every project that touches data assets of the organization with its limited scope classifies the inputs and outputs required. These details should list data entities, data attributes, and business rules. One can thus organize these by business units and subject areas. Proper categorization and completeness is key to building the enterprise data model.

Planner View (Scope Contexts): A list of subject areas and business entities.

Owner View (Business Concepts): Conceptual data models showing the relationships between entities.

Designer View (System Logic): Fully attributed and normalized logical data models.

Builder View (Technology Physics): Physical data models optimized for constraining technology.

Implementer View (Component Assemblies): Detailed representations of data structures, typically in SQL Data Definition Language (DDL).

Functioning Enterprise: Implemented instances.

The enterprise data model by itself is not enough. The data model is part of the overall enterprise architecture. It is important to understand how data relates to business strategy, organization, process, application systems, and technology infrastructure. In forthcoming articles, we will go over EDM, information value chain analysis, data delivery architecture and some additional aspects of data architecture.

Data Architecture: What is it and Why should we care?

Basic tenets of Data architecture

Data Architecture, as understood by most in the industry, has many different definitions. Here is what Wikipedia says – “In information technology, data architecture is composed of models, policies, rules or standards that govern what data is collected, and how it is stored, arranged, integrated, and put to use in data systems and within the organization. Data is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture”. I would define Data Architecture as a discipline that deals with designing, constructing and integrating an organization’s data asset so it can be well optimized for the organization to run its business.

HDTS infographic
Data Architecture Vs House Architecture

Enterprise Architecture is often compared with the architecture of a house, which defines individual design elements of nook and corners of the house to build it to specifications. Similarly, the data architecture is a design that defines the way data enters the organization, lives in its systems and applications, moves within the organization and is consumed to run the business. It is a blueprint of the data design. Most enterprises deal with unmanageable data sprawl that continues to grow at tremendous speed. This “just a bunch of data”, or JBOD, is a major driving force behind the need for a data strategy and an enterprise data architecture.  It is much like a road network to reach to IT goal and thus business goal while data strategy defines “how” to reach that goal. In principle, the data architecture defines a framework that helps in organizing data ingestion, data storage & management, and data processing. Different components of this architecture would include data integration, DBMS selection, Data modeling, performance and measurement, security and privacy, Business intelligence, metadata, Data quality, and data governance.

The analogy of house design (see attached table) works well in this context to understand what components need to be taken care of when we talk about data architecture.

Data Architecture House Architecture
Policies, Rules, and Standards Code of house building
Policies, rules, and standards are the first thing required to build the house. One can use existing industry standard frameworks such as TOGAF, Zachman etc. These are the guard rails for a full life cycle of the data within the organization.
Data Subject Areas (Inventories) Naming the space e.g. living room, kitchen
Naming each space helps to know what types and categories of data are being used in the organization. It is the inventory of the data-space an organization has.
Data Models House plan
The data model is the actual diagram of various data entities at the conceptual, logical and physical level. These are the detail levels of data classifications that help with the collaborating, implementing, and testing of data specifications on the system.
Meta-data Room specifications
This is the lowest and the last level of detail about the data that describes the properties of the data being stored. This is where data context is added.
Integration Utilities hookup
This is where data movement between systems is handled. An integration plan includes what and how data is transferred and managed in-flight.
Data Residents of the house
Data is the resident of the house that lives and moves and is archived, deleted, and updated. A great data architecture plans the organic data growth for foreseeable future.

So why do we need a road network to reach our goals? Well…because we all value order than chaos. We like to follow etiquettes versus no rules of conduct. Okay… but what help will this “order” provide us and how? This question is simple to answer but rather difficult to execute. The enterprise data architecture helps us onboard the data quickly and delivers clean and trustworthy data at the speed required by patrons and business. It also ensures that data is handled in a more secure and compliant way, which may be required by local laws and regulations. It makes it easier to incorporate new data types and technology and enables end-user self-service. The list goes on, but the cardinal point is that data architecture ties it all together.

However, it is unwieldy to build an enterprise data architecture from scratch that can meet our need. A more pragmatic approach would be to build the future state of architecture in each new strategic business initiative. More in next issue…