Configuring and Managing Data Sources – IT Exams Training

In SQL Server Analysis Services (SSAS), a Data Source defines how SSAS connects to a relational database. It includes the connection string and authentication credentials, allowing the SSAS engine to retrieve data for use in cubes, dimensions, and data mining models. These data sources are critical because they serve as the pipeline through which raw operational data flows into the analytical model.

SSAS supports connecting to a wide variety of data sources, and it does so using either OLE DB providers or .NET managed data providers. These technologies abstract the underlying database systems, allowing SSAS to communicate with different kinds of databases using a consistent interface.

Operational Data Store as a Data Source

An Operational Data Store (ODS) is a frequently used data source in SSAS projects. It is designed to collect and consolidate data from multiple operational systems. Typically, an ODS includes one or more fact tables and several dimension tables, all connected by relational joins. These structures mimic the layout of a star or snowflake schema, which are foundational in multidimensional modeling.

The ODS serves as a central repository from which SSAS can efficiently extract consistent and timely data. Because it integrates multiple operational databases, the ODS simplifies the ETL process and provides a clean, unified dataset for analysis. This centralized approach also makes maintenance and data governance more manageable.

Data Source Wizard

The Data Source Wizard in SSAS simplifies the process of setting up and managing data sources. When using this wizard, you can either define a new data source or reuse an existing one. This reusability encourages consistency across SSAS projects and makes management easier. Additionally, the wizard allows connections to objects within the current SSAS project, which is especially useful when developing and testing mining models or cubes.

The wizard provides flexibility by allowing you to restrict the scope of the connection, for example, by limiting access to specific schemas or filtering out unnecessary tables. This can improve performance and simplify the development environment by focusing only on the data required for a specific analytical model.

Authentication and Impersonation in Data Sources

Impersonation in SSAS

Impersonation is the method SSAS uses to access data in the relational data source. Because SSAS runs as a service, it must impersonate a user or account that has access to the database. SSAS provides four impersonation options, each suitable for different scenarios and security requirements.

Use a Specific Windows Account

This option is commonly used when the SSAS service account lacks permissions to access the database. Here, you specify a Windows username and password explicitly. These credentials are encrypted and securely stored. This method is typically selected in environments where access must be tightly controlled or audited, and where individual accounts are preferred over service accounts.

Use the Service Account

This is the default choice for many deployments. It instructs SSAS to use the Windows account under which the SSAS service is running. As long as that account has the necessary permissions in the database, this option is straightforward and efficient. It also aligns well with Active Directory-based access control, making it easy to manage permissions centrally.

Use the Current User’s Credentials

This option is generally reserved for development or data mining use cases. It tells SSAS to use the credentials of the user who is currently interacting with the cube or model. This is useful in interactive or ad hoc analysis scenarios but is not recommended for production environments, especially where data processing or scheduled jobs are involved.

Inherit Impersonation Settings

The “Inherit” option uses the impersonation settings specified at the database connection level. This allows impersonation to be defined once and reused across multiple components in the project. It is particularly useful for standardizing access across large SSAS deployments.

Data Provider Technologies: OLE DB vs .NET

OLE DB Providers

OLE DB is a well-established technology that allows applications to access data from a variety of sources using a standard set of COM interfaces. SSAS uses OLE DB providers to connect to many traditional databases, including Microsoft Access, Oracle, IBM DB2, and Sybase. Because of its long history and broad support, OLE DB remains a reliable choice for many enterprise environments.

.NET Data Providers

In contrast, .NET data providers are part of the Microsoft .NET Framework and operate within the Common Language Runtime (CLR). These managed code interfaces offer benefits such as improved memory management, type safety, and deeper integration with NET-based applications. They are more modern and may offer better performance in certain scenarios, especially when working with SQL Server.

.NET data providers are particularly advantageous in environments that already use .NET technologies extensively. They can also be preferred when performance optimization and integration with custom .NET applications are important.

Trade-offs Between the Two

Earlier versions of SSAS supported only OLE DB providers, but with the introduction of SSAS 2005 and beyond, both provider types are supported. While OLE DB is widely compatible and time-tested, .NET providers offer advantages in terms of modern programming paradigms and tighter integration with .NET applications. The choice between the two often depends on the source system, performance needs, and internal development standards.

Supported Data Sources and Connectivity

Major Supported Databases

SSAS supports many popular relational databases. Among them are:

Microsoft SQL Server
Oracle
IBM DB2
Teradata
Sybase
Informix

This wide-ranging support allows SSAS to be used in diverse environments, including those with legacy systems or heterogeneous data infrastructures. The cartridge mechanism in SSAS allows it to use database-specific dialects and extensions, improving performance and compatibility.

Cartridge Mechanism

The cartridge mechanism allows SSAS to adapt its SQL generation for the nuances of each supported database. This flexibility is key in environments with non-standard SQL implementations or where performance tuning is crucial. For example, SSAS can issue native PL/SQL for Oracle databases or T-SQL for SQL Server, depending on the data source.

Establishing Connections in SSAS

Establishing a connection to a data source in SSAS involves specifying the server name, database name, authentication method, and optional provider-specific settings. This configuration is stored in the data source object and can be reused across cubes, dimensions, and mining models. It forms the foundation for data access in the SSAS project and should be tested thoroughly before deployment.

Understanding Data Source Views (DSVs) in Analysis Services

A Data Source View (DSV) is a metadata layer in SQL Server Analysis Services that sits between the data source and the SSAS objects, such as cubes and dimensions. While the data source points directly to the relational database, the DSV allows you to define a logical schema based on the data source tables. This schema is not materialized in the database itself but exists only within the SSAS project.

The primary advantage of using a DSV is that it allows modelers to work only with the tables and columns that are needed for analysis, filtering out unrelated data. It also allows for the creation of derived tables, named queries, and calculated columns without altering the source data. This logical layer simplifies the modeling process and improves the manageability of complex data models.

Creating a Data Source View

Creating a DSV is typically the next step after defining a data source. The Data Source View Wizard guides users through the process. This includes selecting the tables from the underlying database, creating relationships, and optionally defining custom queries or restricting certain schemas.

If a data source has not been created beforehand, the DSV Wizard provides the option to define one on the fly. Once the data source is in place, the wizard displays the available tables and allows the user to select which ones to include in the DSV. This selection helps reduce clutter and focus only on the necessary elements.

The wizard also allows for schema filtering, which is particularly useful when dealing with databases that contain hundreds of tables. By limiting the view to only relevant schemas or table prefixes, the modeling process becomes more efficient.

The DSV Designer Interface

Once the DSV is created, it can be modified and enhanced using the DSV Designer. This interface includes several panes that help users visualize and manage the schema.

The central pane presents a graphical view of all tables included in the DSV. Tables are shown with their columns and primary keys, and relationships are illustrated using connecting lines. This graphical interface helps modelers quickly understand the structure and associations of the data.

The top-left pane, known as the Diagram Organizer, allows users to create and manage different diagrams within a large DSV. Each diagram can contain a subset of tables, which makes it easier to manage views when dealing with extensive schemas. For example, a data warehouse with over 50 tables may benefit from separate diagrams for sales, inventory, and customer data.

The lower-left pane, labeled Tables, displays a hierarchical view of all tables in the DSV along with their relationships. This tree view is particularly useful for navigating large models and identifying related objects quickly.

Modifying a Data Source View

Modifying a DSV is a common task during the lifecycle of a project. As business requirements evolve, new tables may need to be added, or existing tables may need to be removed. The DSV Designer makes these tasks straightforward.

To add or remove tables, users can right-click the diagram view and select the Add/Remove Tables option. This opens a dialog that displays available and included tables. Moving tables between these lists updates the DSV accordingly.

Tables can also be removed directly from the graphical pane or the table view. By right-clicking a table and selecting the Delete Table from DSV option, the table is removed from the logical view. It’s important to note that this operation does not affect the physical data source or underlying database.

Defining Primary Keys and Relationships

While some databases define foreign key relationships at the schema level, others may not. In such cases, SSAS allows modelers to define relationships manually within the DSV.

To set a primary key, a user selects a column within a table and chooses the Set Logical Primary Key option. This informs SSAS of the uniqueness of the column and enables it to establish meaningful joins.

To create relationships between tables, one column is selected and dragged onto a corresponding column in another table. This forms a logical join that is used when building dimensions and cubes. These relationships mimic those found in relational database design but exist only within the DSV.

Additional relationships can also be created through the context menu by selecting the New Relationship option. This opens a dialog where users can specify the source and destination tables, as well as the joining columns. Defining accurate relationships is critical to ensure data integrity and support correct aggregation logic in cubes.

Using Named Queries in a DSV

Named Queries are a powerful feature in SSAS that allow users to define virtual tables within the DSV. These queries can be used to filter data, join multiple tables, or create summary views without changing the underlying database.

To create a Named Query, users right-click a table and select the Replace Table With New Named Query option. This opens a query editor where a SQL statement can be entered. After executing the query, the results are displayed and can be previewed before being added to the DSV.

Named Queries are particularly useful when certain transformations or joins are required that are not available in the raw source tables. For example, a Named Query could combine order header and order detail tables to provide a flat view suitable for cube modeling.

Named Queries also support the creation of multiple instances of the same table with different filters. This is useful in scenarios such as year-over-year comparisons or data segmentation.

Creating Calculated Columns

In addition to Named Queries, SSAS allows the creation of calculated columns within the DSV. These columns do not exist in the original data source but are defined using expressions based on existing columns.

For example, a calculated column might multiply quantity and unit price to derive total sales. The data type of the calculated column is inferred from the expression, and the logic is stored in the DSV metadata. It is not persisted in the database, which preserves the integrity of the source system.

To add a calculated column, a user right-clicks a table and selects the New Named Calculation option. The expression is then entered and evaluated against the table structure. The new column becomes part of the table in the DSV and can be used just like any other column in dimensions or cubes.

This functionality enables lightweight transformations to be performed directly within SSAS, reducing the need for complex ETL logic in external tools.

Advanced Capabilities of Data Source Views

As data warehouse models grow, it becomes essential to logically organize and manage the large number of tables involved. Data Source Views in Analysis Services support this need through the use of diagrams. A diagram in a DSV is a curated visual layout containing a selected subset of tables from the overall view.

To create a new diagram, you open the DSV Designer and right-click within the Diagram Organizer pane. From there, selecting the option to create a new diagram allows you to define a name and begin customizing it. This process helps focus development efforts on smaller parts of the model, such as sales, inventory, or marketing data, while reducing visual complexity.

After naming the diagram, the next step is to add tables by right-clicking in the empty diagram space and selecting “Show Table.” A dialog box appears where you can choose which tables should appear in this specific view. For example, if you create a diagram named Internet Sales, you might include the Internet Sales fact table and its associated dimension tables such as Customer, Product, and Date.

Diagrams allow modelers to manage and navigate large DSVs more efficiently. Instead of working with dozens or hundreds of tables in one flat space, separate diagrams let you break down the model into logical, business-focused segments. These diagrams are saved as part of the project and can be revisited or modified later as needed.

Viewing and Managing Object Properties in the DSV

The DSV Designer in Analysis Services allows you to inspect and manage metadata properties associated with tables, views, columns, and relationships. These properties help ensure data is interpreted correctly and support additional configuration needs for your cubes or dimensions.

When a table is selected in the DSV Designer, the Properties window within Business Intelligence Development Studio (BIDS) displays detailed information about the selected object. This includes the table name, its associated data source, schema, and any custom properties like friendly names, descriptions, or logical primary keys.

Columns within tables also expose their metadata. You can view and edit column-level details such as data types, nullable status, and formatting options. Relationships between tables are also fully documented and can be manually created or edited through the interface.

Having direct access to these properties in the DSV simplifies model validation and helps maintain consistency across multiple development efforts. It also aids in debugging when data relationships do not produce expected results in the cube or dimension structures.

Using Multiple Data Sources in a Single DSV

Data Source Views are not limited to a single data source. Analysis Services allows modelers to bring together tables from multiple data sources into one unified DSV. This is particularly useful for scenarios where data is distributed across different systems or storage platforms.

For example, you might have customer profile data stored in a CRM system using Microsoft SQL Server, and transactional data captured in a legacy Oracle database. By defining both as separate data sources within the SSAS project, you can then include tables from each into the same DSV.

When working with multiple sources, one source is typically designated as the primary. This source should be SQL Server-based because the Analysis Services engine uses SQL Server–specific features like OPENROWSET to integrate and process data from external providers.

Once all desired tables are added, you can create logical relationships between them, just as you would within a single-source DSV. These relationships are based on matching keys and logical business rules. From the perspective of the model, all tables appear to be part of the same logical structure, even though they originate from different systems.

This capability makes it possible to build comprehensive analytical models without needing to consolidate all data into a single warehouse. It supports federated data architectures and hybrid integration approaches.

Limitations and Considerations for Multi-Source DSVs

While the ability to create a multi-source DSV is powerful, it also introduces certain limitations and design considerations. The most important limitation is that Analysis Services requires the primary data source to be SQL Server. This is because only SQL Server supports certain integration mechanisms used by SSAS during cube processing.

Another consideration is performance. Joining tables across different sources can impact query execution times, particularly if the data must be pulled in real-time during processing. For this reason, many organizations choose to stage or replicate the necessary data into a centralized SQL Server environment before modeling it in SSAS.

Security is also a key factor. Each data source requires valid connection credentials and appropriate impersonation settings. When multiple sources are used, SSAS must be configured to securely manage and maintain access to each backend system.

Despite these challenges, the flexibility offered by multi-source DSVs makes them a valuable option for complex environments where data integration through ETL is not immediately feasible.

Enhancing the DSV with Named Queries and Custom Tables

Named Queries allow modelers to enhance a DSV by creating custom views of the data without altering the underlying database. These queries can join multiple tables, filter rows, or perform aggregations to produce a table structure better suited for analysis.

For example, instead of using a normalized schema with separate order headers and order details, a Named Query could flatten this structure into a single virtual table. This simplifies cube design and supports more efficient measure calculations.

To create a Named Query, right-click a table in the DSV Designer and choose the option to replace it with a new Named Query. This opens a SQL query editor where you can define the custom logic. After the query is executed and validated, the results are presented as a new table within the DSV.

Named Queries are especially useful when there is a need to create multiple filtered views of the same table. For example, creating one view for current-year data and another for historical data can simplify model logic and improve cube performance.

Named Calculations, a similar feature, allows users to add derived columns to tables using expressions. These columns are calculated at runtime and behave like any other column within the model. The expression is defined within the DSV metadata and does not affect the underlying source system.

Customizing Data in the Data Source View

In many cases, existing tables do not contain all the columns needed for building analytical solutions. To support these additional requirements without modifying the source tables, Analysis Services allows users to add calculated columns directly within the Data Source View.

Calculated columns are metadata-level constructs that appear in the DSV and behave like physical columns. They are created using expressions based on existing columns in the table. These expressions can perform mathematical operations, string manipulations, or conditional logic to generate new insights.

For example, in an Employee table, if the data includes HireDate and BirthDate columns, a calculated column could be created to compute the number of years of service for each employee. This would involve subtracting the HireDate from the current date and formatting the result accordingly.

To add a calculated column, open the DSV Designer, right-click the table where the column is needed, and select the option to create a new Named Calculation. This opens a dialog where you can define the column name and provide an SQL expression to compute its value. The system validates the expression before saving it.

These calculated columns are not written back to the source database. Instead, they are stored in the SSAS project metadata and evaluated during processing or querying. This non-invasive approach allows developers to enrich their data models without requiring changes to the production database.

Using Named Queries for Enhanced Flexibility

Named Queries are an essential feature within the DSV that provides even greater flexibility. Unlike calculated columns, which operate on a single table, Named Queries can span multiple tables, define joins, apply filters, and reshape data into a format that is more suitable for analysis.

When you replace a table with a Named Query, you essentially create a virtual table that uses SQL to define its structure and contents. This is useful when the relational model in the source system is not ideal for analysis, for example, when dealing with highly normalized schemas or fragmented historical data.

To create a Named Query, right-click a table in the DSV and select Replace Table with New Named Query. You can then enter a custom SQL SELECT statement that joins multiple tables, filters out unnecessary rows, or groups data for summarization.

After running and validating the query, the result is added to the DSV as a new logical table. You can create relationships from this table to others just as you would with regular tables. This capability enables more advanced modeling without altering the source systems or requiring ETL transformations upstream.

Named Queries support the reuse of business logic directly within the model and enable more efficient cube processing. They help isolate business rules, simplify measure calculations, and support scenarios where the original data layout does not align with dimensional modeling requirements.

Maintaining and Optimizing the DSV

Once a DSV has been built and customized, it must be maintained to ensure continued accuracy, performance, and usability. This involves validating relationships, removing unused tables, and refreshing schema metadata to reflect changes in the underlying source systems.

As source databases evolve, column names may change, tables may be altered, and relationships may become obsolete. SSAS does not automatically detect these changes, so it’s essential to periodically refresh the DSV. This can be done by right-clicking the DSV and choosing the option to refresh the schema. The system will compare the DSV metadata with the source and prompt you to resolve any discrepancies.

In addition, it’s good practice to remove unused tables and columns from the DSV to reduce clutter and improve usability. Unused objects not only add confusion but can also affect performance during cube processing and browsing. The DSV Designer makes it easy to right-click and delete tables that are no longer relevant.

Managing relationships is another key task. Relationships in the DSV should reflect real-world business connections between entities. Incorrect or missing relationships can lead to inaccurate aggregations or improper dimension usage. Use the DSV Designer to validate and edit relationships, ensuring that keys are properly defined and mapped.

When working with multiple diagrams in large models, naming conventions become critical. Diagram names should identify their business focus or subject area. This helps maintain order in the project and improves team collaboration.

Best Practices for Using Data Sources and DSVs in Business Intelligence Projects

Working with Data Sources and DSVs effectively requires following a set of best practices to ensure scalability, maintainability, and alignment with business requirements.

First, always use descriptive names for data sources, tables, and relationships. This enhances the readability of the model and helps new team members understand its structure. Avoid abbreviations or technical codes that might confuse business users.

Second, define logical primary keys and relationships early in the modeling process. These elements are essential for building correct dimensions and fact-to-dimension relationships. Logical keys in the DSV do not need to match physical constraints in the database, but should represent the correct business identity.

Third, leverage Named Queries and Calculated Columns strategically. While they provide flexibility, overuse can result in complex models that are hard to manage. Limit these customizations to scenarios where they add measurable value to the cube structure or user experience.

Fourth, document your model thoroughly. Use comments within the DSV and include business definitions for each table and column. Good documentation reduces dependence on individual developers and speeds up onboarding for new analysts and engineers.

Fifth, test DSV changes in a development environment before deploying them to production. Since DSVs underpin cubes and dimensions, any modification can have a wide-reaching impact. Unit tests, regression tests, and test processing should be part of the workflow for any DSV update.

Finally, align the DSV with the needs of your cube design. The DSV should serve as a clean, simplified abstraction over potentially messy source systems. It should present the data in a way that aligns with analytical goals, whether that means supporting historical trend analysis, current performance dashboards, or predictive modeling.

Real-World Use Cases and Implementation Scenarios

In real-world projects, Data Source Views play a central role in connecting disparate data sources into a unified analytical model. A global retailer, for instance, might have sales data in an ERP system, customer loyalty data in a CRM, and marketing campaign data stored in cloud-based platforms. By defining separate data sources and combining them in a DSV, the company can analyze customer behavior, campaign effectiveness, and regional sales trends in one integrated model.

In healthcare settings, hospitals often maintain separate systems for patient records, billing, and clinical trial data. A DSV allows these datasets to be joined logically, enabling population health analysis, treatment outcomes tracking, and operational efficiency reporting without compromising the integrity of the source systems.

In finance, DSVs support modeling complex hierarchies such as a chart of accounts, budget cycles, and legal entities. Multiple data sources are frequently used, such as regulatory systems, general ledger platforms, and internal performance tracking tools. The DSV serves as the bridge that brings these datasets together for multidimensional analysis and reporting.

Government agencies may use DSVs to consolidate data from various departments—such as housing, education, transportation, and finance—into a comprehensive model for resource planning and impact assessments.

Across all these industries, Data Source Views help bridge the technical and business divide. They abstract the raw structure of the source data into a more meaningful, usable form. This enables better decision-making, faster insight delivery, and more flexible adaptation to changing business needs.

Final Thoughts

Understanding and effectively utilizing Data Sources and Data Source Views (DSVs) is fundamental to building robust, scalable, and insightful business intelligence solutions in Analysis Services. These components serve as the critical link between raw data stored in various systems and the structured analytical models that deliver actionable insights to business users.

Data Sources provide the foundation by establishing secure and efficient connections to operational databases, data warehouses, and other enterprise systems. The flexibility to use both OLE DB and .NET providers ensures compatibility across a wide array of platforms and technologies. Through impersonation options and provider configurations, developers can manage data access in a way that aligns with both security requirements and project needs.

DSVs, on the other hand, offer a logical abstraction over these data sources. They allow you to shape, organize, and enrich your datasets before building dimensions, cubes, or mining models. The ability to define relationships, create Named Queries, and add Calculated Columns turns the DSV into a powerful data modeling layer that aligns technical structures with business logic.

Customization features like Named Queries and logical relationships make it possible to adapt to real-world scenarios where source data may be fragmented, overly normalized, or inconsistent. DSVs also help reduce dependencies on external ETL processes by allowing lightweight transformations within the model.

However, with this flexibility comes the responsibility of ongoing maintenance and governance. As source systems evolve, DSVs must be updated to reflect schema changes, remove deprecated objects, and validate relationships. Well-documented, logically structured, and regularly reviewed DSVs contribute to long-term project success and ease of collaboration across development teams.

In today’s data-driven world, where organizations rely on timely, accurate, and integrated information for strategic decisions, mastering the use of Data Sources and DSVs in Analysis Services is not just a technical skill—it is a strategic capability. Whether you’re building analytical solutions for retail, healthcare, finance, or government, leveraging these tools effectively can make the difference between a functional data warehouse and a truly impactful business intelligence platform.

By approaching your data modeling efforts with clarity, discipline, and attention to both business context and technical detail, you can ensure that your solutions are not only correct and performant but also meaningful to the people who rely on them every day.