Thursday, 24 April 2014

Sequence Container in SSIS - Different ways of using Sequence Containers

Integration Services includes the Sequence container, which makes it simple to divide the control flow in a package into groups of tasks and containers that you can manage as a unit.

Using Sequence containers lets you handle the control flow in a package in more detail, without having to manage individual tasks and containers.
If a package has many tasks, it can be helpful to group the tasks in Sequence containers.

I'll explain different scenarios where Sequence Container paly an important role to implement business rules.

Scenario 1:
  • Sequence Task SC1 must execute first. SC1 may have many task with or without precedence constraints. The group of these task must execute prior to any other tasks in the package.
  • Sequence Task SC2 must start after completion of SC1. SC2 must execute irrespective of Failure/Success of SC1.
  • Sequence Task SC3 contains Maintenence Plan Tasks and should be followed by SC2. SC3 must execute only after successfully execution of SC2. 

Scenario 2:

  • Sequence Task SC1 must execute first.
  • Sequence Task SC2 must start based on logical condition. This condition can be based on package variables. 
  • Sequence Task SC3 contains Maintenence Plan Tasks and should be followed by both SC1 and SC2. SC3 must execute after successfully execution of SC1. SC3 should execute regardless of SC2 but if SC2 executes, SC3 must execute after SC2.

  Scenario 3: This is real scenario in many BI applications.

  • SC1 must execute prior to other task. This Sequence Container may contain many tasks to pull data from hetrogenious data sources.
  • SC2 should be followed by SC1. This can be used to Insert/Update dimension tables and fact tables along with intermediate tables (if any). This container may further contain Sub - Sequence Containers e.g. SC2(a) to handle dimension Insert/Update and SC2(b) to update Intermediate and Fact tables.
  • SC3 should be followed by SC2. This may contains SSAS Tasks to Build/Prcoess Cubes. This container may also have Sub - Sequence Containers e.g. SC3(a) to check whether Current Month partition exists or not. If Yes than Do Nothing Else Create Current Month Partitions using Script task. SC3(b) can be used to Process Dimensions and Current Month Partitions.
  • SC4 can be used for Maintenence Plan tasks. If SC3 successfully completed (Evolution operation - Expression and Constraint) than take Backup of Cube, Shrink Datamart and Take Backup of Datamart.
  • SC5 can be used for Recovery Paln. If SC3 fails, Restore Cube from previos & Latest Backup available..

However, there are many more ways of utilizing Sequence Containers. Further, things depends on complexity of business requirement rules defined.

Sequence Container in SSIS

The Sequence Container defines a control flow that is a subset of the control flow in a package. Sequence containers group the package into multiple separate control flows, each containing one or more tasks and containers that run within the overall package control flow.

There are some benefits of using a Sequence container which are mentioned below:
  • Provides the facility of disabling groups of tasks to focus debugging on one subset of the package control flow.
  • Managing multiple tasks in one location by setting properties on a Sequence Container instead of setting properties on the individual tasks.
  • Provides scope for variables that a group of related tasks and containers use.
For more information about using Sequence Containers, see Different ways of using Sequence Containers
Using Sequence Containers lets you handle the control flow in more detail, without having to manage individual tasks and containers. For example, you can set the Disable property of the Sequence container to True to disable all the tasks and containers in the Sequence container.

If a package has many tasks then it is easier to group the tasks in Sequence Containers and you can collapse and expand Sequence Containers. 
Note: You can also create task groups which collapse and expand using the Group box this is a design-time feature that has no properties or run-time behavior.

Now I'll explain you how to use sequence Container in a package.
1. To begin, right click on SSIS Packages folder in Solution Explorer and click New SSIS Package. Rename it with SequenceContainer.dtsx as shown below:













2. Add a package variable DayOfWeek









3. Drag and drop Script Task from toolbox. This task will be used to initialize DayOfWeek variable with current day of week. Rename Task name as Script Task - Set DayOfWeek.
4. Double click on Script Task to open Script Task Editor. Enter User::DayOfWeek in ReadOnlyVariables property.
5. Click on Edit Script... and write below code in Main() function of ScriptMain.cs:

I have just attached an image of code to avoid formatting problems of the code.

6. Drag and Drop 7 Sequence Container tasks from Toolbox and rename the task on Week Days e.g. "SC - Sunday", "SC - Monday" etc.
I am not placing any controls flow items inside Sequence Container to narrow down the example and focus more on how to use sequence container.

7. Connect all these Sequence Containers to Script Task. Now double click on green arrow to open Precedence Constraint Editor. Select Evaluation operator as Expression and Constraint and Expression as @DayOfWeek == "Sunday". Click OK to close and save changes. The Expression for remaining task will be differ as per week day e.g.@DayOfWeek == "Monday" for Monday and so on.


















8. Save the package. Now right click on the package in Solution Explorer and execute the package.
















9. This package will execute only one sequence container as per expression set for the precedence constraints.

















This is just an example of how we can use Sequence Container. But we can use this in many ways which depends on the requirement.

Create Shared Data Source in SSRS

A shared data source is a set of data source connection properties that can be referenced by multiple reports, models, and data-driven subscriptions that run on a Reporting Services report server. It provide an easy way to manage data source properties in our Report Project.

When we create a Data Source we can specify it as shared. In general rule you probably want to create Shared Data Sources. If a Data Source isn’t shared it just means that its definition is stored inside the report and it cannot be shared with another report and if Data Source is Shared it means that its definition is stored outside the report.
In this section we will create a Shared Data Source.
1.  After Creating Sql Server Report Project, Right-click on Shared Data Sourcesfolder in your project and Click on Add New Data Source.
Create Shared Data Source in SSRS-1
2. A new window Shared Data Sources Properties pop-up as shown below. Give a name to your shared data source.
As our database is AdventureWorks, I am giving a name sdsAdventureWorks.
Create Shared Data Source in SSRS-2
3. Then select type of DataSource. Here we are going to Connect Sql Server database so we select Microsoft Sql Server. If you want to connect any other database then select that database from Drop Down List.
Create Shared Data Source in SSRS-3
4.  Then click on Edit button to the right. A new window Connection Propertiespops-up.In this window give your sql server name, sql server credentials and select AdventureWorks Database.
Create Shared Data Source in SSRS-4
5.  Connection Properties Window will be different based upon your data source.Here you can also test your database connection. To check connection click on Test Connection Button in left bottom corner. If succeed than it will give below message.
Create Shared Data Source in SSRS-5
Then click on OK button of Test Results Window. After than click on OK button of Connection Properties Window.
6.  So now you can see the connection string in your Shared Data Source Properties Window.
Create Shared Data Source in SSRS-6
Now click on Ok button.
7.  So your new shared data source will be created under Shared Data Source folder in your project. You can create multiple shared data source in your project.
Create Shared Data Source in SSRS-7
Congratulations! Our Shared Data Source is created.

Recursive Hierarchy Group in SSRS 2008

In this article, I am going to explain how to create a Recursive Hierarchy Group in SSRS 2008 to display hierarchical levels. This is helpful when we want to display hierarchical data in a report e.g. employees in an organizational chart as shown below: 


Pre-requisites for this article:
  1. SQL Server 2008 along with SSRS installed
  2. Business Intelligence Development Studio (BIDS)
  3. AdventureWorksDW2008R2 database. Click here to install this database from codeplex.
  4. Familier with SSRS
Below are the stpes to create Recursive Hierarchy Group:
STEP 1: To begin, create new report in your existing solution. Add new Shared Data Source. If you don't know how to create a shared data source, click Creating Shared Data Source in SSRS 2008.

STEP 2: Create a new dataset for the report. Just right click on the data source and click Add Dataset... to create new data set as shown below:
In Dataset Properties window, enter dsMain in Name textbox and below T-SQL code in Query textbox:
SELECT [EmployeeKey],[ParentEmployeeKey
,[FirstName] + ' ' + ISNULL([MiddleName],'') + ' ' + [LastName] EmployeeName],[Title]
FROM [AdventureWorksDW2008R2].[dbo].[DimEmployee]
Finally click OK to close the window and save changes. The Report Data will look like below image:

STEP 3: Drag and drop Table control and map with dsMain dataset. Select EmployeeName in first column, Title in second column and =Level() expression in the third column. Set following properties for table control:
  • Header Background color = DarkBlue
  • Header Color = White
  • Font = Verdana, 8pt


STEP 4:  Set Group Properties to reference Recursive Parent. Select table control --> click View and check Groupingas shown below:

Open Row Group properties, click on add button and select [EmployeeKey] in Group on box.
Click on Advance tab of Group Properties window and select [ParentEmployeeKey] in Recursive parent box. Click OK and save changes.
STEP 5: We are done with report development. Click on Preview to generate the report. You can see a report like one shown below:
Here, you can see the Employee Hierarchy but not as per standard. We need to format the text to make it better report. Do below formating for EmployeeName textbox:
1) Enter below expression in FontWeight under Font Property:
=IIF(Count(Fields!EmployeeName.Value, "Details", Recursive) > 1, "Bold", "Normal")

2)  Enter below expression in LeftIndent under Indent Property:
=CStr(2 + (Level()*20)) + "pt"

Again Preview the report, report will look like below image:

Wednesday, 24 July 2013

SQL Server complete : Part:-3 Normalization , 1NF, 2NF, 3NF

Normalization
The relational model also defines normalization rules (also known as normal forms). Normalization is a formal mathematical process to guarantee that each entity will be represented by a single relation. In a normalized database you avoid anomalies upon data modification and keep redundancy to a minimum without sacrificing completeness. If you follow Entity Relationship Modeling (ERM), and represent each entity and its attributes, you probably won't need normalization; rather you will apply normalization only to reinforce and assure that the model is correct. The following sections briefly cover the first three normal forms (1NF, 2NF, and 3NF) introduced by Codd.

1NF
The first normal form says that rows in the table must be unique, and attributes should be atomic. This is a redundant definition of a relation; in other words, if a table truly represents a relation, it is in first normal form.

Uniqueness of rows is achieved by defining a unique key in the table.

You can only operate on attributes with operations defined as part of the attribute's datatype. Atomicity of attributes is subjective in the same way that the definition of a set is subjective. For example, take an employee name in an Employees relation. Should it be expressed with one attribute (fullname), two (firstname and lastname), or three (firstname, middlename, and lastname)? That depends on the application. If the application will need to manipulate the employee name parts separately (such as for search purposes), it makes sense to break them apart; otherwise, it doesn't.

In the same way that an attribute might not be atomic enough based on the needs of the application, an attribute might be subatomic. For example, if an address attribute is considered atomic for a given application, not including the city as part of the address would violate the first normal form.

This normal form is often misunderstood. Some people think that an attempt to mimic arrays violates the first normal form. An example would be defining a YearlySales relation with the following attributes: salesperson, qty2006, qty2007, and qty2008. However, in this example you don't really violate the first normal form; you simply impose a constraint—restricting the data to three specific years: 2006, 2007, and 2008.

2NF
The second normal form involves two rules. One rule is that the data must meet the first normal form. The other rule addresses the relationship between nonkey and candidate key attributes. For every candidate key, every nonkey attribute has to be fully functionally dependent on the entire candidate key. In other words, a nonkey attribute cannot be fully functionally dependent on part of a candidate key. Informally, if you need to obtain any nonkey attribute value, you need to provide the values of all attributes of a candidate key from the same row. You can find any value of any attribute of any row if you know all attribute values of a candidate key.

As an example of violating the second normal form, suppose that you define a relation called Orders that represents information about orders and order details (see Figure 1-1). The Orders relation contains the following attributes: orderid, productid, orderdate, qty, customerid, and companyname. The primary key is defined on orderid and productid.


The second normal form is violated in Figure 1-1 because there are nonkey attributes that depend only on part of a candidate key (the primary key in this example). For example, you can find the orderdate of an order, as well as customerid and companyname, based on the orderid alone. To conform to the second normal form, you would need to split your original relation into two relations: Orders and OrderDetails (as shown in Figure 1-2). The Orders relation would include the attributes: orderid, orderdate, customerid, and companyname. The primary key is defined on orderid. The OrderDetails relation would include the following attributes: orderid, productid, and qty, with the primary key
defined on orderid, productid.


3NF

The third normal form also has two rules. The data must meet the second normal form. Also, all nonkey attributes must be dependent on candidate keys nontransitively. Informally this rule means that all nonkey attributes must be mutually independent. In other words, one nonkey attribute cannot be dependent on another nonkey attribute.

Our Orders and OrderDetails relations now conform to the second normal form. Remember that the Orders relation at this point contains the attributes orderid, orderdate, customerid, and companyname with the primary key defined on orderid. Both customerid and companyname depend on the whole primary key—orderid. For example, you need the whole primary key to find the customerid who placed the order. Similarly, you need the whole primary key to find the company name of the customer who placed the order. However, customerid and companyname are also dependent on each other. To meet the third normal form, you need to add a Customers relation (shown in Figure 1-3) with attributes customerid (primary key) and companyname and remove the companyname attribute from the Orders relation.




Informally, 2NF and 3NF are commonly summarized with the sentence: "Every nonkey attribute is dependent on the key, the whole key, and nothing but the key—so help me Codd."


There are higher normal forms beyond Codd's original first three normal forms that involve compound primary keys and temporal databases.

SQL Server complete : Part:-2 Missing Values & Constraints

Missing Values

One aspect of the relational model is the source of many passionate debates—whether propositions should be restricted to use two-valued predicate logic. That is, using two-valued predicate logic, a proposition is either true or false. If a proposition is not true, it must be false. However, some say that there's room for three-valued predicate logic (or even four-valued), taking into account cases where something is unknown. Take, for example, a cellphone attribute of an Employees relation. Suppose that a certain employee's cell phone number is missing. How do you represent this fact in the database? In a three-valued logic implementation, the cellphone attribute should allow a special mark for a missing value.

Some people believe that three-valued predicate logic is nonrelational, while others believe that it is relational. Codd actually advocated four-valued predicate logic, saying that there were two different cases of missing values: missing but applicable, and missing but not applicable. An example of missing but applicable is when an employee has a cell phone, but we don't know what the cell phone number is. An example of missing but not applicable is when an employee doesn't have a cell phone at all. Per Codd, two special markers should be used to support these two cases of missing values. SQL implements three-valued predicate logic by supporting the NULL mark to signify the generic concept of a missing value. Support for NULLs and three-valued predicate logic in SQL is the source of a great deal of confusion and complexity, though one can argue that missing values are part of reality and the alternative—using two-valued predicate logic—is no less problematic.

Constraints

One of the greatest benefits of the relational model is having data integrity defined as part of the model. Integrity is achieved through rules, or constraints, that are defined in the data model and enforced by the RDBMS. The simplest methods of enforcing integrity are the attribute type and its NULLability (whether it supports or doesn't support NULLs), which enforce domain integrity. Constraints are also enforced through the model itself; for example, the relation Orders(orderid, orderdate, duedate, shipdate) allows three distinct dates per order, while the relations Employees(empid) and EmployeeChildren(empid, childname) allow zero to countable infinity of children per employee.

Other examples of constraints include candidate keys that provide entity integrity and foreign keys that provide referential integrity. A candidate key is a key defined on one or more attributes preventing more than one occurrence of the same tuple (row) in a relation. A predicate based on a candidate key can uniquely identify a row (such as an employee). You can define multiple candidate keys in a relation. For example, in an Employees relation you can define candidate keys on employeeid, on ssn (social security number), and others. One of the candidate keys is arbitrarily chosen as the primary key (say, employeeid in the Employees relation), and is used as the preferred way to identify a row. All other candidate keys are also known as alternate keys.


Foreign keys are used to enforce referential integrity. A foreign key is defined on one or more attributes of a relation (known as the referencing relation) and references a candidate key in another (or possibly the same) relation. This constraint restricts the values in the referencing relation's foreign key attributes to the values that appear in the referenced relation's candidate key attributes. For example, say that the Employees relation has a foreign key defined on the attribute departmentid, referencing the primary key attribute departmentid in the Departments relation. This means that the values in Employees.departmentid are restricted to the values that appear in Departments.departmentid.

SQL Server complete : Part:-1 Introduction & The Relational Model

Introduction


SQL is both an ANSI and ISO standard language based on the relational model, designed for querying and managing data in an RDBMS.

In the early 1970s, IBM developed a language called SEQUEL (short for Structured English QUEry Language) for their RDBMS product called System R. The name of the language was later changed from SEQUEL to SQL because of a trademark dispute. SQL first became an ANSI standard in 1986, and an ISO standard in 1987. Since 1986, ANSI and ISO have been releasing revisions for the SQL standard every few years. So far, the following standards have been released: SQL-86 (1986), SQL-89 (1989), SQL-92 (1992), SQL:1999 (1999), SQL:2003 (2003), SQL:2006 (2006), and SQL:2008 (2008).

Interestingly, SQL resembles English and is also very logical. Unlike many other programming languages, SQL requires you to specify what you want to get and not how to get it. The task of the RDBMS is to figure out the physical mechanics of processing your request.

SQL has several categories of statements, including Data Definition Language (DDL), Data Manipulation Language (DML), and Data Control Language (DCL). DDL deals with object definitions and includes statements such as CREATE, ALTER, and DROP. DML allows you to query and modify data and includes statements such as SELECT, INSERT, UPDATE, DELETE, and MERGE. It's a common misunderstanding that DML includes only data modification statements, but as I mentioned, it also includes SELECT. DCL deals with permissions and includes statements such as GRANT and REVOKE. I will focus on DML.


The Relational Model

The relational model is a semantic model for representing data and is based on set theory and predicate logic. It was created by Dr. Edgar F. Codd, and later explained and developed by Chris Date, Hugh Darwen, and others. The first version of the relational model was proposed by Codd in 1969 in an IBM research report called "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks." A revised version was proposed by Codd in 1970 in a paper called "A Relational Model of Data for Large Shared Data Banks" published in the journal Communications of the ACM.

The goal of the relational model is to enable consistent representation of data with minimal or no redundancy and without sacrificing completeness, and to define data integrity (enforcement of data consistency) as part of the model. An RDBMS is supposed to implement the relational model and provide the means to store, manage, enforce the integrity of, and query data. The fact that the relational model is based on a strong mathematical foundation means that given a certain data model instance (from which a physical database will later be generated), you can tell with certainty when a design is flawed, rather than relying solely on intuition.

The relational model involves concepts such as propositions, domains, n-ary relations, n-tuples, ordered pairs, and so on. For non-mathematicians these concepts can be quite intimidating. The sections that follow cover some of the key aspects of the model in an informal, nonmathematical manner and explain how they relate to databases.


Propositions, Predicates, and Relations

The common belief that the term relational stems from relationships between tables is incorrect. "Relational" actually pertains to the mathematical term relation. A relation is a representation of a set in set theory. In the relational model, a relation is a set of related information, with the implementation in the database being a table. A key point in the relational model is that a single relation should represent a single set (for example, Customers). It is interesting to note that operations on relations (based on relational algebra) result in a relation (for example, a join between two relations).

When you design a data model for a database, you represent all data with relations (tables). You start by identifying propositions that you will need to represent in your database. A proposition is an assertion or a statement that must be true or false. For example, the statement "employee Itzik Ben-Gan was born on February 12, 1971 and belongs to department IT" is a proposition. If this proposition is true, it will manifest itself as a row in a table of Employees. A false proposition simply won't manifest itself.

The next step is to formalize propositions. You do this by taking out the actual data (the body of the relation) and defining the structure (the heading of the relation)—for example, by creating predicates out of propositions. The heading of a relation comprises a set of attributes. Note the use of the term "set"; in the relational model, attributes are unordered. An attribute is identified by an attribute name and a domain (type) name. For example, the heading of an Employees relation might consist of the following attributes (expressed as pairs of attribute name and type name): employeeid integer, firstname character string, lastname character string, birthdate date, departmentid integer. A domain or type is one of the most fundamental relational building blocks. A domain is the set of possible/valid values for an attribute. For example, the domain INT is the set of all integers in the range–,147,483,648 to 2,147,483,647. A domain is one of the simplest forms of a predicate in our database because it restricts the attribute values that are allowed. For example, the database would not accept a proposition where an employee birth date is February 31, 1971 (not to speak of birth date "abc"!). Note that omains are not restricted to base types such as integers or character strings. For example, a domain can be an enumeration of possible values, such as an enumeration of possible job positions. A domain can be complex. Probably the best way to think of a domain is as a class—encapsulated data and

the behavior supporting it. An example of a complex domain would be a geometry domain that supports polygons.