Azure Data Factory or SSIS

I have been a on few calls with my customers lately and there seems to be confusion when it comes to Azure Data Factory (ADF) and how it compares to SSIS. So I thought I would post a blog entry on it.

First ADF is not “SSIS in the cloud”. What makes them different you might ask? SSIS is an Extract-Transfer-Load(ETL) tool, but ADF is a Extract-Load Tool(ELT), as it does not do any transformations within the tool, instead those would be done by ADF calling a stored procedures on a SQL Server that does the transformation, or calling a Hive job, or a U-SQL job in Azure Data Lake Analytics, for example. Think of it more of an orchestration tool. SSIS has the added benefit of doing transformations in memory and is extremely fast utilizing this method of ETL, keeping in mind the performance of any transformation depends on the power of the server hardware that SSIS is installed on.

Here is a table of some of the other differences that may be helpful:

  • Azure Data FactorySQL Server Integration ServicesCloud-based service (via ADF editor in Azure portal)
  • SSIS is a desktop tool (via SQL Server Data Tools)
  • PaaS service does not require hardware or any installation
  • Requires a decent sized server that you manage and install SQL Server with SSIS
  • Uses JSON scripts for its orchestration (coding)
  • Uses drag-and-drop objects (very little to no coding)
  • Pay-as-you-go via an Azure subscription
  • License cost as part of SQL Server
  • Can spin up HDInsight clusters and run Pig and Hive scripts for BIG Data
  • Can work with HDInsight and run other scripts via the Azure Feature Pack for Integration Services (SSIS)
  • ADF has a basic editor and no intellisense or debugging as of authoring of this blog.
  • Powerful GUI, intellisense, and debugging
  • Administered via the Azure portal
  • Administered via SSMS
  • Does not have a programming SDK, has automation via PowerShell, and no third-party components
  • Programming SDK, automation via BIML, and third-party components
  • No Error Handling
  • Extensive Error Handling
  • Has “data lineage“, tagging and tracking the data from different sources
  • SSIS does not have Data Lineage
  • Currently a limited range of data sources and destinations
  • Wide range of supported data sources and destinations

Think of ADF as a complementary service to SSIS, with its main use case confined to dealing with big data in the cloud. In my opinion I would expect this product offering to evolve or be replaced with SSIS or Informatica functionality(since Microsoft has a large multi-billion investment in Informatica).