The F&O Twist on Process Mining – Ingest from OneLake

Ingest from Onelake

Investments and improvements are made by Microsoft on many different solutions. You hardly won’t miss the updates around Copilot and anonymous agents. In this post, I will share information about a new feature within Process Mining which is at the time of writing this post in preview. A new option is available to load event logs directly from Fabric OneLake.

Ingest data

In previous posts, I shared my experiences using a Dataflow to ingest data for analyzing a process. For example, you can read the blog where I created a new process with an event log. Using the Dataflow, you can get data from several connectors and transform it before Process Mining starts analyzing the report.

When I got interested in Process Mining, my first question was how to get an event log from Microsoft Dynamics 365 processes like order to cash and procure to pay. You would need a Dynamics 365 and data model expert to help your organization get event logs with not only the minimum but also additional attributes for analyzing the processes. Additional attributes can be very helpful in finding root causes for situations where e.g. processes have multiple paths.

When taking the definition and purpose of Microsoft Fabric, it is stated as being a unified, AI-powered data platform to simplify data management and analytics. If combines several Azure cloud services and should reduce the complexity of operations, like extract, transform, and load (ETL) management. In case you need to transform ERP transactions from processes, it comes with a lot of complexity as you can read in my previous Process Mining post: Order to Cash example.

Fabric OneLake

In my previous post, I showed that several tables were enabled to be available in Microsoft Fabric by enabling the Fabric link in the Power Platform. This simplified the part to have Dynamics 365 data externally available where otherwise data entities would need to be used or developed. With a union query, a view was created that combined data from several tables. This was the base for an event log to be used in Process Mining. To be able to load the data into Process Mining, I had to use the Azure SQL server connector with the Dataflow option when creating the process. That requires an additional lookup of the connection string and database name to load data from the view into my process. For both tables and files, there was no direct connector.

Currently, in preview, you will find a new option to load data into a process. As you can see in the screenshot below, a third option is provided as type to import data from.

The first time I tried this option, I could only see an option to get data from files in my OneLake. Today, there is an option to choose between files and tables. There are some prerequisites you need to fulfill before you can load data from your OneLake. The most important setting is granting the Admin role to the user setting up the workspace and the Process Insight Prod service principal. In addition, the workspace you need to load data from should not be your My workspace. You can read more about the OneLake ingestion feature on Microsoft Learn: Ingest files from Fabric OneLake (preview) – Power Automate.

Let’s explore the next steps after selecting the OneLake (preview) option.

In the first step, you will need to provide the Fabric workspace and the Lakehouse. You can then continue to step 2 where you can choose the data source. You can choose between Files and Tables. The view I created in my previous example is not visible. It lists the tables (at this moment) only. So either there is a requirement to copy all data into a file or another table. In this case, I used a Data pipeline that creates a file in one of the subfolders. How I created this file will be explained later in this post.

When you click next to the name, an option will be selected and this will trigger Copilot to search for processes. In case you click on the name itself, e.g. on Files, it will open the files option and show subfolders or files, depending on how this is organized in your Lakehouse.

When you select a specific folder, again Copilot will give a distinct summary of the file(s) in that subfolder and let you know if and what type of process is discovered. This is helpful in case there are a lot of subfolders or folders with less meaningful names.

You can click on the subfolder name to see the individual files or directly click Next. In case you click next, it is assumed that all files in the folder belong to the same single process.

In the last step, you need to link the Attribute types to the attributes as provided in the file. Then you can click Save and analyze which will then directly create the process report.

Note that using this OneLake data import option, there is no additional Power Query to transform the data before starting the report. It is considered that all data available in OneLake is easy to use for the consumer. In this case, the file I created should be fully ready for the user. In case a user should create the process only for the USMF company, there should be a file present with data for this legal entity only.

Suppose a user wants to be able to analyze a single legal entity or multiple in different process reports, then you can consider creating a file per legal entity placed in a single or multiple subfolders. This OneLake option does not take away the complexity of the data scientist, albeit it makes it easier for the user to create the process mining map with data directly available in OneLake.

Data pipeline

As mentioned above, the view I created in the SQL endpoint of the Lakehouse is not available to be used using the OneLake data import in Process Mining. You need to get data into another table or in the files section. If you are new to this, you can follow a walkthrough available on Microsoft Learn: Copy sample data into Lakehouse and transform with dataflow – Microsoft Fabric.

In this scenario, I created a new pipeline where it is possible to provide a query, read tables, and copy it to a file.

Like the Order2Cash example, I need to use the Azure SQL database option to read the various tables in OneLake

The server name and database name should be provided. Then click Next.

In my post about process mining for the Order2Cash process, I shared a query. You can paste this query in this step. Then click Next.

You can decide where to copy the data. You can create a new Lakehouse in your Fabric workspace or select an existing Lakehouse by selecting the correct one at the bottom of the dialog.

In this step, you can provide the file details. You can select the subfolder where you want to create the file. Note that the filename should get the required extension as without a supported extension the file will not be visible in Process mining.
Then on the next step, you can provide details about the file format. I excluded a screenshot as I did not make changes.

In the last step, you can review the summary and start the transfer. Click Save + Run to finish the data pipeline.

On the created pipeline you can make some changes, like renaming the copy data step and schedule a recurrency for the copy task.

Conclusion

The OneLake data import option gives an option to separate tasks between a data specialist and the end user consuming the event log for process mining. There is no difference in complexity between creating a view in the SQL endpoint of your Lakehouse or create the data pipeline to query the tables and create the file.

The end user will have an easy option to find and select the data source for his process mining report but is not able to transform data using a Power Query interface when using the OneLake option. If that is required, the Dataflow option should be used.



I do hope you liked this post and will add value for you in your daily work as a professional. If you have related questions or feedback, don’t hesitate to use the Comment feature below.


That’s all for now. Till next time!

2 replies
  1. Otto Buurman
    Otto Buurman says:

    Hi André, have you considered using the Virtual Entity functionality on the PowerPlatform? Any CRUD action can trigger a PowerAutomate flow, creating or updating a file or table that can be used as the source data for Process Mining.

    Reply
    • André Arnaud de Calavon
      André Arnaud de Calavon says:

      Hi Otto,

      Yes, I have considered this. I used this for some POCs and evaluating the possibilities of process mining. Depending on the process and number of records or updates, a flow can update a file or table. Actually, my first real use case with an ISV solution was maintaining an event log in a Dataverse table filled by a Power Automate flow.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.