We continue our discussion of modern data consultancies with a consideration of the various roles required to build one. Depending on the type, the roles will vary, but there is a common core to all three types, and these shared roles is what we will consider here. We will divide those roles into two categories: technical and commercial. The technical roles are responsible for building and delivering analytical value, while commercial roles are responsible for the acquisition of work for the consultancy and managing the relation with the consultancy’s clientele. We will begin our discussion of roles by presenting an example architecture. This example architecture will be used to clarify the responsibilities of the technical roles as they are introduced.

Prologue: An Example Architecture

Let us assume that our client wishes to build an entire platform to address all of their analytical needs. The client has a variety of transactional and non-transactional source to pull data from. The data comes in structured, semi-structured and unstructured flavors, and the expectation is that the platform should be able to handle all three variations gracefully. The client also expects analysis to performed both in real time and in batch mode over historical data. The community within the client’s organization to consume the data maintained on the platform will include a mix of analysts and scientists that will enrich the data with their work, and we are expected to be able to feed the insights gleaned back into the platform to further enhance the value of the data maintained on it.

One possible architecture that might be a good fit for this scenario is a data lakehouse. In very simple terms, a data lakehouse combines the best features of a data lake and a data warehouse. The data lake maintains the incoming data in varying stages of refinement from raw to curated. The data warehouse presents some (or all) of the data maintained in the data lake in a structured form based on dimensional modeling (or another similar modeling strategy) for analytical consumption. The data warehousing capacity could be completely virtual (as in the Data Lakehouse architectures presented by Data Bricks) or a physical one (in an architecture sometimes referred to as a Modern Data Warehouse). Data maintained on both the data lake and data warehouse is routinely enriched using machine learning capacities, or at least that is the expectation. Data from both the data lake and data warehouse is consumed by a number of tools in the form of reports, dashboards and custom applications. An example of this over simplified architecture can be seen Figure 1 below. A cohort of professionals will be required to build out this architecture and maintain it, and we will use this simple architecture to discuss the roles involved in building the different parts of this architecture below.

A Simplified Data Lakehouse Architecture
Figure 1: A Simplified Data Lakehouse Architecture

Technical Roles

The core roles required to build and deliver analytical value to the consultancy’s clients are:

  • Data Engineers
  • Data Scientists
  • Data Analysts
  • DataOps Engineers
  • MLOps Engineers

Let us take each of these roles and do a deep dive on their responsibilities and contributions to the consultancy’s offering.

Data Engineers

Data engineers are responsible for the acquisition of data from their originating systems, transforming it and presenting it for analytical consumption. This translates in data engineers selecting an appropriate analytical architecture for the client’s needs, building and maintaining it. The analytical architecture specifies how a set of tools (open source tools or potentially from different vendors) would be used to acquire and ingest data from sources of interest, cleanse it, integrate it, transform it to different datasets, and present said datasets to consumption. The main consumers in this regard would be the data scientists and data analysts (discussed shortly). The work of data scientists and analysts would result in insights, which can be used to enrich the body of data maintained by the engineers. The data engineers would also be responsible in facilitating the capture of the insights discovered by the scientists and analysts, and integrating those insights with the data maintained on the organization’s analytical infrastructure in a positive feedback loop. Senior data engineers are usually promoted to be Data Architects – responsible for the selection of an analytical architecture, with the implementation falling to the less senior data engineers. Additionally, data engineers can be involved in pre-sales technical activities, such as contributing in demonstrations or proof of concepts. The sections of our example architecture where data engineers and architects would be involved is highlighted in Figure 2.

Figure 2: An example of the responsibilities of a Data Engineer. The highlighted arrows and processes indicate the potential flows a data engineer would be responsible for building and maintaining in this example.

Data Scientists

Data Scientists use mathematics, statistics and computer science to extract knowledge and insights from structured, semi-structured and unstructured data. Data scientists are one of the main consumers of an organization’s analytical infrastructure and their job is to derive insights from latent data assets. The tools used to derive insights usually take the form of machine learning and artificial intelligence models. These models are used to provide predictive (projecting performance into the future), prescriptive (uncovering possible solutions to foreseen problems before they occur) and cognitive (uncovering hidden patterns in data that can be used to gain value) analytics, and the insights are usually incorporated into the main body of maintained data with the help of data engineers. Since the form of the data used in a data scientist’s work depends on the tools and models at hand, they are given access to all forms of data (unstructured, semi-structured and structured) and at various levels of refinement, and it is up to them to use the form of data that they see fit for their purposes. The sections of our example architecture where data scientists would be involved is highlighted in Figure 3.

Figure 3: An example of the responsibilities of a Data Scientist. A Data Scientist would be mainly responsible for the enrichment of data at any stage, and feeding that enriched data back into the data lakehouse in this example.

Data Analysts

Data Analysts build reports and dashboards to enable discovery, insights and monitoring of information maintained on an organization’s data infrastructure. Data analysts are the second main consumer of an organization’s analytical platform, and their job is to primarily develop dashboards and reports to provide descriptive analytics – providing a complete picture of where the organization stands now and where it stood in the past. Data analysts usually consume data that has been highly refined by the pipelines built by data engineers, and present it in a visual format to stakeholders of concern. Since data analysts are primarily focused on descriptive analytics, it is not uncommon to find that data analysts have industry experience and are familiar with business metrics and KPIs of concern.

Figure 4: An example of the responsibilities of a Data Analyst. A Data Analyst is largely responsible for the presentation of data maintained within the data lakehouse in this example for analytical consumption.

DataOps Engineers

DataOps Engineers are an emerging support role to support the consistent, safe and rapid development of analytical pipelines. Their main responsibility lies in providing development and test environments for all three of the aforementioned roles, and to facilitate the running of tests both during development and in production to ensure the integrity of the pipelines themselves. The tests developed during development ensure that the pipelines are built correctly, and continuing to run them during production ensures that the pipelines stay in a consistent state and help alert us if there are any deviations from the expected running parameters. Although the tests are mainly authored by the data engineers, the DataOps engineers facilitate their running also during production.

MLOps Engineers

MLOps Engineers are a role targeted at supporting Data Scientists build and deploy machine learning models in a rapid and reliable manner. They are responsible for setting up environments that allow the development of machine learning models, deploying models to production, maintaining a pantheon of models deployed over time, maintaining a feature catalog and maintaining a dataset catalog amongst other things.

Commercial Roles

Along with the technical roles mentioned so far, a consultancy would require a number of additional roles to feed the consultancy with new clientele, maintain relations with existing ones and keep the doors open. Since this series of articles is more of a meta-blueprint instead of an actual one, the discussions in this section would not be detailed or as specific as the those in the technical roles, and it is left to the reader to flesh out these roles more during implementation.

The first, and perhaps the most important of the commercial roles, is sales. The sales unit is responsible for feeding the consultancy with new clientele. For the sales unit to be successful, it must be acutely aware of the organization’s current capacities (in terms of their knowledge and experience in different architectures, tools, industries, etc.) in order to be able to approach potential clients and convert them to actual ones. The sales unit also needs to be aware of the current bandwidth the teams within the consultancy so as to avoid over allocating them, potentially compromising the quality of the solutions being delivered.

The second most important role is that of marketing. The marketing unit is responsible for positioning the consultancy within the market and potentially identify new areas in the market where the consultancy can be competitive. Positioning the consultancy within the market is a function of the type of consultancy, its current capabilities and its targeted market segment, and it is up to to the marketing unit to define this function in concert with sales and the technical roles. In order to identify potential future market segments, constantly updating the marketing unit with the latest trends in analytics and/or industry would help the marketing unit in identifying new areas to penetrate.

Finally, we have account management. Account managers are responsible for the ongoing relationship with the client during their tenure with the consultancy. Their main responsibility is to maintain a high level of customer satisfaction and identifying opportunities to further extend the relationship and increase life time value.

Along with these commercial roles, a consultancy would require the regular cohort of business units (such as Human Resources, Accounting, Finance, etc.) in order to be a viable business. These functions can be shared with other units within an organization if the consultancy is an internal one and not a stand alone business.

Variations based on Type of Consultancy

Depending on the type of consultancy being set up, there might be slight variations to the backgrounds associated with the different roles mentioned above. For Brokers, the pool of talents being brokered is highly dependent on the clients engaged, and so keen awareness of the client’s operational parameters is key to acquiring and brokering technical roles with the correct skill sets and experience. For Specialists, hiring personnel with industry (as in the one targeted by the consultancy) experience across all roles would be key in being able to speak the customer’s language, being aware of industry challenges they might be facing and helping them address them. It is not uncommon to see industry experts being resident in a Specialist type consultancy as this ensures that awareness of the challenges met by potential clientele within the consultancy itself, as well as potentially serving a double role in approaching clientele during the early stages of client engagement. Finally, for Augmentors, the focus here would be in hiring personnel with skill sets and experience relevant to the consultancy’s core technological competencies. Sales, Marketing and Account Management can be hired from consultancies with similar profiles in the services sector.

Epilogue

The roles presented so far are according to my particular point of view. Every organization has its own quirks when it comes to implementation, and it is not uncommon to see organizations using the role titles to refer to other ones. You can, for example, observe an organization advertise for acquiring a data scientist, but the job description lists the responsibilities of a data engineer as described above. Another quirk is overlap, and we can, for example, observe a data scientist that pulls double duty as both a data scientist and data engineer. The views presented in this series are meant to be a guideline and not a rule. You are free to implement this meta-blueprint as you see fit.