What is Big Data? We usually talk about Big Data when traditional data processing methods can no longer cope with the volume, velocity, variety and reliability of the data. While there is no specific amount of data that can be labelled as Big Data, it is usually a data set that exceeds the capabilities of traditional databases and data processing tools. What should you do if you need to analyse such a huge data set and don't have immediate access to a fully-fledged Big Data platform? In this article, we will look at how you can use available tools to gain insight into this Big Data when no specialised solution is yet available.
One such tool is Power Query. This is a powerful data transformation tool that integrates into Excel and Power BI and provides an efficient way to gain insights into Big Data. It allows you to perform initial data exploration and preparation before performing more complex analyses. Whilst Power Query is not a complete Big Data solution in itself, it is a valuable aid to understanding and preparing data for further processing
Power Query plays a crucial role in the early stages of your workflow before you can process your data. It acts as a bridge between different data sources. Whether your data is in large CSV files, large databases or elsewhere, Power Query can make the connection and extract the information you need. However, it is often impractical to load millions or billions of rows directly into Excel. This is where Power Query’s ability to analyse different data sets comes into play.
It is possible to obtain smaller, representative samples of Big Data for analysis using different methods. Obtaining the first N rows provides quick insights, while selecting a random sample provides a more statistically sound set. Filtering by specific criteria, such as transactions from a particular region or period, allows one to focus on relevant subsets of the data. Once a manageable sample of data has been selected, you can utilise Power Query’s data preparation functions. Common tasks include dealing with missing values, standardising data formats (dates, currencies, etc.), aggregating data to a higher level of granularity and creating calculated columns to gain new insights. For example, it is possible to confirm the proportion of sales by customer based on transaction history or categorise products based on sales volume.
However, it is also important to mention the limitations of Power Query when working with datasets that can be labelled as Big Data. Performance can degrade significantly as the volume of data increases, and if you are using Excel as the final repository for data, the limited number of rows can become a hindrance. Therefore, it is very important to optimise the data processing steps and queries that affect the processing speed. It is necessary to filter the data at the beginning of the process to reduce the amount of data loaded. It is desirable to reduce the number of transformation steps and use appropriate data type conversions to avoid unnecessary utilisation of processing power. This approach is suitable for initial research work and analyses on a smaller scale. For tasks that require the processing of large amounts of data, specialised Big Data platforms such as Hadoop, Spark or cloud data warehouses are required.
An example of when initial data research should be carried out is analysing customer transaction data from an e-commerce platform. With Power Query, you can connect to a database that contains transaction data. From this, you can select a sample of 10,000 transactions and then use Power Query to calculate the average transaction value per customer. Identify the top-selling product types and segment customers based on transaction history. This initial analysis can provide valuable insights and indicate which data sets are worth analysing in more detail using specialised tools.
To summarise the above, Power Query is a practical and accessible tool for gaining small insights into Big Data. It provides data selection and processing that allows users to explore huge data sets and gain initial insights without the need for a full Big Data infrastructure. Even if no Big Data processing platform is available, you can use Power Query to perform an initial small data exploration. To expand your knowledge of Big Data, we recommend that you explore resources on data warehousing, cloud computing and distributed processing techniques.
If you have any comments on this article please email them to lv_mindlink@pwc.com
Ask questionThe Organisation for Economic Co-operation and Development (OECD) is known to be a unique forum and a globally recognised centre of expertise that enables member states, including Latvia, to effectively address matters of interest to it regarding the adequacy of transfer prices.
This article looks at the guidance developed by the OECD on Amount B for associated enterprises performing the function of a distributor of goods within a group of companies.
On 9 September 2024 the State Revenue Service (SRS) reminded Latvian taxpayers about the opportunity to apply for an automatic refund of personal income tax (PIT) without filing the annual tax return (ATR). Persons wishing to receive into their bank account any PIT overpaid in the previous tax year are asked to apply for this service by 30 September 2024. In August 2024 the SRS added Smart-ID to the array of tools for signing in to the Electronic Declaration System (EDS), offering taxpayers an easier method of authentication.
Central and local government agencies have been required to create accessible Web content since 2016. This is prescribed by Directive (EU) 2016/2102 of the European Parliament and of the Council of 26 October 2016 on the accessibility of websites and mobile applications of public sector bodies passed in 2016. However, studies suggest that it’s still very difficult for people with disabilities to access information in the way they need. For example, in August 2024, having surveyed 15 websites run by central and local government agencies, the Ombudsman found that none of them is fully accessible to people with disabilities. In this article we will explain what accessibility is and why it’s important for businesses, as well as exploring the essence of accessibility to digital resources and services, the rationale, legislation, practices and recommendations for providing accessibility successfully.
We use cookies to make our site work well for you and so we can continually improve it. The cookies that keep the site functioning are always on. We use analytics and marketing cookies to help us understand what content is of most interest and to personalise your user experience.
It’s your choice to accept these or not. You can either click the 'I accept all’ button below or use the switches to choose and save your choices.
For detailed information on how we use cookies and other tracking technologies, please visit our cookies information page.
These cookies are necessary for the website to operate. Our website cannot function without these cookies and they can only be disabled by changing your browser preferences.
These cookies allow us to measure and report on website activity by tracking page visits, visitor locations and how visitors move around the site. The information collected does not directly identify visitors. We drop these cookies and use Adobe to help us analyse the data.
These cookies help us provide you with personalised and relevant services or advertising, and track the effectiveness of our digital marketing activities.