Data Mining Issues that Still Persist in 2018


Data mining has become more important than ever. Brands and consumers use a wide range of platforms to exchange data. There are a lot of applications for these platforms as well.

Many people use digital platforms to look for products and services. Unfortunately, organizations that use data mining to streamline their customer service strategies or improve their services have encountered a number of challenges.

Understanding the Scope of Data Mining

On the internet, people share views on almost every product and topic. They rate restaurants, tourist destinations, software applications and accounting services. They also share their insights on political trends.

Many digital platforms make it easy for different consumers to share their points of view on given topics. There are several reasons these platforms enable scalable data sharing:

  • Customers can share their insights any time they want and from anywhere in the world. This gives organizations access to a larger pool of information.
  • The impersonal and anonymous nature of the Internet makes people feel more comfortable about sharing information. This doesn’t just increase the amount of information customers provide. It also improves transparency.

Unfortunately, companies will also have to overcome many challenges, which mean they need to use more effective data mining solutions.  Some of these challenges are fairly easy to solve, such as the diversity of platforms or the need to process and score data that includes spelling and grammatical issues. On the other hand, some challenges are more complex. These include dealing with opinion spam, false reporting, deep data mining and fabrication bias.

Data mining challenges that must be overcome

In 2018, companies routinely apply data mining methodologies to social networks to establish new trends, identify solid business opportunities and monitor the effectiveness of their marketing strategies. However, mining consumer data has become a more difficult task to carry out for variety of reasons. Here are six of the biggest:

  • Ability to score the level of nuance in a respondent’s view.
  • Diversity of data on each product or service.
  • Variety of platforms where user ratings are shared.
  • False inputs, which could be provided by malicious competitors or trolls.
  • Use of informal language or slang. This is a bigger problem on some Internet platforms where opinions are disseminated. The reliability of blogs, forums or social networks must be taken into account.
  • Utilization of emoticons between the text to better describe the feelings that arise in a given circumstance, good or service. These are often difficult to decipher by some data mining tools.

When these challenges are overcome, a new challenge may arise. Some solutions lead to other data biases or difficulty extracting data.  Let’s take a look at some of these issues.

Scoring the Level of Nuance

People use different social media for different purposes. Therefore, the information collected from each of them is often incomplete.  This means that there may never be enough information on the customer.

When you are trying to collect data on customers, you need to be careful about treating a specific statement the same for every customer. You may find that some people lean a particular way on a specific issue, but they may not have very strong views on it. Others may technically argue the same point, but have much more intense positions.

This is a common challenge for brands trying to establish customer loyalty. Some customers may make a statement that indicates they like the product, but they have only recently started using it. They might not have enough brand loyalty to recommend it to their friends and may be willing to try another product in the future. Others may be intensely loyal followers of it.

Brands should strive to determine the level of intensity a customer has on a particular view. It is difficult to make these nuanced observations. However, it is also very necessary.

Diversity of Data

You need a large sample size of data to draw any conclusions. The amount of data that you need obviously depends on the size of the population that you are trying to observe.

Scaling data sets has been a priority for this reason. However, it has unfortunately led experts to make some other mistakes. In their quest to increase the quantity of data, they often inject other biases.

One of the most common mistakes that brands make is using the wrong traffic source to attract people to their own platforms to get input. You might use Facebook ads to attract people to your site. However, this might give you a skewed sample if only a fraction of your audience is on Facebook. International brands can run into the same problem if they are using Google Ads to get customers to their profiles. In some countries, fewer than 2% of customers use the Google search engine network.

You don’t want to have data that doesn’t adequately represent the audience that you are studying. This bias will be very difficult to untangle during the data mining process later.

Tradeoff Between Data Quality on a User and Data Duplication in the Data Pool

When sources of complementary information are integrated, it is possible to build a more complete and reliable user profile that can help improve the quality of online services. However, it can also create a situation where data is duplicated. If the same individual shared their opinions multiple times on the same platform or data from multiple platforms were aggregated, then one particularly vocal customer could create the impression that their views were more prevalent than they actually were.

Dealing with These Data Mining Challenges

Some data mining challenges are more difficult to address than others. If brands don’t have the capacity to resolve all of them, then they should focus on the ones within their control.

Dealing with customer identification is the best place to start. To address problems associated with user identification on different platforms, it is a good idea to exploit the redundancies. You must also evaluate the behavior of specific users, which includes their participation on multiple social media sites. This makes data mining and analysis much easier, because it allows you to collect more detailed data for:

  • Creating new online services.
  • Providing better online recommendations
  • Advertising to the user and their contacts.

Dealing with opinion spam is more difficult. According to The Guardian, Amazon identified over 1000 people that shared false opinions last year. They might take legal action against these users, because they created a problem that affects producers, distributors and users.

How do brands identify false opinions during the data mining process? They can start by using existing data (opinions) in different databases (primarily from large platforms like social networks) to transform data into new research and results. Here are some factors that come into play:

This process relies on artificial intelligence, machine learning and database management to extract new patterns of large data sets and the knowledge associated with these patterns.

It extracts the data that the organization needs through automatic or semi-automatic means.

The different parameters included in the data extraction include grouping, forecasts, route analysis and predictive analysis, among others.

Data mining algorithms that evaluate and score online opinions take the comments made by different people about a specific product. They analyze the reliability of the data. The data scientists must then develop a mathematical model that allows to look at the historical data of each component and the new ones. They must identify the user identity, which may involve matching the user to a social media account, email address or IP address. False statements created by competitors posing as legitimate users and the general public can be detected by using appropriate data mining techniques.

Since the opinions of online users have the potential to influence the success or failure of a product, detecting users that share erroneous information should be a top priority. They should also take appropriate measures, such as blocking access to that user’s website or informing the person responsible for the site that they have abused the privileges and future abuses may lead to ramifications.

Dealing with Data Mining Challenges is a Top Concern in 2018

Data mining is a big challenge these days. Brands need to take the right steps to minimize bias, data duplication and other problems they will encounter. Fortunately, new data mining methodologies are making this easier.





%d bloggers like this: