Tag Archives: big data

Big data: The next frontier for innovation, competition, and productivity

Abstract Data have become a torrent flowing into every area of the global economy. Companies churn out a burgeoning volume of transactional data, capturing trillions of bytes of information about their customers, suppliers, and operations. millions of networked sensors are being embedded in the physical world in devices such as mobile phones, smart energy meters, [...]

Towards Trustworthy Participatory Sensing

Abstract Grassroots Participatory Sensing empowers people to collect and share sensor data using mobile devices across many applications, spanning intelligent transportation, air quality monitoring and social networking. In this paper, we argue that the very openness of such a system makes it vulnerable to abuse by malicious users who may poison the information, collude to [...]


Abstract The Apache Hadoop  project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering [...]

Scaling Social Science with Hadoop

Abstract “The methods of social science are dear in time and money and getting dearer every day.” — George C. Homans, Social Behavior: Its Elementary Forms, 1974……When Homans — one of my favorite 20th century social scientists — wrote the above, one of the reasons the data needed to do social science was expensive was [...]

MACOSPOL (Mapping Controversies on Science for Politics)

Abstract In modern societies, collective life is assembled through the superposition of scientific and technical controversies. The inequities of growth, the ecological crisis, the bioethical dilemma and all other major contemporary issues occur today as tangles of humans and non-humans actors, politics and science, morality and technology. Because of this growing hybridization complexity, getting involved [...]

Top 10 algorithms in data mining

Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we [...]

The next Google

Abstract Ten years ago this month, Google’s first employee turned up at the garage where the search engine was originally housed. What technology at a similar early stage today will have changed our world as much by 2018? Nature asked some researchers and business people to speculate — or lay out their wares. Their responses [...]

The Promise and Peril of Big Data

Abstract According to a recent report, the amount of digital content on the Internet is now close to five hundred billion gigabytes. This number is expected to double within a year…The explosion of mobile networks, cloud computing and new technologies has given rise to incomprehensibly large worlds of information, often described as “Big Data.” Using [...]

Privacy-Preserving Data Mining

Abstract A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We [...]

The Pathologies of Big Data

Abstract Scale up your datasets enough and all your apps will come undone. What are the typical problems and where do the bottlenecks generally surface? Comments An account of issues inherent to big data from an engineering perspective. Informed of constraints in computer hardware and search procedures, the author makes some back-of-the-envelope calculations with simple [...]

Live Linked Open Sensor Database

Abstract There are millions of sensors being deployed all over the world. Data generated by these sensors is provided in different formats and interfaces and is rarely associated with semantics that describe its meaning. The heterogeneity and lack of semantic descriptions pose a big barrier for accessing sensor data and combining it with other data [...]

The Human Infrastructure of Cyberinfrastructure

Abstract Despite their rapid proliferation, there has been little examination of the coordination and social practices of cyberinfrastructure projects. We use the notion of “human infrastructure” to explore how human and organizational arrangements share properties with technological infrastructures. We conducted an 18-month ethnographic study of a large-scale distributed biomedical cyberinfrastructure project and discovered that human [...]

Data Protection Law and the Ethical Use of Analytics

Abstract Organizations now work in a data-rich environment. As the Article 29 Working Group of the EU recently noted, ‘‘[W]e are witnessing a so-called ‘data deluge’ effect, where the amount of personal data that exists, is processed and is further transferred continues to grow.’’ From all indications, the data deluge will not only continue, but [...]

A Comprehensive Survey of Data Mining-based Fraud Detection Research

Abstract This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of [...]

Big Data and Cloud Computing: Current State and Future Opportunities

Abstract Scalable database management systems (DBMS)—both for update intensive application workloads as well as decision support systems for descriptive and deep analytics—are a critical part of the cloud infrastructure and play an important role in ensuring the smooth transition of applications from the traditional enterprise infrastructures to next generation cloud infrastructures. Though scalable data management [...]

Awash in Stardust: Data Practices in Astronomy

Abstract One of several major research initiatives into the grand challenge of data curation, the Data Conservancy (DC)…is investigating data use, sharing, and preservation in multiple fields of science. Our group at the University of California, Los Angeles is conducting a deep case study of astronomy and astrophysics…This poster will summarize findings from the first [...]

Privacy and Publicity in the Context of Big Data

Abstract Unless you’ve been hiding under a rock, you’ve witnessed all sorts of grumblings about privacy issues in relation to social media. Sometimes, this comes in the form of complete panic. “OMG, kids these days! What are they putting up online!?!?” At other times, we hear this issue emerge in relation to security issues: data [...]

Interconnected Media for Human-Centered Understanding

Abstract Today, there are many systems with large amounts of complex data sets. Visualizing these systems in a way that enlightens the user and provides a profound understanding of the respective information space is one of the big information visualization research challenges…To overcome this incapacity and to provide a solution to the dilemma of time [...]

Big Brother’s Little Helpers: How ChoicePoint and Other Commercial Data Brokers Collect and Package Your Data for Law Enforcement

Abstract Traditionally, law enforcement officers obtained information by speaking with suspects’ neighbors, employers, or friends. They would analyze paper arrest records and crime reports. In order to obtain personal information stored in private databases, they would have to call a variety of different vendors. The shift to a digital environment has brought many changes to [...]