Pinterest open-sources big data analytics tool Querybook

0
19
Pinterest open-sources big data analytics tool Querybook

Attend Transform 2021 to learn the key topics in Enterprise AI & Data. Learn more.

Pinterest today is an open source querybook, a data management solution for remote engineering collaboration on an enterprise scale. The company says the tool it uses in-house can help engineers build queries, do analysis, and collaborate with one another through a notebook interface.

Querybook started in 2017 as an internal project on Pinterest. Early on, the development team opted for a document-like interface that allows users to write queries and analyzes in one place, with collated metadata and the simplicity of a note-taking app. Querybook was released internally in March 2018 and has become the preferred solution for big data analytics on Pinterest. There are now an average of 500 active users per day and 7,000 daily query runs.

“With Querybook, Pinterest engineers have merged the power of metadata with the simplicity of a note-taking app for a better query experience where teams can query and write analysis in one place,” a spokesperson told VentureBeat. “Querybook can be set up and deployed in minutes.”

Every query executed in Querybook is parsed to extract metadata such as referenced tables and query runners. Querybook uses this information to automatically update the data schema and search ranking, as well as to display the common users and query examples of a table. The more queries there are in Querybook, the better the tables are documented.

Querybook also has an administration interface that companies can use to configure query engines, table metadata collection, and access permissions. Administrators can use this user interface to make changes to the Querybook live without going through any code or configuration files. They can also create visualizations including lines, bars, stacked areas, cakes, donuts, scatter charts, and table charts.

“The common starting point for any analysis on Pinterest is an ad hoc query that is run on the internal Hadoop or Presto cluster. To make these improvements continually, especially in an increasingly remote environment, it’s more important than ever for teams to query, conduct analysis, and collaborate with one another, ”Pinterest wrote in a blog post. “We designed Querybook to provide a responsive and simple web interface for such analysis, so that data scientists, product managers, and engineers can get the right data, assemble their queries, and share their results.”

Pinterest was previously open source Teletraan, a tool that allows code to be deployed on virtual machines, e.g. B. on those that are available via Amazon Web Services in the public cloud. Previously, the company released Terrapin, software that allows data to be more efficiently transferred from Hadoop’s open source big data software and made available to other systems.

VentureBeat

VentureBeat’s mission is to be a digital city square for tech decision makers to gain knowledge of transformative technology and transactions. Our website provides important information on data technologies and strategies to help you run your business. We invite you to become a member of our community and access:

  • up-to-date information on the topics of interest to you
  • our newsletters
  • gated thought leader content and discounted access to our valuable events such as Transform 2021: Learn more
  • Network functions and more

become a member