Data Software Engineering Daily público
[search 0]
Mais

Download the App!

show episodes
 
Loading …
show series
 
Modern companies leverage dozens or even hundreds of software solutions to solve specific needs of the business. Organizations need to collect all these disparate data sources into a data warehouse in order to add value. The raw data typically needs transformation before it can be analyzed. In many cases, companies develop homegrown solutions, thus…
 
Instabase is a technology platform for building automation solutions. Users deploy it onto their own infrastructure and can leverage the tools offered by the platform to build complex workflows for handling tasks like income verification and claims processing. In this episode we interview Anant Bhardwaj, founder of Instabase. He describes Instabase…
 
Shinji Kim is Founder and CEO of Select Star. In this episode we discuss data discovery and more. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com The post Data Discovery with Shinji Kim appeared first on Software Engineering D…
 
Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data (influxdata.com). The platform InfluxData is designed for build…
 
Whether sending messages, shopping in an app, or watching videos, modern consumers expect information and responsiveness to be near-instant in their apps and devices. From a developer’s perspective, this means clean code and a fast database. Apache Druid is a database built to power real-time analytic workloads for event-driven data, like user-faci…
 
Auren Hoffman is the CEO of SafeGraph. In this episode we discuss data as a service and more. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com The post DaaS with Auren Hoffman appeared first on Software Engineering Daily.…
 
Enterprise data warehouses store all company data in a single place to be accessed, queried, and analyzed. They’re essential for business operations because they support managing data from multiple sources, providing context, and have built-in analytics tools. While keeping a single source of truth is important, easily moving data from the warehous…
 
Prophecy is a complete Low-Code Data Engineering Platform for the Enterprise. Prophecy enables all your teams on Apache Spark with a unique low-code designer. While you visually build your Dataflows – Prophecy generates high-quality Spark code on Git. Then, you can schedule Spark workflows with Prophecy’s low-code Airflow. Not only that, Prophecy p…
 
In the previous episode, Pulsar Revisited, we discussed how the company DataStax has added to their product stack Astra Streaming, their cloud-native messaging and event streaming service that’s built on top of Apache Pulsar. We discussed Apache Pulsar and the added features DataStax offers like injecting machine learning into your data streams and…
 
In 2003, Google developed a robust cluster management system called Borg. This enabled them to manage clusters with tens of thousands of machines, moving them away from virtual machines and firmly into container management. Then, in 2014, they open sourced a version of Borg called Kubernetes, or K8s. Now, in 2021, CockroachDB is a distributed datab…
 
Big data analytics is the process of collecting data, processing and cleaning it, then analyzing it with techniques like data mining, predictive analytics, and deep learning. This process requires a suite of tools to operate efficiently. Data analytics can save companies money, drive product development, and give insight into the market and custome…
 
DevOps has shortened the development life cycle for countless applications and is embraced by companies around the world. But managing and monitoring multiple environments is still a major pain point, particularly when companies need to mix cloud and legacy systems. Knowing when services go down and quickly pinpointing the cause is essential for co…
 
Data science is an interdisciplinary field that combines strong technical skills with industry knowledge to perform a large range of jobs. Data scientists solve business questions with hands-on work cleaning and analyzing data, building machine learning models and applying algorithms, and generating dynamic visuals and tools to understand the world…
 
Big Data has exploded the past decade as cloud computing and more efficient hardware made scaling essentially limitless. Products like Uber revolve entirely around analyzing data to provide rides. According to an EMC/IDC study, there was approximately 5.2TB of data for every person in 2020. That estimate was made before the transition to remote wor…
 
There are over 4 billion people using email. Many people using email for business communicate quick questions to colleagues, send repetitive, template-based information to potential customers and freshly hired employees, and repeat a lot of the same phrases. We actually repeat phrases in a lot of written formats. How often do you copy and paste the…
 
Continuous integration is a coding practice where engineers deliver incremental and frequent code changes to create higher quality software and collaborate more. Teams attempting to continuously integrate new code need a consistent and automated pipeline for reviewing, testing, and deploying the changes. Otherwise change requests pile up in the que…
 
ELT is a process for copying data from a source system into a target system. It stands for “Extract, Load, Transform” and starts with extracting a copy of data from the source location. It’s loaded into the target system like a data warehouse, and then it’s ready to be transformed into a usable format for things like modern cloud applications. The …
 
Uber is one of many examples we’ve discussed on this show that has changed the world with big data analysis. With over 8 million users, 1 billion Uber trips and people driving for Uber in over 400 cities and 66 countries, Uber has redefined an entire industry in a very short time frame. It’s difficult to find precise details about Uber’s big data i…
 
The quantity and quality of a company’s data can mean the difference between a major success or major failure. Companies like Google have used big data from its earliest days to steer their product suite in the direction consumers need. Other companies, like Apple, didn’t always use big data analytics to drive product design, but they do now. The c…
 
The company StreamSets is enabling DataOps practices in today’s enterprises. StreamSets is a data engineering platform designed to help engineers design, deploy, and operate smart data pipelines. StreamSets Data Collector is a codeless solution for designing pipelines, triggering CDC operations, and monitoring data in flight. StreamSets Transformer…
 
Delivering Saas products involves a lot more than just building the product. Saas management involves customer relationship management, licensing, renewals, maintaining software visibility, and the general management of the technology portfolio. The company Blissfully helps businesses manage their SaaS products from within a complete IT platform wi…
 
Amundsen was started at Lyft and is the leading open-source data catalog with the fastest-growing community and the most integrations. Amundsen enables you to search your entire organization by text search, see automated and curated metadata, share context with co workers, and learn from others by seeing most common queries on a table or frequently…
 
Data exploration uses visual exploration to understand what is in a dataset and the characteristics of the data. Data scientists explore data to understand things like customer behavior and resource utilization. Some common programming languages used for data exploration are Python, R, and Matlab. Doris Jung-Lin Lee is currently a Graduate Research…
 
Cloud data warehouses are databases hosted in cloud environments. They provide typical benefits of the cloud like flexible data access, scalability, and performance. The company Firebolt provides a cloud data warehouse built for modern data environments. It decouples storage and compute to operate on top of existing data lakes like S3. It computes …
 
Apache Superset is an open-source, fast, lightweight and modern data exploration and visualization platform. It can connect to any SQL based data source through SQLAlchemy at petabyte scale. Its architecture is highly scalable and it ships with a wide array of visualizations. The company Preset provides a powerful, easy to use data exploration and …
 
Columnar databases store and retrieve columns of data rather than rows of data. Each block of data in a columnar database stores up to 3 times as many records as row-based storage. This means you can read data with a third of the power needed in row-based data, among other advantages. The company Altinity is the leading enterprise provider for Clic…
 
Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development. This framework more efficiently manages business requirements like data lifecycle and improves data quality. Some common use cases for Hudi is record-level insert, update, and delete, simplified file management and nea…
 
An application programming interface, API for short, is the connector between 2 applications. For example, a user interface that needs user data will call an endpoint, like a special URL, with request parameters and receive the data back if the request is valid. Modern applications rely on APIs to send data back and forth to each other and save, ed…
 
The traveling salesman problem is a classic challenge of finding the shortest and most efficient route for a person to take given a list of destinations. This is one of many real-world optimization problems that companies encounter. How should they schedule product distribution, or promote product bundles, or define sales territories? The answers t…
 
In software engineering, telemetry is the data that is collected about your applications. Unlike logging, which is used in the development of apps to pinpoint errors and code flows, telemetry data includes all operational data including logs, metrics, events, traces, usage, and other analytical data. Companies usually visualize this information to …
 
Modern applications are increasingly built as large, distributed systems. A distributed system is a program where its components are located on different machines that communicate with one another to create a single cohesive app. Components may exist as multiple instances across “nodes,” the computers hosting them, which form clusters of nodes that…
 
Geospatial technology impacts every person who uses a smartphone, drives a car, or flies in airplanes. It refers to all of the technology used to acquire and interpret geographic information. In more advanced settings, geospatial technology is used for constructing dynamic maps, 3D visualizations, and scientific and governmental simulations. The co…
 
A data warehouse is a data management system that often contains large amounts of historical data and is used for business intelligence activities like analytics. It centralizes customer data from multiple sources to be an organization’s single source of truth. Getting the data from your data warehouse into the different applications used by your o…
 
A major change in the software industry is the expectation of automation. The infrastructure for deploying code, hosting it, and monitoring it is now being viewed as a fully automatable substrate. Equinix Metal has taken the bare metal servers that you would see in data centers and fitted them with supreme automation and repeatability. This movemen…
 
Digital communities have exponentially grown in importance ever since most of the world went remote. Basically every popular online forum, message board, chat app, and other online social aggregators were created before this new normal. Many of these platforms lack sufficient organization or are just outdated for a fully remote environment. If soci…
 
Product teams sometimes double as data teams. They struggle through import errors, scrub long and complicated data sheets for consistency, and map spreadsheet fields on step 3 in a long instruction document. Data structuring and synchronization is a very real problem that product teams regularly overcome. Flatfile uses AI-assisted data onboarding t…
 
Right now, more than 10 million people use notebooks like Jupyter in their workflow. Notebooks are open-source tools for creating and sharing documents with live code, equations, visualizations and explanatory text. Notebooks like Jupyter have exploded in popularity the past 5 years to become the standard tool for data science teams. They became es…
 
Email has become such a routine feature of knowledge work that we often take it- and the email clients we use for it- for granted. While advancements such as intelligent spam filtering have improved the experience, many email clients retain the same basic structure and offer a largely similar experience. Superhuman is building a modern email client…
 
Studies show that people in “maker” professions such as developers and writers are most productive when they can carve out dedicated time for focused work, without the frequent context-switching that comes with an irregular meeting schedule. Meetings and other non-development work are necessary parts of the job, but a team will be much more product…
 
A “co-location” center is a data center that leases out networking and compute infrastructure to retail clients. Co-location centers host clients with a wide variety of infrastructure strategies, from small retail customers, to medium-size teams running hybrid cloud models, to large corporate clients who prefer not to incur the capital cost of buil…
 
Over the past few years, the conventional wisdom around the value proposition of Big Data has begun to shift. While the prevailing attitude towards Big Data may once have been “bigger is better,” many organizations today recognize that broad-scale data collection comes with its own set of risks. Data privacy is becoming a hotly debated topic both i…
 
A data-driven organization collects a wide variety of data to help in strategic decision-making. The cost of storing large amounts and variety of data has dropped dramatically in the last two decades, but too much unstructured data may not improve decision-making, and can even lead to “analysis paralysis.” Organizations react by extracting the most…
 
Video calling over the internet has experienced explosive growth in the last decade. In 2010, surveys estimated that around 1 in 5 Americans had tried online video calling for any reason. By May of 2020, that number had nearly tripled. A significant factor in the growth of video calling has been an open-source project called WebRTC, or “Web Real-Ti…
 
In a distributed application, observability is key to handling incidents and building better, more stable software. Legacy monitoring methods were built to respond to predictable failure modes, and to aggregate high-level data like access speed, connectivity, and downtime. Observability, on the other hand, is a measure of how well you can infer the…
 
Kafka has achieved widespread popularity as a popular distributed queue and event streaming platform, with enterprise adoption and a billion dollar company (Confluent) built around it. But could there be value in building a new platform from scratch? Redpanda is a streaming platform built to be compatible with Kafka, that does not require the JVM n…
 
GraphQL has changed the common design patterns for the interface between backend and frontend. This is usually achieved by the presence of a GraphQL server, which interprets and federates a query from the frontend to the backend server infrastructure. Dgraph is a distributed graph database with native GraphQL support. Manish Jain is a founder of Dg…
 
Agriculture infrastructure allows plants such as corn, soy, and wheat to move from large scale farms to consumers all around the world. The relevant players in the agricultural infrastructure includes growers, shippers, and planners. These individuals need new technology to interact more efficiently. Growers need to be able to connect more smoothly…
 
Data lakes and data warehouses store high volumes of multidimensional data. Data sources for these pieces of infrastructure can become unreliable for a variety of reasons. When data sources break, it can cause downstream problems. One company working to solve the problem of data reliability is Monte Carlo Data. Barr Moses and Lior Gavish are founde…
 
When Tim Wagner worked at Amazon, he invented AWS Lambda. After working on the early serverless infrastructure, he joined Coinbase and worked as VP of Engineering. Since leaving Coinbase, he has started a new company called Vendia. Vendia combines his learnings from the serverless space with the innovations around blockchains to work on the problem…
 
Originally published July 19, 2019 In 2011, Facebook had begun to focus its efforts on mobile development. Mobile phones did not have access to reliable, high bandwidth connections, and the Facebook engineering team needed to find a solution to improve the request latency between mobile clients and the backend Facebook infrastructure. One source of…
 
Loading …

Guia rápido de referências

Google login Twitter login Classic login