Build an AI Tech News Aggregator: Azure Functions & Claude

There’s a lot of noise on the internet. Reddit, Hacker News, tech blogs, keeping up with what actually matters in enterprise software is a full-time job. So I built a fully automated system that does it for me, runs in the cloud, is powered by AI, and was deployed end-to-end in less than two hours using Claude Code.

Here’s how.

What We Built (What Claude did mostly)

A C# Azure Function that runs every hour and:

  1. Fetches posts from configurable Reddit subreddits and Hacker News
  2. Filters for recency only posts from the last 7 days
  3. Deduplicates across runs never evaluates the same URL twice
  4. Applies an AI editorial filter Claude decides what’s genuinely newsworthy
  5. Writes curated results to Azure Blob Storage as timestamped JSON

The output is clean, structured JSON ready to feed into a newsletter, dashboard, or notification system.

The Architecture

The system has three layers: data collectionAI filtering, and persistence.

Reddit RSS feeds ──┐

                   ├─► Aggregator Function ─► Claude AI Filter ─► Blob Storage

HN Firebase API ───┘         │

                              └─► State Store (seen URLs)

Tech Stack

ConcernChoice
RuntimeAzure Functions v4, .NET 8 isolated worker
Reddit dataPublic Atom/RSS feed (r/{sub}/top.rss)
HN dataFirebase REST API
AI filteringAnthropic Claude (claude-opus-4-6) via raw HttpClient
StorageAzure Blob Storage
ScheduleNCRONTAB timer trigger

Interesting Engineering Decisions

Reddit: RSS over JSON API

The Reddit JSON API (/top.json) started returning 403s without authentication. Rather than deal with OAuth, we switched to Reddit’s public Atom/RSS feed (no credentials required) and parsed it with System.Xml.Linq in a handful of lines. Simple wins.

Claude as an Editorial Filter

Instead of writing brittle keyword heuristics to judge whether a post is “real tech news,” we hand that job to Claude with a carefully crafted system prompt based on Editorial Guidelines:

A post qualifies if it is relevant to enterprise software development AND meets at least one of the following: Change, Innovation, or Emergent Ideas, and is not a minor patch release, pure marketing, or clickbait.

Claude receives posts in batches of 25, returns a JSON array of qualifying indices, and we map those back to posts. If the API is unreachable, the batch passes through unfiltered as a deliberate fail-safe so the pipeline never breaks.

We used structured JSON output (output_config.format.type = “json_schema”) to guarantee a parseable response every time, no regex needed.

Deduplication Without a Database

To prevent re-evaluating the same URLs across hourly runs (and paying for unnecessary AI API calls), we persist a rolling state file — state/seen-urls.json — in Blob Storage. On each run:

  • Load seen URLs into a HashSet<string> for O(1) lookup
  • Filter new posts against it
  • After filtering, mark all new posts as seen (not just the ones that passed the AI filter — rejected posts shouldn’t be retried)
  • Prune entries older than 7 days to keep the file small

No database, no Redis, no infrastructure overhead. A blob file is enough.

The AI Filter in Practice

A typical hourly run might look like this:

Fetched 312 posts from the last 7 days.

Deduplication: 47 new / 265 already seen (skipped).

Running news quality filter on 47 new posts…

News filter: 11/25 posts passed.

News filter: 9/22 posts passed.

Filter complete: 20/47 posts kept.

20 posts saved to 2026/03/24/09-00-01.json

Out of 312 raw posts, 20 make it through. That’s the kind of signal-to-noise ratio that makes a curated feed actually worth reading.

Deployment

The whole thing deploys with two commands:

# Push app settings (API keys, schedule, etc.)

az functionapp config appsettings set \

  –name FuncNewsAggregation \

  –resource-group rg-news-aggregators \

  –settings @appsettings.json

# Publish the function

func azure functionapp publish FuncNewsAggregation –dotnet-isolated

Done. The function is live, running on Azure’s infrastructure, costing pennies per day.

What’s Next

A few natural extensions:

  • Email or Slack digest — trigger a Logic App when a new blob is written
  • Web frontend — serve the JSON blobs as a read-only news feed
  • Scoring — weight HN scores more heavily now that RSS drops Reddit scores
  • More sources — dev.to, lobste.rs, or custom RSS feeds are easy to add

Takeaways

The most interesting lesson here isn’t the code, it’s the division of labor. Deterministic logic handles the mechanical work: fetching, deduplicating, and scheduling. The judgment call “Is this actually news?”  goes to the model.

That separation keeps the system simple, cheap to run, and easy to adjust. Change the system prompt, and you change the editorial policy. No retraining, no feature engineering.

Two hours from idea to deployed function. That’s the pace at which you can build now.


All source code is C# targeting .NET 8. The function runs on an Azure Consumption plan and incurs roughly $0 in hourly costs well within the free tier.

My Experience with Microsoft Excel During IT Projects

During my extensive career in IT, I often ran into Microsoft Excel. One of my first projects was leveraging Excel to create documentation for a telco for site surveys. I build a solution with Visual Basic for Applications, a programming language for Excel, and all the other Microsoft Office programs like Word and PowerPoint. With VBA, I could generate multiple worksheets in a Workbook filled with static and dynamic data – from a user’s input or configuration file. Once populated with data and rendered, the Workbook was converted to a Portable Document Format (PDF).

Over the last couple of years, I had other projects involving Excel. In this post, I will dive into the details of implementations (use cases) concerning Excel Workbooks. One project involved processing Excel files in a Container running on an Azure Kubernetes Service (AKS) cluster, the other generating an Excel Workbook for reporting purposes orchestrated by an Azure Logic App.

Use Case – Processing an Excel Workbook in a Container

The use case was as follows. In short, I was working on a project for a client a few years ago that required processing a standardized Excel template that their customers could provide for enrichment. The data in the excel file needed to end in a database for further processing (enrichment) so that it could be presented back to them.  The diagram below shows the process of a customer uploading an Excel file via an API. The API would store the Excel in an Azure storage container and trigger code inside a container responsible for processing (parsing the Excel to JSON). The second container had code persist the data in SQL Azure.

Use Case 1

The code snippet (as an example) responsible for processing the Excel file:

For creating the Excel Workbook and its sheet with data, I found the EPPlus library, a spreadsheet library for the .NET framework and .NET core. In the project, I imported the EPPlus NuGet package – specifically, I used the ExcelPackage class.

Now let’s move on to the second use case.

Use Case – Generating an Excel Report in Azure

In a recent project for another customer, I had to generate a report of products inside D365 that needed to be an Excel File (a workbook containing a worksheet with data). The file had to be written to an on-premises file share to allow the target system to consume it. The solution I built was using a Logic App to orchestrate the project of generating the Excel file.

Below you see a diagram visualizing the steps from triggering a package in D365 until the writing of the Excel file in a file share on-premises.

Use Case 2

The steps are:

  1. Logic App triggering a package in D365 (schedule trigger).
  2. Executing the package to retrieve and export data to a SQL Azure Database.
  3. Query by the same Logic App that triggered the package to retrieve the data from the SQL Azure Database.
  4. Passing the data to (the result of the query) to an Azure Function, which will create an Excel Workbook with one sheet containing the data in a given format. The function will write the Excel to an Azure Storage container.
  5. Subsequently, the Logic App will download and write the file to the on-premises file share (leveraging the On-Premises Data Gateway – ODPGW).

The sequence diagram below shows the flow (orchestration) of the process.

Sequence diagram

And below is a part of the Logic App workflow definition resembling the sequence diagram above.

The code snippet (as an example) in the Azure Function responsible for creating the Excel file:

For the creation of the Excel Workbook and sheet with data, I used NPOI – an open-source project which can help you read/write XLS, DOC, and PPT file extensions. In Visual Studio, I imported NPOI NuGet Package. The package covers most of the features of Excel like styling, formatting, data formulas, extracting images, etc. In addition, it does not require the presence of Microsoft Office. Furthermore, I used the StorageAccountClass to write the Excel file.

Conclusion

Microsoft Excel is a popular product available for decades and used by millions of people ranging from businesses heavily relying on Excel to home users for basic administration. Moreover, in IT, Excel is used in many scenarios such as project planning, environment overviews, project member administration, reporting, etc. As said earlier, I have encountered Microsoft Excel various times in my career and built solutions involving the product. The two use-cases are examples of that.

In the first example, I faced a challenge finding a library that supported .NET Core 2.0. I found EPPlus, which did the job for us after experimenting with it first. In the second example, the cost and simplicity were the benefits of using the NPOI library. There were constraints in the project to use solutions with a cost (subscription-based or one-off). Furthermore, the solution proved to be stable enough to generate the report.

Note that the libraries I found are not the only ones available to work with Excel. For instance, SpreadsheetGear, and others, which are listed here. In Logic Apps, you can find connectors that can do the job for you, such as CloudMersive (API you connect to convert, for instance, CSV to Excel).

I do feel with code you have the most flexibility when it comes to dealing with Excel. A standard, of-the-shelve can do the job for you, however, cost (licensing) might be involved or other considerations. What you choose in your scenarios depends on the given context and requirements.