LangChain Integration for Vector Support for Azure SQL and SQL database in Microsoft Fabric Microsoft SQL now supports native vector search capabilities in Azure SQL and SQL database in Microsoft Fabric. We also released the langchain-sqlserver package, enabling the management of SQL Server as a Vectorstore in LangChain. In this step-by-step tutorial, we will show […]
They cite Kaggle as the dataset source, but that entry reads:
About Dataset
This dataset contains 7 txt files of 7 books of Harry Potter. First, I downloaded the ebooks and then converted them to txt files.
I removed the front page and the ending lines of the books to make it more clean.
This dataset can be used for NLP tasks like text processing, text mining, etc.
“Your honor, this isn’t pirated video content. It’s a series of MP4 files that have had the commercials removed to make it more clean, so now it’s a dataset I can use to train video models.”
They cite Kaggle as the dataset source, but that entry reads:
So the dataset is public domain…? Is that how copyright works?
“Your honor, this isn’t pirated video content. It’s a series of MP4 files that have had the commercials removed to make it more clean, so now it’s a dataset I can use to train video models.”