be careful with your hyphens

Quick Monday lesson: not all characters are the same, even when they look identical.

is not the same as

Don’t believe me? Copy and paste both here and check the ASCII value.
https://www.duplichecker.com/text-to-ascii.php

The first one has an ASCII value of: 226 128 144
The second one has an ASCII value of: 45

The difference is one is a hyphen (Unicode Character: U+2010) and one is a hyphen-minus (Unicode Character: U+002D), but many fonts don’t make them look any different.

But, try to use one instead of the other in a tool like Power BI, when searching, and one will fail while the other will succeed. By contrast, SQL Server (in the SQL_Latin1_General_CP1_CI_AS collation) doesn’t care.

Here are a bunch of similar looking things, which are different:
https://www.compart.com/en/unicode/category/Pd

🤓

A Case for Columnstore

Let me start by saying this is by no means a technical document. There are SQL Server giants out there who are doing that excellently. This post is just a recap for myself, and a brief intro for people not in the the industry, maybe, if anyone cases. When I got started with databases, these were not a thing, and they hadn’t really come up for me in the wild, other than theoretically, until recently when I was completely blown away by the performance advantage in my use case. So, I wanted to pay a brief tribute here, for my memory, of the occasion.

I recently had to move a large fact table into a database. After doing so, I really began to appreciate the value of a clustered columnstore index. The table had approximately 2.15 billion rows and was comprised of a bigint (primary key), an int (foreign key), and 2 nvarchar columns, one for key and one for value. The table had to be quickly restored from a replicated production database, which contained the most recent data, and then batch-backfilled from a restored backup of a data warehouse table. The process took most of the day to complete. Once it had finally finished, however, I discovered that the table was basically unusable. The process of replication gave the table the index from the production database, which was a primary key on the bigint ID column. This had worked fine for the 32 million rows in the production database but was impossible to query for the 2.15 billion rows on the reporting server. To give an example, a COUNT_BIG(Id) query had to eventually be killed after an hour because it was holding up replication and other reporting tasks (never mind the fact that this information could more quickly be obtained by querying the sys.partitions Catalong View instead, or the sys.dm_db_partition_stats DMV, since those defeat the purpose of the example and the point of this discussion). There is no telling how long this simple query would have taken had I not killed it. Everything had come to a grinding halt in the attempt. Skipping ahead to the contrasted scenario, once a clustered columnstore index was implemented on the table, the same COUNT_BIG(Id) query finished in 00:00:01, while rows were also being inserted into the table by replication and while the table was also under load from reporting queries.

Why was it so much more dramatically efficient? A columnstore index uses batch mode execution, also known as vector-based execution, which allows SQL Server to process multiple rows, each stored in a separate memory area, at the same time. A columnstore index also reads fewer rows by using rowgroup elimination, in which by storing the min and max of each rowsegment it eliminates the need to read any unnecessary rowgroups. Column elimination, also known as segment elimination, is also possible since, in a columnstore index, the data is organized and compressed by column, so if a wide table only uses several columns in the query, those are the only columns that need to be read from disk. With rowstore, the column values for each row are stored together and are not easily separated. Finally, there can be a huge compression advantage. A columnstore index can provide about 10 times the level of data compression to a B-tree index, the self-balancing tree which is the typical data structure used for indexing row-oriented data pages. This high level of compression allows for a smaller memory footprint. The compression is achieved by columns storing values in rowgroups of similar values, greatly reducing I/O.

Why not just always use columnstore then? A rowstore (or B-tree) table might perform better. Columnstore is best for very large tables, or tables in which you can not predict the filter columns, or if you don’t need all the columns. It is not ideal for tables with frequent updates. Here is some design guidance from Microsoft on the topic.

A few other useful links about the topic:

Some Database-Related Links

Table that lists the past 12 months in SQL Server

Calendar tables are incredibly useful for all sorts of things and most people use them in BI and in databases, but what I also find seems to come up is the need for just a listing of the past 12 months. I was searching and searching and couldn’t find it in the way that I wanted it, so I pieced together various things I found and added some things that I already knew and made a table-valued function to return what I want. You can call the function with something like:

SELECT * FROM GetPrevious12Months()

(although you would probably not do it that way, but you get the idea)
It will return a table like this:

It is currently December of 2022, so this looks like it just returns the months of the year, but what it does is actually go back 12 months from whatever the current month may be. So, if it were currently July-2022 then row 1 would be July-2022 and row 12 would be June-2021.

I made it because I wanted to use it to populate a dropdown list in a a Retool app I was making, and also because this kind of thing tends to come up anyway for me periodically (although I usually just deal with it in Power BI instead.) I could find bits and pieces of what I wanted, but not everything I wanted, so I put it together and here it is. It makes use of a recursive CTE and also the MONTHNAME() ODBC Scalar Function.

--function GetPrevious12Months
IF OBJECT_ID('GetPrevious12Months') IS NULL
	EXEC ('CREATE FUNCTION GetPrevious12Months AS;')
GO

ALTER FUNCTION [dbo].[GetPrevious12Months] ()
RETURNS @GetMonths TABLE (
	MonthName VARCHAR(9)
	,MonthNameYear VARCHAR(14)
	,FOTM DATE
	,EOM DATE
	,YearMonth VARCHAR(7)
	)
AS
BEGIN
	DECLARE @DateFrom DATE = DATEADD(DAY, 1, EOMONTH(GETDATE(), - 1))
	DECLARE @DateTo DATE = DATEADD(YEAR, - 1, DATEADD(day, 1, EOMONTH(@DateFrom)));

	WITH DateRanges
	AS (
		SELECT {fn MONTHNAME(@DateFrom) } AS MonthName
			,@DateFrom AS FOTM
		
		UNION ALL
		
		SELECT {fn MONTHNAME(DATEADD(MONTH, - 1, EOMONTH(FOTM))) } AS MonthName
			,DATEADD(MONTH, - 1, DATEADD(DAY, 1, EOMONTH(FOTM, - 1))) AS FOTM
		FROM DateRanges
		WHERE FOTM > @DateTo
		)
	INSERT INTO @GetMonths
	SELECT [MonthName]
		,[MonthName] + '-' + CAST(DATEPART(year, FOTM) AS VARCHAR)
		,FOTM
		,EOMONTH(FOTM)
		,CONVERT(VARCHAR(7), FOTM, (23))
	FROM DateRanges
	OPTION (MAXRECURSION 0)

	RETURN
END
GO